bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Genomics of a killifish from the Seychelles islands supports transoceanic island colonization and reveals

2 relaxed selection of developmental genes

3

4 Rongfeng Cui1, Alexandra M Tyers1, Zahabiya Juzar Malubhoy1, Sadie Wisotsky2, Stefano Valdesalici3, Elvina

5 Henriette4, Sergei L Kosakovsky Pond2, Dario Riccardo Valenzano*1,5

6 1 Max Planck Institute for Biology of Ageing, Cologne, Germany

7 2 Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Temple, U.S.A.

8 3 Associazione Italiana Killifish, Italy

9 4 Island Biodiversity Conservation centre, University of Seychelles, Anse Royale, Mahe, Seychelles

10 5 University of Cologne, Cologne, Germany

11

12 *corresponding author: [email protected]

13

14

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15 Abstract

16 How freshwater colonize remote islands remains an evolutionary puzzle. Tectonic drift and trans-oceanic

17 dispersal models have been proposed as possible alternative mechanisms. Integrating dating of known tectonic events

18 with population genetics and experimental test of salinity tolerance in the Seychelles islands golden panchax

19 ( playfairii), we found support for trans-oceanic dispersal being the most likely scenario. At the

20 macroevolutionary scale, the non-annual killifish golden panchax shows stronger genome-wide purifying selection

21 compared to annual killifishes from continental Africa. Reconstructing past demographies in isolated golden panchax

22 populations provides support for decline in effective population size, which could have allowed slightly deleterious

23 mutations to segregate in the population. Unlike annual killifishes, where relaxed selection preferentially targets

24 aging-related genes, relaxation of purifying selection in golden panchax affects genes involved in developmental

25 processes, including fgf10.

26

27

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

28 Introduction

29 Taxa with limited dispersal capabilities, such as flightless mammals, amphibians and freshwater are thought to

30 speciate largely through vicariance, i.e. population isolation caused by the presence of physical barriers, such as rivers

31 and mountain ranges (Wiley 1988). Whether species with limited dispersal capacity, such as freshwater fishes,

32 colonize remote islands through tectonic drift or trans-oceanic events is still debated (Briggs 2003; Sparks and Smith

33 2005).

34 Aplocheiloid killifish (, Cyprinotondiformes) are small fresh-water and brackish water fishes

35 distributed in South-east Asia, America, Africa, Madagascar, Seychelles and India. Killifishes are often adapted to

36 survive in extreme environments, such as those characterized by seasonal drought. Some killifishes evolved an annual

37 life cycle (Figure 1), which allows fertilized eggs to remain in diapause and survive through the dry season (Cellerino,

38 et al. 2016). The main mechanism explaining the current distribution of killifishes across the planet is the mesozoic

39 divergence of the major killifish clades, which followed the tectonic drift order of Gondwana (Murphy and Collier

40 1997; Costa 2013). Hence, if taxa mainly speciated as a consequence of geographical isolation due to tectonic events,

41 the phylogenetic branching order and timings are expected to agree with those of continental drift.

42 However, the Pachypanchax challenges this model. Pachypanchax are non-annual killifishes restricted to

43 Madagascar and the neighboring Seychelles, with the sister genus found in India and South-east Asia.

44 Only a single species, (the golden panchax) is found on the Seychelles islands. Despite the

45 current geographical proximity to Madagascar, the Seychelles islands have a closer geological relationship with the

46 Indian subcontinent (Plummer and Belle 1995; Seton, et al. 2012). Hence, based on the continental drift hypothesis of

47 taxa distribution, species from Seychelles with limited dispersal capacity, like freshwater fish, are expected to have

48 closer phylogenetic proximity to India than to Madagascar. However, Seychelles and Madagascar are home to fish of

49 the same genus (Pachypanchax), suggesting the possibility for transoceanic dispersal following tectonic drift as a

50 possible mechanism for species dispersal. Evidences for transoceanic dispersal have been found in other groups of

51 . For example, palaeontological and molecular clock analyses place the divergence time of cichlid fish in the

52 Palaeocene (65-57 Myr ago), much later than Mesozoic tectonic rifts of Gondwana (Friedman, et al. 2013). Based on

53 molecular clock and nucleotide substitution saturation analysis in mitochondrial genes, Malagasy amphibian, reptilian

54 and non-flying mammals are too closely related to continental sister groups to fit the ancient tectonic events (Vences

55 2004). The Malagasy chameleon Archaius tigris was shown to be sister to a continental African chameleon genus,

56 which diverged not until the Eocene–Oligocene, again supporting the possibility for transoceanic dispersal

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

57 (Townsend, et al. 2011). Molecular dating of the now extinct Malagasy Elephant Bird reveals its relatively recent

58 divergence with the sister group New Zealand Kiwis, suggesting transoceanic dispersal by their flying ancestors

59 (Grealy, et al. 2017).

60 To directly test whether tectonic or transoceanic dispersal can explain the evolution of the golden panchax

61 (Pachypanchax playfairii), we used phylogenetic and population genomic methods, as well as a lab experiment.

62 The golden panchax has a long evolutionary history of isolation on the Seychelles islands lasting several million years,

63 which combined with its limited dispersal capacity, represents a paradigmatic example of island evolution. Evolution

64 through island colonization has been used as an idealized model for understanding speciation (MacArthur and Wilson

65 2001). Endemic species are known to be more susceptible to extinction attributed to inbreeding (Frankham 1998),

66 despite some taxa evolving mechanisms for its mitigation (Wheelwright and Mauck 1998). Reduction of effective

67 population normally lowers the efficacy of selection, leading to the accumulation of deleterious mutations

68 (Charlesworth, et al. 1993). On the other hand, shifts in the ecological niche can also lead to relaxed selection by

69 removing niche-specific selective constraints (Wertheim, et al. 2014). Colonizing new island habitats is likely

70 accompanied by both a reduction of effective population size and by a change of the ecological niche, which may

71 affect a suite of phenotypes. For instance, relaxation of pre-existing selective pressures, such as a lack of predators in

72 novel environments, can explain the loss of anti-predator behaviors in oceanic tortoises (Austin and Nicholas Arnold

73 2001) and the loss of flight in the Kakapo (Livezey 1992) and Dodo (Livezey 1993). Positive selection has also been

74 invoked to explain island-specific phenotypes. For example, gigantism in oceanic tortoises is proposed to confer a

75 selective advantage during the unpredictable long dry seasons on islands (Jaffe, et al. 2011). However, still little is

76 known about the genomic patterns of species undergoing long-term island isolation.

77 Previous studies conducted in continental African killifishes have shown that relaxation of selection dominates

78 genome evolution in short-lived annual African killifishes, preferentially targeting aging-related pathways, likely

79 resulting from population fragmentation, population size reduction and a shift in the selective regime (Cui, et al. 2019;

80 Willemsen, et al. 2019). Here, to assess the effects of island isolation in the golden panchax, we inferred the historical

81 changes in effective population size in wild populations from different islands of the Seychelles archipelago and

82 conducted an unbiased characterization of the relative contribution of relaxation of selection, intensified selection and

83 positive selection in shaping the golden panchax genome.

84 Our results show that golden panchax likely colonized the Seychelles via transoceanic dispersal and indicate that

85 prolonged insularism led to progressive population size reduction. Detectable relaxation of purifying selection in

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

86 golden panchax appears to largely affect developmental genes, including fgf10 (important for cell cycle and organ

87 development), and is markedly distinct from the pattern observed in annual killifishes from continental Africa.

88 Results

89 The killifish biogeographic pattern supports transoceanic species dispersal

90 We constructed a phylogenetic tree using 4026 orthologous protein-coding sequences from 61 teleosts, adding the

91 newly sequenced genome of the Madagascar species (estimated genome size ~ 707 Mbp).

92 Based on calibrations obtained from a broader-scale teleost phylogeny (Near, et al. 2012), we inferred divergence

93 times with mcmctree (Figure 1A). South American representatives of the family diverged from African and

94 Indo-malagasy families ~59.85 myr ago (95% CI 52.66-67.08). , which are found in continental

95 Africa, diverged from the Indo-malagasy ~52.90 myr ago (95% CI 46.33-59.66). Continental African

96 killifishes are deeply diverged into western and eastern clades ~38.64 myr ago (95% CI 33.18-44.07). The Indian

97 genus Aplocheilus diverged from Pachypanchax ~37.45 myr ago (95% CI 31.55-43.50). Amongst all sampled genera

98 with more than one representative species, Pachypanchax has the most ancient common ancestor at ~14.95 myr ago

99 (95% CI 11.47-18.77), even deeper than the divergence between the sister genera and

100 Pronothobranchius, which dates to ~8 myr ago (95% CI 6.87-9.22) and between Callopanchax and

101 Scriptaphyosemion, which dates to ~8.9 myr ago (95% CI 7.61-10.32)(Figure 1A). The divergence between the two

102 extant species in the genus Pachypanchax is comparable with the divergence between more distantly related genera,

103 such as Fundulopanchax and Nothobranchius (14.4 myr ago, 95% CI 12.59-16.50) or Archiaphyosemion and

104 Callopanchax (14.5 myr ago, 95% CI 12.45-16.41). To our surprise, neither the order of divergence nor any of the

105 major divergence dates fit the known tectonic order of the Gondwanan continents, suggesting that transoceanic

106 dispersal rather than tectonic drift may explain the current biogeographic pattern within killifish (Figure 1A). We also

107 found that although the concatenated mitochondrial tree using codon positions 1 & 2 has only a moderate rapid

108 bootstrap support of 69 and a non-significant AU test statistics of 0.264 (Shimodaira 2002), the topology of the

109 maximum likelihood tree agrees with the nuclear genome (Figure S1), a result contradicting previous mitochondrial

110 trees built with fewer genes (Murphy and Collier 1997). Therefore, we find that the tree topology from mitochondrial

111 and nuclear genetic data are largely in agreement across all the major branches of killifish and support a significant

112 contribution of transoceanic species dispersal to the current killifish species distribution.

113 Studying population separation among golden panchax (Pachypanchax playfairii) from the Seychelles reveals a

114 branching pattern consistent with island distance (TreeMix analysis, Figure 1B); and the genome-wide statistics of

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

115 divergence dxy and Fst agree with the migration analysis based on TreeMix tree (Figure 1B). Overall, the level of

116 divergence between populations has the same range as within-population polymorphisms, suggesting recent

117 divergence. Interestingly, samples from northern Mahé island are very divergent from samples from the south, but we

118 cannot rule out the difference being driven by introduction of domesticated fish samples of the Mahé North island.

119 TreeMix analysis supports adding one migration edge from the La Digue population to the (B+C) population on the

120 neighboring Praslin island, with an estimated edge weight of 39-41%, suggesting high gene flow (Figure 1B). The

121 resulting divergence among populations remains unchanged with outgroup inclusion (Figure 2). Addition of further

122 migration edges is not supported as even when one migration was added, the network already explains on average

123 99.5% (outgroup excluded) or 99.99% (outgroup included) of the variation in the data (Figure S2).

124 Together, our results suggest that both macro- and micro-evolutionary divergence of killifishes can be in part

125 explained by transoceanic dispersal.

126

127 Pachypanchax embryos develop normally in sea water

128 To assess whether golden panchax would survive extended periods of time in the sea, which would be consistent with

129 the transoceanic species dispersal supported by the genomic data, we incubated in seawater golden panchax embryos

130 and compared their developmental trajectory to embryos raised in normal conditions, i.e. freshwater incubation.

131 Remarkably, we found that golden panchax embryos develop perfectly in seawater, with no detectable differences in

132 survival with control groups incubated in freshwater (Figure S3). Moreover, fry from golden panchax embryos raised

133 in seawater remain viable. This finding does not extend to the continental African killifish Nothobranchius furzeri,

134 whose embryos did not survive in seawater, with all embryos dead after 3 days post collection. Our findings suggest

135 that golden panchax could in principle sustain extended periods of time in seawater, which could be compatible with a

136 model of direct transoceanic dispersal vial ocean currents.

137

138 Annual killifishes from Madagascar and Seychelles accumulate deleterious mutations as population size

139 declines

140 Annual killifish from continental Africa display a larger proportion of genes under relaxed rather than intensified

141 selection compared to related non-annual species (Cui, et al. 2019). We asked whether the two annual species from

142 Madagascar and Seychelles (i.e. Pachypanchax omalonotus and playfairii), followed the same pattern of genes under

143 relaxed vs. intensified selection observed in the annual killifish species from continental Africa. We used phylogenetic

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

144 methods to examine intensity of selection at the macroevolutionary scale. We identified 444 genes under relaxed

145 purifying selection (p < 0.05, Table S4) and 877 genes under intensified selection (p < 0.05, Table S5) in the genus

146 Pachypanchax relative to other aplocheiloids. Compared to genera of annual killifish within the Nothobranchiidae

147 family, the genus Pachypanchax has a larger proportion of intensified rather than relaxed genes (Figure 2B, Figure

148 3), at a similar level as the non-annual killifish genus Scriptaphyosemion (from continental Africa). We used a

149 previously simulated dataset to compare the performance of the one-scaler and two-scaler parameterizations of the

150 RELAX test (Cui, et al. 2019). The simulated dataset was based on parameters obtained from the non-annual

151 Scriptaphyosemion clade. We simulated genes that are under five selection regimes: relaxed purifying selection,

152 intensified purifying selection, relaxed and positive selection, intensified and positive selection, and positive selection

153 only. Then we ran both the RELAX tests on these datasets and recorded the number of genes called as relaxed or

154 intensified. We found that the two-scaler parameterization of RELAX has 7.6% higher power to detect relaxed

155 selection when positive selection is also present in the same gene (Table S2). Simulations also show that the original

156 RELAX parameterization has a higher power (46.7% more) for detecting intensified selection (Table S2); we

157 therefore used this method for intensified selection, after excluding any genes already called relaxed by the two-scaler

158 parametrization method. Overall, our findings show that the island annual species of the genus Pachypanchax have a

159 larger proportion of genes under intensified rather than relaxed selection compared to annual killifishes, to a level that

160 is comparable to non-annual killifish genera living in continental Africa. Nevertheless, the total number of genes (444)

161 under relaxed selection in Pachypanchax is still much higher than the simulated false positive rate (Table S3;

162 expected = 253-269 in 10556, assuming 1-10% intensified genes, 90-99% genes with no shifts in selection and 2.5%

163 positively selected genes) of the test, suggesting that relaxed selection still fixed deleterious mutations in this non-

164 annual genus.

165 To investigate the possible cause of relaxation of selection in Pachypanchax, we focused on the more isolated

166 Seychelles species golden panchax (Pachypanchax playfairii). We collected individual golden panchax from several

167 natural localities (Table S1), sequenced their genomes, and we conducted population genetics analysis, using

168 Pachypanchax omalonotus as outgroup. We pooled individual fish sequences from the Praslin, Curieuse and La Digue

169 to analyze the asymptotic McDonald-Kreitman alpha (Messer and Petrov 2013). A positive alpha value is an estimate

170 of the proportion of substitutions fixed by positive selection since divergence from P. omalonotus. Negative alpha is

171 due to the segregation of slightly deleterious mutations in P. playfairii since divergence from the outgroup (Messer

172 and Petrov 2012). The estimated ratio of mutations fixed by positive selection is not significantly different from zero,

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

173 suggesting not many sites were positively selected in this Seychelles taxon. At lower derived allele frequencies, the

174 alpha statistics remains negative and does not reach the asymptotic level until 80% of derived allele frequency,

175 indicating that slightly deleterious variants are segregating at high frequency in the population (Figure 2C). Therefore,

176 although the macroevolutionary trend suggests that the Pachypanchax clade has more intensified selection compared

177 to other killifishes, the MK alpha measurements indicate the presence of some slightly deleterious variants.

178 We then asked what could be the leading cause for the extensive number of genes under relaxed selection and of the

179 deleterious gene variants segregating at high frequency in the golden panchax. A likely scenario is that decreased

180 effective population size may have occurred in this island species. To test this hypothesis, we then analyzed individual

181 genomes from wild-caught golden panchax using PSMC’, and found that most populations underwent a gradual

182 decline in effective population size since 200,000 generations ago, ending in a population size of about 5000-20,000 at

183 generation 5000 (Figure 2A). Different individuals from the same population show repeatable PSMC’ traces.

184 Population A from the Praslin island had a population expansion from ~20,000 generations ago until ~7,000

185 generation ago, after which it rapidly declined to less than 10,000. Overall, island annual killifish display a larger

186 proportion of the genome under intensified rather than relaxed selection, compared to annual killifishes from

187 continental Africa. However, their extended isolation and progressive reduction in population size may have caused

188 the genome-wide accumulation of nearly neutral polymorphisms at otherwise conserved sites in more recent

189 generations.

190

191 Relaxation of selection in non-coding regions

192 Besides protein-coding genes, molecular evolution of regulatory elements of genomes underlies important phenotypic

193 variations in response to selection in different organisms, including the loss of flight in Ratite birds (Sackton, et al.

194 2019) and degeneration of armor plates in Stickleback fish (O'Brown, et al. 2015). We therefore explored whether

195 non-coding regions in the Pachypanchax genome experienced relaxation of selection. First, we identified 50bp

196 conserved non-coding elements (totaled 86,087,429bp, excluding CDS 56,842,251 bp, excluding repeats

197 54,779,098bp) in the assembled genomes of the annual species Austrofundulus limnaeus, Callopanchax toddi,

198 Nothobranchius furzeri and Nothobranchius orthonotus using PhastCon. These elements are expected to have highly

199 conserved regulatory functions within a ~60myr divergence time. Together, 10223 (511.15kb, or 0.093%) of these

200 elements were significantly accelerated (i.e. had significant sequence divergence compared to the annual species) in

201 the Pachypanchax genome when compared to 4-fold degenerate sites. Because 4-fold degenerate sites are known to

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

202 evolve slower than completely neutral sites due to codon bias (Hershberg and Petrov 2008), we interpret these

203 accelerated elements to have likely undergone relaxed purifying selection in the genus Pachypanchax. We then asked

204 whether these accelerated conserved elements were enriched for regulatory sequence motifs (using DREME), and

205 found enrichment for the promoter TATA box (TAATTA, p= 5.6e-28) and the enhancer box (CAGCTG, p = 6.5e-13),

206 corroborating their potential role as regulators of gene expression. Then, we asked whether these non-coding

207 accelerated conserved elements were associated with downstream genes that evolved under intensified or relaxed

208 selection. We found that protein-coding genes within 80kb distance from an accelerated conserved non-coding

209 element were more likely to be also under relaxed selection, according to the 2-scaler RELAX test. Furthermore, the

210 median direction of selection (DoS, where negative values are indicative of genes evolving under relaxed selection)

211 was more negative for protein-coding genes near an accelerated conserved non-coding element (Figure S4). Overall,

212 we found that relaxation of selection in Pachypanchax extends beyond the coding regions, supporting a scenario

213 where both gene regulation and adjacent coding sequences are subject to similar fluctuations of evolutionary

214 constraints.

215

216 Intensified selection on RNA polymerase complex genes

217 We performed a GO term enrichment analysis on intensified genes (i.e. under positive or purifying selection) detected

218 with the original RELAX method (more powerful in identifying genes under intensified selection, based on our

219 simulations), after excluding genes detected to be relaxed by the new parameterization (more powerful in identifying

220 genes under relaxed selection, based on our simulations). Eight biological complexes were found to be enriched at an

221 FDR cutoff of 0.05. The most significantly enriched component was the RNA polymerase complex (p = 2.12e-6, FDR

222 = 0.000338), containing 19 genes detected to have undergone intensified selection in the Pachypanchax branch,

223 including 6 RNA polymerase subunits and 4 general transcription factor subunits. Other complexes under intensified

224 selection (FDR < 0.05) include the MICOS complex, smooth ER, nuclear envelope lumen and transferase complex

225 (Table S6).

226

227 Relaxed selection affects genes involved in early Pachypanchax development

228 Despite having some portion of the genome under relaxed purifying selection, fish of the genus Pachypanchax live

229 several years, which is a significantly longer time than the corresponding annual species from continental Africa,

230 whose lifespan can be as short as 4 to 6 months (Valenzano, et al. 2015; Willemsen, et al. 2019). Since relaxation of

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

231 purifying selection largely affects genes involved in late-life maintenance in annual African killifishes (Cui, et al.

232 2019), we asked what genes and gene functions were affected by relaxed selection in a clade with long-term island

233 isolation yet maintaining a normal lifespan. To answer this question, we performed gene ontology analysis on the

234 genes under relaxed selection (Figure S5; Table S7). Terms related to early development, such as anatomical

235 structure morphogenesis (p=7.00e-5, FDR = 0.0012), cell communication (p=1.47e-6, FDR=5.19e-5) and cell surface

236 receptor signaling pathway (p=1.68e-7, FDR=6.51e-5) are strongly enriched in genes under relaxed selection. Further

237 pathway analysis using ConsensusPathDB (Herwig, et al. 2016) of the same gene set reveals that 7 pathways are

238 enriched (Table S8; FDR < 0.05). Among these is the fibroblast receptor 2 ligand-binding pathway (p=0.000495,

239 FDR=0.046), which has been found to be important for early embryogenesis. Aligning the protein sequences of

240 human and killifishes, we found that both Pachypanchax omalonotus and Pachypanchax playfairii carried a serine

241 instead of a glycine at a critical position on fgf10, while this position is conserved from human to other fishes (Figure

242 4). This amino acid is homologous to G160 in human fgf10, which in turn interacts with R251 on the human receptor

243 fgfr2 (Yeh, et al. 2003). The homologous amino acid R251 of fgfr2 is conserved in all killifishes, including

244 Pachypanchax.

245 Since we found that relaxation of selection applies to both coding and non-coding regions, we asked what genes and

246 gene functions were enriched in proximity of accelerated non-coding elements in the genus Pachypanchax. This

247 analysis confirmed that accelerated non-coding elements map in proximity of genes involved in developmental

248 processes (Table S9). Therefore, not only did we find a strong correlation between relaxation of selection in coding

249 and adjacent non-coding regions (Figure S4), but both of the GO enrichments also support that non-annual fish of the

250 genus Pachypanchax underwent relaxation of selection at specific developmental genes.

251

252 Minimal evidence of positive selection in Pachypanchax

253 To detect positive selection in the genus Pachypanchax, we ran BUSTED-MH (Table S9), confirming that when

254 correcting for multi-nucleotides substitution and mutation rate variation, there is little evidence of positive selection in

255 this clade, as less than 5% of tested genes were called as under positive selection, not exceeding the p value cutoff of

256 0.05 (Figure 3). Gene ontology analysis of candidate positively selected genes reveals no enriched terms below an

257 FDR cutoff of 0.05 (Table S10). Therefore, the higher proportion of genes under intensified rather than relaxed

258 selection in Pachypanchax could be due to genes under purifying selection, rather than under positive selection.

259

260 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

261 Discussion

262 Transoceanic dispersal is compatible with killifish species distribution

263 There has been a great deal of debates on the extent to which tectonic events can explain the biogeographic pattern of

264 Gondwanan fauna. Killifish distribution have been previously suggested as an example of speciation following the

265 sequence of events that led to the separation of the Gondwanan plates, largely supported by the congruence between

266 mitochondrial phylogeny and the order of break-up of the geological tectonic plates. However, a small-scale

267 phylogenetic tree built on nuclear genomic information (Pohl, et al. 2015) has presented a different branching pattern,

268 placing continental African killifishes as sister species to the Indo-Malagasy clade. Here we present a whole-genome

269 phylogenetic tree, based on nuclear genome, which rejects the proposal of sister relationship between South American

270 and continental African taxa. Instead, our results support that Indo-Malagasy and continental African taxa form a

271 sister relationship. Furthermore, contrary to the species distribution predicted by the order of events congruous with

272 tectonic drift, Seychelles Pachypanchax are closely related to Malagasy taxa, rather than to the Indian genus

273 Aplocheilus. Molecular dating further shows that the 95% posterior distributions of divergence times are more recent

274 than the proposed separation time of the corresponding continental plates. Therefore, a more likely explanation for the

275 current distribution of killifishes must involve crossing oceanic barriers (Figure S6). However, the mechanisms by

276 which transoceanic migration might have occurred is unknown. Open-ocean “rafts” and dispersal by birds have been

277 suggested as plausible means (Silva, et al. 2019). In our work, we show that the freshwater golden panchax

278 (Pachypanchax playfairii), a non-annual killifish from the Seychelles islands, can complete embryonic development

279 and successfully hatch in seawater, providing a surprising support for the opportunity of transoceanic dispersal by

280 means of direct embryo or juvenile transport via oceanic currents.

281

282 Long term decline of population size accompanied by limited relaxed selection and positive selection

283 The amount of fixed and segregating deleterious mutations in Nothobranchius has been shown to negatively correlate

284 with effective population size (Cui, et al. 2019; Willemsen, et al. 2019). Hence, we asked whether a similar trend

285 occurred in non-annual species, such as the golden panchax (Pachypanchax playfairii). While population size steadily

286 declined over the past 200,000 generations in most of the golden panchax populations we analyzed, the ratio of genes

287 under relaxed selection detected with a phylogenetic method is lower than in genera within Nothobranchiidae,

288 especially when compared to the annual genera. Testing deleterious mutations at a more recent time-frame using the

289 McDonald-Kreitman alpha metrics, we found evidence of deleterious mutations being segregating at various

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

290 frequencies, although probably not yet fixed, hence likely preventing their detection by the phylogenetic method

291 (which scores only the more frequent or fixed alleles). Compared to populations of annual killifishes of the genus

292 Nothobranchius, individuals from Pachypanchax populations have lower levels of polymorphism, and yet they fixed

293 fewer deleterious mutations. This is not surprising, however, because accumulation of deleterious mutations is not

294 only a function of population size, but also relates to the selection regime, which eventually affects the shape of the

295 distribution of fitness effects (DFE) of new mutations. Golden panchax live in permanent water streams and are robust

296 and long-lived fish, unlike Nothobranchius, which reside in ephemeral waters and are short-lived. Despite the small

297 population size, natural selection acting over a long evolutionary time may have further prevented life-shortening

298 mutations from accumulating in this clade.

299 In agreement with recent findings that positive selection is in general difficult to detect especially in the presence of

300 repeated relaxed purifying selection (Chen, et al. 2020), we found weak evidence for positive selection in the golden

301 panchax using both population genetics and phylogenetic methods.

302 The limited genetic diversity of the extant golden panchax populations, together with their limited distribution within

303 the Seychelles archipelago and to their progressively declining effective population size, represents a serious concern

304 for the future survival of this species, and calls for effective conservation measures aimed at long-term preservation of

305 this species.

306

307 Relaxed selection in Pachypanchax affects embryonic developmental gene pathways

308 Despite relatively fewer genes having undergone relaxation of purifying selection in Pachypanchax compared

309 to annual killifishes, they still amount to hundreds. In annual killifish genera from continental Africa, genes under

310 relaxed selection are related to longevity and are differentially expressed between young and old fish (Cui, et al.

311 2019). In contrast, genes under relaxed selection, as well as accelerated non-coding elements in Pachypanchax are

312 significantly enriched for early embryo development genes. An intriguing scenario compatible with developmental

313 genes evolving under relaxed selection in Pachypanchax is that relaxation of selection could occur in both regulatory

314 regions and coding regions of genes related to embryo diapause. If diapause were an ancestral trait to both annual and

315 non-annual killifishes, once the ancestor of Pachypanchax colonized permanent water streams, they might have lost

316 the selective pressure to remove any deleterious mutation affecting diapause-related developmental genes.

317 Interestingly, although we did not detect extensive positive selection or highly enriched gene ontology terms in

318 candidate positively selected genes, the GO terms with a FDR cutoff from 0.05 to 0.2 for genes under positive

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

319 selection are also related to developmental processes (cell cycle and semaphorin receptors). This could hint that

320 remodeling of early developmental processes may have occurred in the ancestors of Pachypanchax, involving both

321 relaxation and positive selection for the same gene pathways. We provide a list of positively selected sites as candidate

322 targets for future studies in Table S11. Hence, by detecting relaxed selection in a non-annual killifish clade that lost

323 embryonic diapause, we may help reveal the genetic architecture of this unique trait that characterizes annual

324 killifishes.

325

326

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

327 Conclusions

328 Salt water barriers represent a considerable limit to the diffusion of freshwater taxa. Our analysis of the

329 branching pattern as well as our molecular dating results suggest that at least in Cyprinodontiforms, transoceanic

330 dispersal offers a likely mechanism to explain the current biogeographic pattern, and is supported by specific

331 physiologic adaptation (embryo development in seawater). We show that long-term isolation may lead to a steady

332 decline in the effective population size, which can cause recent relaxed selection leading to deleterious variants

333 segregating in the population. In permanent water, relaxed selection does not preferentially target aging related

334 pathways. Instead, relaxed selection targets both protein-coding and non-coding regions related to embryonic

335 development.

336

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

337 Methods

338 Sample collection

339 Specimens from most populations were collected during a dedicated field trip in April 2016 (and by Elvina Henriette

340 on the following month), using dip nets. Fish were identified in the field, small fin clips were taken from their caudal

341 fin and stored in 98% ethanol. Fish were then released back to their habitat. Voucher specimens (a random subsample

342 of both sexes) were taken from most populations. All field sampling and export procedures followed regulations of

343 Seychelles, with permits and research associateship issued by Seychelles Bureau of Standards Ref.A0157 dated 22

344 March 2016. Samples from Mahe North were from aquarium strains bred in captivity since 2013, for approximately 3-

345 4 generations (Table S1).

346

347 Library preparation and sequencing

348 Library preparation follows previous methods (Rowan, et al. 2015; Cui, et al. 2019). Briefly, genomic DNA was

349 extracted by incubating a small finclip (1mm x 2mm) in 50 µL lysis buffer (Tris-HCl pH=8.0 10mM, EDTA 10mM,

350 NaCl 10mM, SDS 0.5%) containing 0.5 µL Proteinase K (20mg/ml, NEB) at 50 for 2hr. Forty µL SPRI beads

351 (1mL SeraMag GE Healthcare, 65152105050250 beads in 100mL of PEG8000 20%, NaCl 2.5M, Tris-HCl pH=8.0

352 10mM, EDTA 1mM, Tween20 0.05%) were added directly to the mix to purify DNA with two washes by 80%

353 ethanol. For library preparation, 250ng of genomic DNA was digested with 0.665µL of Shearase (Zymo) in a 10 µL

354 volume at 42 for 20min, inactivated at 65 and purified with 0.8 volume of SPRI beads. End-repairing and A-

355 tailing were performed by incubating in 25µL of repair mix (NEB End-repair buffer 2.5 µL, Klenow fragment with 3’-

356 5’ exonuclease activity 0.5 µL, Taq polymerase 0.5 µL, T4 polynucleotide kinase 0.2 µL) at 25 for 30min and 75

357 for 30min, then purified with 1 volume of SPRI beads. Fragments were ligated to sequence adapters using quick ligase

358 (NEB M2200S) in a total volume of 25 µL at 20°C for 20 min. Ligation mix was diluted to 50 µL with nuclease-free

359 water, and purified by adding 29.15 µL SPRI beads and resuspended in 30 µL. PCR reaction was performed by adding

360 6µL of ligated fragments into 10 µL with Kapa Hifi HotStart ReadyMix and 2 µL of each barcoded primer (10mM),

361 and cycling with the PCR program 98 40s, 9 cycles of 98 15s, 65 30s, 72 30s and finally 1 min at 72.

362 After purifying with 0.8 volume of SPRI beads, samples were quantified with Qubit broad range DNA kit and pooled

363 at equal ratios for sequencing on HiSeqX. SPRI beads were left in with the samples throughout the protocol, except

364 before the quantification step during pooling.

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

365 For Pachypanchax omalonotus and Nothobranchius virgatus, a more simplified protocol was used. NEB Fragmentase

366 (1 µL) was used to shear gDNA in 10 µL for 20min, stopped by 5µL of 0.5 M EDTA then purified with 1.2X

367 SeraSure beads and eluted in 5 µL. End-repair and A-tailing were performed as before but in 10 µL, after which 15 µL

368 of ligation mix was directly added to the uncleaned repair mix to a total volume of 25µL. Protocol then proceeded as

369 before.

370

371 Read processing and genotype calling

372 Paired-end reads were adapter- and quality-trimmed with Trimmomatic v 0.32 (Bolger, et al. 2014) using the

373 following parameters: ILLUMINACLIP:adaptseqs.txt:3:7:7:1:true LEADING:10 TRAILING:10

374 SLIDINGWINDOW:4:10 MINLEN:20, and mapped using BWA-MEM (Li 2013) with default parameters to a

375 version of the P. playfairii reference genome (La Digue) assigned by ALLMaps (Tang, et al. 2015) to

376 pseudochromosomes by orthologous protein synteny information (Cui, et al. 2019) from Xiphophorus maculatus

377 (weight = 100), Nothobranchius furzeri (weight = 80) and Oryzias latipes (weight=50).

378 PCR duplicates were marked with the MarkDuplicates tool from Picard tools v.1.119 (McKenna, et al. 2010) with

379 default parameters. Cleaned reads were realigned around INDELs with the RealignerTargetCreator and IndelRealigner

380 commands using default parameters in the GATK package (McKenna, et al. 2010). Variants were called by bcftools

381 v1.6 with mpileup and call commands (Li, et al. 2009), with a minimal mapping quality of 30 for each individual.

382 For McDonald Kreitman (MK) and Direction of Selection (DoS) calculations in the pool of Pachypanchax

383 populations of Praslin, La Digue and Curieuse, variant calls were further converted to .pro format by Sam2Pro 0.6

384 with a minimal read coverage of 6. Allele frequencies were estimated by Package-GFE (Maruki and Lynch 2015) and

385 polarized by using Pachypanchax omalonotus.

386

387 Genome size estimation of P. omalonotus

388 As previously described, we mapped the trimmed reads of P. omalonotus to a draft genome of P. playfairii with

389 BWA-MEM, counting read coverage in the single-copy BUSCOs. Genome size is calculated as total read bp divided

390 by read coverage. This simple method was shown to agree well with flow-cytometry and not very sensitive to the

391 choice of reference genomes or read coverage (Cui, et al. 2019).

392

393 Phylogeny and molecular dating

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

394 To our previous dataset of orthologous protein-coding genes, we added newly available South American killifish

395 genomes (Kelley, et al. 2016; Reid, et al. 2016; Wagner, et al. 2018) in addition to other representative teleost

396 genomes from Ensembl and the newly obtained Pachypanchx omalonotus. Orthologs sequences were identified by the

397 UPhO package (Ballesteros and Hormiga 2016), aligned, filtered and trimmed as previous described (Cui, et al. 2019).

398 The resulting alignment contains 2,092,416 sites (697,472 codons) of 68 samples, including one individual from each

399 P. playfairii population. The proportion of missing data ranges from 0-31.89% (mean 2.61%) per taxon. RAxML 8.2.9

400 (Stamatakis 2006) was run to estimate a phylogenetic tree using the GTR substitution model with gamma rate

401 variation. Topological confidence was assessed by 100 rapid bootstraps. Before molecular dating, only a single sample

402 was retained for each species in the dataset and the tree.

403 Divergence time was estimated by running MCMCTree (dos Reis and Yang 2011) using the GTR substitution model

404 with a LogNormal distribution of clock rates, where clock rates were modeled as random variables for each branch

405 independently. Four external calibration points were placed on the following nodes based on the posterior distributions

406 in a previous study (Near, et al. 2012): MRCA(Spotted Gar, Pachypanchax) = 250 – 331.1 MYA, MRCA(Zebrafish,

407 Pachypanchax) = 150.9–235MYA, MRCA(Takifugu, Tetraodon)=32–50MYA, MRCA(Medaka, Pachypanchax) =

408 49.1–130.8MYA. Two independent chains were run for 5 million generations with a sampling frequency of every 10

409 generations. The first 20,000 generations were discarded as burn-ins. Tracer was used to inspect the posterior traces of

410 parameter estimates to ensure convergence between runs. The resulting tree was visualized with FigTree v1.4.4.

411

412 Population genetic statistics

413 The gVCF files of individuals were merged with the merge command in bcftools v1.6, keeping missing genotypes as

414 missing. The merged gVCF file was converted to .geno format with the parseVCF.py script from the genomic_general

415 package, requiring a minimal variant quality of 60. We used the popgenWindows.py script

416 (https://github.com/simonhmartin/genomics_general) to compute FST, dxy and θπ for all P. playfairii populations in

417 20kb non-overlapping windows, with the requirement of at least 5000bp of non-missing data. The mean statistics

418 across all windows are reported.

419

420 TreeMix analysis

421 Distributions of read coverage at each population level were estimated from merged bcf files. BCF files were then

422 converted into the TreeMix (Pickrell and Pritchard 2012) allele count format with a custom script, excluding sites with

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

423 read coverage below the 0.025 or above the 0.975 coverage quantiles. The OptM R package (Fitak 2019) was used to

424 determine the optimal number of migration edges. TreeMix was run with 0-6 migration edges in 20 iterations, starting

425 from –k=200 to 4000, with an increment of 200 per iteration. The output files from TreeMix were used as input for

426 OptM, where the Evano method was used to estimate the proportion of variance explained by different number of

427 migration edges. The ad hoc statistics Δm was used to select for the optimal number of migration edges. We ran this

428 analysis either with or leaving out the outgroup species P. omalonotus.

429

430 PSMC analysis

431 We followed the MSMC2 (Schiffels and Durbin 2014) manual. Briefly, sequencing depth was estimated by averaging

432 the per-base coverage computed by samtools depth command with no mapping quality cutoffs. The bamCaller.py tool

433 from MSMC-Tool package was used to call variants and generate a mask of regions with abnormal coverage. The files

434 were then converted to MSMC2 format by the generate_multihetsep.py tool. The PSMC’ method was run for each

435 individual’s pseudo-chromosome separately using unphased genotypes in MSMC2, with an initial rho/mu set to

436 2.61976354 based on mu estimated from the dated phylogeny and the assumption of 2

437 recombinations/meiosis/chromosome. The mutation rate and recombination rate were then optimized by the program

438 during the run.

439

440 Asymptotic McDonald-Kreitman analysis and direction of selection

441 Calculations of the MK and DoS statistics are as previously described (Cui, et al. 2019).

442

443 RELAX tests

444 Only single individuals were kept for RELAX (Wertheim, et al. 2014) and BUSTED-MH (Wisotsky, et al. 2020) tests.

445 Pseudogenomes were made by aligning short reads to the closest related reference genomes as previously described.

446 Protein-coding gene alignments were extracted, aligned and cleaned as described before. Due to the observation

447 (simulation details at DOI: http://dx.doi.org/10.17632/f9phgbv4rs.1#file-0ec9fe0f-8085-44c5-9179-b9ee01c94360)

448 that the original parameterization in the RELAX test would sometimes classify genes with both positive selection and

449 relaxed selection as intensified (Cui, et al. 2019), resulting in missing some genes with a signature of relaxed

450 selection, we modified the parameterization scheme to include two different k (intensification/relaxation) scaling

451 parameters, where k1 is used to transform site categories with ⍵ ≤ 1 (as ⍵k), and k2 (also called q) applied to the site

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

452 category with ⍵ > 1. Given two non-overlapping sets of tree branches (reference and test), the RELAX models

453 describe selective regimes on the reference branches using a random effect with three ⍵ categories: ⍵11 (weight

454 p1), ⍵21 (weight p2), and ⍵31 (weight 1-p1-p2). Selective regimes on test branches follow the distribution

k1 k1 k2 455 ⍵1 ≤1 (weight p1), ⍵2 ≤1 (weight p2), and ⍵3 ≥1 (weight 1-p1-p2). Using this parametrization, we defined three

456 models and fitted them to the data using maximum likelihood. Model 1 estimates k1 and k2 from the data, model 2

457 estimates k1 but fixes k2 =1 and model 3 fixes both parameters at 1. Relaxed purifying selection is tested by

458 performing a likelihood ratio test between models 3 and 2 with 1 degree of freedom (k1 > 1 implies intensification,

459 and k1 < 1 – relaxation). In theory, the comparison between Model 1 and Model 2 forms a test of positive selection,

460 but simulations (Table S2) show that the power is very low and thus we do not rely on this test for positive selection

461 (see below for BUSTED-MH instead). Compared to the original RELAX parameterization, the two-scaler

462 parameterization has a reduced power to detect intensified purifying selection (17% of the original power, Table S2),

463 thus we do not recommend it for that purpose. For our real dataset, we applied the two-scaler parameterization to

464 detect relaxed genes, and the original one-scaler parameterization for detecting intensified genes. We provide the two-

465 scaler parameterization analysis script (relax_q) for HYPHY 2.5.x (Pond, et al. 2019) in the supplementary

466 information.

467 To compare the relative proportions of relaxed and intensified genes, we repeated these tests in a previously published

468 dataset of continental African killifishes, with the addition the Nothobranchius virgatus genome.

469

470 Positive selection test with BUSTED-MH

471 Detecting positive selection in protein-coding sequences has recently been shown to be vulnerable to two violations in

472 model assumptions, namely instantaneous multinucleotide substitutions (Venkat, et al. 2018) and site-to-site variation

473 in synonymous substitution rates (Wisotsky, et al. 2020). We thus use a developmental version of the BUSTED test

474 that accounts for both processes, named BUSTED-MH, in order to test for positive selection in the genus

475 Pachypanchax. This script implementing the analysis can be run with HYPHY v2.5.10 or later and is available from

476 https://github.com/veg/hyphy-analyses/.

477

478 Conserved element detection and PhastCon analysis

479 We performed a whole-genome alignment using Progressive Cactus (Paten, et al. 2011) of 4 species in three annual

480 genera (Austrofundulus, Callopanchax and Nothobranchius), and 1 non annual species (P. playfairii). PhastCons in

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

481 the R rphast package (Hubisz, et al. 2011) was used to identify conserved genomic regions in the annual species,

482 leaving out the non-annual. Due to the long divergence of these annual genera, the conserved genomic regions likely

483 contain functionally important regulatory or protein-coding elements. After excluding protein-coding elements, we

484 then identified accelerated non-coding regions in P. playfairii that are otherwise conserved in annuals by comparing

485 the rate with 4-fold degenerate sites with PhyloFit in the rphast package. Because 4-fold degenerate sites evolve

486 slower than neutrality in many other organisms (Künstner, et al. 2011; Lawrie, et al. 2013), accelerated conserved

487 elements (ACEs) detected in this manner likely reflect relaxed selection.

488

489 DREME analysis

490 We extracted the 50bp Pachypanchax-ACE sequences from the homologous locations in N. furzeri, and used DREME

491 (Bailey 2011) to analyze the enrichment of hexamers. A literature search was performed to check whether any of the

492 enriched hexamers corresponds to known functional regulatory motifs.

493

494 Gene ontology and pathway analysis

495 Gene ontology and pathway enrichment analyses were performed on the ConsensusPathDB website (Herwig, et al.

496 2016). Genes called relaxed (p < 0.05) were mapped to Ensembl IDs of the human ortholog, and compared to a

497 background list of all genes that were entered into the RELAX analysis. For the ACEs, the immediately down-stream

498 protein-coding gene was used as the foreground, and a random set of genes sampled from the genome which are more

499 than 170kb away from any ACEs was used as the background.

500

501 Incubation of killifish embryos in sea water

502 We directly compared viability of killifish embryos incubated in autoclaved fish room water and artificial sea water

503 (33 g RedSea Marine Salt in 1L RO water). Four species/strains were examined: P. playfairii, N. furzeri GRZ-AD,

504 GRZ-Bell and MZCS-002. For each species or strain, fertilized eggs produced within 1 week were collected from 3-4

505 breeding tanks each consisted of 1 male and 3-4 females. Eggs in batch 1 was bleached with 0.5% hydrogen peroxide

506 to prevent fungal infections. We used n=100 eggs per N. furzeri strain and n=117 eggs for P. plaifairii, with

507 approximately 25-30 eggs per petri dish. Batch 2 was washed in autoclaved fresh water but not bleached. Batch 2

508 contains 320 GRZ-Bell eggs, 140 MZCS-002 eggs and 94 P. playfairii eggs, incubated at a density of 23-40 eggs/petri

509 dish. In both batches, embryos were checked daily. Dead embryos were recorded, removed and medium was

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

510 exchanged. Hatched P. playfairii were transferred to the hatching incubator and slowly acclimatized to freshwater

511 conditions if they were hatched in sea water (by replacing 50% of the artificial sea water with freshwater daily). N.

512 furzeri embryos were transferred to damp Whatman paper after reaching black eye stage for incubation. All embryos

513 were incubated at 28C.

514

515

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

516 Acknowledgments

517 We thank Francesca and Jacopo Valdesalici for their help on field sampling, the residents of Seychelles for their

518 continuous support and kindness. Hilton Seychelles Labriz Resort & Spa provided logistics and transport to Silhouette

519 island. We would like to thank Francois Baguette and Ella (Silhouette Island Conservation Officer) for help in

520 sampling fish on Silhouette island. We thank the members of the Valenzano lab at the Max Planck Institute for

521 Biology of Ageing for continuous discussions and feedback. All computations were performed on the Amalia HPC

522 cluster at MPI-AGE. This work was funded by the Max Planck Institute for Biology of Ageing and by the Max Planck

523 Society.

524

525 Author contributions

526 D.R.V., S.V., R.C. conceived and designed the study. S.V. and E.H. collected fish samples. Z.M and R.C. made the

527 sequencing libraries. A.T. performed sea-water embryo incubation. S.P. and R.C. implemented the 2-scaler RELAX

528 test and BUSTED-MH. R.C. analyzed the data. R.C. and D.R.V. wrote the manuscript. All authors read and

529 commented the manuscript.

530

531

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

532 Figure legends

533

534 Figure 1. A. Dated phylogeny of Cyprinodontiforms mapped to tectonic events of the Gondwana continents. Tree was

535 built with RAxML with 100 rapid bootstraps, and MCMCtree with GTR+LogNormal clock was used to infer

536 divergence times, calibrated from a previous teleost phylogenetic study (external taxa trimmed off in figure). Node

537 bars showing the 95% C.I. from the posterior distribution of divergence times. The order of continental breakup and

538 their timings do not fit the inferred divergence times of major killifish clades. Putative tectonic event-driven nodes are

539 numbered 1 through 4, with the mean divergence times indicated. B. Distribution of P. omalonotus and P. playfairii

540 on Madagascar and the Seychelles. Population history of sampled populations of P. playfairii was inferred by

541 TreeMix, allowing for two migration edges colored orange (weight given in percentages).

542

543 Figure 2. A. PSMC plot for wild-caught P. playfairii individuals from 3 localities in Praslin, one from Curieuse and

544 one from Mahé. B. Ratios plotted on a logarithmic scale of gene counts that are detected as relaxed and intensified

545 selection in five killifish genera. Relaxed selection detected by the 2-k parameterization, intensified selection with the

546 original 1-k parameterization. Confidence interval obtained by 1000 multinomial resampling. Red line indicates ratio

547 at 1. C. McDonald-Kreitman α plotted against derived allele frequencies, outgroup P. omalonotus.

548

549 Figure 3. Ideograms showing genes under relaxed purifying selection (dark blue) and intensified selection (bright

550 green) in the Pachypanchax clade plotted on 24 pseudochromosomes. Relaxed selection detected with HYPHY

551 RELAX with 2-k parameterization, intensified selection detected with the original 1-k parameterization. Genes under

552 positive selection (triangles) detected in the Pachypanchax lineage, accounting for multinucleotide substitutions and

553 mutational rate variation using HYPHY BUSTED-MH. Direction of Selection (DoS) statistics computed for wild-

554 caught P. playfairii individuals from Praslin, La Digue and Curieuse, using P. omalonotus as the outgroup.

555

556 Figure 4. Structure of the human FGF10-FGFR2 complex, focusing on the interaction of G160 and R251 (Yeh, et al.

557 2003). Partial protein sequence alignments of FGF10a and FGFR2 showing key interacting residues G160 and R251.

558 Two Pachypanchax species have a unique substitution from Glysine to Serine corresponding to residue 160 in human

559 FGF10. The key amino acid at position 251 corresponding to human FGFR2 is conserved.

560

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

561 Figure S1. Maximum likelihood tree inferred by RAxML on codon positions 1 and 2 of all mitochondrial protein-

562 coding genes. AU test does not reject the alternative topology of monophyly between South American and African

563 taxa, but the ML tree agrees with the nuclear genome.

564

565 Figure S2. TreeMix analyses with (A-C) or without (D-F) P. omalonotus as an outgroup. A, B The TreeMix graph

566 with the optimal number of migration edges identified by OptM. k=400 was used for plotting. B, E. Selection of the

567 most likely number of migration edges by the OptM R package. Distribution of log likelihood and variance explained

568 of Treemix models with 0-5 migration edges. Standard deviation generated by repeating analyses by varying k (snps

569 per window) from 200 to 4000, with a 200 increment. The number of migration edges that explains ~99.8% of the

570 variance is chosen as the optimal number of edges. C. F. Selection of the most likely number of migration edges by

571 plotting deltaM (OptM package).

572

573 Figure S3. Survival rate of P. playfairii (blue) and N. furzeri (orange) embryos in freshwater and sea water. Left

574 column shows incubation of eggs bleached with 0.5% hydrogen peroxide prior to experiment for fungal removal.

575 Right column shows incubation of freshly collected eggs without fungal removal. Top row shows total embryo

576 survival (including successfully hatched) rates of the two killifish species in freshwater (solid line) and sea-water

577 (dotted line). Bottom row shows hatched and surviving fry.

578

579 Figure S4. A. The probability of a gene immediately downstream of an accelerated conserved element in P. playfairii

580 to be relaxed in the lineage Pachypanchax is higher. Asterisks mark Fisher's exact tests with p < 0.05. Faint lines

581 generated by 500 bootstraps. B. An ideogram showing the method for detecting accelerated conserved elements in the

582 Pachypanchax genome. The protein coding gene immediately downstream of the focal element was used for

583 correlation. C. The median direction of selection (DoS) of the protein coding genes immediately downstream of

584 accelerated conserved elements in P. playfairii is more negative. Asterisks mark Wilcoxon rank sum tests with p <

585 0.05. Faint lines generated by 500 bootstraps.

586

587 Figure S5. Enriched gene ontology terms (q < 0.05) of the relaxed genes (p<0.05, k<1) detected in the Pachypanchax

588 lineage plotted on a semantic space using Evigo.

589

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

590 Figure S6. Putative transoceanic dispersal events plotted on reconstructed PALAEOMAP (https://

591 www.earthbyte.org/paleomap-paleoatlas-for-gplates/) at the inferred divergence times of killifish. The speciation

592 event numbering follows Figure 1.

593

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

594 References

595 Austin JJ, Nicholas Arnold E. 2001. Ancient mitochondrial DNA and morphology elucidate an extinct island 596 radiation of Indian Ocean giant tortoises (Cylindraspis). Proceedings of the Royal Society of London. Series 597 B: Biological Sciences 268:2515-2523. 598 Bailey TL. 2011. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27:1653- 599 1659. 600 Ballesteros JA, Hormiga G. 2016. A new orthology assessment method for phylogenomic data: unrooted 601 phylogenetic orthology. Molecular Biology and Evolution 33:2117-2134. 602 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. 603 Bioinformatics 30:2114-2120. 604 Briggs JC. 2003. Fishes and birds: Gondwana life rafts reconsidered. Systematic Biology 52:548-553. 605 Cellerino A, Valenzano DR, Reichard M. 2016. From the bush to the bench: the annual Nothobranchius 606 fishes as a new model system in biology. Biol Rev Camb Philos Soc 91:511-533. 607 Charlesworth D, Morgan MT, Charlesworth B. 1993. Mutation accumulation in finite outbreeding and 608 inbreeding populations. Genetical Research 61:39-56. 609 Chen Q, He Z, Feng X, Yang H, Shi S, Wu C-I. 2020. Two decades of suspect evidence for adaptive DNA- 610 sequence evolution – Less negative selection misconstrued as positive selection. 611 bioRxiv:2020.2004.2021.049973. 612 Costa W. 2013. Historical biogeography of aplocheiloid killifishes (Teleostei: ). 613 Vertebrate Zoology 63:139-154. 614 Cui R, Medeiros T, Willemsen D, Iasi LN, Collier GE, Graef M, Reichard M, Valenzano DR. 2019. Relaxed 615 selection limits lifespan by increasing mutation load. Cell 178:385-399. e320. 616 dos Reis M, Yang Z. 2011. Approximate likelihood calculation on a phylogeny for Bayesian estimation of 617 divergence times. Molecular Biology and Evolution 28:2161-2172. 618 Fitak RR. 2019. optM: an R package to optimize the number of migration edges using threshold models. 619 Journal of Heredity. 620 Frankham R. 1998. Inbreeding and Extinction: Island Populations. Conservation Biology 12:665-675. 621 Friedman M, Keck BP, Dornburg A, Eytan RI, Martin CH, Hulsey CD, Wainwright PC, Near TJ. 2013. 622 Molecular and fossil evidence place the origin of cichlid fishes long after Gondwanan rifting. Proceedings of 623 the Royal Society B: Biological Sciences 280:20131733. 624 Grealy A, Phillips M, Miller G, Gilbert MTP, Rouillard J-M, Lambert D, Bunce M, Haile J. 2017. Eggshell 625 palaeogenomics: Palaeognath evolutionary history revealed through ancient nuclear and mitochondrial 626 DNA from Madagascan elephant bird (Aepyornis sp.) eggshell. Molecular phylogenetics and evolution 627 109:151-163. 628 Hershberg R, Petrov DA. 2008. Selection on codon bias. Annu Rev Genet 42:287-299. 629 Herwig R, Hardt C, Lienhard M, Kamburov A. 2016. Analyzing and interpreting genome data at the network 630 level with ConsensusPathDB. Nature Protocols 11:1889-1907. 631 Hubisz MJ, Pollard KS, Siepel A. 2011. PHAST and RPHAST: phylogenetic analysis with space/time 632 models. Briefings in bioinformatics 12:41-51. 633 Jaffe AL, Slater GJ, Alfaro ME. 2011. The evolution of island gigantism and body size variation in tortoises 634 and turtles. Biology Letters 7:558-561. 635 Kelley JL, Yee M-C, Brown AP, Richardson RR, Tatarenkov A, Lee CC, Harkins TT, Bustamante CD, 636 Earley RL. 2016. The genome of the self-fertilizing mangrove rivulus fish, Kryptolebias marmoratus: a 637 model for studying phenotypic plasticity and adaptations to extreme environments. Genome biology and 638 evolution 8:2145-2154. 639 Künstner A, Nabholz B, Ellegren H. 2011. Significant selective constraint at 4-fold degenerate sites in the 640 avian genome and its consequence for detection of positive selection. Genome biology and evolution 641 3:1381-1389. 642 Lawrie DS, Messer PW, Hershberg R, Petrov DA. 2013. Strong purifying selection at synonymous sites in 643 D. melanogaster. PLoS genetics 9. 644 Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 645 preprint arXiv:1303.3997. 646 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The 647 sequence alignment/map format and SAMtools. Bioinformatics 25:2078-2079. 648 Livezey BC. 1993. An ecomorphological review of the dodo (Raphus cucullatus) and solitaire (Pezophaps 649 solitaria), flightless Columbiformes of the Mascarene Islands. Journal of Zoology 230:247-292.

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

650 Livezey BC. 1992. Morphological corollaries and ecological implications of flightlessness in the kakapo 651 (Psittaciformes: Strigops habroptilus). Journal of Morphology 213:105-145. 652 MacArthur RH, Wilson EO. 2001. The theory of island biogeography: Princeton university press. 653 Maruki T, Lynch M. 2015. Genotype-frequency estimation from high-throughput sequencing data. Genetics 654 201:473-486. 655 McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel 656 S, Daly M, et al. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next- 657 generation DNA sequencing data. Genome Research 20:1297-1303. 658 Messer PW, Petrov DA. 2013. Frequent adaptation and the McDonald-Kreitman test. Proc Natl Acad Sci U 659 S A 110:8615-8620. 660 Messer PW, Petrov DA. 2012. The McDonald-Kreitman test and its extensions under frequent adaptation: 661 problems and solutions. Proceedings of the National Academy of Sciences of the United States of America 662 110:8615-8620. 663 Murphy WJ, Collier GE. 1997. A molecular phylogeny for aplocheiloid fishes (Atherinomorpha, 664 Cyprinodontiformes): the role of vicariance and the origins of annualism. Mol Biol Evol 14:790-799. 665 Near TJ, Eytan RI, Dornburg A, Kuhn KL, Moore JA, Davis MP, Wainwright PC, Friedman M, Smith WL. 666 2012. Resolution of ray-finned fish phylogeny and timing of diversification. Proceedings of the National 667 Academy of Sciences of the United States of America 109:13698-13703. 668 O'Brown NM, Summers BR, Jones FC, Brady SD, Kingsley DM. 2015. A recurrent regulatory change 669 underlying altered expression and Wnt response of the stickleback armor plates gene EDA. elife 4:e05290. 670 Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. 2011. Cactus: Algorithms for genome 671 multiple sequence alignment. Genome Research 21:1512-1528. 672 Pickrell J, Pritchard J. 2012. Inference of population splits and mixtures from genome-wide allele frequency 673 data. Nature Precedings:1-1. 674 Plummer PS, Belle ER. 1995. Mesozoic tectono-stratigraphic evolution of the Seychelles microcontinent. 675 Sedimentary Geology 96:73-91. 676 Pohl M, Milvertz FC, Meyer A, Vences M. 2015. Multigene phylogeny of cyprinodontiform fishes suggests 677 continental radiations and a rogue taxon position of Pantanodon. Vertebrate Zoology 65:37-44. 678 Pond SLK, Poon AFY, Velazquez R, Weaver S, Hepler NL, Murrell B, Shank SD, Magalis BR, Bouvier D, 679 Nekrutenko A, et al. 2019. HyPhy 2.5—A Customizable platform for evolutionary hypothesis testing using 680 phylogenies. Molecular Biology and Evolution 37:295-299. 681 Reid NM, Proestou DA, Clark BW, Warren WC, Colbourne JK, Shaw JR, Karchner SI, Hahn ME, Nacci D, 682 Oleksiak MF, et al. 2016. The genomic landscape of rapid repeated evolutionary adaptation to toxic 683 pollution in wild fish. Science 354:1305-1308. 684 Rowan BA, Patel V, Weigel D, Schneeberger K. 2015. Rapid and inexpensive whole-genome genotyping- 685 by-sequencing for crossover localization and fine-scale genetic mapping. G3-Genes Genomes Genetics 686 5:385-398. 687 Sackton TB, Grayson P, Cloutier A, Hu Z, Liu JS, Wheeler NE, Gardner PP, Clarke JA, Baker AJ, Clamp M. 688 2019. Convergent regulatory evolution and loss of flight in paleognathous birds. Science 364:74-78. 689 Schiffels S, Durbin R. 2014. Inferring human population size and separation history from multiple genome 690 sequences. Nature Genetics 46:919-925. 691 Seton M, Müller R, Zahirovic S, Gaina C, Torsvik T, Shephard G, Talsma A, Gurnis M, Turner M, Maus S. 692 2012. Global continental and ocean basin reconstructions since 200 Ma. Earth-Science Reviews 113:212- 693 270. 694 Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Systematic Biology 695 51:492-508. 696 Silva GG, Weber V, Green AJ, Hoffmann P, Silva VS, Volcan MV, Lanés LEK, Stenert C, Reichard M, 697 Maltchik L. 2019. Killifish eggs can disperse via gut passage through waterfowl. Ecology 100:e02774. 698 Sparks JS, Smith WL. 2005. Freshwater Fishes, Dispersal Ability, and Nonevidence: “Gondwana Life 699 Rafts” to the Rescue. Systematic Biology 54:158-165. 700 Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of 701 taxa and mixed models. Bioinformatics 22:2688-2690. 702 Tang HB, Zhang XT, Miao CY, Zhang JS, Ming R, Schnable JC, Schnable PS, Lyons E, Lu JG. 2015. 703 ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biology 16. 704 Townsend TM, Tolley KA, Glaw F, Böhme W, Vences M. 2011. Eastward from Africa: palaeocurrent- 705 mediated chameleon dispersal to the Seychelles islands. Biology Letters 7:225-228.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

706 Valenzano DR, Benayoun BA, Singh PP, Zhang E, Etter PD, Hu CK, Clement-Ziza M, Willemsen D, Cui R, 707 Harel I, et al. 2015. The African turquoise killifish genome provides insights into evolution and genetic 708 architecture of lifespan. Cell 163:1539-1554. 709 Vences M. 2004. Origin of Madagascar's extant fauna: A perspective from amphibians, reptiles and other 710 nonflying vertebrates. Italian Journal of Zoology 71:217-228. 711 Venkat A, Hahn MW, Thornton JW. 2018. Multinucleotide mutations cause false inferences of lineage- 712 specific positive selection. Nature Ecology & Evolution 2:1280-1288. 713 Wagner JT, Singh PP, Romney AL, Riggs CL, Minx P, Woll SC, Roush J, Warren WC, Brunet A, Podrabsky 714 JE. 2018. The genome of Austrofundulus limnaeus offers insights into extreme vertebrate stress tolerance 715 and embryonic development. BMC Genomics 19:155. 716 Wertheim JO, Murrell B, Smith MD, Kosakovsky Pond SL, Scheffler K. 2014. RELAX: Detecting Relaxed 717 Selection in a Phylogenetic Framework. Molecular Biology and Evolution 32:820-832. 718 Wheelwright NT, Mauck RA. 1998. Philopatry, natal dispersal, and inbreeding avoidance in an island 719 population of savannah sparrows. Ecology 79:755-767. 720 Wiley EO. 1988. Vicariance biogeography. Annual Review of Ecology and Systematics 19:513-542. 721 Willemsen D, Cui R, Reichard M, Valenzano DR. 2019. Intra-species differences in population size shape 722 life history and genome evolution. bioRxiv. 723 Wisotsky SR, Kosakovsky Pond SL, Shank SD, Muse SV. 2020. Synonymous site-to-site substitution rate 724 variation dramatically inflates false positive rates of selection analyses: ignore at your own peril. Molecular 725 Biology and Evolution. 726 Yeh BK, Igarashi M, Eliseenkova AV, Plotnikov AN, Sher I, Ron D, Aaronson SA, Mohammadi M. 2003. 727 Structural basis by which alternative splicing confers specificity in fibroblast growth factor receptors. 728 Proceedings of the national academy of sciences 100:2266-2271. 729

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.232421; this version posted August 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.