bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 TITLE:
2 Inferring the role of habitat dynamics in driving diversification: evidence for a species pump
3 in Lake Tanganyika cichlids
4 AUTHORS: Thijs Janzen1,2,3*, Rampal S. Etienne2
5
6 AUTHOR AFFILIATION
7 1. Max Planck Institute for Evolutionary Biology, Department of Evolutionary Theory,
8 August-Thienemann-Straße 2, 24306, Plön, Germany.
9 2. University of Groningen, Groningen Institute for Evolutionary Life Sciences, Box
10 11103, 9700 CC Groningen, The Netherlands
11 3. Carl von Ossietzky University of Oldenburg, Institute for Biology, Department of
12 Molecular Ecology, Carl-von-Ossietzky Straße 9, 26111, Oldenburg, Germany
13
14 4. *Corresponding author: Thijs Janzen, Carl von Ossietzky University of Oldenburg,
15 Institute for Biology, Department of Molecular Ecology, Carl-von-Ossietzky Straße 9,
16 26111, Oldenburg, Germany
17 e-mail: [email protected]
18 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
19 ABSTRACT
20 Geographic isolation that drives speciation is often assumed to slowly increase over time, for
21 instance through the formation of rivers, the formation of mountains or the movement of
22 tectonic plates. Cyclic changes in connectivity between areas may occur with the
23 advancement and retraction of glaciers, with water level fluctuations in seas between islands
24 or in lakes that have an uneven bathymetry. These habitat dynamics may act as a driver of
25 allopatric speciation and propel local diversity. Here we present a parsimonious model of the
26 interaction between cyclical (but not necessarily periodic) changes in the environment and
27 speciation, and provide an ABC-SMC method to infer the rates of allopatric and sympatric
28 speciation from a phylogenetic tree. We apply our approach to the posterior sample of an
29 updated phylogeny of the Lamprologini, a tribe of cichlid fish from Lake Tanganyika where
30 such cyclic changes in water level have occurred. We find that water level changes play a
31 crucial role in driving diversity in Lake Tanganyika. We note that if we apply our analysis to
32 the Most Credible Consensus (MCC) tree, we do not find evidence for water level changes
33 influencing diversity in the Lamprologini, suggesting that the MCC tree is a misleading
34 representation of the true species tree. Furthermore, we note that the signature of habitat
35 dynamics is found in the posterior sample despite the fact that this sample was constructed
36 using a species tree prior that ignores habitat dynamics. However, in other cases this species
37 tree prior might erase this signature. Hence we argue that in order to improve inference of the
38 effect of habitat dynamics on biodiversity, phylogenetic reconstruction methods should
39 include tree priors that explicitly take into account such dynamics.
40
41 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
42 INTRODUCTION
43
44 Environmental changes such as the formation of mountain ridges, the formation of rivers and
45 the movement of tectonic plates have long been known to be important drivers of speciation
46 (Coyne and Orr 2004). Repeated environmental changes may thus lead to diversification
47 patterns. Cyclic changes in the environment can cause populations to continuously switch
48 between an allopatric and sympatric stage, providing a continuously renewed potential for
49 speciation. And these cyclic changes can in turn drive diversity towards levels unexpected
50 given the current geography, sometimes referred to as a “species pump” (Heaney 1985;
51 Rossiter 1995). Examples of species pumps include environmental fluctuations fragmenting
52 habitats on the slopes of mountains (Weir 2006; Sedano and Burns 2010; Hutter et al. 2013),
53 glaciations and postglacial secondary contacts (Barnosky 2005), sea level changes causing
54 the fusion and fragmentation of islands (Glor et al. 2004; Thorpe et al. 2008, but see
55 Papadopoulou and Knowles 2015), and fluctuations in water level causing fragmentation and
56 fusion of lakes with uneven bathymetry, as in the African Rift Lakes (Cohen et al. 1997b;
57 Alin et al. 1999; McGlue et al. 2008; Ivory et al. 2016).
58
59 The African Rift Lakes provide a good starting point in studying the interplay between cyclic
60 habitat dynamics and speciation, because they have been subject to frequent water level
61 changes (Cohen et al. 1997b; Alin and Cohen 2003; Ivory et al. 2016), and are well known
62 for their tremendous biodiversity (Seehausen 2000, 2006; Turner et al. 2001; Wagner et al.
63 2012, 2014; Brawand et al. 2014). An estimated number of 2000 cichlid fish species (Turner
64 et al. 2001) have evolved in the African Rift Lakes over the past 10 million years (Genner et
65 al. 2007; Meyer et al. 2016), and comprise one of the most spectacular adaptive radiations bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
66 (Seehausen 2006). The most prominent water level changes took place in Lake Tanganyika,
67 where the water level has dropped substantially on multiple occasions over the past million
68 years, sometimes splitting the lake into multiple smaller lakes (Lezzar et al. 1996; Cohen et
69 al. 1997a, 2007). Being the oldest lake of the three large rift lakes (Cohen et al. 1993), Lake
70 Tanganyika contains the highest behavioral diversity (Konings 2007) and is the only lake
71 with a highly resolved phylogeny for cichlid fish. Evidence for the influence of changing
72 water levels comes from analysis of mitochondrial DNA, which shows that for Tropheus
73 species, some populations have experienced secondary contact upon changes in water level,
74 potentially increasing genetic diversity and driving speciation (Sturmbauer et al. 2001;
75 Koblmüller et al. 2011; Sefc et al. 2017). Similar patterns were found for Variabilichromis
76 moorii and Ophthalmotilapia nasuta (Sturmbauer et al. 2001), Telmatochromis temporalis
77 (Winkelmann et al. 2016), and Altolamprologus (Koblmüller et al. 2016). Comparison of
78 mitochondrial DNA between populations from deep and shallow areas emphasizes that the
79 deep areas are habitats that are more persistent over time, with lower genetic variation
80 (Nevado et al. 2013). Furthermore, Eretmodus lineages identified using mitochondrial DNA
81 are strongly associated with the bathymetric basins of Lake Tanganyika (Verheyen et al.
82 1996), suggesting that they have independently diversified at low water level.
83
84 Aguilée et al. (2013) developed a model for the African Rift Lakes in which populations at
85 different locations diverge from each other depending on the local habitat, and at the same
86 time allowed for sympatric speciation by implementing assortative mating that allows for a
87 single branching point in trait values. Over time the different locations become separated or
88 are reconnected, and this may drive the formation of new species. The authors conclude that
89 stable numbers of diversity are best obtained by a fragmented habitat with recurrent merged
90 states and rapid fluctuations. However, Aguilée et al. (2013) do not compare their results to bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
91 empirical data. By contrast, Pigot et al (2010) used a spatially explicit model of landscape
92 fragmentation, where consecutive splitting of species’ geographic ranges drives speciation,
93 and compared phylogenies generated with their model, with known bird phylogenies. They
94 found that including this geographical context of speciation explains a large part of the
95 features exhibited by the reconstructed avian trees. Hence, including a geographical context
96 of speciation seems a promising research avenue.
97
98 Here, we provide a method to infer whether or how cyclic changes in the environment
99 influence both the generation and the maintenance of biodiversity. We use an extension of the
100 standard constant-rates birth-death model. Because deriving an expression for the likelihood
101 of this model for a given set of phylogenetic branching times is difficult, but simulation of
102 phylogenies under the model is easy, we used approximate Bayesian computation (ABC)
103 based on sequential Monte Carlo sampling (SMC) to estimate parameters from phylogenies.
104 We applied our approach to an updated phylogeny of the Lamprologini, a tribe of cichlid fish
105 from Lake Tanganyika in order to assess the importance of these habitat dynamics in shaping
106 the current biodiversity of cichlids in Lake Tanganyika.
107
108
109 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
110 METHODS
111 Model
112 To model the interaction between environmental change and speciation, we envisage a lake
113 that consists of a single pocket at high water level, but that splits into two pockets when the
114 water level drops. When the water level drops, we assume that all species distribute
115 themselves equally over the two pockets; similarly, when the water level rises, all species
116 previously contained in the two pockets are combined into the single pocket. Allopatric
117 speciation can only occur when the water level is low. We assume a constant probability rate
118 for allopatric speciation, and hence the waiting time until the next speciation event is
119 exponentially distributed. After this waiting time, one of the two incipient species in either
120 pocket can speciate into a new species. If this allopatric speciation does not occur before the
121 water level rises again, i.e. reflecting that there has not been enough genetic divergence, the
122 two incipient species in the two pockets merge back into one species. This is conceptually
123 similar to the idea of protracted speciation (Etienne and Rosindell 2012): the water level drop
124 initiates the speciation process whereas the allopatric speciation event is the completion of
125 speciation under the protracted speciation model. Sympatric speciation can always occur in
126 our model, either at high water level in the lake, or in both pockets when the water level is
127 low. Extinction is considered to be a background process that occurs locally, i.e. within a
128 pocket. If the water level is high, this causes extinction of a species, if the water level is low,
129 this causes local extinction in one of the pockets.
130 We implemented our model using a Gillespie algorithm, where the time steps are chosen
131 depending on the rate of possible events. In the model there are five possible events:
132 1) A water level change event, inducing incipient species or merger of incipient species.
# 133 2) Sympatric speciation event at high water level, with rate . bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
' 134 3) Sympatric speciation event at low water level, with rate .
' 135 4) Allopatric speciation(-completion) event, at low water level, with rate
136 5) Extinction event, with rate
137 When the water level drops, all species distribute themselves over both pockets. Thus,
138 immediately after a water level drop, the number of incipient species is equal to twice the
139 number of species. When the water level rises, all incipient species that belong to the same
140 species merge into a single species. During a sympatric speciation event, a single species
141 splits into two new species, and the original (incipient) species is consumed in the process.
142 Here we assume that local disruptive selection causes divergence, similar to the
143 implementation of speciation by Aguilee (Aguilée et al. 2011, 2013). If sympatric speciation
144 occurs when the water level is low, the species in the other pocket is retained, and thus three
145 new lineages arise: the first branching point occurs at the water level drop while the second
146 occurs at the sympatric speciation event (Figure 1).
147
148 Figure 1. Schematic representation of the consequences of the three different types of 149 speciation. Time proceeds from left to right. The dotted blue lines indicate water level 150 changes. (a) During a sympatric speciation at high water level event, diversification is not 151 aligned with any associated water level change. (b) During allopatric speciation at low water 152 level, speciation initiation (incipient species are indicated with a dotted line) coincides with 153 the water level drop, causing branching events (if speciation-completion occurs before water 154 level rise) to line up in time. Branching events are conditionally independent of the time of 155 speciation completion, hence, even when the actual speciation completion events occur at 156 different time points, branching events in the species tree are identical. (c) During a sympatric bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
157 speciation event at low water, the speciation event is independent of the water level changes. 158 Because the original species is consumed in the process, a new branching event is also added 159 at the water level change event. Hence, both speciation-completion (b) and sympatric 160 speciation at low water level (c) cause branching times to line up at the time of water level 161 drop. Please note that (a) and (c) represent the reconstructed species tree, but (b) does not; the 162 reconstructed species tree for (b) would not show the branching event in the uppermost part 163 of the tree. 164 During an extinction event, one (possibly incipient) species is removed from the simulation.
165 If the water level is low, this need not lead to the extinction of a species, because the sister
166 incipient species might remain in the other pocket, ensuring survival of the species.
167 Maximum Likelihood
168 Without water level changes, our model reduces to the constant rates birth-death model (Nee
169 et al. 1994). As a reference therefore, we estimated parameters of the standard birth-death
170 model using Maximum Likelihood. The likelihood of the birth-death model was calculated
171 using the function “bd_ML” from the R package DDD. (Etienne et al. 2012).
172
173 Fitting the model to empirical data
174 We performed two different fitting procedures: firstly, we performed a model selection
175 procedure, where three different water level scenarios were fitted simultaneously to the data
176 (more information about the chosen scenarios can be found in the next section). The model
177 selection procedure simultaneously estimates parameter estimates and assesses the fit of the
178 models. However, because the model selection procedure primarily samples the best fitting
179 model (by design), it does not allow for the comparison of parameter estimates across
180 different models. Therefore, we also fitted the three different water level scenarios
181 independently to the empirical data, and obtained posterior distributions for the parameters
182 relevant to these scenarios. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
183 We fitted our model to 100 trees randomly sampled from the MCMC chain obtained from the
184 *BEAST analysis (see below), and to the Most Credible Consensus (MCC) tree.
185 Water level scenarios
186 The main focus of our approach is to assess the impact of water level changes on the
187 diversification rate. Lake Tanganyika experienced low water level stands 35 - 40 k years ago
188 (kya) (-160 meter), 169 - 193 kya (-250 meter), 262 - 295 kya (-350 meter) , 363 - 393 kya (-
189 350 meter) and 550 - 1100 kya (-650 – 700 meter) (Lezzar et al. 1996; Cohen et al. 1997a).
190 The southern and northern basin of Lake Tanganyika are separated from each other by a ridge
191 at a depth of 500 meter below present level. Although some of these water level changes may
192 not have split up the lake completely, we assume here that these water level changes still
193 caused sufficient disruption of migration between the northern and southern basin, to be
194 equivalent to physical separation. Consequently, high water levels occurred between 0 – 35
195 kya, 40 – 169 kya, 193 – 262 kya, 295 - 363 kya and 393 – 550 kya. Unfortunately the
196 geological record does not reveal whether any low water level stands occurred beyond 1.1
197 million years ago (Ma). This leaves us with two alternative scenarios: either no low water
198 level stands occurred beyond 1.1 Ma, or these low water level stands have not been preserved
199 accurately in the geological record.
200 In order to capture these two scenarios we performed inference using two alternative water
201 level implementations. Firstly we used the exact literature values, assuming a high water
202 level stand until 1.1 Ma. We refer to this scenario as LW (Literature Waterlevels). Secondly
203 we assumed that before 1.1. Ma, water level changes occurred at the same average rate of
204 water level change in the most recent 1.1 million years. In the recent 1.1 million years, the
205 lake experienced 5 high water level stands, and 5 low water level stands, which amounts to
206 10 water level changes in total. To extrapolate water level changes to more than 1.1 Ma, we bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
207 drew waiting times until the next water level change from an exponential distribution with
208 rate 10. We refer to this scenario as EW (Extrapolated Water levels). Thirdly we also tested
209 the null expectation: no effect of water level changes on speciation, we refer to this scenario
210 as NW (No Water levels). Without water level changes, the model reduces to the constant-
211 rates birth death model.
212 Parameter estimation
213 To fit the model to empirical data we used Approximate Bayesian Computation, in
214 combination with a Sequential Monte Carlo scheme (ABC-SMC) (Toni et al. 2009).
215 As summary statistics for the ABC analysis we chose the normalized Lineages Through Time
216 statistic (Janzen et al. 2015), tree size, Phylogenetic Diversity (AvPD, Schweiger et al. 2008)
# ' ' 217 and the γ statistic (Pybus and Harvey 2000). On all parameters ( . , . , , ) we chose
218 uniform priors U(-3, 2), on a 10log scale, such that the eventual prior distribution spans (10-3,
219 102). A 10log scale was chosen to explore parameter space uniformly, and put extra emphasis
220 on low values. The standard deviation of the normal distribution used to perturb the
221 parameters was chosen to have a mean of 0, and a standard deviation of 0.05 (on the 10log
222 transformed parameter), and we updated one parameter each time (e.g. jumps were only made
223 in one dimension, to avoid extremely low acceptance rates). The number of particles used per
224 SMC step was 10,000, where a particle is a data structure containing the model choice and
225 the parameter estimates. To assess the fit of the model to the data we calculated the Euclidian
226 distance between the summary statistic of the simulated data and the empirical data. To
227 ensure that the differences in summary statistics were on the same scale, we normalized the
228 differences. Differences were normalized by dividing each difference by the standard
229 deviation of that summary statistic of 1,000,000 trees simulated using parameter values
230 sampled from the prior.
231 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
232 Model selection
233 To identify which model best explains the data, we performed ABC model selection, as
234 described in Toni et al. (2009; 2010). The main difference between standard ABC-SMC and
235 ABC-SMC including model selection is that the latter adds one parameter, which keeps track
236 of the model. As jumping kernel between models we assumed a 50% probability of staying at
237 the same model, and a 25% probability of jumping to either other model. We assumed a
238 uniform prior across all three models; this translates to a probability of 1/3 for each model in
239 the first iteration of the ABC-SMC procedure, and hence an expected number of 3333
240 particles assigned to each model in the first iteration. This reversible jump ABC-SMC model
241 selection procedure results in a posterior distribution over the three models, where the model
242 with most support is the model selected most across all particles. We can calculate the Bayes
243 factor by taking the ratio of the number of particles assigned to the respective models (Toni et
244 al. 2009). For example, the Bayes factor of LW/EW is the number of particles assigned to the
245 model with literature water level changes divided by the number of particles assigned to the
246 model with extrapolated water level changes. Because a model can receive zero particles, we
247 set the Bayes factor for each model compared to the model with zero particles to the
248 maximum support possible, which is the total number of particles: 10,000. To calculate the
249 posterior support for a model, we calculate 2 ln (Bayes factor), following Kass and Raftery
250 (1995). A transformed Bayes factor over a value of 2 then corresponds to substantial support
251 for the considered model (Kass and Raftery 1995).
252 Model selection validation
253 To assess whether our ABC-SMC method can accurately infer the correct model, we
254 simulated 100 datasets for each model (NW, LW & EW), with parameter values drawn from
255 the prior. We report the median Bayes factor across the 100 replicates. If our method can bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
256 accurately infer the correct model, we expect the median Bayes Factor (after 2 ln
257 transformation) to be above 2 when comparing posterior support for the model with which
258 the data was simulated to the other two models.
259
260 Measurement uncertainty
261 A phylogeny generated with a high rate of allopatric speciation and a high rate of water level
262 changes tends to have multiple speciation events that are aligned in time (Figure 1, b). This is
263 due to the fact that the onset of speciation is given by the time of water level change.
264 Phylogenetic reconstruction methods such as BEAST (Bouckaert et al. 2014) currently do not
265 allow for simultaneous branching events. Hence, when fitting the model, trees are generated
266 that are by definition dissimilar from the empirical tree constructed using BEAST, even if
267 underlying events are close to the original events. To circumvent this we perturbed the
268 branching time of each node in the trees simulated using our model. In this way speciation
269 events that were previously aligned in time now occur on slightly different time points, as in
270 a tree from a BEAST analysis. We perturbed branching times by adding a random number
271 drawn from a truncated normal distribution with mean 0, standard deviation , truncated by
272 the minimum distance to either the daughter or the parent species. If there were no daughter
273 lineages present, and the node gave rise to an extant species, the normal distribution was
274 truncated to the minimum distance to the parent or the present time. Nodes were perturbed
275 from past to present (leaving the crown in place, to ensure a phylogenetic tree with the same
276 age as the empirical tree). The standard deviation of the perturbation kernel was included as
277 an extra parameter to be inferred, with a uniform prior on (10-3,100).
278 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
279 Empirical data
280 We fitted our model to the phylogenetic tree of the tribe of Lamprologini, the most diverse
281 tribe within Lake Tanganyika, containing 79 species of cichlids in Lake Tanganyika (Day et
282 al. 2007; Koblmüller et al. 2007; Sturmbauer et al. 2010). The Lamprologini are endemic to
283 Lake Tanganyika and its surrounding rivers and all species are substrate brooders with shared
284 paternal and maternal care. In contrast to the mouthbrooding species from the
285 Haplochromini, the Lamprologini show little sexual dimorphism and dichromatism, which
286 are well-known indicators for sexual selection (Kraaijeveld et al. 2011). We therefore expect
287 that the Lamprologini is a good candidate for picking up signals from water level changes.
288 We reconstructed a new Lamprologini tree following the workflow of the most complete
289 Lamprologini tree to date, which is a consensus tree based on the mitochondrial ND2 gene
290 (Sturmbauer et al. 2010), but we added three newly described species (Lepidiolamprologus
291 mimicus (Schelly et al. 2007), Neolamprologus timidus (Kullander et al. 2014b) and
292 Chalinochromis cyanophleps (Kullander et al. 2014a)). Using phyloGenerator (Pearse and
293 Purvis 2013), we downloaded sequences from GenBank for nine genes (GenBank access
294 numbers can be found in the Supplementary Information). Genes were selected on the basis
295 of species coverage (at least 25% of the 79 Lamprologini species for which molecular data is
296 available), and whether or not the gene was crucial for inclusion of a species (e.g. for a
297 number of species, the only available gene was ND2). After selection, our full dataset
298 consisted of three mitochondrial genes: the NADH dehydrogenase subunit 2 (ND2 gene,
299 sequences from Kocher et al. 1995; Klett and Meyer 2002; Clabaut et al. 2005; Duftner et al.
300 2005; Schelly et al. 2006; Day et al. 2007; Koblmüller et al. 2007, 2016; Schwarzer et al.
301 2009; Wagner et al. 2009; Sturmbauer et al. 2010; O’Quin et al. 2010; Kullander et al. 2014b;
302 Weiss et al. 2015). The cytochrome b (cytb) gene (sequences from Salzburger et al. 2002;
303 Nevado et al. 2009; Wagner et al. 2009; O’Quin et al. 2010; Matschiner et al. 2011, 2016; bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
304 Kullander et al. 2014b; Shirai et al. 2014) and the cytochrome c oxidase subunit I (COI gene,
305 sequences from Sparks and Smith 2004; Nevado et al. 2013; Kullander et al. 2014a, 2014b;
306 Breman et al. 2016; Matschiner et al. 2016) and six nuclear genes: the nuclear locus 38A
307 (38A, sequences from Muschick et al. 2012; Meyer et al. 2016), the 18S ribosomal RNA
308 internal-transcribed spacer 1–2 with 5.8S and 28S ribosomal RNA partial sequences (18S,
309 sequences from Nevado et al. 2009; Koblmüller et al. 2016), the recombinase activating
310 protein 1 (rag1, sequences from Clabaut et al. 2005; Nevado et al. 2009; Kullander et al.
311 2014b; Shirai et al. 2014; Koblmüller et al. 2016; Meyer et al. 2016), the endothelin receptor
312 B1 gene (ednrb1, sequences from Muschick et al. 2012; Santos et al. 2014), the ribosomal
313 protein S7 (rps7, sequences from Schelly et al. 2006; Meyer et al. 2016)) gene and the rod
314 opsin gene (RH1, sequences from Sugawara et al. 2002; Spady et al. 2005; Nagai et al. 2011;
315 Meyer et al. 2015). GenBank access numbers for the used sequences can be found in the
316 supplementary material.
317 Sequences were aligned using MAFFT (setting: --auto) (Katoh and Standley 2013), and
318 subsequently, sequences were cleaned using trimAI (sites with >80% data missing were
319 removed, e.g. setting –gt 0.2) (Capella-Gutiérrez et al. 2009). Rather than concatenating the
320 alignments, we partitioned the data into subsets with independent sequence evolution models,
321 which is more suitable for a dataset which is expected to show incomplete lineage sorting or
322 hybridization (Meyer et al. 2016). To prepare alignments for use with partitionFinder,
323 alignments were combined using SequenceMatrix 1.8 (Vaidya et al. 2011). The best
324 partitioning found by partitionFinder 2.1.1 (Lanfear et al. 2012, 2016), partitioned the data
325 into 5 subsets (unlinked branches, AICc selection criterion), with all nuclear genes into one
326 subset (Rps7, ednrb1, 38A, 18S, RAG1 and RH1), with substitution model HKY+I+Γ. The
327 remaining three mitochondrial genes (ND2, COI and cytb) were placed in separate subsets,
328 each with a GTR+I+ Γ substitution model. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
329 Using *BEAST (Heled and Drummond 2010) within the BEAST 2 package (Bouckaert et al.
330 2014), we inferred the time-calibrated species tree. We used an uncorrelated log-normal
331 relaxed clock and applied two calibration points. Firstly, we calibrated the crown of the
332 Lamprologini to be 4 million years old (log-normal prior, mean of 4 Myr 95% conf interval:
333 [3, 5]), based on the results from Meyer et al. (2016). Secondly, we included two riverine
334 Lamprologini species (L. congoensis and L. teugelsi), and calibrated the onset of their
335 branching event at 1.7 Ma (offset 1.1, log normal distribution with mean 1.7, 95% conf
336 interval [1.15, 3.47], “use originate = true”), following Koblmuller (2010). We applied 1/X
337 priors on the clock rates, and log-normal priors on the substitution rates. All other priors were
338 left at their default setting. As tree model we used the birth death model. The used BEAST
339 configuration file (the Beauti xml) can be found in the supplementary material.
340 We ran 10 independent STARBEAST MCMC chains, of 750M trees each. Each chain was
341 verified to have ESS values of at least 100 for all parameters. The first 10M trees were
342 pruned from these chains as burn-in and then they were combined (we used the species tree,
343 rather than the individual gene trees) into one large chain (of 7400M trees). Chains were
344 thinned by taking only each 5,000th tree. Using TreeAnnotator (from the BEAST 2 suite) we
345 constructed a Maximum Clade Credibility tree (using all 1.48M trees after thinning), storing
346 the mean heights.
347 We then pruned the tree from riverine species to obtain the pure Lamprologini tree on which
348 we fitted our model. Instead of performing one ABC-SMC inference on the obtained MCC
349 tree using a huge number of particles, which would be more accurate but computationally
350 extremely demanding, we performed 100 parallel inferences using 10,000 particles each. We
351 report the mean Bayes factor across these replicates.
352 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
353 Branching time uncertainty in the empirical tree
354 To account for uncertainty in the estimates of branching times in the Lamprologini tree we
355 sampled 100 trees from the posterior distribution obtained by *BEAST. Sampling was
356 performed at random, irrespective of the likelihood of the trees. In the Supplementary
357 material we show that the distribution of summary statistics of the subset of 100 trees is
358 similar to the distribution of the thinned chain. The 100 sampled trees were, like the
359 Maximum Clade Credibility tree, also pruned to remove the riverine taxa and stored
360 separately. For all 100 trees we performed both the ABC-SMC model selection algorithm and
361 the ABC-SMC parameter estimation algorithm, to determine the impact of different
362 branching times on the inferred water level model and associated parameters, and to
363 determine whether the MCC tree is a good representation of the underlying variability. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
364 RESULTS
365 Lamprologini phylogeny
366
367 The onset of diversification within the Lamprologini is estimated to be around 3.96 Ma (95%
368 Highest Posterior Density interval (HPD): [3.09, 4.91]), which is very close to the prior we
369 put on the node age, based on previous estimates (Meyer et al. 2016). Furthermore, we
370 estimate the branching off of the Congo species (N. congoensis and N. teugelsi) from the
371 Lamprologini in Lake Tanganyika to have occurred around 1.35 Ma (HPD: [1.12, 1.68]),
372 which is a bit younger than previously obtained estimates (1.70 Ma, (Sturmbauer et al.
373 2010)). The topology of the Maximum Clade Credibility tree is largely consistent with
374 previous findings (Sturmbauer et al. 2010) (Figure 1). Placement of Neolamprologus
375 fasciatus as a close relative to N. wauthioni seems to re-iterate previously published evidence
376 for introgressive hybridization (Koblmüller et al. 2007). For the three species not previously
377 included in the Lamprologini phylogeny, Lepidiolamprologus mimicus was placed as a close
378 relative to the other species within the genus Lepidiolamprologus, Chalinochromis
379 cyanophleps was placed as a sister species to Chalinochromis brichardi, within the group of
380 Chalinochromis and Julidochromis species, and in agreement with previous analysis
381 (Kullander et al. 2014b). In contrast to previous findings (Kullander et al. 2014b),
382 Neolamprologus timidus is not placed as a sister species to Neolamprologus furcifer, but
383 rather associates with N. mondabu and N. falcicula. Again, in contrast to other previous
384 findings (Gante et al. 2016), we place N. olivaceous outside the brichardi complex, which
385 includes the model system species N. brichardi and N. pulcher. (but see the DensiTree
386 representation, which shows that this is not true for all trees). Interestingly, we also do not
387 infer N. savoryi to be phylogenetically clustered within the brichardi complex (the ‘Princess
388 cichlids’ (Gante et al. 2016)), in contrast to Gante et al. (2016). We should take into account bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
389 however that the analysis by Gante et al. is based on on full genome sequences from only a
390 small group of species, in contrast to the limited number of markers from a large number of
391 species that we used.
392 As a reference we inferred speciation and extinction using the constant-rates birth-death
393 model (Nee et al. 1994). Using Maximum Likelihood (the function bd_ML in the DDD
394 package (Etienne et al. 2012)), we obtained an estimate of 1.871 myr-1 for the speciation rate,
395 and an estimate of 0.993 myr-1 for the extinction rate, for the MCC tree. We find an estimate
396 of 0.87 myr-1 for the diversification rate (speciation - extinction) and an estimate of 0.531
397 myr-1 for the turnover rate (extinction / speciation). For the 100 trees sampled from the full
398 chain, we obtain estimates of 3.02 myr-1 (95% HPD: [1.608, 4.947]) for the speciation rate
399 and 2.409 myr-1 (95% HPD: [0.884, 4.530]) for the extinction rate. This translates into
400 estimates of 0.61 myr-1 (95% HPD: [0.313, 0.985]) for the diversification rate, and 0.765 myr-
401 1 (95% HPD: [0.518, 0.930]) for the turnover rate. Estimates for the birth-death model
402 obtained during reconstruction of the tree using BEAST indicate an estimate of 0.864 myr-1
403 (95% HPD: [0.287, 1.459]) for the speciation rate and an estimate of 0.613 myr-1 (95% HPD:
404 [0.181, 0.953]) for the relative death rate, which translates into an estimate for the extinction
405 rate of 0.52 myr-1 (95% HPD: [0.156, 0.823]) per million years. This yields estimates of
406 0.334 myr-1 and 0.613 myr-1for the diversification and turnover rate respectively. The BEAST
407 inferences include the riverine species, so speciation and extinction rates are expected to be a
408 bit different. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
409
410 Figure 2. Phylogenetic hypothesis for the Lamprologini and outgroups, based on 3 411 mitochondrial and 6 nuclear genes, and two calibrations: 4 million years for the root of the 412 Lamprologini clade, and 1.1 - 3.5 million years for the Congo Lamprologini species. Left 413 panel: DensiTree (Bouckaert and Heled 2014) representation of the MCMC chain obtained 414 using *BEAST. Shown are trees from a thinned posterior chain, after selecting every 415 100,000th tree. Riverine species are indicated in grey. Right panel: Maximum Clade 416 Credibility tree. Bars around the node span the 95% HPD for each node. Please note that for 417 the dual display of both the densitree representation and the MCC phylogeny, some tips of 418 the MCC phylogeny might appear slightly misaligned. A high resolution version of both the 419 Densitree representation and the MCC phylogeny can be found in the supplementary 420 information. 421 422 Parameter estimation 423 We estimated parameter values for the three models for all 100 trees sampled from the
424 posterior. We report the parameter values across the combined posterior across all 100 trees.
425 Note that variation in the parameter estimates results from two sources of variation:
426 branching time variation across the 100 trees, and secondly variation in the parameter
427 estimate within each ABC-SMC inference.
428 The model without water level changes is identical to the constant-rates birth-death model,
# 429 and we find that our ABC-SMC estimates for sympatric speciation at high water level ( . )
430 are slightly lower than the Maximum Likelihood estimate of the birth rate under the constant
431 rates birth-death model (2.644 Myr-1(95% HPD: [1.208, 4.633], see also Table 1) versus 3.02, bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
432 see also Table 1). Similarly, we infer the extinction rate (µ) to also be slightly lower (1.950
433 Myr-1 (95% HPD: [0.188, 4.101]) versus 2.409, see also Table 1). We obtain estimates of
434 0.694 and 0.738 for diversification and turnover respectively, which are close to the estimates
435 obtained using Maximum Likelihood (a diversification rate of 0.610 and a turnover rate of
436 0.765 respectively). Taking into account the 95% confidence intervals on the obtained
437 parameter estimates and the fact that the ABC-SMC estimates are potentially affected by the
438 prior while the ML estimates are not, we are confident that estimates obtained using our
439 ABC-SMC method for the model without water level changes are consistent with the
440 maximum likelihood estimates under the constant-rates birth-death model.
441 Using the LW model, which implements water level changes following the literature (e.g.
442 high water level until ~1.1 Ma, after which a series of water level changes took place), we
443 infer a lower rate of sympatric speciation at high water level (0.871 Myr-1 (95% HPD: [0.227,
444 3.642])), which is compensated with a high rate of allopatric speciation (6.412 Myr-1 (95%
445 HPD: [0.001, 14.195])) but not with a high rate of sympatric speciation at low water level
446 (0.028 Myr-1 (95% HPD: [0.001, 0.651])), suggesting that water level dynamics are important
447 drivers of biodiversity, but only through allopatric speciation. Extinction is inferred to be low
448 (0.037 Myr-1 (95% HPD: [0.001, 2.133])). Because of the non-trivial relationship between
449 speciation at high and low water level, we can no longer calculate diversification and
450 turnover rates.
451 Using the EW model, where water level changes are extrapolated beyond 1.1 Ma, we observe
452 that the rate of sympatric speciation at high water level is inferred to be similar to without
453 water level changes (2.753 Myr-1 (95% HPD: [1.347, 4.383])). Extinction, however, is lower
454 (0.111 Myr-1 (95% HPD: [0.001, 1.627])), and allopatric speciation and sympatric speciation
455 at low water level are both inferred to be much lower than for the literature water scenario bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
456 (0.022 Myr-1 (95% HPD: [0.001, 0.466]) and 0.033 Myr-1 (95% HPD: [0.001, 0.504])
457 respectively).
458 Across the three water level models we observe that the distribution of the post-hoc
459 perturbations does not differ substantially from the prior for the NW and EW water models,
460 with low estimates (0.036 (95% HPD: [0.001, 0.569]) and 0.030 (95% HPD: [0.001, 0.484])
461 for the NW and EW model respectively, Table 1). We notice a much higher value of
462 associated with LW (0.174, (95% HPD: [0.001, 0.680])), which also has a much higher
463 estimate for allopatric speciation at low water level. Allopatric speciation at low water level
464 potentially causes temporal alignment of branching times and we introduced the parameter
465 to correct simulated phylogenies for this, to allow comparison with phylogenies generated by
466 *BEAST, which does not allow for temporally aligned branching times. Hence, the higher
467 inferred value of for the LW model confirms the validity of the application of our post-hoc
468 perturbation.
469 470 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
# 471 Table 1. Median posterior density estimate, for sympatric speciation at high water ( . ), ' 472 extinction ( ), perturbation ( ), sympatric speciation at low water ( . ) and allopatric ' 473 speciation ( ). Shown are results for the model with no water level changes (NW), literature 474 values for water level changes (LW) and water level changes extrapolated beyond the 475 literature range (EW). The 95% credibility interval is shown between square brackets. All 476 values are rates per million years.
¾ Â Â õÉ ö ü õ· õÉ NW 2.644 [1.208, 4.633] 1.950 [0.188, 4.101] 0.036 [0.001, 0.569] LW 0.871 [0.227, 3.642] 0.037 [0.001, 2.133] 0.174 [0.001, 0.68] 6.412 [0.001, 14.195] 0.028 [0.001, 0.651] EW 2.753 [1.347, 4.383] 0.111 [0.001, 1.627] 0.030 [0.001, 0.484] 0.022 [0.001, 0.466] 0.033 [0.001, 0.504] 477
478 479 Figure 3. Posterior densities of the pooled posterior distribution across 100 randomly drawn 480 trees from the posterior MCMC chain. Shown are estimates for the three water level 481 scenarios (no water level changes (NW), literature values for water level changes (LW) and 482 extrapolated values for water level changes (EW)). Shown are the posterior density (black 483 line) and the 95% credibility interval (shaded area, blue for NW, gold for LW and green for 484 EW. X-axes are on a 10log scale. The first column shows a sample water level profile, with 485 the water level on the y-axis, and the time before present (in million years) on the x-axis. 486 Note that for the EW model, for each simulation a new profile was generated, and that the ' 487 shown profile is only one example of such a profile. Because without water level changes, ' 488 and . have no meaning, their posterior distribution is not shown for the NW scenario. 489 490 491 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
492 Model fitting
493 494 Figure 4. Model selection results on 100 trees randomly drawn from the *BEAST posterior of 495 the Lamprologini tree. The top row shows 2 ln (Bayes factors) comparing posterior support 496 of the LW (literature water changes) model with the NW (no water level changes) model, the 497 bottom row shows 2 ln (Bayes factors) of the comparison between the posterior support for 498 the LW model with the EW (extrapolated water level changes) model. A 2ln(Bayes factor) 499 higher than 2 is generally considered to provide substantial evidence in favor of the 500 respective model (Kass and Raftery 1995), which is indicated by the solid lines. The thin 501 dotted line indicates the median 2 ln (Bayes factor) obtained for the MCC tree, for which we 502 do not find substantial support for any of the three models. The thick dotted line indicates the 503 median 2 ln (Bayes factor) for the trees drawn from the *BEAST posterior (e.g. the median of 504 the distribution shown), which is in both cases above 2, indicating substantial support for the 505 LW model compared to the other two models. 2 ln(Bayes factors) higher than 10 are grouped 506 together into one category. 507 508 509 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
510 Model selection 511 When we apply the model selection algorithm to the MCC tree, we find median Bayes factors
512 (we report here not the raw numbers, but 2 ln(Bayes factor), but for brevity refer to them as
513 Bayes factors) of 1.64 and 0.9 when comparing the LW model with the NW and EW model
514 respectively. We thus find no convincing evidence for any of the three models, when fitting
515 our model to the MCC tree. Alternatively, when we fit to 100 trees randomly sampled from
516 the *BEAST posterior, we find Bayes factors of 3.60 and 3.65 when comparing LW model
517 with the NW and EW model respectively. Furthermore, in 77 out of the 100 trees we select
518 the LW model as the most likely model (based on the Bayes factor), in 17 out of 100 trees we
519 select the EW model, and only in 6 out of 100 trees we select the model without any water
520 level changes.
521
522 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
523 Validation of the model selection procedure
524 525 Figure 5. Validation of the ability of our ABC-SMC algorithm to infer the correct model. 100 526 replicate datasets were generated for each water level model (no water level changes NW, 527 water level changes from the literature LW, or water level changes extrapolated beyond the 528 literature range, EW). The plots show the distribution of the 2 ln(Bayes factor) across all 100 529 replicate inferences. The dotted line indicates the median 2 ln(Bayes factor). A 2ln(Bayes 530 factor) higher than 2 is generally considered to provide substantial evidence in favor of the 531 respective model (Kass and Raftery 1995). 2 ln(Bayes factors) higher than 10 are grouped 532 together into one category. 533 534 Model validation shows that when we simulated data using the NW model, the NW model
535 was selected using our model validation algorithm more than the other two models (59 out of
536 100 replicates). Median Bayes factors are both higher than 2, with a median of 5.28 and 3.83
537 versus the LW and EW model respectively, supporting considerable support for the NW bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
538 model over the other two models. When data was simulated with the LW model, we selected
539 the correct model in the majority of 100 replicates (84 out of 100 replicates). The Bayes
540 factors reflect this, with medians of 18.4 (this is the maximum score) versus both the NW and
541 EW model. Lastly, when we simulated data using the EW model, we selected the correct
542 model more than the other two models, in 65 out of 100 replicates. This was reflected by the
543 Bayes factors as well, as the median Bayes factor versus the NW model was 18.4, and the
544 median Bayes factor versus the LW model was 7.47.
545 More interesting is the correct detection rate of a model, which is given by the number of
546 trees simulated by the model that is selected for that tree. This is equal to asking whether,
547 given posterior support for a respective model, we also find that the tree for which we find
548 this support was simulated with the respective model. If our model selection procedure can
549 not detect models accurately, we expect a detection rate of around 50%, as support is always
550 divided between two (not three) models. Detection rates larger than 50% support the
551 conclusion that our model selection procedure can adequately infer the correct model.
552 We find that across the 300 simulated trees, 120 trees received considerable support for the
553 LW model over the NW model (e.g. 2 ln (BF LW/NW) > 2), of these 120 trees, 92 trees were
554 simulated with the LW model, which leads to a correct detection rate of 77% (See Figure 6).
555 Furthermore, out of 107 trees that received considerable support for the LW model over the
556 EW model, we find that 83 trees were simulated using the LW model, which translates to a
557 detection rate of 78%. We find similar detection rates for the NW model: 90% against the
558 LW model (62 out of 69 detected trees) and 83% against the EW model (54 out of 65
559 detected trees). Lastly, detection rates for the EW model mirror these findings: a detection
560 rate of 79% against the NW model (79 out of 100 detected trees), and of 86% against the LW
561 model (68 out of 79 detected trees).
562 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
563
564 Figure 6. Accuracy of assignment of models depending on their posterior support. A: The 565 relative fraction of trees simulated with either LW (dark) or NW (light), receiving support for 566 LW (2ln(Bayes Factor LW/NW) > 2), support for NW (2ln(Bayes Factor LW/NW) < -2), or 567 receiving no support for either model. B: The relative fraction of trees simulated with either bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
568 LW (dark) or EW (light), receiving support for LW (2ln(Bayes Factor LW/EW) > 2), support 569 for EW (2ln(Bayes Factor LW/EW) < -2), or receiving no support for either model. C: The 570 relative fraction of trees simulated with either NW (dark) or EW (light), receiving support for 571 NW (2ln(Bayes Factor NW/EW) > 2), support for EW (2ln(Bayes Factor NW/EW) < -2), or 572 receiving no support for either model. 573 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
574 DISCUSSION
575 We have presented a model that infers past speciation and extinction rates, and their
576 interactions with changes in the environment, from a given phylogeny. We have shown that
577 our model is able to accurately select between different scenarios, including or excluding
578 environmental change. We applied our model to an updated phylogeny of the cichlid fish
579 tribe of Lamprologini and found evidence that past water level changes have shaped current
580 cichlid diversity in Lake Tanganyika, when we applied our model to a sample from the
581 posterior distribution of trees of the Lamprologini, as inferred by *BEAST. We asked the
582 model to select the best fitting of three scenarios: a scenario without any water level changes,
583 a scenario using the values found in the literature, and a scenario using the mean rate of water
584 level change found in the literature to extrapolate water level changes beyond the range of
585 literature values available. We found that the model following literature water levels received
586 most support, which suggests that water level changes have been an important driver of
587 diversity in the Lamprologini. We note that a model without effect of water level changes on
588 diversification (NW) can sometimes generate patterns that resemble the predictions of the
589 preferred model (LW). Yet, we find when fitting our model to trees drawn from the *BEAST
590 posterior that the distribution of Bayes Factors is skewed towards the model following
591 literature water levels (LW) and we find support for the model without an effect of water
592 level changes on diversification (NW) only for a small number of trees, suggesting that this
593 effect is relatively small.
594
595 When we applied our model selection algorithm on the Most Credible Consensus (MCC)
596 tree, we found contrasting results. Support for both models including water level changes
597 diminished, and posterior support for the model without any water level changes increased. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
598 Nevertheless, no single model could yield enough support to convincingly reject the other
599 two. Moreover, results using the MCC tree are markedly different from those using trees
600 sampled from the posterior. We conclude therefore that the MCC tree, at least for the
601 Lamprologini, but most likely more generally, provides a poor summary of the true species
602 tree and of the underlying variation in branching patterns. Hence, we suggest to avoid
603 reporting MCC trees, and instead to provide the reader with the full posterior distribution, for
604 instance through a DensiTree plot (Bouckaert and Heled 2014). Posterior inference, for
605 instance of speciation and extinction rates should preferentially also be performed on multiple
606 independent samples from the posterior, rather than on the MCC tree, as the underlying
607 variation might lead to very different results, as we have shown here.
608
609 Discrepancies between the MCC tree and the posterior distribution of trees could also
610 potentially clarify previously recovered inconsistencies when studying diversification, for
611 example in shrews in the Philippines. The Philippines have been subject to strong sea level
612 fluctuations, causing the fission and fusion of several islands, primarily during the
613 Pleistocene (Brown et al. 2013). Population genetic evidence has convincingly shown that the
614 location of such fused islands correlates strongly with genetic divergence between
615 populations in many different species (Evans et al. 2003; Linkem et al. 2010; Siler et al.
616 2010; Oaks et al. 2013). Phylogenetic analysis however, has failed to show any evidence of
617 diversification associated with Pleistocene water level changes (Esselstyn and Brown 2009).
618 The basis for this phylogenetic analysis however, was an MCC tree. Repeating the analysis
619 on the posterior distribution underlying the MCC tree could mitigate these problems, and
620 could clarify the impact of Pleistocene water level changes on diversification in the
621 Philippines archipelago. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
622
623 When allopatric speciation rates are high, the resulting phylogenetic trees have internal nodes
624 that have synchronized branching times, e.g. branching times that align with episodes of
625 water level change. Although Phylogenetic reconstruction software is able to infer
626 simultaneous branching events, it typically uses only two parameters (birth and death) to infer
627 all branching events of the tree. Therefore, if it can accommodate the simultaneous events, it
628 is unlikely to fit well to the non-simultaneous events, and vice-versa. Our finding of evidence
629 for a substantial role of habitat dynamics in diversification can therefore be regarded as
630 conservative. To improve the fit of trees generated by our model with trees generated by
631 *BEAST we included an a posteriori perturbation parameter in our model. This parameter
632 determines the standard deviation of a Gaussian perturbation kernel that is applied to each
633 node after the simulation has completed. By perturbing each node, we minimized the
634 probability that branching times align in time. We found that standard deviation increased in
635 size with an increase in allopatric speciation, as expected. A less ad hoc solution to deal with
636 the alignment of branching times in the tree would be to incorporate the model presented here
637 as a tree prior in phylogenetic reconstruction software. Although this need not introduce any
638 significant differences in the tree topology, the distribution of branching times could be
639 substantially influenced, and any subsequent inference focusing on such patterns could be
640 very different. Including such models in tree reconstruction software may require
641 incorporation of ABC methods, and will be extremely computationally demanding, but our
642 results justify such an endeavor.
643
644 Given that water level changes are only prevalent during the last million years before present,
645 we cannot exclude the possibility that increased diversification due to reasons other than bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
646 changing water levels has driven diversification during this period. On average, the LW
647 model could be represented by a simple birth-death model with a rate shift around one
648 million years ago. We expect however that although such a model could accommodate the
649 increased average diversification, it cannot replicate temporal alignment in branching events
650 due to water level changes. To examine this in more detail, we fitted a simple birth-death
651 model with a rate shift around one million years ago to the trees obtained from the posterior
652 (see Supplementary Information).. In the absence of a likelihood for the LW model, we
653 compared the nLTT statistic for the rate-shift model with that of the LW model, as the nLTT
654 statistic should be sensitive to detecting temporal alignment of branching events, We find that
655 our model is much closer to the empirical data than the rate shift model. We attempted to
656 improve the fit of the rate-shift model by allowing the speciation rate in the model to shift up
657 and down in line with the literature values of the water level changes. The two rates inferred
658 by the model then represent speciation at low, and at high water level respectively. Although
659 we do find an increase in the rate of speciation at low water level, the fit of this rate-shift
660 model is still worse than that of the LW model. This supports our conclusion that water level
661 changes influence the phylogeny not only through an increased speciation rate, but also
662 through temporal alignment of branching times.
663
664 Although we refer in our model to the different implementations of speciation as sympatric
665 and allopatric, care should be taken in interpreting these forms of speciation. We consider
666 here allopatric speciation only on a large scale, where populations become allopatric over
667 stretches of hundreds of kilometers (Sturmbauer et al. 2001). Large-scale isolation might not
668 be necessary for cichlids, as some species can already be limited in gene flow by a sand
669 stretch of 50 meters separating populations (Rico and Turner 2002). Such micro-allopatric
670 speciation events are not captured by the allopatric speciation rate in our model. Rather, these bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
671 local scale events are captured in our model by sympatric speciation. Hence, sympatric
672 speciation in our model covers all degrees of speciation ranging from full sympatry to
673 allopatry, providing that geographical isolation is smaller than that imposed by a water level
674 change. Allopatric speciation in our model then solely refers to speciation events caused by
675 geographical isolation over a large distance, driven by changes in water level, and inducing
676 simultaneous branching events.
677
678 In our model we have assumed that when the water level drops, species distribute themselves
679 equally over the two pockets of water that survive the water level drop. A more realistic
680 model would allow for a skew towards one of the pockets, either dependent on the respective
681 sizes of the pockets, the distribution of the species over the lake at high water level, or both.
682 We have here refrained from including a parameter that regulates the distribution of species
683 over the two pockets in order to avoid over fitting. Another possible extension of our model
684 would lie into extending the approach towards three or more pockets, possibly combined with
685 a parameter governing the distribution of species across these three pockets during a water
686 level drop. Bathymetric maps of Lake Tanganyika suggest that for some water level changes
687 it might split into three lakes (Coulter 1991). How a split of a species into three populations,
688 and associated allopatric divergence and speciation, affects phylogenetic structure and affects
689 temporal alignment in branching times remains currently unexplored and would be an
690 interesting avenue for future work.
691
692 Our results are strongly in line with population genomic analyses in a number of cichlid
693 species including Eretmodus cyanostictus (Verheyen et al. 1996), Tropheus moorii
694 (Koblmüller et al. 2011; Nevado et al. 2013; Sefc et al. 2017), Variabilichromis moorii bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
695 (Nevado et al. 2013), Altolamprologus (Koblmüller et al. 2016) and Telmatochromis
696 temporalis (Winkelmann et al. 2016), and resonate with population genomic findings across
697 the three African Rift Lakes (Sturmbauer et al. 2001). Furthermore, population genetic
698 studies have shown that water level fluctuations in Lake Malawi have been associated with
699 population expansion in cichlid species (Arnegard et al. 1999; Sturmbauer et al. 2001;
700 Genner et al. 2010), suggesting a potential role for water level changes in Lake Malawi as
701 well. Phylogenetic reconstruction for Malawi cichlid species is problematic however,
702 partially due to the young age of the species. However, considering that the geological record
703 of Lake Malawi spans a much larger part of the total lifespan of the lake (Delvaux 1995;
704 Lyons et al. 2015; Ivory et al. 2016) and thus provides a much better record about water level
705 fluctuations since the colonization of the lake by cichlids, we expect that modern genetic
706 developments will soon allow for a thorough understanding of the impact of water level
707 changes on cichlids in Lake Malawi as well.
708
709 Conclusion
710 Our model integrates standard constant-rate birth-death mechanics with environmental
711 change and with speciation induced by geographical isolation. We analyzed the phylogeny of
712 the tribe of Lamprologini to see whether past water level changes in Lake Tanganyika have
713 contributed to the current diversity of cichlid fish in Lake Tanganyika. We find an important
714 role for environmental changes in driving diversity, and find evidence that past water level
715 changes have shaped current standing diversity in the tribe of Lamprologini. However, we
716 found that inference of past environmental changes from a single phylogeny, and more
717 specifically, from the MCC tree, tends to lead to unreliable results. We therefore advocate
718 caution when using the MCC tree as a basis for further analysis. Furthermore, we argue for bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
719 the inclusion of more detailed branching models in phylogenetic reconstruction software,
720 which allow for the inclusion of an interaction between the environment, and speciation rates.
721
722 Acknowledgements
723 We thank Lucas Molleman for useful discussions. We thank the Netherlands Organisation for
724 Scientific Research (NWO) for financial support through VIDI and VICI grants awarded to
725 RSE. We thank the Donald Smits Center for information Technology of the University of
726 Groningen for their support and providing access to the Millipede and Peregrine high-
727 performance computing cluster. We thank the Max Planck Institute for Evolutionary Theory
728 for their support and providing access to their high-performance computing cluster. We thank
729 the Carl von Ossietzky Universität Oldenburg for their support and providing access to the
730 CARL computing cluster.
731 bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
732 REFERENCES
733 Aguilée R., Claessen D., Lambert A. 2013. Adaptive radiation driven by the interplay of eco- 734 evolutionary and landscape dynamics. Evolution (N. Y). 67:1291–1306. 735 Aguilée R., Lambert A., Claessen D. 2011. Ecological speciation in dynamic landscapes. J. 736 Evol. Biol. 24:2663–77. 737 Alin S., Cohen A., Bills R. 1999. Effects of landscape disturbance on animal communities in 738 Lake Tanganyika, East Africa. Conservation. 13:1017–1033. 739 Alin S., Cohen A.S. 2003. Lake-level history of Lake Tanganyika, East Africa, for the past 740 2500 years based on ostracode-inferred water-depth reconstruction. Palaeogeogr. 741 Palaeoclimatol. Palaeoecol. 199:31–49. 742 Arnegard M.E., Markert J.A., Danley P.D., Stauffer J.R., Ambali A.J., Kocher T.D. 1999. 743 Population structure and colour variation of the cichlid fishes Labeotropheus fuelleborni 744 Ahl along a recently formed archipelago of rocky habitat patches in southern Lake 745 Malawi. Proc. R. Soc. Lond. B. 266:119–130. 746 Barnosky A. 2005. Effects of Quaternary climatic change on speciation in mammals. J. 747 Mamm. Evol. 12:247–264. 748 Bouckaert R., Heled J. 2014. DensiTree 2: Seeing trees through the forest. bioRxiv.:1–11. 749 Bouckaert R., Heled J., Kühnert D., Vaughan T., Wu C.H., Xie D., Suchard M.A., Rambaut 750 A., Drummond A.J. 2014. BEAST 2: A Software Platform for Bayesian Evolutionary 751 Analysis. PLoS Comput. Biol. 10:1–6. 752 Brawand D., Wagner C.E., Li Y.I., Malinsky M., Keller I., Fan S., Simakov O., Ng A.Y., 753 Lim Z.W., Bezault E., Turner-Maier, J. Johnson J., Alcazar R., Noh H.J., Russell P., 754 Aken B., Alföldi J., Amemiya C., Azzouzi N., Baroiller J.-F., Barloy-Hubler F., Berlin 755 A., Bloomquist R., Carleton K.L., Conte M.A., D’Cotta H., Eshel O., Gaffney L., 756 Galibert F., Gante H.F., Gnerre S., Greuter L., Guyon R., Haddad N.S., Haerty W., 757 Harris R.M., Hofmann H. a., Hourlier T., Hulata G., Jaffe D.B., Lara M., A.P. L., 758 MacCallum I., Mwaiko S., Nikaido M., Nishihara H., Ozouf-Costaz C., Penman D.J., 759 Przybylski D., Rakotomanga M., Renn S.C.P., Ribeiro F.J., Ron M., Salzburger W., 760 Sanchez-Pulido L., Santos M.E., Searle S., Sharpe T., Swofford R., Tan F.J., Williams 761 L., Young S., Yin S., Okada N., Kocher T.D., Miska E. a., Lander E.S., Venkatesh B., 762 Fernald R.D., Meyer A., Ponting C.P., Streelman J.T., Lindblad-Toh K., Seehausen O., 763 Di Palma F. 2014. The genomic substrate for adaptive radiation in African cichlid fish. 764 Nature. 513:375–381. 765 Breman F.C., Loix S., Jordaens K., Snoeks J., Van Steenberge M. 2016. Testing the potential 766 of DNA barcoding in vertebrate radiations: the case of the littoral cichlids (Pisces, 767 Perciformes, Cichlidae) from Lake Tanganyika. Mol. Ecol. Resour. 16:1455–1464. 768 Brown R.M., Siler C.D., Oliveros C.H., Esselstyn J. a., Diesmos A.C., Hosner P. a., Linkem 769 C.W., Barley A.J., Oaks J.R., Sanguila M.B., Welton L.J., Blackburn D.C., Moyle R.G., 770 Townsend Peterson a., Alcala A.C. 2013. Evolutionary processes of diversification in a 771 model island archipelago. Annu. Rev. Ecol. Evol. Syst. 44:411–435. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
772 Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. 2009. trimAl: A tool for automated 773 alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25:1972– 774 1973. 775 Clabaut C., Salzburger W., Meyer A. 2005. Comparative phylogenetic analyses of the 776 adaptive radiation of Lake Tanganyika cichlid fish: nuclear sequences are less 777 homoplasious but also less informative than mitochondrial DNA. J. Mol. Evol. 61:666– 778 81. 779 Cohen A.S., Lezzar K.E., Tiercelin J.J., Soreghan M. 1997a. New palaeogeographic and lake- 780 level reconstructions of Lake Tanganyika: implications for tectonic, climatic and 781 biological evolution in a rift lake. Basin Res. 9:107–132. 782 Cohen A.S., Soreghan M., Scholz C.A. 1993. Estimating the age of formation of lakes: an 783 example from Lake Tanganyika, East African Rift system. Geology. 21:511. 784 Cohen A.S., Stone J.R., Beuning K.R.M., Park L.E., Reinthal P.N., Dettman D., Scholz C.A., 785 Johnson T.C., King J.W., Talbot M.R., Brown E.T., Ivory S.J. 2007. Ecological 786 consequences of early Late Pleistocene megadroughts in tropical Africa. Proc. Natl. 787 Acad. Sci. 104:16422–7. 788 Cohen A.S., Talbot M.R., Awramik S.M., Dettman D.L., Abell P. 1997b. Lake level and 789 paleoenvironmental history of Lake Tanganyika, Africa, as inferred from late Holocene 790 and modern stromatolites. Geol. Soc. Am. Bull. 109:444–460. 791 Coulter G. 1991. Lake Tanganyika and its life. Oxford University Press. 792 Coyne J., Orr H. 2004. Speciation. Sunderland, Massachusets U.S.A.: Sinauer Associates. 793 Day J.J., Santini S., Garcia-Moreno J. 2007. Phylogenetic relationships of the Lake 794 Tanganyika cichlid tribe Lamprologini: the story from mitochondrial DNA. Mol. 795 Phylogenet. Evol. 45:629–42. 796 Delvaux D. 1995. Age of Lake Malawi (Nyasa) and water level fluctuations. Mus. roy. Afrr. 797 centr., Tervuren (Belg.), Dept. Geol. Min., Rapp. ann. 1993 1994. 108:99–108. 798 Duftner N., Koblmüller S., Sturmbauer C. 2005. Evolutionary relationships of the 799 Limnochromini, a tribe of benthic deepwater cichlid fish endemic to Lake Tanganyika, 800 East Africa. J. Mol. Evol. 60:277–289. 801 Esselstyn J.A., Brown R.M. 2009. The role of repeated sea-level fluctuations in the 802 generation of shrew (Soricidae: Crocidura) diversity in the Philippine Archipelago. Mol. 803 Phylogenet. Evol. 53:171–181. 804 Etienne R.S., Haegeman B., Stadler T., Aze T., Pearson P.N., Purvis A., Phillimore A.B. 805 2012. Diversity-dependence brings molecular phylogenies closer to agreement with the 806 fossil record. Proc. R. Soc. B Biol. Sci. 279:1300–1309. 807 Etienne R.S., Rosindell J. 2012. Prolonging the past counteracts the pull of the present: 808 protracted speciation can explain observed slowdowns in diversification. Syst. Biol. 809 61:204–13. 810 Evans B.J., Brown R.M., McGuire J.A., Supriatna J., Andayani N., Diesmos A., Iskandar D., 811 Melnick D.J., Cannatella D.C. 2003. Phylogenetics of fanged frogs: testing bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
812 biogeographical hypotheses at the interface of the asian and Australian faunal zones. 813 Syst. Biol. 52:794–819. 814 Gante H.F., Matschiner M., Malmstrøm M., Jakobsen K.S., Jentoft S., Salzburger W. 2016. 815 Genomics of speciation and introgression in Princess cichlid fishes from Lake 816 Tanganyika. Mol. Ecol. 817 Genner M.J., Knight M.E., Haesler M.P., Turner G.F. 2010. Establishment and expansion of 818 Lake Malawi rock fish populations after a dramatic Late Pleistocene lake level rise. Mol. 819 Ecol. 19:170–82. 820 Genner M.J., Seehausen O., Lunt D.H., Joyce D.A., Shaw P.W., Carvalho G.R., Turner G.F. 821 2007. Age of cichlids: new dates for ancient lake fish radiations. Mol. Biol. Evol. 822 24:1269–82. 823 Glor R.E., Gifford M.E., Larson A., Losos J.B., Schettino L.R., Chamizo Lara A.R., Jackman 824 T.R. 2004. Partial island submergence and speciation in an adaptive radiation: a 825 multilocus analysis of the Cuban green anoles. Proc. R. Soc. B Biol. Sci. 271:2257–65. 826 Heaney L.R. 1985. Zoogeographic Evidence for Middle and Late Pleistocene Land Bridges 827 to the Philippine Islands. Mod. Quat. Res. Southeast Asia. 9:127–143. 828 Heled J., Drummond A.J. 2010. Bayesian Inference of Species Trees from Multilocus Data. 829 Mol. Biol. Evol. 27:570–580. 830 Hutter C.R., Guayasamin J.M., Wiens J.J. 2013. Explaining Andean megadiversity: The 831 evolutionary and ecological causes of glassfrog elevational richness patterns. Ecol. Lett. 832 16:1135–1144. 833 Ivory S.J., Blome M.W., King J.W., McGlue M.M., Cole J.E., Cohen A.S. 2016. 834 Environmental change explains cichlid adaptive radiation at Lake Malawi over the past 835 1.2 million years. Proc. Natl. Acad. Sci.:201611028. 836 Janzen T., Höhna S., Etienne R.S.R.S. 2015. Approximate Bayesian Computation of 837 diversification rates from molecular phylogenies: introducing a new efficient summary 838 statistic, the nLTT. Methods Ecol. Evol. 6:566–575. 839 Kass R., Raftery A. 1995. Bayes Factors. J. Amer. Stat. Assoc. 90:773–795. 840 Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignment software version 7: 841 Improvements in performance and usability. Mol. Biol. Evol. 30:772–780. 842 Klett V., Meyer A. 2002. What, if anything, is a Tilapia?-mitochondrial ND2 phylogeny of 843 tilapiines and the evolution of parental care systems in the African cichlid fishes. Mol. 844 Biol. Evol. 19:865–883. 845 Koblmüller S., Duftner N., Sefc K.M., Aibara M., Stipacek M., Blanc M., Egger B., 846 Sturmbauer C. 2007. Reticulate phylogeny of gastropod-shell-breeding cichlids from 847 Lake Tanganyika--the result of repeated introgressive hybridization. BMC Evol. Biol. 848 7:7. 849 Koblmüller S., Nevado B., Makasa L., van Steenberge M., Vanhove M.P.M., Verheyen E., 850 Sturmbauer C., Sefc K.M. 2016. Phylogeny and phylogeography of Altolamprologus: 851 ancient introgression and recent divergence in a rock-dwelling Lake Tanganyika cichlid bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
852 genus. Hydrobiologia.:1–16. 853 Koblmüller S., Salzburger W., Obermüller B., Eigner E., Sturmbauer C., Sefc K.M. 2011. 854 Separated by sand, fused by dropping water: habitat barriers and fluctuating water levels 855 steer the evolution of rock-dwelling cichlid populations in Lake Tanganyika. Mol. Ecol. 856 20:2272–90. 857 Kocher T.D., Conroy J. a, McKaye K.R., Stauffer J.R., Lockwood S.F. 1995. Evolution of 858 NADH dehydrogenase subunit 2 in east African cichlid fish. Mol. Phylogenet. Evol. 859 4:420–432. 860 Kraaijeveld K., Kraaijeveld-Smit F.J.L., Maan M.E. 2011. Sexual selection and speciation: 861 the comparative evidence revisited. Biol. Rev. Camb. Philos. Soc. 86:367–77. 862 Kullander S.O., Karlsson M., Norén M. 2014a. Chalinochromis cyanophleps, a new species 863 of cichlid fish (Teleostei: Cichlidae) from Lake Tanganyika. Zootaxa. 3790:425–438. 864 Kullander S.O., Norén M., Karlsson M., Karlsson M. 2014b. Description of Neolamprologus 865 timidus, new species, and review of N . furcifer from Lake Tanganyika (Teleostei: 866 Cichlidae). Ichthyol.Explor.Freshwaters. 24:301–328. 867 Lanfear R., Calcott B., Ho S.Y.W., Guindon S. 2012. PartitionFinder: Combined selection of 868 partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 869 29:1695–1701. 870 Lanfear R., Frandsen P.B., Wright A.M., Senfeld T., Calcott B. 2016. PartitionFinder 2: New 871 Methods for Selecting Partitioned Models of Evolution for Molecular and 872 Morphological Phylogenetic Analyses. Mol. Biol. Evol.:msw260. 873 Lezzar K.E., Tiercelin J.J., Batist M., Cohen A.S., Bandora T., Rensbergen P., Turdu C., 874 Mifundu W., Klerkx J. 1996. New seismic stratigraphy and Late Tertiary history of the 875 North Tanganyika Basin, East African Rift system, deduced from multichannel and 876 high-resolution reflection seismic data and piston core evidence. Basin Res. 8:1–28. 877 Linkem C.W., Hesed K.M., Diesmos A.C., Brown R.M. 2010. Species boundaries and 878 cryptic lineage diversity in a Philippine forest skink complex (Reptilia; Squamata; 879 Scincidae: Lygosominae). Mol. Phylogenet. Evol. 56:572–585. 880 Lyons R.P., Scholz C.A., Cohen A.S., King J.W., Brown E.T., Ivory S.J., Johnson T.C., 881 Deino A.L., Reinthal P.N., Mcglue M.M., Blome M.W. 2015. Continuous 1.3-million- 882 year record of East African hydroclimate, and implications for patterns of evolution and 883 biodiversity. PNAS.:2–7. 884 Matschiner M., Hanel R., Salzburger W. 2011. On the origin and trigger of the notothenioid 885 adaptive radiation. PLoS One. 6. 886 Matschiner M., Musilová Z., Barth J.M.I., Starostova Z., Salzburger W., Steel M., Bouckaert 887 R. 2016. Bayesian Phylogenetic Estimation of Clade Ages Supports Trans-Atlantic 888 Dispersal of Cichlid Fishes. Syst. Biol. 889 McGlue M.M., Lezzar K.E., Cohen A.S., Russell J.M., Tiercelin J.-J., Felton A. a., Mbede E., 890 Nkotagu H.H. 2008. Seismic records of late Pleistocene aridity in Lake Tanganyika, 891 tropical East Africa. J. Paleolimnol. 40:635–653. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
892 Meyer B.S., Matschiner M., Salzburger W. 2015. A tribal level phylogeny of Lake 893 Tanganyika cichlid fishes based on a genomic multi-marker approach. Mol. Phylogenet. 894 Evol. 83:56–71. 895 Meyer B.S., Matschiner M., Salzburger W. 2016. Disentangling incomplete lineage sorting 896 and introgression to refine species-tree estimates for Lake Tanganyika cichlid fishes. 897 Syst. Biol. syw069:1–20. 898 Muschick M., Indermaur A., Salzburger W. 2012. Convergent Evolution within an Adaptive 899 Radiation of Cichlid Fishes. Curr. Biol.:1–7. 900 Nagai H., Terai Y., Sugawara T., Imai H., Nishihara H., Hori M., Okada N. 2011. Reverse 901 evolution in RH1 for adaptation of cichlids to water depth in Lake Tanganyika. Mol. 902 Biol. Evol. 28:1769–1776. 903 Nee S., May R.M., Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. 904 R. Soc. Lond. B. Biol. Sci. 344:305–11. 905 Nevado B., KoblmÜller S., Sturmbauer C., Snoeks J., Usano-Alemany J., Verheyen E. 2009. 906 Complete mitochondrial DNA replacement in a Lake Tanganyika cichlid fish. Mol. 907 Ecol. 18:4240–4255. 908 Nevado B., Mautner S., Sturmbauer C., Verheyen E. 2013. Water-level fluctuations and 909 metapopulation dynamics as drivers of genetic diversity in populations of three 910 Tanganyikan cichlid fish species. Mol. Ecol. 22:3933–48. 911 O’Quin K.E., Hofmann C.M., Hofmann H.A., Carleton K.L. 2010. Parallel Evolution of 912 opsin gene expression in African cichlid fishes. Mol. Biol. Evol. 27:2839–2854. 913 Oaks J.R., Sukumaran J., Esselstyn J. a, Linkem C.W., Siler C.D., Holder M.T., Brown R.M. 914 2013. Evidence for climate-driven diversification? A caution for interpreting ABC 915 inferences of simultaneous historical events. Evolution (N. Y). 67:991–1010. 916 Papadopoulou A., Knowles L.L. 2015. Genomic tests of the species-pump hypothesis: Recent 917 island connectivity cycles drive population divergence but not speciation in Caribbean 918 crickets across the Virgin Islands. Evolution (N. Y). 69:1501–1517. 919 Pearse W.D., Purvis A. 2013. phyloGenerator: An automated phylogeny generation tool for 920 ecologists. Methods Ecol. Evol. 4:692–698. 921 Pybus O., Harvey P. 2000. Testing macro–evolutionary models using incomplete molecular 922 phylogenies. Proc. R. Soc. B Biol. Sci. 267:2267–72. 923 Rico C., Turner G.F. 2002. Extreme microallopatric divergence in a cichlid species from 924 Lake Malawi. Mol. Ecol. 11:1585–90. 925 Rossiter A. 1995. The cichlid fish assemblages of Lake Tanganyika: ecology, behaviour and 926 evolution of its species flocks. Adv. Ecol. Res. 927 Salzburger W., Baric S., Sturmbauer C. 2002. Speciation via introgressive hybridization in 928 East African cichlids? Mol. Ecol. 11:619–25. 929 Santos M.E., Braasch I., Boileau N., Meyer B.S., Sauteur L., Böhne A., Belting H.-G., 930 Affolter M., Salzburger W. 2014. The evolution of cichlid fish egg-spots is linked with a bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
931 cis-regulatory change. Nat. Commun. 5:5149. 932 Schelly R., Salzburger W., Koblmüller S., Duftner N., Sturmbauer C. 2006. Phylogenetic 933 relationships of the lamprologine cichlid genus Lepidiolamprologus (Teleostei: 934 Perciformes) based on mitochondrial and nuclear sequences, suggesting introgressive 935 hybridization. Mol. Phylogenet. Evol. 38:426–438. 936 Schelly R., Takahashi T., Bills R., Hori M. 2007. The first case of aggressive mimicry among 937 lamprologines in a new species of Lepidiolamprologus (Perciformes: Cichlidae) from 938 Lake Tanganyika. Zootaxa. 49:39–49. 939 Schwarzer J., Misof B., Tautz D., Schliewen U.K. 2009. The root of the East African cichlid 940 radiations. BMC Evol. Biol. 9:186. 941 Schweiger O., Klotz S., Durka W., Kühn I. 2008. A comparative test of phylogenetic 942 diversity indices. Oecologia. 157:485–95. 943 Sedano R.E., Burns K.J. 2010. Are the Northern Andes a species pump for Neotropical birds? 944 Phylogenetics and biogeography of a clade of Neotropical tanagers (Aves: Thraupini). J. 945 Biogeogr. 37:325–343. 946 Seehausen O. 2000. Explosive speciation rates and unusual species richness in 947 haplochromine cichlid fishes: effects of sexual selection. Adv. Ecol. Res. 31:237–274. 948 Seehausen O. 2006. African cichlid fish: a model system in adaptive radiation research. Proc. 949 R. Soc. B Biol. Sci. 273:1987–98. 950 Sefc K.M., Mattersdorfer K., Ziegelbecker A., Neuhüttler N., Steiner O., Goessler W., 951 Koblmüller S. 2017. Shifting barriers and phenotypic diversification by hybridisation. 952 Ecol. Lett. 953 Shirai K., Inomata N., Mizoiri S., Aibara M., Terai Y., Okada N., Tachida H. 2014. High 954 prevalence of non-synonymous substitutions in mtDNA of cichlid fishes from Lake 955 Victoria. Gene. 552:239–245. 956 Siler C.D., Oaks J.R., Esselstyn J.A., Diesmos A.C., Brown R.M. 2010. Phylogeny and 957 biogeography of Philippine bent-toed geckos (Gekkonidae: Cyrtodactylus) contradict a 958 prevailing model of Pleistocene diversification. Mol. Phylogenet. Evol. 55:699–710. 959 Spady T.C., Seehausen O., Loew E.R., Jordan R.C., Kocher T.D., Carleton K.L. 2005. 960 Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species. 961 Mol. Biol. Evol. 22:1412–1422. 962 Sparks J.S., Smith W.L.W.L. 2004. Phylogeny and biogeography of cichlid fishes (Teleostei: 963 Perciformes: Cichlidae). Cladistics. 20:501–517. 964 Sturmbauer C., Baric S., Salzburger W., Rüber L., Verheyen E. 2001. Lake level fluctuations 965 synchronize genetic divergences of cichlid fishes in African lakes. Mol. Biol. Evol. 966 18:144–54. 967 Sturmbauer C., Salzburger W., Duftner N., Schelly R., Koblmüller S. 2010. Evolutionary 968 history of the Lake Tanganyika cichlid tribe Lamprologini (Teleostei: Perciformes) 969 derived from mitochondrial and nuclear DNA data. Mol. Phylogenet. Evol. 57:266–84. bioRxiv preprint doi: https://doi.org/10.1101/085431; this version posted December 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
970 Sugawara T., Terai Y., Okada N. 2002. Natural Selection of the Rhodopsin Gene During the 971 Adaptive Radiation of East African Great Lakes Cichlid Fishes. Mol. Biol. Evol. 972 19:1807–1811. 973 Thorpe R.S., Surget-Groba Y., Johansson H. 2008. The relative importance of ecology and 974 geographic isolation for speciation in anoles. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 975 363:3071–81. 976 Toni T., Stumpf M.P.H. 2010. Simulation-based model selection for dynamical systems in 977 systems and population biology. Bioinformatics. 26:104–110. 978 Toni T., Welch D., Strelkowa N., Ipsen A., Stumpf M.P.H. 2009. Approximate Bayesian 979 computation scheme for parameter inference and model selection in dynamical systems. 980 J. R. Soc. Interface. 6:187–202. 981 Turner G.F., Seehausen O., Knight M.E., Allender C.J., Robinson R. 2001. How many 982 species of cichlid fishes are there in African lakes? Mol. Ecol. 10:793–806. 983 Vaidya G., Lohman D.J., Meier R. 2011. SequenceMatrix: Concatenation software for the 984 fast assembly of multi-gene datasets with character set and codon information. 985 Cladistics. 27:171–180. 986 Verheyen E., Rüber L., Snoeks J., Meyer A. 1996. Mitochondrial phylogeography of rock- 987 dwelling cichlid fishes reveals evolutionary influence of historical lake level fluctuations 988 of Lake Tanganyika, Africa. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 351:797–805. 989 Wagner C.E., Harmon L.J., Seehausen O. 2012. Ecological opportunity and sexual selection 990 together predict adaptive radiation. Nature. 487:366–9. 991 Wagner C.E., Harmon L.J., Seehausen O. 2014. Cichlid species-area relationships are shaped 992 by adaptive radiations that scale with area. Ecol. Lett. 17:583–592. 993 Wagner C.E., McIntyre P.B., Buels K.S., Gilbert D.M., Michel E. 2009. Diet predicts 994 intestine length in Lake Tanganyika’s cichlid fishes. Funct. Ecol. 23:1122–1131. 995 Weir J.T. 2006. Divergent Timing and Patterns of Species Accumulation in Lowland and 996 Highland Neotropical Birds. Evolution (N. Y). 60:842–855. 997 Weiss J.D., Cotterill F.P.D., Schliewen U.K. 2015. Lake Tanganyika — A ’ Melting Pot ’ of 998 Ancient and Young Cichlid Lineages ( Teleostei : Cichlidae )? :1–29. 999 Winkelmann K., Rüber L., Genner M.J. 2016. Lake level fluctuations and divergence of 1000 cichlid fish ecomorphs in Lake Tanganyika. Hydrobiologia.:1–14.
1001
1002