Early arrival and climatically-linked geographic expansion of New World monkeys from tiny African ancestors
Daniele Silvestro1,2,3, Marcelo F. Tejedor1,4,5, Martha L. Serrano-Serrano2,
Oriane Loiseau2, Victor Rossier2, Jonathan Rolland2,6, Alexander Zizka1,3,
Sebastian Hohna¨ 7, Alexandre Antonelli1,3,8, Nicolas Salamin2
1 Department of Biological and Environmental Sciences, University of Gothenburg, 413 19
Gothenburg, Sweden;
2 Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland;
3 Gothenburg Global Biodiversity Center, Gothenburg, Sweden;
4 Instituto Patagonico de Geologia y Paleontologia (CCT CONICET-CENPAT), Boulevard
Almirante Brown 2915, 9120 Puerto Madryn, Chubut, Argentina;
5 Facultad de Ciencias Naturales, Sede Trelew, Universidad Nacional de la Patagonia ‘San Juan
Bosco’, 9100 Trelew, Chubut, Argentina;
6 Department of Zoology, University of British Columbia, Vancouver, Canada;
7 Division of Evolutionary Biology, Ludwig-Maximilians-Universit¨at,M¨unchen,Germany ;
8 Gothenburg Botanical Garden, Carl Skottsbergs gata 22A, 413 19 Gothenburg, Sweden;
1 Supplementary Text
Simulated data sets
We assessed the accuracy and the efficiency of our Bayesian framework by analyzing
simulated data sets and comparing the estimated rates, trends, and ancestral states with
the true values. For each simulation, we generated a complete phylogenetic tree (extinct
and extant taxa) under a constant rate of birth-death with 100 extant tips (sim.bd.taxa
function with parameters λ = 0.4, and µ = 0.2, in the R package TreePar, (Stadler 2011)).
The number of fossils simulated on the tree was defined by a Poisson distribution with expected number of occurrences (N) equal to the total branch length times the preservation rate (q). We fixed the number of expected fossils N = 20 by setting q to the
ratio between N and the sum of branch lengths. Then, each fossil was placed randomly on
the tree and extinct tips without fossils were pruned out (note that this procedure is
equivalent to that implemented in the FBD original paper (Heath et al. 2014)). Finally, the
fossilized tree was re-scaled to an arbitrary root height of 1.
Phenotypic data were simulated on every tree with the following parameters: rate of
2 evolution (σ ), phenotypic trend (µ0), number of shifts in rate, number of shifts in trend, magnitude of the shifts. We simulated data sets under six evolutionary scenarios:
2 1. constant σ drawn from a gamma distribution Γ(2, 5), and fixed µ0 = 0
2 2. fixed σ = 0.1, and constant µ0 drawn from a normal distribution (2, 0.5) N
2 3. baseline σr = 0.1, fixed µ0 = 0, and one rate shift in a randomly selected clade so that σ2 = m σ2, where m (8, 16). i × r ∼ U
2 2 4. baseline σ = 0.1, fixed µ0 = 0, and two shifts in σ drawn from an uniform distribution representing a change with magnitude between 8 and 16 fold
1 2 5. fixed σ = 0.1, fixed initial µ0 = 0, and one shift in µ0 drawn from a normal distribution (2r, 0.5), where r was randomly set to -1 or 1 (simulating negative or positive trends, N respectively).
2 6. fixed σ = 0.1, fixed µ0 = 0, and two shifts in µ0 each one drawn from a normal distribution (2r, 0.5), where r was randomly set to -1 or 1 (simulating negative or N positive trends, respectively).
2 For the last two scenarios the location of shifts (in σ or µ0) was randomly selected by choosing clades that contained between 25 to 50 tips. The branches within the chosen
clades were assigned with the new parameter value before performing the phenotypic data
simulation using a mapped tree (make.era.map function in R package phytools (Revell
2012)). We simulated 100 data sets under each of the six scenarios.
Analysis of simulated data
We analyzed each simulated dataset to estimate the rate and trend parameters of
2 the BM model (σ and µ0) and the ancestral states. Each dataset was run for 500,000 MCMC generations, sampling every 500 steps. We summarized the results in different
ways. First, for each simulation we graphically inspected the results by plotting the
phylogeny with the width of the branches proportional to the true and estimated rates. We
2 plotted the true versus estimated σ , and µ0 for each branch on the tree (Fig. 2, main text). The true and estimated ancestral states are compared in a phenogram plot (Revell 2012).
Secondly, we numerically quantified the overall accuracy of the parameter estimates
across all simulations using different summary statistics for each set of parameters. For the
BM rate parameters (σ2) we calculated the mean absolute percentage error (MAPE),
defined as: N ˆ2 2 ! 2 1 X σi σi MAP Ej(σ ) = | − | (1) N σ2 i=1 i
2 ˆ2 2 where j is the simulation number, σi is the estimated rate at branch i, σi is the true rate at branch i, and N is the number of branches in the tree. Because the trend parameter can take both negative and positive values, we used the mean absolute error (MAE) to quantify the accuracy of its estimates:
N 1 X MAE (µ ) = µˆi µi (2) j 0 N 0 0 i=1 | − |
ˆi i where µ0 is the estimated trend at branch i and µ0 is the true trend at branch i. We quantified the accuracy of the ancestral state estimates in terms of coefficient of determination (R2) between the true and the estimated values. These summary statistics were computed for each simulation scenario (across 100 replicates) and are provided in
Figs. 2 (main text) and S1.
Finally, we assessed the ability of the BDMCMC algorithm to identify the correct
BM model of evolution in terms of number of shifts in rate and trend parameters. We calculated the mean probability estimated for models with different number of rate shifts
(Kσ2 ranging from 0 to 4) and shifts in trends (Kµ0 ranging from 0 to 4). Note that K = 0 indicates a model with constant rate and/or trend parameter across branches. The estimated posterior probability of a given number of rate shifts was obtained from the frequency at which that model was sampled during the MCMC (Stephens 2000). We averaged these probabilities across 100 simulations under each scenario. We additionally calculated the percentage of simulations in which each model was selected as the best
fitting model (i.e., it was sampled most frequently). These summary statistics are provided in the Table S8.
3 Robustness under incomplete sampling
We assessed the effect of incomplete sampling on the accuracy of trait evolution
estimates in two ways. First we ran 100 additional simulations for scenarios (1) and (2)
with a lower number of fossils (N= 5, 1, and 0) adjusting the q parameter. For simulations
with one fossil, we simulated fossils under a Poisson process with N = 20 and kept only the oldest occurrence in the trait analysis. These simulations show the effects of low preservation rates (i.e. poor fossil record).
Second, we assessed the effects of missing extant taxa. We designed these simulations to reflect the level of taxon sampling and the size of the empirical phylogeny of platyrrhines analyzed in this study. We therefore simulated complete phylogenies of 200 extant species and removed randomly 56% of them from the tree prior to the analysis. The mean number of fossils was kept to 20, which is actually fewer than the 34 fossils included in our empirical analysis. The traits were simulated under scenarios (1) and (2). We simulated 100 data sets and analyzed them under the same settings used for the other simulations.)
Performance tests
We compared the performance of our implementation using Gibbs sampling for ancestral states with that of an implementation using a Metropolis-Hasting algorithm to update the ancestral states, i.e. the method used in other software such as Geiger (Slater et al. 2012; Slater and Harmon 2013; Pennell et al. 2014) and phytools (Revell 2012) (note however that faster alternative implementations are available, e.g. (Ho and Ane 2014)). We implemented the latter option in our code by updating the ancestral states individually using sliding window proposals and acceptance probability based on the posterior ratio.
We ran the two implementations on trees with 50, 100, 500, and 1,000 tips for 100,000
4 MCMC iterations, sampling every 50 iterations and using the true parameter values as
starting values in the MCMC to avoid burnin. We simulated the data using a constant
2 Brownian rate (σ = 0.1) and no trend (µ0 = 0) and ran the analyses by constraining the MCMC to use the simplest Brownian model (i.e. no rate shifts and no trends). We then calculated for each simulation the run time and the effective sample size (ESS) of the posterior and use them to estimate the run time necessary to reach an ESS = 1000. These performance tests (summarized in Fig. S3a), show that the Gibbs sampler achieved the target ESS of 1000 in a much shorter time than the alternative Metropolis-Hastings algorithm. Importantly, the time required to reach ESS = 1000 scaled linearly with tree size using the Gibbs sampler and exponentially using the Metropolis-Hastings.
The joint estimation of model parameters (rates and trends) and ancestral states allow us to compute the likelihood as a product of normal densities (Fig. 1, main text) while avoiding the use of a variance-covariance (VCV) matrix (commonly used in comparative methods (Felsenstein 1988; O’Meara 2012)), which can be very expensive to compute especially for large trees. We ran simulations to assess the performance of the two approaches to calculate the likelihood of the data (product of normal densities against standard calculation using the vcv matrix). We simulated trees of 50, 100, 500, 1,000, and
5,000 tips under a pure birth process (only extant taxa) and trait data under a simple
2 constant Brownian model (σ = 0.1, µ0 = 0). On each data set we ran 100,000 MCMC iterations and recorded the run time. The results of these simulations are summarized in
Fig. S3b and show that the two approaches to calculate the model likelihood perform similarly for small trees (around 100 tips). However, the time to calculate the likelihood by inverting the VCV scales exponentially with tree size, while our implementation with the product of normal densities scales linearly with tree size. Thus, for instance, our implementation was 1.7 times faster than the alternative with 500 tips and 16.4 times faster with 5,000 tips.
5 Episodic Fossilized-Birth-Death Process
Our species-diversification model is based on the reconstructed process (Nee et al.
1994) and its extension, the fossilized-birth-death process (Stadler 2010; Heath et al. 2014).
Most importantly, we assume that rates —speciation, extinction, and fossilization— are constant within episodes, hence using a Episodic-Fossilized-Birth-Death (EFBD) Process
(see Fig. S10a). An episodic model has previously been applied to the birth-death process without fossilization (Stadler 2011; H¨ohna2015; May et al. 2016) and we will follow the same logic.
Let us consider an example, as shown in Fig. S10, where speciation (λ), extinction
(µ) and fossilization (φ) rates shift at given times s. In the example provided in Fig. S10a,
the rates shift at times = s0 =0, s1 =10, s2 =20, s3 =30 . Furthermore, in our model we S { } assume that the process starts with a single lineage at the time of the origin, tor million years in the past. Then, every living lineage at some time t speciates with rate λ(t), goes extinct with rate µ(t), and leaves a recovered fossil with rate φ(t). Our model is strictly bifurcating, that is, at a speciation event, the lineage is replaced by two daughter lineages
(see Fig. S10b). Also note that in our model fossilization does not require extinction, which means that fossil taxa do not have to be terminal taxa. We do not condition the process on survival because we could also observe a phylogeny without extant taxa.
To derive the probability density of observing a reconstructed fossil tree, we break the branches of the phylogeny into branch segments, where a segment begins at either a speciation, fossilization, or rate-shift event and terminates at either a speciation, fossilization, or rate-shift event. Within a segment there must not be any speciation, fossilization, or rate-shift event of the reconstructed fossil tree, thus, the branch segments connect any two consecutive events along a lineage (see Fig. S10.C). Let us denote the start time of a branch segment by to (i.e., older time) and the end time of a branch
6 segment by ty (i.e., younger time). First, we will define the differential equation for the probability of extinction in a
small interval ∆t as
h i E(t + ∆t) = µ(t)∆t + (1 µ(t)∆t) (1 φ(t)∆t) 1 λ(t)∆tE(t) + λ(t)∆tE(t)2 (3) | {z } | − {z } | − {z } | − {z } | {z } i ii iii iv v
which accounts for the three scenarios that could occur in the interval: (i) the process goes
extinct in the interval; (ii) no extinction and (iii) no fossil is sampled, and (iv) no
speciation event occurs but the process goes extinct later; or (v) a speciation event in that
interval and both daughter lineages eventually going extinct and are not sampled (neither
as a fossil nor as an extant tip). Rearranging the equation, eliminating terms of order ∆t2, subtracting E(t) from E(t + ∆t) and dividing by ∆t leads to the differential equation
dE = µ(t) E(t)(λ(t) + µ(t) + φ(t)) + λE(t)2 . (4) dt −
Assuming that rates are constant within an episode —our definition of the episodic
fossilized-birth-death process— and that si+1 > t > si we obtain the solution of the differential equation as
−A (t−s ) 1+Bi−e i i (1−Bi) λi + µi + φi Ai −A (t−s ) 1+Bi+e i i (1−Bi) Ei(t) = − (5) 2λi
where
q 2 Ai = λi µi φi + 4 λi φi (6) − − × ×