A Bayesian Approach for Evaluating the Impact of Historical Events on Rates of Diversification
Total Page:16
File Type:pdf, Size:1020Kb
A Bayesian approach for evaluating the impact of historical events on rates of diversification Brian R. Moorea,1 and Michael J. Donoghueb,1 aDepartment of Integrative Biology, University of California, Berkeley, CA 94720; and bDepartment of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520 Contributed by Michael J. Donoghue, November 29, 2008 (sent for review June 22, 2008) Evolutionary biologists often wish to explore the impact of a rates that relies on cross-validation predictive densities, which particular historical event (e.g., the origin of a novel morphological effectively asks ‘‘How diverse would the effected lineage be if the trait, an episode of biogeographic dispersal, or the onset of an inferred event had no impact on rates of diversification?’’ The ecological association) on rates of diversification (speciation minus cross-validation predictive densities approach provides a versatile extinction). We describe a Bayesian approach for evaluating the inference framework (under both hypothesis-testing and data- correlation between such events and differential rates of diversi- exploration scenarios) for investigating the influence of diverse fication that relies on cross-validation predictive densities. This types of historical events (whether unique or replicated), while approach exploits estimates of the marginal posterior probability simultaneously accommodating various sources of uncertainty. We for the rate of diversification (in the unaffected part of the tree) illustrate this approach with two empirical analyses, first exploring and the marginal probability for the timing of the event to the impact of fruit evolution on rates of diversification in the plant generate a predictive distribution of species diversity that would group Adoxaceae and then investigating the influence of biogeo- be expected had the event not occurred. The realized species graphic dispersal into South America on rates of diversification in diversity can then be compared to this predictive diversity distri- lupines. bution to assess whether rates of diversification associated with the event are significantly higher or lower than expected. Al- Cross-Validation Predictive Diversity Densities Cross-validation pre- though simple, this Bayesian approach provides a robust inference dictive densities (4, 5) are related to the more familiar technique framework that accommodates various sources of uncertainty, known as posterior predictive densities simulation, which has been including error associated with estimates of divergence times, applied to a number of problems in evolutionary biology, such as diversification-rate parameters, and event history. Furthermore, detecting positive selection on amino acid sites (6), evaluating the the proposed approach is relatively flexible, allowing exploration adequacy of nucleotide substitution models (7), and mapping of various types of events (including changes in discrete morpho- mutations (8) and traits on phylogenies (9). Both are sampling- logical traits, episodes of biogeographic movement, etc.) under based approaches that provide a means of evaluating the adequacy both hypothesis-testing and data-exploration inference scenarios. of a given model. Posterior predictive simulation involves drawing Importantly, the cross-validation predictive densities approach model parameter values from their respective marginal posterior EVOLUTION facilitates evaluation of both replicated and unique historical probability distributions (previously estimated from the original events. We demonstrate this approach with empirical examples data under the candidate model) to generate a distribution of concerning the impact of morphological and biogeographic events ‘‘future’’ (predictive) observations. If the model provides an ade- on rates of diversification in Adoxaceae and Lupinus, respectively. quate description of the original data, relevant aspects of the predictive and realized observations should be similar. Adoxaceae ͉ key innovations ͉ Lupinus ͉ speciation ͉ extinction Cross-validation predictive density simulation is similar, but includes the additional step of parsing the data into two comple- mentary subsets, referred to as the ‘‘training’’ and ‘‘testing’’ parti- ocumenting the patterns and understanding the causes of tions (10). This entails estimating the marginal posterior densities variation in diversification rates is a central objective of D for parameters of the model from a subset of the data (the training evolutionary biology. Rates of diversification may be influenced partition), which are then sampled to generate a posterior predic- both by the origin of intrinsic traits—‘‘key innovations,’’ such as tive distribution. This predictive distribution is compared to the morphological, behavioral, or physiological novelties—and the complementary subset of the data (the testing partition), and the incidence of extrinsic events—‘‘key opportunities,’’ such as episodes adequacy of the model is then assessed by its ability to predict of biogeographic or climatic change. Accordingly, a comprehensive relevant aspects of the excluded data. Importantly, by integrating understanding of the causes of differential diversification requires over their respective marginal posterior probability densities, this the ability to explore the impact of a diverse array of both intrinsic approach accommodates uncertainty associated with those param- and extrinsic factors. eter estimates. Several recent phylogeny-based methods have greatly en- More formally, suppose that X is a set of observations { xi; i ϭ hanced our ability to test key innovation hypotheses regarding 1,2,...,n}. The cross-validation predictive densities compose the the influence of intrinsic factors, principally discrete binary set {f(xi͉X(i)); i ϭ 1,2,...,n}, where X(i) denotes all elements except traits, on rates of diversification (1–3). Despite remarkable x . Conveniently, the density f(x ͉X ) predicts what values of x are progress in this area, we perceive the need to extend the i i (i) i phylogenetic study of diversification-rate correlates in 3 ways: to more fully accommodate inherent sources of uncertainty (asso- Author contributions: B.R.M. and M.J.D. designed research; B.R.M. performed research; ciated with estimated divergence times, diversification-rate pa- B.R.M. analyzed data; and B.R.M. and M.J.D. wrote the paper. rameters, event histories, etc.), to address a wider range of The authors declare no conflict of interest. historical events (associated with episodes of change in mor- Freely available online through the PNAS open access option. phology, biogeography, and ecology), and to expand the funda- 1To whom correspondence may be addressed. E-mail: [email protected] or mental mode of inference (to enable both hypothesis testing and [email protected]. data exploration). This article contains supporting information online at www.pnas.org/cgi/content/full/ With these considerations in mind, we describe a Bayesian 0807230106/DCSupplemental. approach for identifying correlates of differential diversification © 2009 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0807230106 PNAS ͉ March 17, 2009 ͉ vol. 106 ͉ no. 11 ͉ 4307–4312 Downloaded by guest on September 30, 2021 likely when the model is fitted to all of the observations except xi. predicted under the background diversification rate (k), which The actual value xi,obs can then be compared with this predictive therefore suggests that event k is correlated with a significantly density in various ways to evaluate whether xi,obs is likely under the increased rate of diversification. Of course, we can also evaluate model (4). These cross-validation predictive densities are calculated both tails of the cross-validation predictive diversity density if the as hypothesis predicts that the event may be correlated with either significant increases or decreases in diversification rate or if we are adopting an exploratory data analysis (EDA) perspective in which ͑ ͉ ͑ ͒͒ ϭ ͵ ͑ ͉ ͑ ͒͒⅐͉͑ ͑ ͒͒ [1] f xi X i f xi , X i X i d the direction of the effect is unspecified. We have so far implicitly assumed the availability of marginal posterior densities for the parameters required to generate the where the integrand is evaluated over the marginal density of the cross-validation predictive diversity density. Our approach lever- model parameters, . ages dedicated implementations of Bayesian MCMC methods to The application of cross-validation predictive densities to the estimate marginal posterior densities for these parameters. A current problem is straightforward. Suppose that the history of our number of existing programs can be used to approximate the joint study group includes an event k that is inferred to occur along an posterior probability density of phylogeny and absolute or relative internal branch v of the phylogeny at some time in the past t , which k k divergence times (under a strict or a relaxed molecular clock) and, subtends a subclade comprising n species. We may suspect that k importantly, to allow estimation of the marginal posterior densities event k is a key innovation/opportunity (hypothesis-testing sce- of the diversification-rate parameters [i.e., and/or , depending on nario), or we may wish to investigate whether this event coincides whether a birth–death or a Yule prior is used to model the with a shift in diversification rate (data-exploration scenario). branching process, respectively (14–17)]. Similarly, estimation of