A Variable-Rate Quantitative Trait Evolution Model Using
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.04.17.440282; this version posted August 9, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 A variable-rate quantitative trait evolution 2 model using penalized-likelihood 1,2 3 Liam J. Revell 1 4 Department of Biology, University of Massachusetts Boston, Boston, Massachusetts 5 02125 2 6 Facultad de Ciencias, Universidad Catolica´ de la Sant´ısimaConcepcion,´ Concepcion,´ 7 Chile 8 Corresponding author: 1;2 9 Liam J. Revell 10 Email address: [email protected] 11 ABSTRACT 12 In recent years it has become increasingly popular to use phylogenetic comparative methods to investigate 13 heterogeneity in the rate or process of quantitative trait evolution across the branches or clades of a 14 phylogenetic tree. Here, I present a new method for modeling variability in the rate of evolution of a 15 continuously-valued character trait on a reconstructed phylogeny. The underlying model of evolution is 2 16 stochastic diffusion (Brownian motion), but in which the instantaneous diffusion rate (s ) also evolves 17 by Brownian motion on a logarithmic scale. Unfortunately, it’s not possible to simultaneously estimate 2 18 the rates of evolution along each edge of the tree and the rate of evolution of s itself using Maximum 19 Likelihood. As such, I propose a penalized-likelihood method in which the penalty term is equal to the 20 log-transformed probability density of the rates under a Brownian model, multiplied by a ‘smoothing’ 21 coefficient, l, selected by the user. l determines the magnitude of penalty that’s applied to rate variation 22 between edges. Lower values of l penalize rate variation relatively little; whereas larger l values result 23 in minimal rate variation among edges of the tree in the fitted model, eventually converging on a single 2 24 value of s for all of the branches of the tree. In addition to presenting this model here, I have also 25 implemented it as part of my phytools R package in the function multirateBM. Using different values of the 26 penalty coefficient, l, I fit the model to simulated data with: Brownian rate variation among edges (the 27 model assumption); uncorrelated rate variation; rate changes that occur in discrete places on the tree; 28 and no rate variation at all among the branches of the phylogeny. I then compare the estimated values 2 29 of s to their known true values. In addition, I use the method to analyze a simple empirical dataset of 30 body mass evolution in mammals. Finally, I discuss the relationship between the method of this article 31 and other models from the phylogenetic comparative methods and finance literature, as well as some 32 applications and limitations of the approach. 33 INTRODUCTION 34 As the quantity and quality of data for phylogenetic inference multiplies rapidly in the current genomic 35 age (Philippe et al., 2005), so too has grown the popularity of using estimated phylogenetic trees to make 36 inferences about the history of life on Earth (Harvey, 1996; Nunn, 2011; O'Meara, 2012). This endeavor, 37 usually referred to as phylogenetic comparative biology, involves taking an evolutionary tree or set of trees 38 obtained from phylogenetic inference, often combined with phenotypic trait data for the species in the 39 tree, and then making some kind of quantitative inference about evolution (Garamszegi, 2014; Harmon, 40 2019). A wide variety of different methodologies now comprise the field of phylogenetic comparative 41 biology. These are often collectively referred to as ‘phylogenetic comparative methods’ (Felsenstein, 42 1985; Nunn, 2011; O'Meara, 2012; Garamszegi, 2014; Harmon, 2019). 43 A significant advance that’s occurred over about the past 15 years has been the development and 44 implementation in software of statistical methodologies that allow the user to model, or explicitly take 45 into consideration, heterogeneity in the tempo and mode of evolution across the branches or clades of 46 our evolutionary tree (e.g., Butler and King, 2004; summarized in Harmon, 2019). For instance, in a 47 now classic paper, O'Meara et al. (2006; also see Thomas et al., 2006) published an innovative model bioRxiv preprint doi: https://doi.org/10.1101/2021.04.17.440282; this version posted August 9, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 2 48 in which the rate of evolutionary change for a trait (normally denominated s in this type of study) 49 was permitted to differ between different parts of the phylogeny, such as specific branches or clades, 50 as determined a priori by the investigator. In the years since the publication of that seminal paper, a 51 wide variety of different related approaches have been developed and implemented in software, such as 52 methods that permit the process or mode of evolution, and not just its tempo, to differ in different parts of 53 a reconstructed phylogeny (e.g., Revell and Harmon, 2008; Revell and Collar, 2009; Revell et al., 2012; 54 Beaulieu et al., 2012; Uyeda and Harmon, 2014; Caetano and Harmon, 2017). 55 The predominant model that’s been used to study quantitative trait evolution on phylogenetic trees 56 is called Brownian motion (Felsenstein, 1985; O'Meara et al., 2006; Revell et al., 2008; Harmon, 2019). 57 Brownian motion is a model of stochastic diffusion in which the trait changes randomly and continuously 58 through time. The expected value of the trait is constant and equal to the starting value, x0; and the 59 variance (either between two or more lineages diverging by the process, or between the current state 2 60 of a lineage and its ancestor) increases linearly with a rate of s (O'Meara et al., 2006; Harmon, 2019). 2 61 Because s is the rate of increase in variance with time under our evolutionary process, our estimate of 2 62 s is typically referred to as the rate of evolution of the trait (O'Meara et al., 2006). Brownian motion is 63 described in much greater detail in other articles and books, such as Felsenstein (2004), O'Meara et al. 64 (2006), Revell et al. (2008), and Harmon (2019). 65 Herein, I propose a new method for modeling variation in the rate of character evolution across the 66 branches and clades of a phylogeny. Under this model, we assume that Brownian evolution has a different 2 67 value (si ) on each of the branches of the tree, and that the log-values of these rates, in turn, have evolved 68 via a separate Brownian process, also called geometric Brownian motion, theoretically with a separate 2 69 rate, sBM – although I’ll describe later why it may not be possible to estimate this rate. I will show how to 70 fit this model to data using a penalized-likelihood approach (Sanderson, 2002) in which the penalty term 2 71 is proportional to the logarithm of the probability of the log rates (log(si )) across edges under our model. 72 I’ll also apply the model to a classic empirical dataset for mammalian body size evolution from Garland 73 et al. (1992), and discuss both applications and limitations of the approach. I envision the primary use of 74 this method to be in exploratory data analysis – although it may also be possible to employ it in examining 75 specific alternative hypotheses for evolutionary change through time. 76 METHODS AND RESULTS 77 In this section I’ll begin by introducing our model of evolution, then I’ll apply it to several different 78 simulated datasets, and, finally, I’ll use the method to analyze a simple empirical dataset and tree. 79 The model 80 The model of this article is as follows. We assume that the phenotypic trait of interest, x, evolves by a 81 process of Brownian motion (Felsenstein, 1985; O'Meara et al., 2006; Harmon, 2019). Under standard 82 Brownian motion evolution, changes are random and normally distributed, with a mean value of 0 and a 2 2 83 variance of s t, in which s is the instantaneous variance of the Brownian process and t is the elapsed 84 time over which change is presumed to have occurred (Felsenstein, 2004; Harmon, 2019). 85 In this scenario, we expect that the values for our trait at the tips of the phylogeny will have a 86 multivariate normal distribution with an expectation equal to the value of the trait at the start of the 2 87 process, x0, and a variance-covariance matrix given by s C (e.g., O'Meara et al., 2006; Revell and Collar, 88 2009). Here, C is an n × n matrix for n species containing the height above the root of the common 89 ancestor of each pair of terminal taxa, i and j, in the matrix position Ci; j (e.g., Rohlf, 2001; Revell et al., 2 90 2008). s is the rate of evolution, as previously defined. 91 With just data and our phylogeny, we can find Maximum Likelihood estimates of the two parameters 2 92 of the Brownian model, s and x0, simply by maximizing the following expression giving the likelihood 93 (O'Meara et al., 2006). This equation may be familiar to many readers because it’s based on the multivariate 94 normal probability density (O'Meara et al., 2006; Revell and Harmon, 2008).