<<

Ecology, and Organismal Biology Ecology, Evolution and Organismal Biology Publications

1-2016 Phylogenies, the Comparative Method, and the Conflation of Tempo and Mode Antigoni Kaliontzopoulou CIBIO/InBio, Vairão, Portugal

Dean C. Adams Iowa State University, [email protected]

Follow this and additional works at: http://lib.dr.iastate.edu/eeob_ag_pubs Part of the Evolution Commons, and the Statistical Models Commons The ompc lete bibliographic information for this item can be found at http://lib.dr.iastate.edu/ eeob_ag_pubs/208. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html.

This Article is brought to you for free and open access by the Ecology, Evolution and Organismal Biology at Iowa State University Digital Repository. It has been accepted for inclusion in Ecology, Evolution and Organismal Biology Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Page 1 of 52 Systematic Biology

1 2 3 1 Running head: CONFLATION OF TEMPO AND MODE 4 5 6 7 2 8 9 10 3 Phylogenies, the Comparative Method, and the Conflation of Tempo and Mode 11 12 13 4 14 15 *,1,2 2,3 16 5 ANTIGONI KALIONTZOPOULOU AND DEAN C. ADAMS 17 18 19 20 6 21 22 1 23 7 CIBIO/InBio, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus 24 25 8 Agrario de Vairão, 4485661 Vairão, Portugal 26 27 28 9 2 Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 29 30 31 10 50011, USA 32 33 3 34 11 Department of Statistics, Iowa State University, Ames, Iowa 50011, USA 35 36 37 12 *Correspondence to be sent to: CIBIO/InBio, Centro de Investigação em Biodiversidade e 38 39 40 13 Recursos Genéticos, Campus Agrario de Vairão, 4485661 Vairão, Portugal; Email: 41 42 14 [email protected] 43 44 45 15 46 47 48 49 50 51 This is a pre-copyedited, author-produced version of an article accepted for publication in 52 Systematic Biology following peer review. The version of record Antigoni Kaliontzopoulou, 53 Dean C. Adams; Phylogenies, the Comparative Method, and the Conflation of Tempo and 54 Mode. Syst Biol 2016; 65 (1): 1-15 is available online at doi: 10.1093/sysbio/syv079 55 56 57 58 59 60 1 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 2 of 52

1 2 3 16 Abstract 4 5 6 7 17 The comparison of mathematical models that represent alternative hypotheses about the tempo 8 9 18 and mode of evolutionary change is a common approach for assessing the evolutionary processes 10 11 19 underlying phenotypic diversification. However, because model parameters are estimated 12 13 14 20 simultaneously, they are inextricably linked, such that changes in tempo, the pace of evolution, 15 16 21 and mode, the manner in which evolution occurs, may be difficult to assess separately. This may 17 18 22 potentially complicate biological interpretation, but the extent to which this occurs has not yet 19 20 21 23 been determined. In this study, we examined 160 phylogeny × trait empirical datasets, and 22 23 24 conducted extensive numerical phylogenetic simulations, to investigate the efficacy of 24 25 26 25 phylogenetic comparative methods to distinguish between models that represent different 27 28 26 evolutionary processes in a phylogenetic context. We observed that, in some circumstances, a 29 30 27 high uncertainty exists when attempting to distinguish between alternative evolutionary scenarios 31 32 33 28 underlying phenotypic variation. When examining datasets simulated under known conditions, 34 35 29 we found that evolutionary inference is straightforward when phenotypic patterns are generated 36 37 30 by simple evolutionary processes that are represented by modifying a single model parameter at 38 39 40 31 a time. However, inferring the exact nature of the evolutionary process that has yielded 41 42 32 phenotypic variation when facing complex, potentially more realistic, mechanisms is more 43 44 problematic. A detailed investigation of the influence of different model parameters showed that 45 33 46 47 34 changes in evolutionary rates, marked changes in phylogenetic means, or the existence of a 48 49 35 strong selective pull on the data, are all readily recovered by phenotypic model comparison. 50 51 52 36 However, under evolutionary processes with a milder restraining pull acting on trait values, 53 54 37 alternative models representing very different evolutionary processes may exhibit similar 55 56 38 goodnessoffit to the data, potentially leading to the conflation of interpretations that emphasize 57 58 59 60 2 http://mc.manuscriptcentral.com/systbiol Page 3 of 52 Systematic Biology

1 2 3 39 tempo and mode during empirical evolutionary inference. This is a mathematical and conceptual 4 5 6 40 property of the considered models that, while not prohibitive for studying phenotypic evolution, 7 8 41 should be taken into account and addressed when appropriate. 9 10 11 42 12 13 14 15 43 Keywords: phylogeny, comparative method, tempo, mode, phenotypic evolution, model fit 16 17 18 44 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 3 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 4 of 52

1 2 3 45 The phylogenetic comparative method, where trait values are examined in light of 4 5 6 46 the phylogeny of the group to infer the evolutionary processes that have shaped phenotypic 7 8 47 diversity, is a major framework in (Harvey and Pagel 1991). In recent 9 10 11 48 years, remarkable advances have been made by the development of new tools for investigating 12 13 49 macroevolutionary phenotypic patterns and testing hypothesis about the biological mechanisms 14 15 50 that shape them. Rooted in the approaches of phylogenetic independent contrasts (Felsenstein 16 17 18 51 1985, 1988) and phylogenetic generalized least squares (PGLS: Grafen 1989; Rohlf 2001), 19 20 52 numerous methods have been developed to investigate how phenotypes diversify over 21 22 53 evolutionary time. Testing for diversifying selection and (Butler and King 2004) or 23 24 25 54 for (Harvey and Rambaut 2000; Glor 2010; Harmon et al. 2010); 26 27 55 understanding whether morphological disparity is coupled to cladogenesis (Harmon et al. 2003; 28 29 56 Ricklefs 2004; Rabosky and Adams 2012) or species diversification (Bokma 2002; Adams et al. 30 31 32 57 2009; Rabosky and Adams 2012); identifying phenotypic convergence and parallelism (Harmon 33 34 58 et al. 2005; Stayton 2006; Revell et al. 2007; Adams 2010); and examining the correlation 35 36 37 59 among traits through evolutionary history (Martins and Garland 1991; Pagel 1998; Revell and 38 39 60 Collar 2009) are only some examples of how the study of phenotypic traits on phylogenies have 40 41 61 aided biologists in understanding the processes driving diversification. 42 43 44 Common to all these approaches is the use of mathematical models that aim at 45 62 46 47 63 approximating the tempo and mode of evolutionary change (Simpson 1944; Fitch and Ayala 48 49 64 1994). These models are rooted in similar methods first developed in paleontology to explore 50 51 52 65 how phenotypes evolve. Researchers in this field have long been concerned with evolutionary 53 54 66 tempo and mode, which they study by using data from the fossil record to infer these 55 56 67 evolutionary parameters (Gingerich 1976; Gould and Eldredge 1977; Gould 1980; Fitch and 57 58 59 60 4 http://mc.manuscriptcentral.com/systbiol Page 5 of 52 Systematic Biology

1 2 3 68 Ayala 1994). Paleontological studies were profoundly influenced by the hallmark contribution of 4 5 6 69 George Gaylord Simpson (1944) in which he used the word “tempo” to define the pace at which 7 8 70 phenotypic evolution proceeds. Likewise, he defined “mode” as “…the study of the way, 9 10 11 71 manner, or pattern of evolution, a study in which tempo is a basic factor…” (Simpson, 1944). In 12 13 72 his definitions, Simpson inextricably linked tempo and mode together: the selfcontained 14 15 73 description of how fast evolutionary changes occurs (tempo) was a basic component for 16 17 18 74 describing the way in which these changes are attained (mode). Indeed, a recent investigation of 19 20 75 the paleontological methods used to estimate and compare evolutionary rates shows that different 21 22 76 rate metrics perform better depending on the mode of evolution (Hunt, 2012). Thus, in 23 24 25 77 paleontological studies, it is clear that tempo and mode are intimately related and can often not 26 27 78 be accurately characterized independently (Hunt, 2012). This observation raises an important 28 29 79 question: is this also the case when using phylogenetic comparative approaches to assess 30 31 32 80 phenotypic evolution of extant taxa? 33 34 35 81 In modern phylogenetic comparative methods, the tempo and mode of evolution are 36 37 82 approached through mathematical models that describe extant phenotypic variation given a 38 39 40 83 phylogenetic hypothesis for the group of interest. The first breakthrough towards modeling how 41 42 84 continuous phenotypic traits evolve on phylogenies was the introduction of a randomwalk 43 44 model (Brownian Motion, BM; Edwards and CavalliSforza 1964; Felsenstein 1973, 1985, 1988; 45 85 46 47 86 Harvey and Pagel 1991). Under BM, phenotypic variation accumulates linearly over time and the 48 49 87 amount of change in the value of a phenotypic trait (X) over a small time interval (t) can be 50 51 52 88 modeled as: 53 54 89 (1) 55 = 56 57 58 59 60 5 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 6 of 52

1 2 3 90 In equation (1), dB(t) represents independent, normally distributed, random perturbations 4 5 6 91 and σ is the evolutionary rate or variance. The maximum likelihood estimator of the evolutionary 7 8 92 rate is given by: 9 10 11 12 93 (2) 13 = 14 15 94 where C is the phylogenetic variancecovariance matrix, N is the number of taxa, X is the vector 16 17 95 of phenotypic trait values at the tips of the phylogeny and E(X) is the expected value of X, or the 18 19 20 96 phylogenetic mean, corresponding to the value at the root node of the phylogeny under BM 21 22 97 (O’Meara et al. 2006). The evolutionary rate σ is a central parameter of the BM model, as it 23 24 25 98 captures how fast evolution proceeds. As such, it represents Simpson’s idea of evolutionary 26 27 99 tempo. 28 29 30 100 Despite its enormous utility and wide application in evolutionary research, the BM model 31 32 33 101 is sometimes too simple to represent complex evolutionary reality (Butler and King 2004; 34 35 102 Beaulieu et al. 2012). Extensions to this model have thus been developed to allow assessing not 36 37 103 only how fast, but also how evolution has generated the phenotypic patterns observed in nature. 38 39 40 104 One family of these extended models aims at providing a solution for modeling the tempo of 41 42 105 phenotypic evolution more accurately. For instance, the pace of phenotypic evolution may vary 43 44 106 across single branches of the phylogeny (McPeek 1995; O’Meara et al. 2006; Revell 2008), 45 46 47 107 between groups of taxa on a phylogeny (Garland 1992; O’Meara et al. 2006; Thomas et al. 2006, 48 49 108 2009; Adams 2014), across evolutionary time (Pagel 1999; Blomberg et al. 2003; Harmon et al. 50 51 109 2010), or among traits (Adams 2013). Such evolutionary hypotheses are tested by fitting models 52 53 54 110 of evolution that encompass more than one evolutionary rate parameter across the phylogeny, 55 56 111 and then comparing their fit to a singlerate BM. 57 58 59 60 6 http://mc.manuscriptcentral.com/systbiol Page 7 of 52 Systematic Biology

1 2 3 112 Another family of models includes an additional term, yielding an OrnsteinUhlenbeck 4 5 6 113 (OU) process, which describes an evolutionary ‘pull’ of trait mean value towards one or more 7 8 114 optima through time: 9 10 11 115 (3) 12 = + [ − ] 13 14 15 116 The first term of equation (3) corresponds to the random walk component, while the 16 17 117 second term represents the strength of selection (α) towards a phenotypic optimum (β) (Butler 18 19 118 and King 2004; Beaulieu et al. 2012). Notice that here we follow the notation of Beaulieu et al. 20 21 22 119 (2012) and represent phenotypic optima as β, to avoid confusion with the notation θ, sometimes 23 24 120 used for the relative rate parameter (i.e. Thomas et al. 2006; 2009). From the above mathematical 25 26 121 formulation, the first term of equation (3) is dominated by the evolutionary rate σ. The second 27 28 29 122 term represents a change in mean trait value, occurring towards an optimal state β under a pace 30 31 123 proportional to α (Butler and King 2004). By varying the terms α and β of equation (3), one can 32 33 34 124 represent evolutionary changes that vary in strength and direction, correspondingly (Butler and 35 36 125 King 2004; Beaulieu et al. 2012). For α=0, equation (3) collapses back to a BM process. 37 38 126 Variation in the relative influence of σ and α would then yield models that represent evolutionary 39 40 41 127 processes that lie closer or further away from the simple BM model. In contrast to the first family 42 43 128 of models, though, which focus on modifications of the speed by which evolution proceeds, 44 45 129 these models represent a shift from a randomwalk (BM) to an evolutionary process that also 46 47 48 130 encompasses changes in trait mean value. 49 50 51 131 Recently, more complex models have been developed in an attempt to characterize the 52 53 132 biological mechanisms underlying phenotypic evolution more accurately. For instance, this can 54 55 56 133 be done either by allowing all σ, α and β in equation (3) to vary (Beaulieu et al. 2012); or by 57 58 59 60 7 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 8 of 52

1 2 3 134 incorporating different phylogenetic means for different parts of the tree in the calculation of σ 4 5 6 135 (O’Meara et al. 2006; Thomas et al. 2006, 2009). In each case, model parameters are 7 8 136 simultaneously estimated, typically in concert with maximizing the corresponding likelihood 9 10 11 137 equation (but see also e.g. Revell et al. 2011; Eastman et al. 2011; Revell and Reynolds 2012 for 12 13 138 Bayesian implementations). Some of these parameters contribute to modeling trait variance 14 15 139 across taxa through a mean value (i.e. phylogenetic means E(X), optimal trait values β), while 16 17 18 140 others model residual variance (i.e. evolutionary rates σ, strength of selection α). Alternative 19 20 141 models are then compared by evaluating their fit to the data given the underlying phylogeny 21 22 142 through likelihood comparison methods (e.g. likelihood ratio tests, information theoretic criteria, 23 24 25 143 or Monte Carlo simulations; Boettiger et al. 2012). Through this procedure, evolutionary 26 27 144 biologists attempt to obtain a reliable model of the historical events that underlie current 28 29 145 phenotypic variation. As models become more complex, though, inference becomes more 30 31 32 146 complicated. This is because each of the mathematical parameters used to characterize 33 34 147 phenotypic evolution in a phylogenetic context is estimated with respect to the other parameters 35 36 37 148 included in the underlying model. Therefore, it is of interest to determine whether changes in 38 39 149 model parameters can be readily assessed when using phylogenies to study phenotypic evolution. 40 41 42 150 In this article we investigate the efficacy of comparative methods to distinguish between 43 44 phylogenetic comparative models that emphasize changes in different evolutionary parameters. 45 151 46 47 152 We restrict our study to those cases where evolutionary changes are found across groups on a 48 49 153 phylogeny. These encompass questions about how ecological, biogeographic, historical or other 50 51 52 154 history factors have influenced trait diversification, and they are among the most frequently 53 54 155 examined hypotheses in phenotypic evolution. Based on empirical data, we demonstrate that it is 55 56 156 frequently the case that models representing very different evolutionary processes are equally 57 58 59 60 8 http://mc.manuscriptcentral.com/systbiol Page 9 of 52 Systematic Biology

1 2 3 157 likely for describing the data given a phylogeny. A detailed examination of model parameters 4 5 6 158 using simulations indicates that model performance is more strongly dependent on variations in 7 8 159 mode than on variations in tempo. When complex evolutionary processes are considered and 9 10 11 160 differences in means is not prominent, different models receive similar support, potentially 12 13 161 leading to the conflation of radically different interpretations during evolutionary inference. This 14 15 162 is a property of the considered models that, while not prohibitive for studying phenotypic 16 17 18 163 evolution, should be taken into account and addressed when appropriate. 19 20 21 164 22 23 24 165 DATASETS AND MODELS CONSIDERED 25 26 27 166 In an effort to determine whether different model parameters could be unambiguously 28 29 30 167 approached using standard phylogenetic comparative techniques, we examined previously 31 32 168 published datasets and conducted numerical simulations. Empirical datasets were used to 33 34 169 investigate the degree of ambiguity encountered when using phylogenetic comparative models 35 36 37 170 on real biological data. In continuation, we used simulations to build specific evolutionary 38 39 171 scenarios, where the evolutionary process underlying phenotypic variation was considered 40 41 172 known. This allowed us to address whether the inferred evolutionary process found by 42 43 44 173 comparing the fit of alternative evolutionary models to the data matched the known process 45 46 174 under which the phenotype were actually generated. 47 48 49 175 For each phylogeny × trait dataset, empirical or simulated, we fit six evolutionary 50 51 52 176 models, which represent different hypotheses about the process underlying evolutionary change. 53 54 177 The simplest model examined, often considered as a null hypothesis, was a simple BM, with a 55 56 57 178 single evolutionary rate across all branches (BM1). In a BM framework, we also examined two 58 59 60 9 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 10 of 52

1 2 3 179 other models, which consider variation in evolutionary rates across groups of taxa. The first fits a 4 5 6 180 different evolutionary rate for each group based on a single phylogenetic mean (BMS; i.e. the 7 8 181 “noncensored” approach of O’Meara et al. 2006). The second fits a different evolutionary rate 9 10 11 182 for each group, also considering different evolutionary means for each group (BMSG; i.e. the 12 13 183 “censored” approach of O’Meara et al. 2006; see also Thomas et al. 2006, 2009). Note that the 14 15 184 method proposed by Thomas et al. 2006 corresponds to the censored approach developed by 16 17 18 185 O’Meara et al. 2006, when at least one of the examined groups is monophyletic (see online 19 20 186 Appendix 3, doi: 10.5061/dryad.2ss46). In an OU framework, we fit three models: the first with 21 22 187 a single rate and a single optimum for all taxa (OU1), the second with a single rate but different 23 24 25 188 optima for each group of taxa (OUM) and the third with different optima and different rates for 26 27 189 each group of taxa (OUMV) (Butler and King 2004; Beaulieu et al. 2012). We did not consider 28 29 190 variation in the strength of selection (α) across groups, as this parameter has only recently been 30 31 32 191 allowed to vary in evolutionary models and the inference of differences in α has been shown to 33 34 192 bear a high uncertainty (Beaulieu et al. 2012; but see also below for the influence of α on model 35 36 37 193 inference). Models were fit using the Rpackages OUwie (Beaulieu et al. 2012) and motmot 38 39 194 (Thomas and Freckleton 2012). Before engaging in any model comparisons, we first used 40 41 195 simulations to confirm that the two software packages provide comparable parameter estimates 42 43 44 196 and likelihood scores (online Appendix 4). Models that did not reach convergence were excluded 45 46 197 from model comparison procedures (see online Appendix 1 for details). 47 48 49 198 Once all models were computed, we compared their fit to the data using likelihood 50 51 52 199 approaches. This is a strategy with a long history in ecology and evolutionary biology, and 53 54 200 several different measures may be used to compare the goodnessoffit of different models 55 56 201 (Burnham and Anderson 2002). One of the most commonly used in recent studies of phenotypic 57 58 59 60 10 http://mc.manuscriptcentral.com/systbiol Page 11 of 52 Systematic Biology

1 2 3 202 evolution is Akaike´s information criterion (AIC). Model comparisons are typically performed 4 5 6 203 by ranking the models based on their AIC. Models that lie less than four AIC units from the 7 8 204 model with the lowest value (AIC<4) are usually considered as indistinguishable in terms of 9 10 11 205 their goodnessoffit to the data (Burnham and Anderson 2002). Here we followed a twostep 12 13 206 procedure to evaluate the goodnessoffit of alternative models. First, for each phylogeny × trait 14 15 207 dataset we ranked all models fitted based on their AICc and retained only those with AICc<4. 16 17 18 208 We used AICc, which implements a correction to AIC for finite sample size (Burnham and 19 20 209 Anderson 2002). Second, we calculated pairwise AICc values to evaluate similarity in fit 21 22 210 between pairs of models lying in the AICc<4 interval. Notice, however, that this approach can 23 24 25 211 be somewhat problematic, because our comparisons involved several pairs of nested models. 26 27 212 Under such circumstances, the loglikelihood of the simpler of two models can never exceed that 28 29 213 of the more complex one (Hunt 2006). As such, if there is no difference in likelihood between 30 31 32 214 the two alternative models, AIC is driven exclusively by the difference in the number of 33 34 215 parameters and it will be e.g. AIC ≈ 2 if the models differ by only one in the number of 35 36 37 216 estimated parameters (Burnham and Anderson 2002: p. 131). In such cases, one or more models 38 39 217 may lie in the AICc<4 interval, but if the best model is nested in others it should be preferred, 40 41 218 and there is in reality no uncertainty in terms of model selection. Taking this property of AIC 42 43 44 219 into account, we examine comparisons of nested and nonnested pairs of models separately when 45 46 220 considering the results obtained in relation to variations of evolutionary model parameters. 47 48 49 221 50 51 52 222 EVOLUTIONARY INFERENCE ON EMPIRICAL DATASETS 53 54 55 56 57 58 59 60 11 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 12 of 52

1 2 3 223 We compiled a total of 160 phylogeny × trait empirical datasets from 21 published 4 5 6 224 studies that tested for differences between groups in evolutionary tempo and mode across 7 8 225 phylogenies. These encompass a wide array of continuous phenotypic traits for several plant and 9 10 11 226 animal taxa, examined on phylogenies of different sizes (Table 1). In all these studies, the 12 13 227 authors examined the effect of some biological factor of interest on phenotypic evolution, by 14 15 228 comparing tempo and mode attributes of the evolutionary process across biologically identified 16 17 18 229 groups of interest. We used the biological hypotheses examined by the authors in the original 19 20 230 studies to allocate taxa into groups for our comparisons. However, to delimit our analyses, we 21 22 231 only conducted comparisons of two groups, as an increase in the number of groups can only 23 24 25 232 make inference more complex. In this context, when original studies included evolutionary 26 27 233 models with multiple rates, evolutionary means, or adaptive optima, we broke down these 28 29 234 hypotheses into pairwise group comparisons and compared one group to the pool of all other 30 31 32 235 groups considered. Furthermore, for all empirical datasets examined, we closely followed 33 34 236 preliminary data processing operations (e.g. logarithmic transformations and sizecorrection of 35 36 37 237 phenotypic traits, treelength standardizations) as described by the authors. 38 39 40 238 41 42 43 239 Results from Empirical Datasets 44 45 46 240 Pairwise comparison between models within 4 AICc units of the firstranking model 47 48 49 241 indicated that mean pairwise AICc values are often below the established threshold of 4 units 50 51 242 (Fig. 1; online Appendix 1). This means that pairs of models representing different evolutionary 52 53 243 hypotheses are frequently very similar in terms of their fit to the data, and thus the best estimate 54 55 56 244 of the evolutionary process may not be easily identifiable. This lack of statistical distinction 57 58 59 60 12 http://mc.manuscriptcentral.com/systbiol Page 13 of 52 Systematic Biology

1 2 3 245 between models frequently occurred within the BM and within the OUfamilies of models (Fig. 4 5 6 246 1). In these cases, the ambiguity in identifying the best model was among models encompassing 7 8 247 variations between groups in the rate parameter or among models encompassing variations in the 9 10 11 248 number of hypothesized optima. These sets, however, include models that are nested and 12 13 249 therefore this lack of statistical distinctiveness does not hinder evolutionary inference. 14 15 16 250 However, AICc was also frequently below 4 units when comparing BMS to OU1 or 17 18 251 OUM, and BMSG to OUM or OUMV models (Fig. 1). In these cases, statistical lack of 19 20 21 252 distinctiveness in model fit also translates into an evolutionary uncertainty. Indeed, these pairs of 22 23 253 models are not nested and they differ in the model parameters they include, representing 24 25 26 254 radically different evolutionary processes. For instance, a model with two rates under BM (BMS) 27 28 255 is frequently indistinguishable in terms of fit to the data from an OU1 model where all species 29 30 256 evolve towards a single optimum under a single rate. These results indicate that inferring the 31 32 33 257 correct evolutionary model underlying phenotypic diversification between groups may hold a 34 35 258 higher uncertainty than is generally appreciated. 36 37 38 259 39 40 41 SIMULATIONS UNDER DIFFERENT EVOLUTIONARY SCENARIOS 42 260 43 44 45 261 The empirical results above demonstrate that biologists interested in understanding 46 47 262 phenotypic evolution may frequently face difficulties when testing hypotheses that encompass 48 49 263 the simultaneous modification of several model parameters. To examine the generality of this 50 51 52 264 observation, we used simulations where a continuous trait was simulated on a phylogeny using a 53 54 265 known evolutionary process, and the resulting phenotypic patterns were subsequently evaluated 55 56 57 266 using several evolutionary models that differ in their included parameters. In this way, we could 58 59 60 13 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 14 of 52

1 2 3 267 determine the extent to which the inferred evolutionary process found by comparing the fit of 4 5 6 268 alternative evolutionary models to the data and the phylogeny matched the known process under 7 8 269 which the phenotype actually evolved. 9 10 11 270 Briefly, our simulation protocol was as follows. First we simulated a continuous 12 13 14 271 phenotypic trait evolving on 64taxa, random phylogenies. We began by simulating a single set 15 16 272 of 1000 purebirth phylogenies (λ=1, =0) using the pbtree function of phytools Rpackage 17 18 273 (Revell 2012). All phylogenetic trees were scaled to unit total length, and used as the underlying 19 20 21 274 phylogenetic hypothesis for all simulations. We used OUwie (Beaulieu et al. 2012) and motmot 22 23 275 (Thomas and Freckleton 2012) Rpackages to simulate trait values for two groups under the six 24 25 26 276 evolutionary models described above. Group membership, represented as a binary trait evolving 27 28 277 on the phylogeny, was randomized by sampling a single shift in the binary trait in a node 29 30 278 relatively deep in the tree, thus yielding at least one monophyletic group. We deliberately 31 32 33 279 avoided examining groups randomly distributed across the phylogeny, because rate estimates in 34 35 280 such circumstances are known to be inflated when both means and rates vary across groups, and 36 37 281 rate comparison methods may present a high type I error (Thomas et al. 2009). For group 38 39 40 282 distribution simulations, we used ape (Paradis et al. 2004), geiger (Harmon et al. 2009) and 41 42 283 phangorn (Schliep 2011) packages for R (R Development Core Team 2012). 43 44 45 284 For the continuous trait, we simulated datasets with or without differences in 46 47 2 2 2 48 285 evolutionary rates (setting σ 1 = 1 and the relative rate parameter σ 2 / σ 1 to either 1 or 6 49 50 286 respectively), under both BM and OU; with or without differences in phylogenetic means under 51 52 287 BM (setting the difference in means to either 0 or 3 standard deviations of the mean phenotype at 53 54 55 288 the tips; see Thomas et al. 2006, 2009); and with or without differences in optima (β) under OU 56 57 289 (setting β1 = 1 and the relative optima parameter β2/ β1 to either 1 or 5, which correspond to an 58 59 60 14 http://mc.manuscriptcentral.com/systbiol Page 15 of 52 Systematic Biology

1 2 3 290 absolute difference in optima of 0 or 4, respectively). These parameter settings were chosen to 4 5 6 291 closely match those observed in empirical studies, while maximizing the potential for 7 8 292 discriminating between different models (but see below for variation in simulated parameter 9 10 11 293 values). For instance, the tree size used for simulations (64 taxa) approaches that of empirical 12 13 294 datasets (mean tree size = 80), while facilitating computation for simulations. Similarly, the 14 15 295 mean value of relative rate estimated by the BMS and BMSG models for empirical datasets was 16 17 2 2 2 2 18 296 approximately σ 2 / σ 1 = 5 (depending on the model fit) and we set this parameter to σ 2 / σ 1 = 6. 19 20 297 In OUbased simulations, the “rubber band” parameter (Butler & King, 2004) was set to α = 1, 21 22 298 which translates to a moderate phylogenetic halflife of t1/2 = ln(2)/α ≈ 0.69. This parameter 23 24 25 299 represents the time it takes for adaptation to a new optimum to become more influential than 26 27 300 constraints from the ancestral state and, as such, it substantially influences the dynamics of OU 28 29 301 models (Hansen, 1997; see also further on for the effect of variation in α). Combinations of the 30 31 32 302 above parameter settings yield the six models examined for the empirical datasets (e.g. BM1, 33 34 303 BMS, BMSG, OU1, OUM and OUMV). We simulated a total of 1000 datasets under each 35 36 37 304 model. 38 39 40 305 We then fit the same set of six models to all simulated datasets and followed the same 41 42 306 procedure as above to filter out models that did not reach convergence. In brief, we first excluded 43 44 all models that failed to converge (giving package errors and failing to provide a solution), as 45 307 46 47 308 well as those models in which the estimated alpha parameter equaled the upper bound of the 48 49 309 optimizing algorithm. Then, for each type of model fitted, we examined the distribution of 50 51 52 310 estimates for each of the model parameters across the 1000 datasets simulated for each 53 54 311 evolutionary process and filtered out models with relative sigma, difference in means (for the 55 56 312 BMSG model) or relative optima (for the OUM and OUMV models) parameter values that were 57 58 59 60 15 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 16 of 52

1 2 3 313 outside the 99% quantiles of the parameter distribution. Finally, we conducted comparisons 4 5 6 314 using AICc to gain insight into potential convergence between models in terms of fit to the 7 8 315 data. 9 10 11 316 12 13 14 15 317 Simulation Results 16 17 18 318 In accordance with the pattern observed in empirical datasets, model evaluation of data 19 20 319 simulated under known evolutionary processes indicated that, under the simulation conditions 21 22 23 320 used, several of the candidate models could not be efficiently distinguished with respect to their 24 25 321 statistical fit to the data. Importantly, while the model used to simulate the data was very often 26 27 322 the one with the lowest mean AICc score for that simulation condition, several other 28 29 30 323 evolutionary models were within 4 AICc units from it or exhibited lower AICc values than it 31 32 324 (Fig. 2). Since using simulations enables us to access the “true” underlying process, this indicates 33 34 325 that alternative evolutionary models might be equally plausible explanations for phenotypic 35 36 37 326 patterns generated under a specific evolutionary model. For the most simple model available 38 39 327 (BM1; Fig. 2a), several other models exhibited AICc<4, but BM1 was globally the bestfit 40 41 328 model. However, when model parameters varied between groups inferential ambiguity increased. 42 43 44 329 For instance, when all taxa evolved towards a single selective optimum (OU1; Fig. 2d), several 45 46 330 other models were frequently in the AICc<4 interval. 47 48 49 331 Interestingly, we found that variation in evolutionary rates between groups of taxa was 50 51 2 2 52 332 generally easy to detect. When data were simulated with θ = σ 2 / σ 1 ≠ 1 (e.g. under the BMS, 53 54 333 BMSG and OUMV models), models that represented a process with a single rate across the 55 56 57 334 phylogeny were visibly worse in terms of likelihood, and exhibited very high AICc scores (Fig. 58 59 60 16 http://mc.manuscriptcentral.com/systbiol Page 17 of 52 Systematic Biology

1 2 3 335 2b, 2c, 2f). Thus, current models used in phylogenetic comparative methods are capable of 4 5 6 336 diagnosing the presence of multiple evolutionary rates on a phylogeny, when evolutionary rate is 7 8 337 known to vary across groups of taxa. 9 10 11 338 While differences in fit to the data may be reliably identified between models that contain 12 13 14 339 distinct evolutionary rates, variation in parameters that describe differences in evolutionary 15 16 340 means (i.e., E(X), α, and β in eq. 2 and 3) was not nearly as straightforward to identify. 17 18 341 Specifically, for datasets simulated with multiple rates (Fig. 2b, 2c, 2f), both BM and OU models 19 20 21 342 which allow for variation in rates (e.g. BMS, BMSG and OUMV) showed similar fits to the data, 22 23 343 irrespective of the model used to produce phenotypic variation. A similar pattern was observed 24 25 26 344 for data simulated under an OU process with a single evolutionary rate and two optima (OUM; 27 28 345 Fig. 2e). In this case, the differentiation in phylogenetic means was easily detected, as models 29 30 346 not encompassing this parameter (e.g. BM1, BMS and OU1) generally exhibited high AICc 31 32 33 347 scores. Thus, these results indicate that while model comparison promptly detects that multiple 34 35 348 rates or multiple means are necessary for explaining the phenotypic data, it is not conclusive on 36 37 349 the mode of evolution that has acted under the simulation conditions used here. 38 39 40 41 350 42 43 44 351 VARIATION IN MODEL PARAMETERS AND GOODNESS OF FIT 45 46 47 352 The examination of both empirical and simulated datasets suggests that inferring the 48 49 353 evolutionary process underlying phenotypic diversity may be challenging. Most importantly, the 50 51 52 354 results obtained by fitting evolutionary models to data simulated under known evolutionary 53 54 355 processes suggest that explanations corresponding to models that modify the way trait variance 55 56 57 356 and trait mean value are modeled are often confounded. This may suggest that similar 58 59 60 17 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 18 of 52

1 2 3 357 phenotypic patterns may emerge by varying evolutionary tempo, mode, or both. Notice, 4 5 6 358 however, that evolutionary inferences are made by comparing models that modify different 7 8 359 parameters of equation (3) above, where the two parts of this equation are added together to 9 10 11 360 model phenotypic change. In other words, when modeling phenotypic evolution based on present 12 13 361 trait values, one considers a first component related to the mean structure of the data (represented 14 15 362 by E(X) and β in Eq. 3), and a second component related to the residual variance (determined by 16 17 18 363 σ and α in Eq. 3; see also above). As such, variation in the relative weight of these two pieces 19 20 364 may be responsible for the convergence in model fit observed above. To investigate this 21 22 365 hypothesis and to pinpoint the circumstances under which sets of models may be confounded, we 23 24 25 366 ran additional simulations where we varied the relative magnitude of model parameters. For this, 26 27 367 we focused on those simulating conditions and models fit that pointed to a potential conflation of 28 29 368 evolutionary model parameters. This way we could examine how different models perform when 30 31 32 369 the relative influence of different parameters on phenotypic patterns are known. 33 34 35 370 Because statistical ambiguity was mainly encountered in models that involved variations 36 37 371 in evolutionary rates and phylogenetic means or selective optima (i.e. BMS, BMSG, OUM and 38 39 40 372 OUMV), we first conducted simulations in which we varied these model parameters. 41 42 373 Specifically, we simulated 1000 datasets under each of these models and setting relative rate θ to 43 44 either 3, 4, 5 or 6; difference in means under BMSG to either 1, 2, 4 or 6 standard deviations; and 45 374 46 47 375 relative optima β2/β1 to either 2, 5, 10 or 20. We then fit the subset of models that exhibited a 48 49 376 mean AICc<4 during full simulation runs (see Fig. 2) to each of the simulated datasets and 50 51 52 377 used AICc among the reduced model set to examine model fit to the data. 53 54 55 378 Because neither variation in rates, nor differences in phylogenetic means or selective 56 57 379 optima, could account for the patterns observed (Fig. 3; see below for details), we focused on the 58 59 60 18 http://mc.manuscriptcentral.com/systbiol Page 19 of 52 Systematic Biology

1 2 3 380 strength of selection in relation to tree length, as expressed by the parameter α of Eq. 3 and the 4 5 6 381 corresponding phylogenetic halflife (t1/2). Together with the evolutionary rate σ, the strength of 7 8 382 the restraining force α determines the expected covariance structure of the data at the tips of the 9 10 11 383 phylogeny and as such it has a profound influence on our capacity of statistically distinguishing 12 13 384 between different evolutionary models. To address this possibility, we simulated 1000 datasets 14 15 385 under each of the OU models examined above, but in this case setting the phylogenetic halflife 16 17 18 386 parameter t1/2 to 1, 0.67, 0.5, 0.4, or 0.33. Lower values of t1/2 correspond to higher values of α, a 19 20 387 stronger restraining pull of the data towards one or more optimal trait values, and a more marked 21 22 388 distinction in the expected covariance structure of the data from a BM model. All other 23 24 25 389 simulating parameters were kept as described above. We then fit the six examined models to 26 27 390 each of the simulated datasets and used AICc to examine model fit to the data. 28 29 30 391 31 32 33 34 392 Variation in evolutionary rates, means, and optima 35 36 37 393 From these simulations we found that for data simulated under a BM process, with 38 39 394 increasing differences in either rates (BMS) or both rates and group means (BMSG), model 40 41 395 parameters have little influence on relative model fit, at least within the parameter range 42 43 44 396 examined here (online Appendix 2). This was not the case for the OUbased simulations. For 45 46 397 data simulated with a single rate and increasing relative difference in optima (OUM), the fit of a 47 48 49 398 BMSG and an OUM model was increasingly similar (Fig. 3a). That is, the higher the difference 50 51 399 in optima between groups, the more difficult it is to distinguish between a model of random walk 52 53 400 with different rates and means for each group (BMSG) and a model with a single evolutionary 54 55 56 401 rate and different optima for each group (OUM). This suggests that when emphasis is put 57 58 59 60 19 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 20 of 52

1 2 3 402 towards the mean structure of the data (in this case by increasing differentiation in simulated 4 5 6 403 optima), the residual structures provided by the three models becomes increasingly similar. 7 8 404 Indeed, as the difference in optima (β2/β1) increases the rate parameters (σ) estimated under each 9 10 11 405 of the three models become more similar, such that the instances at which the BMSG and OUM 12 13 406 models are indistinguishable are more and more frequent (Online Appendix 5). 14 15 16 407 For data simulated under OU with two rates and two optima (OUMV), increasing the 17 18 408 relative optima parameter (β /β ) eventually caused the BMS model to be visibly inappropriate 19 2 1 20 21 409 for explaining the data (Fig. 3b). That is, when differences in selective optima between groups 22 23 410 are very marked, a model that does not include any kind of mean differentiation is not sufficient 24 25 26 411 for explaining phenotypic variation. This suggests that the relative fit of different models highly 27 28 412 depends on how these partition phenotypic covariation among taxa into the mean and residual 29 30 413 components. Interestingly, an increase in relative optima under OUMV generally resulted in a 31 32 33 414 slightly higher performance for the BMSG model, as compared to the OUMV model (horizontal 34 35 415 direction in Fig. 3b). By contrast, an increase in relative evolutionary rate (θ) had the opposite 36 37 416 effect, enhancing the fit of OUMV as compared to BMSG (perpendicular direction in Fig. 3b). 38 39 40 417 Putting both sources of variation together, and depending on the relative magnitude of the 41 42 418 relative rate (θ) and relative optima (β2/β1) parameters, simulations yield patterns of phenotypic 43 44 variation where either BMSG or OUMV may be a better fit, regardless of the fact that the “real” 45 419 46 47 420 model underlying the data is in fact OUMV. Given that both models model phenotypic variance 48 49 421 allowing for a mean structure that includes different means for each group of taxa, this signifies 50 51 52 422 that both models in fact provide similar estimates for the residual components of Eq. 3 and 53 54 423 converge towards similar covariance structures for the simulation conditions examined here. This 55 56 424 suggests that the simulated selective pull (α in Eq. 3) may not be strong enough for the simulated 57 58 59 60 20 http://mc.manuscriptcentral.com/systbiol Page 21 of 52 Systematic Biology

1 2 3 425 data to exhibit a covariance structure that is clearly different from what would occur under a BM 4 5 6 426 evolutionary process. 7 8 9 427 Variation in phylogenetic halflife 10 11 12 428 Simulations under varying values of α, which represents the strength of selection, 13 14 15 429 confirmed that this model parameter has a strong influence on the dynamics of OU models. This 16 17 430 influence is directly reflected on our capacity for statistically distinguishing between models that 18 19 431 represent changes in trait mean structure. Generally, lower values of alpha, which translate into 20 21 22 432 increased phylogenetic halflife values, make distinguishing the evolutionary models examined 23 24 433 here more challenging (Fig. 4). By contrast, as the restraining force represented by α increases, 25 26 434 BM models generally become less and less likely for explaining the data. Focusing on pairs of 27 28 29 435 models that exhibited similar AICc scores in previous simulations, and which are not nested 30 31 436 (i.e. OU1 vs. BMS; OUM vs. BMS; OUM vs. BMSG; and OUMV vs. BMSG), the results 32 33 34 437 obtained here suggest that, in most cases, the real model underlying the data can be identified if 35 36 438 the distinction between a BM and an OU process is sufficiently marked through a relatively 37 38 439 strong selective influence. 39 40 41 42 440 43 44 45 441 DISCUSSION 46 47 48 442 Phylogenetic comparative models are a major tool used to investigate interspecific 49 50 443 phenotypic patterns and enhance our understanding of the historical processes that have shaped 51 52 53 444 patterns of phenotypic diversity. Our examination of empirical data indicates that in many 54 55 445 circumstances uncertainties may emerge when attempting to distinguish between alternative 56 57 446 evolutionary processes underlying phenotypic variation. When using simulations to examine 58 59 60 21 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 22 of 52

1 2 3 447 models under known conditions, we found that one can accurately identify some basic 4 5 6 448 characteristics of the evolutionary process (e.g. variation in rates). However, inferring the exact 7 8 449 nature of the evolutionary process that has yielded phenotypic variation when using phylogenetic 9 10 11 450 comparative modeling to access complex, potentially more realistic, mechanisms may be more 12 13 451 problematic. In these cases, model inference bears a higher ambiguity, where nonnested 14 15 452 candidate models may appear equally plausible for explaining phenotypic variation when their 16 17 18 453 goodness of fit to the data is considered. This has important practical and theoretical 19 20 454 implications. 21 22 23 455 For empirical biologists seeking to explain the patterns of phenotypic variation observed 24 25 26 456 in nature, the results obtained here should serve as a cautionary tale when contrasting the fit of 27 28 457 mathematical models on phylogenies. Indeed, by conducting comparisons of alternative models 29 30 458 that represent hypotheses about the causes underlying phenotypic evolution across 160 empirical 31 32 33 459 datasets, we found that several pairs of models may frequently receive similar support (Fig. 1). 34 35 460 Through simulation experiments we showed that this is not due to some distinctive property of 36 37 461 the empirical datasets examined here. Instead, the same tendency was observed when using 38 39 40 462 simulations to produce patterns of phenotypic variation under known evolutionary processes. 41 42 463 Thus, a first caution to be taken from these findings is related to the set of candidate models 43 44 chosen. Because, as shown here, model comparisons can frequently yield ambiguous results, 45 464 46 47 465 researchers will ensure stronger evolutionary inference by a priori limiting the set of candidate 48 49 466 models based on previous knowledge on the biological system under examination (Burnham and 50 51 52 467 Anderson 2002). Another frequent recommendation has been that, when in doubt (in terms of 53 54 468 model selection criteria), simpler evolutionary models should be preferred, as a reduced number 55 56 469 of model parameters can be estimated more accurately (Butler & King 2004; Beaulieu et al. 57 58 59 60 22 http://mc.manuscriptcentral.com/systbiol Page 23 of 52 Systematic Biology

1 2 3 470 2012; Ho and Ané 2014). Our findings support this recommendation for the case of examining 4 5 6 471 pairs of nested models. In such circumstances, lack of statistical distinctiveness does not translate 7 8 472 into an evolutionary ambiguity. Specifically, when nested pairs of models are contrasted, 9 10 11 473 similarity in AIC scores, or a small difference between them, may actually be the result of very 12 13 474 similar model parameters (e.g. Hunt 2006; see also Online Appendix 5) and the evolutionary 14 15 475 interpretation of the data is straightforward. 16 17 18 476 In other circumstances however, this recommendation does not hold. Among the models 19 20 21 477 we compared, those encompassing different evolutionary rates under Brownian Motion (BMS 22 23 478 and BMSG) and evolution towards different optima with a single (OUM) or multiple rates 24 25 26 479 (OUMV) were particularly problematic. Indeed, pairs of models of these four types were 27 28 480 frequently indistinguishable in terms of fit to the data, yielding the inference of evolutionary 29 30 481 tempo and mode ambiguous (Fig. 1, 2). These pairs of models are not nested, and therefore the 31 32 33 482 lack of statistical distinction between them is problematic in terms of evolutionary interpretation. 34 35 483 Technically, this is particularly relevant for pairs of nonnested models that also have the same 36 37 484 number of parameters. This is the case, for instance, with the BMSOU1 and BMSGOUM pairs 38 39 40 485 of models, with three and four parameters respectively (Fig. 3, 4d, 4e). This occurs because these 41 42 486 models modify different pieces of Eq. 3, containing either an additional rate parameter or an 43 44 additional optimum parameter, resulting in the same total number of estimated parameters. In 45 487 46 47 488 these circumstances the recommendation of choosing the simpler model is not applicable. 48 49 50 489 Even with models that do differ in the number of parameters, however, choosing the 51 52 490 simpler model is not always straightforward, as the models compared here represent radically 53 54 55 491 different evolutionary processes and would lead to very different biological interpretations. As 56 57 492 such, it is important to understand why they may exhibit similar fits to the data. Simulations 58 59 60 23 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 24 of 52

1 2 3 493 under varying model parameters provide some insights to that respect. When phenotypic data 4 5 6 494 were simulated under a diversifying evolutionary process with a single rate (OUM), an increase 7 8 495 in simulated relative optima augmented the overlap, in terms of goodness of fit, between the 9 10 11 496 simulating model and a Brownian model with two rates and different phylogenetic means 12 13 497 (BMSG; Fig. 3a). This suggests that both models converge by allowing for different mean 14 15 498 structures in each group and reach similar parameter estimates for the residual structure (Online 16 17 18 499 Appendix 5). Similarly, for data simulated under a diversifying evolutionary process with 19 20 500 different rates for the two groups (OUMV) a Brownian model with different phylogenetic means 21 22 501 (BMSG), and a model of evolution towards two optima with two evolutionary rates (OUMV) 23 24 25 502 exhibit similar fits to the data (Fig. 3b). These results, together with the similarity of the model 26 27 503 parameters estimated under different models, suggests that the residual structure of OU 28 29 504 simulated data is not sufficiently different from a Brownian process, causing the conflation 30 31 32 505 observed. 33 34 35 506 Indeed, simulations under varying values of phylogenetic halflife indicate that, when an 36 37 507 OU process underlies the data, the strength of selection is critical for accurately assessing 38 39 40 508 alternative models (Fig. 4). For progressively lower values of α relative to tree length, which 41 42 509 translate into progressively higher phylogenetic halflife parameters, OU dynamics are not 43 44 sufficiently different from a BM process, and the parameters estimated under both types of 45 510 46 47 511 model are essentially the same (Online Appendix 5), resulting in similar expected covariance 48 49 512 structures. Importantly, for relatively low values of alpha (Fig. 4), this may be interpreted as a 50 51 52 513 conflation of tempo and mode, in the sense that a BM model with rate differentiation between 53 54 514 groups (BMSG) and an OU model with moderate rate differentiation and a rubberband 55 56 515 component acting on trait mean value (OUMV) could be equally plausible for explaining 57 58 59 60 24 http://mc.manuscriptcentral.com/systbiol Page 25 of 52 Systematic Biology

1 2 3 516 phenotypic patterns. In these circumstances, however, the selective pull driving phenotypic 4 5 6 517 evolution is not very strong, and both models converge toward similar estimates, essentially 7 8 518 representing the same evolutionary process. Two observations are of relevance here: first, 9 10 11 519 empiricists should be able to judge, based on a good knowledge of their model system, how 12 13 520 strong the estimated pull of selection on the examined phenotypes is. Phylogenetic halflife is 14 15 521 known to vary extensively across different traits and study organisms (Hansen 2012), such that a 16 17 18 522 close examination of datasetspecific estimated model parameters is necessary for understanding 19 20 523 whether different candidate models would actually lead to different evolutionary interpretations. 21 22 524 Second, it is important to note that the exact biological significance of the BMSG model remains 23 24 25 525 somewhat obscure. Both O’Meara et al. (2006) and Thomas et al. (2006) have suggested, 26 27 526 although in different ways, that this model encompasses a quick shift in mean trait value at the 28 29 527 base of each subtree evolving under different rates, yielding the different phylogenetic means 30 31 32 528 from which phenotypic variation is modeled under this type of BM model. This kind of quick 33 34 529 shift could be further explored and confirmed by modeling a shift in evolutionary rates 35 36 37 530 specifically on the branch leading to these differentiated phylogenetic means of each rategroup 38 39 531 (Revell 2008). In terms of covariance structure, however, the dynamics of this model are 40 41 532 probably not those typically represented in BM models, which may be contributing to the 42 43 44 533 frequent resemblance of BMSG and OU models. This effect seems to be quite important for the 45 46 534 simulation experiments conducted here under trees with 64 taxa, but it may be alleviated in 47 48 535 larger datasets, which would provide more power for distinguishing the fit of different models. 49 50 51 52 536 On the other hand, the comparison of models under different evolutionary scenarios 53 54 537 conducted here also allows us to track the circumstances under which model selection is a 55 56 538 powerful tool for inferring how phenotypic evolution has proceeded. Specifically, when 57 58 59 60 25 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 26 of 52

1 2 3 539 phenotypic data are generated by an evolutionary process that encompasses variation across 4 5 6 540 groups only in evolutionary rates or means, model selection promptly detects such 7 8 541 differentiation. Simulations under a BM process with different rates for each group resulted in a 9 10 11 542 markedly lower performance of models not encompassing rate differences (Fig. 2b). This pattern 12 13 543 may be associated to the observation that evolutionary rate parameters are generally easier to 14 15 544 estimate accurately (Boettiger et al. 2012; Ho and Ané 2014). Similarly, simulations under an 16 17 18 545 OU process where the two groups were driven towards different optima showed that models that 19 20 546 do not consider any type of mean differentiation are a poor fit to the data (Fig. 2e). These results 21 22 547 indicate that, when testing hypotheses encompassing changes in single model parameters, the 23 24 25 548 comparison of phylogenetic models is a useful tool for tracing phenotypic evolution. The same is 26 27 549 the case when faced with evolutionary processes dominated by a strong selective influence, 28 29 550 where phenotypes are promptly driven towards one or more optima. In such circumstances, 30 31 32 551 evolutionary dynamics are determined by a visible dominance of selection (α) over drift (σ) and 33 34 552 model comparison efficiently identifies the evolutionary tempo and mode underlying the data. 35 36 37 553 Another related aspect that has been recently emphasized by other authors (e.g. Beaulieu 38 39 40 554 et al. 2012) and which is further reinforced by the results presented here is that model selection 41 42 555 alone is rarely conclusive for answering a scientific question. Models can only provide the best 43 44 hypothesis for explaining patterns in data, and as such they are only the first step of exploring 45 556 46 47 557 what biological factors have acted and how (Losos 2011). In the context of evolutionary 48 49 558 inference examining phenotypic traits on phylogenies, a powerful toolkit exists today for testing 50 51 52 559 many different hypotheses which should be used to obtain multiple lines of evidence for the 53 54 560 same question. The evaluation of OU models without a priori defining selective regimes (Ingram 55 56 561 and Mahler 2013; Uyeda and Harmon 2014), the delimitation of the parameter space through the 57 58 59 60 26 http://mc.manuscriptcentral.com/systbiol Page 27 of 52 Systematic Biology

1 2 3 562 definition of biologically informed priors (Uyeda and Harmon 2014), or the quantification of 4 5 6 563 statistical model adequacy for explaining phenotypic variation (Pennell et al. 2015) are definitely 7 8 564 promising steps in this direction. This also brings into focus the need for models that consider 9 10 11 565 potential variation in mode across time, or in single branches of the phylogeny. While temporal 12 13 566 variation in evolutionary tempo has been extensively considered (e.g. Pagel 1998; Blomberg et 14 15 567 al. 2003; Harmon et al. 2010), implementations of evolutionary models that allow for variations 16 17 18 568 in mode across time have only been introduced very recently (Slater 2013). Such variation would 19 20 569 also be potentially relevant for the study of evolutionary tempo. Indeed, variation of the mode of 21 22 570 evolution across phylogenetic time is known to entangle the estimation of evolutionary rate 23 24 25 571 parameters in a paleontological framework (Hunt, 2012). Given the conceptual similarities 26 27 572 between the observations of Hunt (2012) regarding paleontological inference and the conclusions 28 29 573 drawn here with respect to using phylogenies and the comparative method to explore 30 31 32 574 evolutionary tempo and mode, we expect the same to occur in phylogenetic comparative models. 33 34 35 575 Finally, it is important to remark that the conflation of evolutionary tempo and mode 36 37 576 observed under certain conditions is, in our view, not only a methodological issue associated to 38 39 40 577 model selection. It is mainly a central element of how we perceive evolutionary processes 41 42 578 through both paleontological and modern phylogenetic comparative methods. In the 43 44 phylogenetic framework, model parameters such as the evolutionary rate σ2 and the phylogenetic 45 579 46 47 580 halflife t1/2 are estimated and compared. Some of them are historically more associated to the 48 49 581 words tempo, in the case of evolutionary rates, or mode, in the case of the α parameter of OU 50 51 52 582 models. However, other models – not examined here – consider variations in evolutionary rates 53 54 583 to actually test hypotheses about tempo. The Early Burst model of fast phenotypic evolution 55 56 584 early during diversification followed by slower evolutionary rates used to test for adaptive 57 58 59 60 27 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 28 of 52

1 2 3 585 radiation (Harmon et al. 2010), or tests of vs. punctuated equilibria (Pagel 1999) 4 5 6 586 nicely illustrate how evolutionary tempo and mode are conceptually inseparable. The links 7 8 587 between these important evolutionary concepts and the mathematical models used to approach 9 10 11 588 them are not, however, always straightforward. The association of model parameters to the terms 12 13 589 “tempo” and “mode” bears some ambiguity, and more work is needed to clarify how 14 15 590 phylogenetic comparative models can be adequately used to describe the evolutionary process. 16 17 18 591 19 20 21 22 592 SUPPLEMENTARY MATERIAL 23 24 25 593 Supplementary material, including simulation files and/or onlineonly appendices, can be 26 27 594 found in the Dryad data repository at http://datadryad.org, doi: 10.5061/dryad.2ss46. 28 29 30 31 595 32 33 34 596 FUNDING 35 36 37 38 597 AK was supported by a postdoctoral grant (SFRH/BPD/68493/2010) and an IF 39 40 598 investigator position by Fundação para a Ciência e Tecnologia (FCT, Portugal). This work was 41 42 599 supported in part by NSF grant DEB1257287 (to DCA) and by Project “, Ecology 43 44 45 600 and Global Change” cofinanced by North Portugal Regional Operational Programme 2007/2013 46 47 601 (ON.2 – O Novo Norte), under the National Strategic Reference Framework (NSRF), through the 48 49 602 European Regional Development Fund (ERDF). 50 51 52 53 603 54 55 56 604 ACKNOWLEDGEMENTS 57 58 59 60 28 http://mc.manuscriptcentral.com/systbiol Page 29 of 52 Systematic Biology

1 2 3 605 We thank L. Harmon, G. Hunt, F. Anderson, B. O’Meara, G. Thomas and three 4 5 6 606 anonymous reviewers for useful comments on previous versions of the manuscript. G. Thomas 7 8 607 kindly provided an updated version of the transformPhylo.sim function of motmot Rpackage. C. 9 10 11 608 J. Stack provided valuable help with the Rpackage RBrownie. The following people kindly 12 13 609 provided data files from empirical studies: T. Barkman, D. Collar, A. Corl, W. Cooper, L. M. 14 15 610 Dávalos, E. R. Dumont, E. Edwards, L. Harmon, D. Hulsey, C. Martin, D. Moen, S. Price, P. 16 17 18 611 Raia, G. Slater, D. Swanson, J. Wiens. 19 20 21 612 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 29 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 30 of 52

1 2 3 613 REFERENCES 4 5 6 7 614 Adams D.C. 2010. Parallel evolution of character displacement driven by competitive 8 9 615 selection in terrestrial salamanders. BMC Evol. Biol. 10(72): 110. 10 11 12 616 Adams D.C. 2013. Comparing evolutionary rates for different phenotypic traits on a 13 14 15 617 phylogeny using likelihood. Syst. Biol. 62: 181192. 16 17 18 618 Adams D.C. 2014. Quantifying and comparing phylogenetic evolutionary rates for shape 19 20 619 and other highdimensional phenotypic data. Syst. Biol. 63: 166177. 21 22 23 620 Adams D.C., Berns C.M., Kozak K.H., Wiens J.J. 2009. Are rates of species 24 25 26 621 diversification correlated with rates of morphological evolution? P. Roy. Soc. B – Biol. Sci. 276: 27 28 622 27292738. 29 30 31 623 Barkman T.J., Bendiksby M., Lim S.H., Salleh K.M., Nais J., Madulid D., Schumacher 32 33 34 624 T. 2008. Accelerated rates of floral evolution at the upper size limit for flowers. Cur. Biol. 18: 35 36 625 15081513. 37 38 39 626 Beaulieu J.M., Jhwueng D.C., Boettiger C., O’Meara B.C. 2012. Modeling stabilizing 40 41 42 627 selection: expanding the OrnsteinUhlenbeck model of adaptive evolution. Evolution 66: 2369 43 44 628 2383. 45 46 47 629 Blomberg S.P., Garland T.Jr., Ives, I.R. 2003. Testing for phylogenetic signal in 48 49 50 630 comparative data: Behavioral traits are more labile. Evolution 57: 717745. 51 52 53 631 Boettiger C., Coop G., Ralph P. 2012. Is your phylogeny informative? Measuring the 54 55 632 power of comparative methods. Evolution 66: 22402251. 56 57 58 59 60 30 http://mc.manuscriptcentral.com/systbiol Page 31 of 52 Systematic Biology

1 2 3 633 Bokma F. 2002. Detection of from molecular phylogenies. J. 4 5 6 634 Evol. Biol. 15: 10481056. 7 8 9 635 Burnham K.P., Anderson D.R. 2002. Model selection and multimodel inference: a 10 11 636 practical informationtheoretic approach. SpringerVerlag: New York. 12 13 14 15 637 Butler M.A., King A.A. 2004. Phylogenetic comparative analysis : a modeling approach 16 17 638 for adaptive evolution. Am. Nat. 164: 683695. 18 19 20 639 Chung K., Hipp A.L., Roalson E.H. 2012. number evolves independently 21 22 23 640 of genome size in a with nonlocalized centromeres (Carex: Cyperaceae). Evolution 66: 24 25 641 27082722. 26 27 28 642 Collar D.C., Wainwright P.C. 2006. Discordance between morphological and mechanical 29 30 31 643 diversity in the feeding mechanism of centrarchid fishes. Evolution 60: 25752584. 32 33 34 644 Collar D.C., O’Meara B.C., Wainwright P.C., Near T.J. 2009. Piscivory limits 35 36 645 diversification of feeding morphology in centrarchid fishes. Evolution 63: 15571573. 37 38 39 646 Collar D.C., Schulte J.A., Losos J.B. 2011. Evolution of extreme body size disparity in 40 41 42 647 monitor lizards (Varanus). Evolution 65: 26642680. 43 44 45 648 Dumont E.R., Dávalos L.M., Goldberg A., Santana S.E., Rex K., Voigt, C.C. 2011. 46 47 649 Morphological innovation, diversification and invasion of a new adaptive zone. Proc. R. Soc. B – 48 49 50 650 Biol. Sci. 279: 17971805. 51 52 53 651 Eastman, J. M., Alfaro, M. E., Joyce, P., Hipp, A. L. and Harmon, L. J. 2011. A novel 54 55 652 comparative method for identifying shifts in the rate of on trees. Evolution 56 57 58 653 65: 35783589. 59 60 31 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 32 of 52

1 2 3 654 Edwards A.W.F., CavalliSforza L.L. 1964. Reconstruction of evolutionary trees. In: 4 5 6 655 Heywood V.H., J. McNeill, eds. Phenetic and phylogenetic classification. 7 8 656 Association: London. p. 6776. 9 10 11 657 Edwards E.J., Still C J. 2008. Climate, phylogeny and the ecological distribution of C4 12 13 14 658 grasses. Ecol. Lett. 11: 266276. 15 16 17 659 Felsenstein J. 1973. Maximum likelihood estimation of evolutionary trees from 18 19 660 continuous characters. Am. J. Hum. Gen. 25: 471492. 20 21 22 23 661 Felsenstein J. 1985. Phylogenies and the comparative method. Am. Nat. 125: 115. 24 25 26 662 Felsenstein J. 1988. Phylogenies and quantitative characters. Ann. Rev. Ecol. Syst. 19: 27 28 663 445471. 29 30 31 664 Fitch W.M., Ayala F.J. 1994. Tempo and mode in evolution. P. Natl. Acad. Sci. USA 91: 32 33 34 665 67176720. 35 36 37 666 Garland T.Jr. 1992. Rate tests for phenotypic evolution using phylogenetically 38 39 667 independent contrasts. Am. Nat. 140: 509519. 40 41 42 43 668 Gingerich P.D. 1976. Paleontology and phylogeny: patterns of evolution at the species 44 45 669 level in early Tertiary mammals. Am. J. Sci. 276: 128. 46 47 48 670 Glor R.E. 2010. Phylogenetic insights on adaptive radiation. Ann. Rev. Ecol. Evol. Syst. 49 50 51 671 41: 251270. 52 53 54 55 56 57 58 59 60 32 http://mc.manuscriptcentral.com/systbiol Page 33 of 52 Systematic Biology

1 2 3 672 Gould S.J. 1980. G. G. Simpson, Paleontology and the Modern Synthesis. In E. Mayr, W. 4 5 6 673 B. Provine, eds., The Evolutionary Synthesis. Cambridge MA: Harvard University Press, p. 153– 7 8 674 172. 9 10 11 675 Gould S.J., Eldredge N. 1977. Punctuated equilibria: the tempo and mode of evolution 12 13 14 676 reconsidered. Paleobiology 3: 115151. 15 16 17 677 Grafen A. 1989. The phylogenetic regression. Philos. T. R. Soc. B 326: 119157. 18 19 20 678 Hansen T.F. 2012. Adaptive landscapes and macroevolutionary dynamics. In: Svensson 21 22 23 679 E., Calsbeek R., editors. The Adaptive Landscape in Evolutionary Biology. Oxford: Oxford 24 25 680 University Press. pp. 205226. 26 27 28 681 Hansen T.F., Martins E.P. 1996. Translating between microevolutionary process and 29 30 31 682 macroevolutionary patterns: the correlation structure of interspecific data. Evolution 50: 1404 32 33 683 1417. 34 35 36 684 Harmon L.J, Kolbe J.J., Cheverud J.M., Losos J.B. 2005. Convergence and the 37 38 685 multidimensional niche. Evolution 59: 409421. 39 40 41 42 686 Harmon L.J., Melville J., Larson A., Losos J.B. 2008. The role of geography and 43 44 687 ecological opportunity in the diversification of day geckos (Phelsuma). Syst. Biol. 57: 562–573. 45 46 47 688 Harmon L.J., Schulte J.A., Larson A., Losos J.B. 2003. Tempo and mode of evolutionary 48 49 50 689 radiation in iguanian lizards. Science 301: 961964. 51 52 53 690 Harmon L.J., Weir J., Brock C., Glor R., Challenger W., Hunt G. 2009. geiger: Analysis 54 55 691 of evolutionary diversification. R package version 1.31. http://CRAN.R 56 57 58 692 project.org/package=geiger 59 60 33 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 34 of 52

1 2 3 693 Harmon L.J., Losos J.B., Davies T.J., Gillespie R.G., Gittleman J.L., Jennings W.B., 4 5 6 694 Kozak K.H., McPeek M.A., MorenoRoark F., Near T.J., Purvis A., Ricklefs R.E., Schluter D., 7 8 695 Schulte II J.A., Seehausen O., Sidlauskas B.L., TorresCarvajal O., Weir J.T., Mooers, A.Ø. 9 10 11 696 2010. Early bursts of body size and shape evolution are rare in comparative data. Evolution 64: 12 13 697 23852396. 14 15 16 698 Harvey P.H., Pagel M.D. 1991. The comparative method in evolutionary biology. Oxford 17 18 699 University Press: Oxford. 19 20 21 22 700 Harvey P.H., Rambaut A. 2000. Comparative analyses for adaptive radiations. Philos. T. 23 24 701 R. Soc. B 355: 15991605. 25 26 27 702 Hipp A.L. 2007. Nonuniform processes of chromosome evolution in sedges (Carex: 28 29 30 703 Cyperaceae). Evolution 61: 21752194. 31 32 33 704 Ho L.S.T., Ané, C. 2014. Intrinsic inference difficulties for trait evolution with Ornstein 34 35 705 Uhlenbeck models. Methods Ecol. Evol. 5: 11331146. 36 37 38 706 Hulsey C.D., Mims M.C., Parnell N.F., Streelman, J.T. 2010. Comparative rates of lower 39 40 41 707 jaw diversification in cichlid adaptive radiations. J. Evol. Biol. 23: 14561467. 42 43 44 708 Hunt G. 2006. Fitting and comparing models of phyletic evolution: random walks and 45 46 709 beyond. Paleobiology 32: 578601. 47 48 49 50 710 Hunt G. 2012. Measuring rates of phenotypic evolution and the inseparability of tempo 51 52 711 and mode. Paleobiology 38: 351373. 53 54 55 56 57 58 59 60 34 http://mc.manuscriptcentral.com/systbiol Page 35 of 52 Systematic Biology

1 2 3 712 Ingram T., Mahler D.L. 2013. SURFACE: detecting from 4 5 6 713 comparative data by fitting OrnsteinUhlenbeck models with stepwise Akaike Information 7 8 714 Criterion. Methods Ecol. Evol. 4: 416425. 9 10 11 715 Martin C.H., Wainwright P.C. 2011. Trophic novelty is linked to exceptional rates of 12 13 14 716 morphological diversification in two adaptive radiations of Cyprinodon pupfish. Evolution 65: 15 16 717 21972212. 17 18 19 718 Martins E.P., Garland T.Jr. 1991. Phylogenetic analyses of the correlated evolution of 20 21 22 719 continuous characters: a simulation study. Evolution 45: 534557. 23 24 25 720 McPeek M.A. 1995. Testing hypotheses about evolutionary change on single branches of 26 27 721 a phylogeny using evolutionary contrasts. Am. Nat. 145: 686703. 28 29 30 31 722 Moen D.S., Wiens J.J. 2009. Phylogenetic evidence for competitively driven divergence: 32 33 723 bodysize evolution in Caribbean treefrogs (Hylidae: Osteopilus). Evolution 63: 195214. 34 35 36 724 O’Meara B.C., Ané C., Sanderson M.J., Wainwright P.C. 2006. Testing for different rates 37 38 725 of continuous trait evolution using likelihood. Evolution 60: 922933. 39 40 41 42 726 Pagel M. 1998. Inferring evolutionary processes from phylogenies. Zool. Scr. 26: 331 43 44 727 348. 45 46 47 728 Pagel M. 1999. Inferring the historical patterns of biological evolution. Nature, 401: 877 48 49 50 729 884. 51 52 53 730 Paradis E., Claude J., Strimmer K. 2004. APE: analyses of and evolution in 54 55 731 R language. Bioinformatics 20: 289290. 56 57 58 59 60 35 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 36 of 52

1 2 3 732 Pennell M.W., FitzJohn R.G., Cornwell W.K., Harmon L.J. 2015. Model adequacy and 4 5 6 733 the of angiosperm functional traits. Am. Nat. in press. 7 8 9 734 Price S.A., Holzman R., Near T.J., Wainwright P.C. 2011. Coral reefs promote the 10 11 735 evolution of morphological diversity and ecological novelty in labrid fishes. Ecol. Lett. 14: 462 12 13 14 736 469. 15 16 17 737 Price S.A., Wainwright P.C., Bellwood D.R., Kazancioglu E., Collar D.C., Near T.J. 18 19 738 2010. Functional innovations and morphological diversification in parrotfishes. Evolution 64: 20 21 22 739 30573068. 23 24 25 740 R Development Core Team. 2012. R: A language and environment for statistical 26 27 741 computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.Rproject.org. 28 29 30 31 742 Rabosky D.L., Adams D.C. 2012. Rates of morphological evolution are correlated with 32 33 743 species richness in salamanders. Evolution 66: 18071818. 34 35 36 744 Raia P., Meiri S. 2011. The tempo and mode of evolution: body sizes of island mammals. 37 38 745 Evolution 65: 19271934. 39 40 41 42 746 Revell L.J. 2008. On the analysis of evolutionary change along single branches in a 43 44 747 phylogeny. Am. Nat. 172: 140147. 45 46 47 748 Revell, L. J. (2012) phytools: An R package for phylogenetic comparative biology (and 48 49 50 749 other things). Methods Ecol. Evol. 3 217223. doi:10.1111/j.2041210X.2011.00169.x 51 52 53 750 Revell L.J., Collar D.C. 2009. Phylogenetic analysis of the evolutionary correlation using 54 55 751 likelihood. Evolution 63: 10901100. 56 57 58 59 60 36 http://mc.manuscriptcentral.com/systbiol Page 37 of 52 Systematic Biology

1 2 3 752 Revell L.J., Reynolds R.G. 2012. A new Bayesian method for fitting evolutionary models 4 5 6 753 to comparative data with intraspecific variation. Evolution 66: 26972707. 7 8 9 754 Revell L.J., Harmon L.J., Collar D.C. 2008. Phylogenetic signal, evolutionary process, 10 11 755 and rate. Syst. Biol. 57: 591601. 12 13 14 15 756 Revell L.J., Johnson M.A., Schulte II J.A., Kolbe J.J., Losos J.B. 2007. A phylogenetic 16 17 757 test for adaptive convergence in rockdwelling lizards. Evolution 61: 28982972. 18 19 20 758 Revell L.J., Mahler D.L., PeresNeto P.R., Redelings B.D. 2011. A new phylogenetic 21 22 23 759 method for identifying exceptional phenotypic diversification. Evolution 66: 135146. 24 25 26 760 Ricklefs R.E. 2004. Cladogenesis and morphological diversification in passerine birds. 27 28 761 Nature 430: 338341. 29 30 31 762 Rohlf F.J. 2001. Comparative methods for the analysis of continuous variables: geometric 32 33 34 763 interpretations. Evolution 55: 21432160. 35 36 37 764 Schliep K.P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics 27: 592593. 38 39 40 765 Setiadi M.I., McGuire J.A, Brown R.M., Zubairi M., Iskandar D.T., Andayani N., 41 42 43 766 Supriatna J., Evans B.J. 2011. Adaptive radiation and ecological opportunity in Sulawesi and 44 45 767 Philippine fanged frog (Limnonectes) communities. Am. Nat. 178: 221240. 46 47 48 768 Simpson G.G. 1944. Tempo and mode in evolution. Columbia University Press: New 49 50 51 769 York. 52 53 54 770 Slater G.J. 2013. Phylogenetic evidence for a shift in the mode of mammalian body size 55 56 771 evolution at the CretaceousPalaeogene boundary. Methods Ecol. Evol. 4: 734744. 57 58 59 60 37 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 38 of 52

1 2 3 772 Slater G.J., Price S.A., Santini F., Alfaro M.E. 2010. Diversity versus disparity and the 4 5 6 773 radiation of modern cetaceans. Proc. R. Soc. B – Biol. Sci. 277: 30973104. 7 8 9 774 Stayton C.T. 2006. Testing hypotheses of convergence with multivariate data: 10 11 775 morphological and functional convergence among herbivorous lizards. Evolution 60: 824841. 12 13 14 15 776 Swanson D.L., Bozinovic F. 2012. Metabolic capacity and the evolution of biogeographic 16 17 777 patterns in oscine and suboscine passerine birds. Physiol. Biochem. Zool. 84: 185194. 18 19 20 778 Swanson D.L., Garland T.Jr. 2009. The evolution of high summit and cold 21 22 23 779 tolerance in birds and its impact on presentday distributions. Evolution 63: 184194. 24 25 26 780 Thomas G.A., Freckleton R.P. 2012. MOTMOT: models of trait macroevolution on trees. 27 28 781 Methods Ecol. Evol. 3: 145151. 29 30 31 782 Thomas G.H., Freckleton R.P., Székely T. 2006. Comparative analyses of the influence 32 33 34 783 of developmental mode on phenotypic diversification rates in shorebirds Proc. R. Soc. B – Biol. 35 36 784 Sci. 273: 16191624. 37 38 39 785 Thomas G.H., Meiri S., Phillimore A.B. 2009. Body size diversification in Anolis: novel 40 41 42 786 environment and island effects. Evolution 63: 20172030. 43 44 45 787 Uyeda J.C., Harmon L.J. 2014. A novel Bayesian method for inferring and interpreting 46 47 788 the dynamics of adaptive landscapes from phylogenetic comparative data. Syst. Biol. 63: 902 48 49 50 789 918. 51 52 53 790 Wiens J.J., Pyron R.A., Moen D.S. 2011. Phylogenetic origins of localscale diversity 54 55 791 patterns and the causes of Amazonian megadiversity. Ecol. Lett. 14: 643652. 56 57 58 59 60 38 http://mc.manuscriptcentral.com/systbiol Page 39 of 52 Systematic Biology

1 2 3 792 FIGURE CAPTIONS 4 5 6 7 793 Figure 1: Means and corresponding 95% confidence intervals of pairwise AICc for all models 8 9 794 fitted to empirical datasets (n=160). Dashed lines represent the frequently used threshold of 10 11 795 AICc=4. Pairs of models with the same number of parameters are shaded in grey. 12 13 14 15 796 16 17 18 797 Figure 2: Quantile boxplots of AICc scores obtained by simulating 1000 datasets under six 19 20 798 different models (a) BM1, b) BMS, c) BMSG, d) OU1, e) OUM, f) OUMV) and then fitting the 21 22 23 799 same six models to each of them. AICc have been standardized relative to the simulating 24 25 800 model (“real” model underlying the data), which therefore always has AICc=0. The dashed 26 27 801 horizontal line represents the frequently used threshold of AICc=4. 28 29 30 31 802 32 33 34 803 Figure 3: Quantile boxplots of AICc scores obtained by fitting the models that showed 35 36 804 ambiguous AICc patterns in previous simulations (Fig. 2). In this case we simulated under 37 38 805 OUM (a) and OUMV (b) models with varying relative rates (θ) and relative optima (β /β ). 39 2 1 40 41 806 AICc have been standardized relative to the simulating model (“real” model underlying the 42 43 807 data), which therefore always has AICc=0. The dashed horizontal line represents the 44 45 46 808 frequently used threshold of AICc=4. BMS models marked with a star exhibited mean AICc 47 48 809 scores above 4 and are offscale in the presented graphs, to maintain the same scale in all graphs. 49 50 51 810 52 53 54 811 Figure 4: Quantile boxplots of AICc scores obtained by simulating 1000 datasets under the 55 56 57 812 three examined OU models (sim.model), while varying the strength of selection α and 58 59 60 39 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 40 of 52

1 2 3 813 consequently the value of phylogenetic halflife. AICc have been standardized relative to the 4 5 6 814 simulating model (“real” model underlying the data), which therefore always has AICc=0. The 7 8 815 dashed horizontal line represents the frequently used threshold of AICc=4. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 40 http://mc.manuscriptcentral.com/systbiol Page 41 of 52 Systematic Biology

1 2 3 Table 1. Summary of tree size (n), group distribution across the phylogeny and number of phylogeny × trait comparisons 4 5 for the different empirical datasets examined. 6 7 BM1 BMS BMSG OU1 OUM OUMV 8 9 k 2 3 5 3 5 6 10 BMS T 11 12 BMSG T+M M 13 OU1 M T+M T+M 14 15 OUM M T+M T+M M 16 OUMV T+M M M T+M T 17 18 816 19 20 21 817 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 41 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 42 of 52

1 2 3 Table 1. Summary of tree size (n), group distribution across the phylogeny and number of phylogeny × trait comparisons included (N ) 4 phy×trt 5 for the different empirical datasets examined. 6 7 Group Trait n Groups Nphy×trt Source 8 9 Plants Grasses Ecological niche 141 Clustered 10 1 10 11 Rafflesia Flower size 19 Clustered 1 2 12 13 Sedges Chromosome number, genome size 87 Clustered 4 3 14 15 Sedges Chromosome number 53 Clustered 8 4 16 17 Fish Centrarchid fishes Jaw morphology 27 Clustered 12 5 18 19 Centrarchid fishes Jaw morphology 29 Random 12 6 20 21 Cichlid fishes Jaw morphology 79 Clustered 5 7 22 23 Parrotfishes Jaw morphology 122 Clustered 18 8 24 25 Parrotfishes Jaw morphology 118 Random 8 9 26 27 Pupfish Body size/shape 48 Clustered 32 10 28 29 Anura Fanged frogs Body size 21 Random 4 11 30 31 Hylid frogs Body size 220 Clustered 1 12 32 33 Osteopilus frogs Body size 171 Clustered 1 13 34 35 Squamata Anolis lizards Body size 160 Random 12 14 36 37 Phelsuma geckos Body size/shape 20 Clustered 15 15 38 39 Monitor lizards Body size 37 Random 3 16 40 41 Birds Birds Body size, metabolic rate, temperature 44 Clustered 3 17 42 43 44 42 45 46 http://mc.manuscriptcentral.com/systbiol 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 43 of 52 Systematic Biology

1 2 3 Passerine birds Body size, metabolic rate 60 Clustered 2 18 4 5 Mammals All mammals Body size 842/539 Random 4 19 6 7 8 Cetaceans Body size 68 Random 3 20 9 10 Chiroptera Skull morphology, trophic level 81 Clustered 2 21 11 1 2 3 4 5 6 12 818 Edwards and Still 2008; Barkman et al. 2008; Chung et al. 2012; Hipp 2007; Collar and Wainwright 2006; Collar et al. 2009; 13 14 819 7Hulsey et al. 2010; 8Price et al. 2010; 9Price et al. 2011; 10Martin and Wainwright 2011; 11Setiadi et al. 2011; 12Wiens et al. 2011; 15 16 820 13Moen and Wiens 2008; 14Thomas et al. 2009; 15Harmon et al. 2008; 16Collar et al. 2011; 17Swanson and Garland 2009; 18Swanson 17 18 19 20 21 19 821 and Bozinovic 2012; Raia and Meiri 2011; Slater et al. 2010; Dumont et al. 2011 20 21 22 822 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 43 45 46 http://mc.manuscriptcentral.com/systbiol 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Systematic Biology Page 44 of 52

1 2 3 823 Figure 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 824 42 43 44 825 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 44 http://mc.manuscriptcentral.com/systbiol Page 45 of 52 Systematic Biology

1 2 3 826 Figure 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 827 40 41 42 828 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 45 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 46 of 52

1 2 3 829 Figure 3 4 5 6 7 8 9 10 11 12 13 14 For Peer Review Only 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 830 52 53 54 831 55 56 57 58 59 60 46 http://mc.manuscriptcentral.com/systbiol Page 47 of 52 Systematic Biology

1 2 3 832 Figure 4 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 833 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 47 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 48 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Figure 1: Means and corresponding 95% confidence intervals of pairwise ∆AICc for all models fitted to 41 empirical datasets (n=160). Dashed lines represent the frequently used threshold of ∆AICc=4. Pairs of 42 models with the same number of parameters are shaded in grey. 43 177x177mm (300 x 300 DPI) 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 http://mc.manuscriptcentral.com/systbiol Page 49 of 52 Systematic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Figure 2: Quantile boxplots of ∆AICc scores obtained by simulating 1000 datasets under six different models 41 (a) BM1, b) BMS, c) BMSG, d) OU1, e) OUM, f) OUMV) and then fitting the same six models to each of them. 42 ∆AICc have been standardized relative to the simulating model (“real” model underlying the data), which 43 therefore always has ∆AICc=0. The dashed horizontal line represents the frequently used threshold of 44 ∆AICc=4. 45 99x99mm (300 x 300 DPI) 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 50 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Figure 3: Quantile boxplots of ∆AICc scores obtained by fitting the models that showed ambiguous ∆AICc patterns in previous simulations (Fig. 2). In this case we simulated under OUM (a) and OUMV (b) models 48 with varying relative rates (θ) and relative optima (β2/β1). ∆AICc have been standardized relative to the 49 simulating model (“real” model underlying the data), which therefore always has ∆AICc=0. The dashed 50 horizontal line represents the frequently used threshold of ∆AICc=4. BMS models marked with a star 51 exhibited mean ∆AICc scores above 4 and are offscale in the presented graphs, to maintain the same scale 52 in all graphs. 53 99x146mm (300 x 300 DPI) 54 55 56 57 58 59 60 http://mc.manuscriptcentral.com/systbiol Page 51 of 52 Systematic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Figure 4: Quantile boxplots of ∆AICc scores obtained by simulating 1000 datasets under the three examined 41 OU models (sim.model), while varying the strength of selection α and consequently the value of 42 phylogenetic halflife. ∆AICc have been standardized relative to the simulating model (“real” model 43 underlying the data), which therefore always has ∆AICc=0. The dashed horizontal line represents the 44 frequently used threshold of ∆AICc=4. 45 177x177mm (300 x 300 DPI) 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 http://mc.manuscriptcentral.com/systbiol Systematic Biology Page 52 of 52

1 2 3 Journal manuscript number: SystBiol-USYB-2015-079 4 Journal: Systematic Biology 5 Submitted by: Antigoni Kaliontzopoulou ([email protected]) 6 Data file(s): 7 Online Appendix 1 8 9 Online Appendix 2 10 Online Appendix 3 11 Online Appendix 4 12 Online Appendix 5 13 R-code 14 15 16 17 Thank you for submitting your data package to Dryad for journal 18 review. Please read the following information carefully, so you will 19 know what to expect during the rest of the data archiving process. 20 21 YOUR DRYAD DOI 22 23 24 Your data package has been assigned a unique identifier, called a 25 DOI. This DOI is provisional for now, but may be included in the 26 article manuscript. It will be fully registered with the DOI system 27 when your submission has been approved by Dryad curation staff. 28 29 doi:10.5061/dryad.2ss46 30 31 32 33 REVIEWER ACCESS TO YOUR DRYAD DATA 34 35 Journal editors and anonymous peer reviewers may view the submission 36 for review purposes using the following url: 37 http://datadryad.org/review?doi=doi:10.5061/dryad.2ss46 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 http://mc.manuscriptcentral.com/systbiol