"PRINCIPLES of PHYLOGENETICS: ECOLOGY and EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley Ginger Jui
Total Page:16
File Type:pdf, Size:1020Kb
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley Ginger Jui February 10, 2011 Adapted from David Ackerly’s lecture notes, Spring 2009. I. Why Analyze Trait Correlations? II. Ahistorical Comparisons Comparisons of closely related species have been used to detect whether species traits co‐vary, either with other traits or with the environment. For example, Salisbury (1942) compared the seed siZe of congeneric species found in open vs. closed habitats, to test the hypothesis that shaded habitats favored larger seeds. III. Phylogenetic Independent Contrasts Felsenstein (1985) developed phylogenetic independent contrasts (PICs) as a way of integrating the increasing availability of phylogenetic trees with comparative biology. Phylogenetic independent contrasts is a generaliZation of the paired comparisons method. Rather than only using paired tip values, contrasts are taken for each bifurcation (or node) in a phylogenetic tree. Assuming that traits evolve independently in each lineage following speciation, then the trait divergences that occur at one node are independent of divergences at other nodes. Information for all species in the phylogeny could be used as part of the analysis, instead of only the sister species comparisons. In contrast to the simple, ahistorical correlations for comparative analyses, the PICs method proposes a specific evolutionary scenario that would allow researchers to conduct statistically robust regression analyses for trait comparisons. Felsenstein (1985) chose to base independent contrasts on a Brownian Motion model of evolution because of its useful statistical properties. The phylogenetic generaliZed least squares (PGLS) regression is a further generaliZation of this approach and allows the specification of other models of trait evolution. For a review of PICs and PGLS, see Garland 2000. III.a. Brownian Motion ReCap A Brownian Motion model describes the evolution of a continuous valued trait or variable as a random walk. After a speciation event, this random walk continues independently for each of the daughter species. One important characteristic of Brownian Motion evolution is that it results in a normal distribution of trait values across the phylogeny. Random drift is NOT the only mechanism that could lead to Brownian Motion type character evolution. Other mechanisms include: ‐ continued change in multiple, independent additive factors of small effect ‐ randomly changing selective regimes III.b. Statistical Properties of Brownian Motion Evolution Phylogenetic Tree Trait simulated over the tree under Brownian Motion evolution. Traits simulated under Brownian Motion evolution are normally distributed with a mean centered at the root value and variance increasing over time. The degree of non‐independence in species values is a function of how much shared history species exhibit, in the context of the clade being studied. Shared history is simply measured by the ratio of shared branch lengths to the total length from root to tips. This can be visualiZed on the tree and represented in matrices of phylogenetic distances (D, distance down to MRCA and back up to another species) and phylogenetic covariance (shared history = C = 1 ‐ D/max(D)). D - distance matrix C phylogenetic covariance matrix t1 t2 t3 t4 t5 t6 t7 t1 t2 t3 t4 t5 t6 t7 t1 0 10 10 10 10 10 10 t1 1 0.0 0.0 0.0 0.0 0.0 0.0 t2 10 0 2 8 8 8 8 t2 0 1.0 0.8 0.2 0.2 0.2 0.2 t3 10 2 0 8 8 8 8 t3 0 0.8 1.0 0.2 0.2 0.2 0.2 t4 10 8 8 0 4 4 6 t4 0 0.2 0.2 1.0 0.6 0.6 0.4 t5 10 8 8 4 0 2 6 t5 0 0.2 0.2 0.6 1.0 0.8 0.4 t6 10 8 8 4 2 0 6 t6 0 0.2 0.2 0.6 0.8 1.0 0.4 t7 10 8 8 6 6 6 0 t7 0 0.2 0.2 0.4 0.4 0.4 1.0 Conventional statistical hypothesis testing requires that the residuals of the observations, after fitting a given statistical model, are identically and independently distributed (IID) The phylogenetic covariance resulting from shared history clearly violated the independent part. Demonstration of phylogenetic nonindependence 1) simulate the independent evolution of 2 traits on this tree 2) Calculate the pearson correlation coefficient between the two traits. Here is one example: 3) Repeat 1000 times and look at the distribution of correlation coefficients. In contrast, the expected distribution, for random data sets with N = 10 is shown on the right. For N = 10, the critical significance value at p <= 0.05 is 0.63; the type I error rate is a whopping 60%! traits evolved traits independently evolved on tree III.c. Independent contrasts to the rescue! The independent contrasts method is derived from the Brownian Motion model. Let's start with a single divergence: II.C. Correlations of independent contrasts: Before we conduct a statistical test with independent contrasts, note that they have one unusual property. Each contrast is based on subtraction of one value from another. Clearly, the direction of subtraction is arbitrary, as long as it is kept the same for each trait in the study. As a result, each contrast has a mirror image of the opposite sign. So clearly the average value of all contrasts must be zero, since each one could be flipped around. As a result, all correlations and regression analyses must be calculated through the origin, and anovas would have to be conducted without a grand mean term. The formula for the correlation coefficient through the origin is a bit different than the familiar one in a stats textbook: "CxCy rxy = 1 2 2 2 {"Cx "Cy } where Cx and Cy are the standardiZed contrasts for traits x and y. Because we are not estimating the mean, the degrees of freedom for independent contrasts is Nc ‐ ! 1, where Nc is the number of contrasts. Remember that Nc = Nt ‐ 1, where Nt is the number of taxa, assuming it's a fully bifurcating tree. See Garland et al. 1992 for additional discussion. Returning to our example above with the 10 taxon tree, here is the distribution of independent contrasts under the null hypothesis. Type I error for 1000 reps is 0.051% ‐ perfect! Ackerly and Reich, 1999, Amer J Bot open circles: angiosperms closed circles: conifers cross: basal contrast between angios and conifers III.e. Are Phylogenetic Independent Contrasts Appropriate for your Data? Fig. III.e. Diagnostic plots for phylogenetic independent contrasts of mature plant height in Hawaiian lobeliods. These plots are used to detect deviations from Brownian Motion. IV. Alternatives to Brownian motion IV.a. OrhnsteinUhlenbeck One of the very important alternative models is to introduce stabiliZing selection, i.e. a model in which there is a 'pull' towards some central optimal value (which may fluctuate along different lineages), and Brownian motion reflects the random processes and/or the excursions in the action of selection around this underlying optimum. The stabiliZing selection model is known as an Ornstein‐Uhlenbeck stochastic process, so you will see references to OU models (and there is an R library called 'ouch' for OU models). There are many evolutionists who believe that stabiliZing selection is the overarching cause of evolutionary stasis and the maintenance of similarity among close relatives. Paradoxically, an OU process with a single optimum generates traits with K<1, so stabiliZing selection reduces phylogenetic signal, which is not intuitive at first! Estes and Arnold (2007, American Naturalist) offer an important discussion of stabiliZing selection and apply it to a large data set compiled by Gingerich (1983, Science) on rates of morphological evolution (mostly from fossils, not comparative data) IV.b. Model Inference Using Akaike Weights Robust model inference and model selection techniques must be used to determine which model out of a set of models best fits the data. Log‐likelihood ratios are only suitable for nested models and for comparing pairs of models. Given the number of hypotheses of evolutionary processes, researchers may be interested evaluating the weight of evidence for a host of models. The Akaike weights approach was taken by Harmon (2009) to compare models. Akaike weights is a way of dividing up the weight of evidence over a set of models. Notice that the sum of wi for the R models will be 1. The table below shows that AIC differences, relative likelihoods and Akaike weights for a set of models. The fitContinuous function in the R package GEIGER may be used to fit Brownian, Orhnstein‐Uhlenbeck and Early Burst models (and some others). The R package OUCH also fits Brownian and OU models. These usually return the maximum likelihood value of the fitted models, allowing us to calculate AIC and the Akaike weights. References Ackerly DD, Reich PB. 1999. Convergence and correlations among leaf siZe and function in seed plants: a comparative test using independent contrasts. Amer. J. Bot. 86:1272‐128 Ackerly, DD. 2009. Phylogenetic Methods in Ecology. Encyclopedia of Life Sciences. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester. Burnham, K. and D. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information‐Theoretic Approach. Springer, second edition. Butler, M. and King, A. 2004. Phylogenetic comparative analysis: A modeling approach for adaptive evolution. American Naturalist. 164(6):683‐695. Diaz‐Uriarte R, Garland T, Jr. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst. Biol. 45:27‐47 Estes S and Arnold SJ. 2007. Resolving the paradox of stasis: Models with stabiliZing selection explain evolutionary divergence at all timescales.