Introduction to Phylogenetic Comparative Methods in R Pável Matos October 17, 2019

Introduction to phylogenetic comparative methods in R Pável Matos October 17, 2019 This is a tutorial that captures the essence of comparative methods using phylogenies. We will use both simulated and real-world data, along with basic statistics implemented in R packages. The lecture is divided in three sections: 1) Handling and visualizing phylogenies and species traits data in the R environment; 2) Understanding the principles of Brownian motion and its use in evolutionary correlations among species traits; 3) Working with phylogenetically independent contrasts (PIC) and phylogenetic generalized least squares regression (PGLS) using R. Package installations This tutorial was created using R v.3.6.1. You can download the latest R version from the CRAN site. We require two of the most popular phylogenetic R packages: ape and phytools: • ape is an essential R package for handling phylogenetic trees and running analyses of comparative data, including ancestral state reconstruction, diversification rate analyses, and DNA distance computation. • phytools is also handy for manipulating phylogenetic trees and includes other comparative methods and functions not available in ape. We begin this tutorial by installing them: if(!require(ape))install.packages("ape") if(!require(phytools))install.packages("phytools") # the nlme package will allow us to fit Gaussian linear mixed-effects model if(!require(phytools))install.packages("nlme") # the dplyr package will help us to handle data tables if(!require(phytools))install.packages("dplyr") And loading the phylogenetic R packages into our R working space: library(ape) library(phytools) library(nlme) Other important phylogenetic packages include phylobase (manipulating trees and comparative data), geiger (methods for fitting evolutionary models to phylogneies), and caper (phylogenetic comparative analyses). You can visit the following CRAN Task View to see all R packages that can handle phylognetic and comparative data. Once we have our working R libraries correctly loaded, we can set the working directory by using the function setwd(). The working directory is the place where the data for this tutorial should be stored. setwd("C:/Users/pavel/Desktop") getwd() # check where your current working directory is ## [1] "C:/Users/pavel/Desktop" SECTION 1: Phylogenetic trees using R packages There are several ways to represent phylogenetic trees. The two most common types of trees are phylograms and dendrograms. 1 Phylograms contain information on topology (phylogenetic relationships among species) and branch lengths that represent evolutionary change (e.g., number of nucleotide substitutions). Let’s simulate and visualize a phylogram having 20 species. phylo = rtree(n=20) plot(phylo) t5 t13 t20 t10 t19 t17 t1 t18 t9 t7 t12 t8 t14 t3 t2 t16 t15 t4 t6 t11 Dendrograms are a special kind of phylograms, in that the tips of the tree have the same distance from the root. Dendrograms are also called ultrametric trees, which usually depict times of divergence (e.g., millions of years). Let’s again simulate a phylogeny, and have this as an ultrametric tree with 30 extant species dendro = pbtree(n=30) plot(dendro); axisPhylo() 2 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 To add node labels in the phylogeny, simply type nodelabels() 3 plot(dendro); axisPhylo(); nodelabels() t21 59 t20 57 t26 58 t25 t1 t15 51 56 t14 52 t17 55 t16 53 t9 45 54 t8 t19 31 50 49 t18 t12 43 46 t5 47 t11 48 t10 40 t4 44 t3 t7 41 t28 42 32 t27 t6 37 t24 38 t30 39 t29 33 t23 36 35 t22 34 t13 t2 3 2.5 2 1.5 1 0.5 0 HINT: To spread out the image, you can save the plot as pdf adjusting for width and height sizes. Running 4 the command below will save the figure in your working directory. pdf(file = "mytree.pdf", width =9, height = 12) plot(dendro); axisPhylo(); nodelabels() dev.off() # Now, look for the pdf file at your working directory We can edit and manipulate phylogenetic trees using the R packages ape and phytools. For example, we can find and highlight species of interest in the phylogeny. In this case, we want to find species “t10” in the phylogeny. find.species = c("t10") plot(dendro); axisPhylo(); add.arrow(dendro, tip = find.species, hedl = 0.05, col = "red", offset = 0.1) 5 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 We can also find the most recent common ancestor (MRCA) of two species. For example, let’s find the MRCA of species “t10” and “t13” in the phylogeny. 6 node = fastMRCA(dendro, "t10", "t13") plot(dendro); axisPhylo(); nodelabels(node = node) t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 32 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 7 Additionally, if the MRCA falls inside the phylogeny, we can extract the subclade containing the MRCA and all of its descendants from the original phylogeny. sub.clade = extract.clade(dendro, node) # we can plot the extracted clade using # plot(sub.clade) 8 original tree extracted clade t21 t1 t20 t15 t26 t14 t25 t17 t1 t15 t16 t14 t9 t17 t8 t16 t19 t9 t18 t8 t12 t19 t5 t18 t12 t11 t5 t10 t11 t4 t10 t3 t4 t7 t3 t28 t7 t28 32 t27 32 t27 t6 t6 t24 t24 t30 t30 t29 t29 t23 t23 t22 t22 t13 t13 t2 t2 3 2 1 0 2.5 2 1.5 1 0.5 0 Alternatively, we can either prune or keep a number of species from the tree. For example, we want to remove the following 10 species out of our phylogeny: “t1”, “t2”, “t3”, “t4”, “t5”, 9 “t6”, “t7”, “t8”, “t9”, and “t10”. par(mfrow = c(1,2)) sp.set = c("t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10") plot(dendro, main = "original tree"); axisPhylo() pr1.tree = drop.tip(dendro, sp.set) plot(pr1.tree, main = "all except 10 species"); axisPhylo(); 10 original tree all except 10 species t21 t21 t20 t20 t26 t25 t26 t1 t25 t15 t14 t15 t17 t16 t14 t9 t17 t8 t19 t16 t18 t19 t12 t5 t18 t11 t12 t10 t4 t11 t3 t28 t7 t28 t27 t27 t24 t6 t24 t30 t30 t29 t29 t23 t23 t22 t22 t13 t2 t13 3 2 1 0 3 2.5 2 1.5 1 0.5 0 Or, we want to keep all the 10 taxa of interest in the phylogeny, and remove all other species. 11 par(mfrow = c(1,2)) plot(dendro, main="original tree"); axisPhylo() pr2.tree = drop.tip(dendro, setdiff(dendro$tip.label, sp.set)) plot(pr2.tree, main = "only 10 species"); axisPhylo(); 12 original tree only 10 species t21 t1 t20 t26 t25 t9 t1 t15 t14 t8 t17 t16 t9 t8 t5 t19 t18 t12 t10 t5 t11 t10 t4 t4 t3 t7 t3 t28 t27 t6 t7 t24 t30 t29 t23 t6 t22 t13 t2 t2 3 2 1 0 2.5 2 1.5 1 0.5 0 It is also possible to store many phylogenetic trees in one single R object (You can encounter a set of phylogenetic trees, for example, in the posterior distribution from a Bayesian phylogenetic inference). 13 Let’s store a copy of each of the 5 simulated trees above in one single R object called multi.phylo. multi.phylo = c(phylo, dendro, sub.clade, pr1.tree, pr2.tree) multi.phylo ## 5 phylogenetic trees We can call any element in the multi.phylo object. For example, we want to plot the first simulated phylogram. plot(multi.phylo[[1]]) t5 t13 t20 t10 t19 t17 t1 t18 t9 t7 t12 t8 t14 t3 t2 t16 t15 t4 t6 t11 Finally, we can save and load phylogenetic trees in several formats. The two most common tree formats are newick and nexus, which can be recognized by R functions having .tree() and .nexus(), respectively. To save the phylogenies, type the following: write.tree(dendro, file="dendrogram.newick.tre") write.nexus(dendro, file="dendrogram.nex.tre") Now, you can load the saved tree files. 14 dendro.newick = read.tree(file="dendrogram.newick.tre") dendro.nexus = read.nexus(file="dendrogram.nex.tre") # plot the loaded dendrogram saved in newick format plot(dendro.newick) 15 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 Let’s have a look at how both tree files are structured. 16 # newick format writeLines(readLines("dendrogram.newick.tre")) ## ((((t2:1.053143244,(t13:0.485929273,(t22:0.1143679921,t23:0.1143679921):0.3715612808):0.5672139707):0.8313475122,(((t29:0.004229910377,t30:0.004229910377):0.08356128786,t24:0.08779119824):0.5997932488,t6:0.687584447):1.196906309):0.8978588367,(((t27:0.05897751674,t28:0.05897751674):0.6279206004,t7:0.6868981171):1.822953508,((t3:0.8435732792,t4:0.8435732792):1.210003923,((((t10:0.5839565914,t11:0.5839565914):0.2271813406,t5:0.811137932):0.3769129703,(t12:0.4923276469,(t18:0.3339928798,t19:0.3339928798):0.158334767):0.6957232554):0.6220279902,((((t8:0.6140755837,t9:0.6140755837):0.4216540133,(t16:0.4729586968,t17:0.4729586968):0.5627709003):0.2148269302,(t14:0.4803316293,t15:0.4803316293):0.7702248979):0.332428687,t1:1.582985214):0.2270936782):0.2434983098):0.4562744233):0.2724979672):0.103205451,((t25:0.07774284778,t26:0.07774284778):0.4423811885,(t20:0.2324601897,t21:0.2324601897):0.2876638466):2.365431007); # nexus format writeLines(readLines("dendrogram.nex.tre")) ## #NEXUS ## [R-package APE, Thu Oct 17 12:10:07 2019] ## ## BEGIN TAXA; ## DIMENSIONS NTAX = 30; ## TAXLABELS ## t2 ## t13 ## t22 ## t23 ## t29 ## t30 ## t24 ## t6 ## t27 ## t28 ## t7 ## t3 ## t4 ## t10 ## t11 ## t5 ## t12 ## t18 ## t19 ## t8 ## t9 ## t16 ## t17 ## t14 ## t15 ## t1 ## t25 ## t26 ## t20 ## t21 ## ; ## END; ## BEGIN TREES; ## TRANSLATE ## 1 t2, ## 2 t13, ## 3 t22, ## 4 t23, ## 5 t29, ## 6 t30, ## 7 t24, 17 ## 8 t6, ## 9 t27, ## 10 t28, ## 11 t7, ## 12 t3, ## 13 t4, ## 14 t10, ## 15 t11, ## 16 t5, ## 17 t12, ## 18 t18, ## 19 t19, ## 20 t8, ## 21 t9, ## 22 t16, ## 23 t17, ## 24 t14, ## 25 t15, ## 26 t1,

Introduction to Phylogenetic Comparative Methods in R Pável Matos October 17, 2019

Phylogenetic Comparative Methods: a User's Guide for Paleontologists

Constructing a Phylogenetic Tree (Cladogram) K.L

Reading Phylogenetic Trees: a Quick Review (Adapted from Evolution.Berkeley.Edu)

Molecular Evolution and Phylogenetic Tree Reconstruction

Computational Methods for Phylogenetic Analysis

Phylogenetic Comparative Methods

Phylogeny and the Tree of Life 537 That Pines and ﬁrs Are Different Enough to Be Placed in Sepa- History

Phylogenetic Reconstruction and Divergence Time Estimation of Blumea DC

COMPUTING LARGE PHYLOGENIES with STATISTICAL METHODS: PROBLEMS & SOLUTIONS *Stamatakis A.P., Ludwig T., Meier H

Understanding Evolutionary Trees

Universal Common Ancestry, LUCA, and the Tree of Life: Three Distinct Hypotheses About the Evolution of Life

Activity: Constructing a Phylogeny by Dana Krempels and Julian Lee