Introduction to Phylogenetic Comparative Methods in R Pável Matos October 17, 2019

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Phylogenetic Comparative Methods in R Pável Matos October 17, 2019 Introduction to phylogenetic comparative methods in R Pável Matos October 17, 2019 This is a tutorial that captures the essence of comparative methods using phylogenies. We will use both simulated and real-world data, along with basic statistics implemented in R packages. The lecture is divided in three sections: 1) Handling and visualizing phylogenies and species traits data in the R environment; 2) Understanding the principles of Brownian motion and its use in evolutionary correlations among species traits; 3) Working with phylogenetically independent contrasts (PIC) and phylogenetic generalized least squares regression (PGLS) using R. Package installations This tutorial was created using R v.3.6.1. You can download the latest R version from the CRAN site. We require two of the most popular phylogenetic R packages: ape and phytools: • ape is an essential R package for handling phylogenetic trees and running analyses of comparative data, including ancestral state reconstruction, diversification rate analyses, and DNA distance computation. • phytools is also handy for manipulating phylogenetic trees and includes other comparative methods and functions not available in ape. We begin this tutorial by installing them: if(!require(ape))install.packages("ape") if(!require(phytools))install.packages("phytools") # the nlme package will allow us to fit Gaussian linear mixed-effects model if(!require(phytools))install.packages("nlme") # the dplyr package will help us to handle data tables if(!require(phytools))install.packages("dplyr") And loading the phylogenetic R packages into our R working space: library(ape) library(phytools) library(nlme) Other important phylogenetic packages include phylobase (manipulating trees and comparative data), geiger (methods for fitting evolutionary models to phylogneies), and caper (phylogenetic comparative analyses). You can visit the following CRAN Task View to see all R packages that can handle phylognetic and comparative data. Once we have our working R libraries correctly loaded, we can set the working directory by using the function setwd(). The working directory is the place where the data for this tutorial should be stored. setwd("C:/Users/pavel/Desktop") getwd() # check where your current working directory is ## [1] "C:/Users/pavel/Desktop" SECTION 1: Phylogenetic trees using R packages There are several ways to represent phylogenetic trees. The two most common types of trees are phylograms and dendrograms. 1 Phylograms contain information on topology (phylogenetic relationships among species) and branch lengths that represent evolutionary change (e.g., number of nucleotide substitutions). Let’s simulate and visualize a phylogram having 20 species. phylo = rtree(n=20) plot(phylo) t5 t13 t20 t10 t19 t17 t1 t18 t9 t7 t12 t8 t14 t3 t2 t16 t15 t4 t6 t11 Dendrograms are a special kind of phylograms, in that the tips of the tree have the same distance from the root. Dendrograms are also called ultrametric trees, which usually depict times of divergence (e.g., millions of years). Let’s again simulate a phylogeny, and have this as an ultrametric tree with 30 extant species dendro = pbtree(n=30) plot(dendro); axisPhylo() 2 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 To add node labels in the phylogeny, simply type nodelabels() 3 plot(dendro); axisPhylo(); nodelabels() t21 59 t20 57 t26 58 t25 t1 t15 51 56 t14 52 t17 55 t16 53 t9 45 54 t8 t19 31 50 49 t18 t12 43 46 t5 47 t11 48 t10 40 t4 44 t3 t7 41 t28 42 32 t27 t6 37 t24 38 t30 39 t29 33 t23 36 35 t22 34 t13 t2 3 2.5 2 1.5 1 0.5 0 HINT: To spread out the image, you can save the plot as pdf adjusting for width and height sizes. Running 4 the command below will save the figure in your working directory. pdf(file = "mytree.pdf", width =9, height = 12) plot(dendro); axisPhylo(); nodelabels() dev.off() # Now, look for the pdf file at your working directory We can edit and manipulate phylogenetic trees using the R packages ape and phytools. For example, we can find and highlight species of interest in the phylogeny. In this case, we want to find species “t10” in the phylogeny. find.species = c("t10") plot(dendro); axisPhylo(); add.arrow(dendro, tip = find.species, hedl = 0.05, col = "red", offset = 0.1) 5 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 We can also find the most recent common ancestor (MRCA) of two species. For example, let’s find the MRCA of species “t10” and “t13” in the phylogeny. 6 node = fastMRCA(dendro, "t10", "t13") plot(dendro); axisPhylo(); nodelabels(node = node) t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 32 t27 t6 t24 t30 t29 t23 t22 t13 t2 3 2.5 2 1.5 1 0.5 0 7 Additionally, if the MRCA falls inside the phylogeny, we can extract the subclade containing the MRCA and all of its descendants from the original phylogeny. sub.clade = extract.clade(dendro, node) # we can plot the extracted clade using # plot(sub.clade) 8 original tree extracted clade t21 t1 t20 t15 t26 t14 t25 t17 t1 t15 t16 t14 t9 t17 t8 t16 t19 t9 t18 t8 t12 t19 t5 t18 t12 t11 t5 t10 t11 t4 t10 t3 t4 t7 t3 t28 t7 t28 32 t27 32 t27 t6 t6 t24 t24 t30 t30 t29 t29 t23 t23 t22 t22 t13 t13 t2 t2 3 2 1 0 2.5 2 1.5 1 0.5 0 Alternatively, we can either prune or keep a number of species from the tree. For example, we want to remove the following 10 species out of our phylogeny: “t1”, “t2”, “t3”, “t4”, “t5”, 9 “t6”, “t7”, “t8”, “t9”, and “t10”. par(mfrow = c(1,2)) sp.set = c("t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10") plot(dendro, main = "original tree"); axisPhylo() pr1.tree = drop.tip(dendro, sp.set) plot(pr1.tree, main = "all except 10 species"); axisPhylo(); 10 original tree all except 10 species t21 t21 t20 t20 t26 t25 t26 t1 t25 t15 t14 t15 t17 t16 t14 t9 t17 t8 t19 t16 t18 t19 t12 t5 t18 t11 t12 t10 t4 t11 t3 t28 t7 t28 t27 t27 t24 t6 t24 t30 t30 t29 t29 t23 t23 t22 t22 t13 t2 t13 3 2 1 0 3 2.5 2 1.5 1 0.5 0 Or, we want to keep all the 10 taxa of interest in the phylogeny, and remove all other species. 11 par(mfrow = c(1,2)) plot(dendro, main="original tree"); axisPhylo() pr2.tree = drop.tip(dendro, setdiff(dendro$tip.label, sp.set)) plot(pr2.tree, main = "only 10 species"); axisPhylo(); 12 original tree only 10 species t21 t1 t20 t26 t25 t9 t1 t15 t14 t8 t17 t16 t9 t8 t5 t19 t18 t12 t10 t5 t11 t10 t4 t4 t3 t7 t3 t28 t27 t6 t7 t24 t30 t29 t23 t6 t22 t13 t2 t2 3 2 1 0 2.5 2 1.5 1 0.5 0 It is also possible to store many phylogenetic trees in one single R object (You can encounter a set of phylogenetic trees, for example, in the posterior distribution from a Bayesian phylogenetic inference). 13 Let’s store a copy of each of the 5 simulated trees above in one single R object called multi.phylo. multi.phylo = c(phylo, dendro, sub.clade, pr1.tree, pr2.tree) multi.phylo ## 5 phylogenetic trees We can call any element in the multi.phylo object. For example, we want to plot the first simulated phylogram. plot(multi.phylo[[1]]) t5 t13 t20 t10 t19 t17 t1 t18 t9 t7 t12 t8 t14 t3 t2 t16 t15 t4 t6 t11 Finally, we can save and load phylogenetic trees in several formats. The two most common tree formats are newick and nexus, which can be recognized by R functions having .tree() and .nexus(), respectively. To save the phylogenies, type the following: write.tree(dendro, file="dendrogram.newick.tre") write.nexus(dendro, file="dendrogram.nex.tre") Now, you can load the saved tree files. 14 dendro.newick = read.tree(file="dendrogram.newick.tre") dendro.nexus = read.nexus(file="dendrogram.nex.tre") # plot the loaded dendrogram saved in newick format plot(dendro.newick) 15 t21 t20 t26 t25 t1 t15 t14 t17 t16 t9 t8 t19 t18 t12 t5 t11 t10 t4 t3 t7 t28 t27 t6 t24 t30 t29 t23 t22 t13 t2 Let’s have a look at how both tree files are structured. 16 # newick format writeLines(readLines("dendrogram.newick.tre")) ## ((((t2:1.053143244,(t13:0.485929273,(t22:0.1143679921,t23:0.1143679921):0.3715612808):0.5672139707):0.8313475122,(((t29:0.004229910377,t30:0.004229910377):0.08356128786,t24:0.08779119824):0.5997932488,t6:0.687584447):1.196906309):0.8978588367,(((t27:0.05897751674,t28:0.05897751674):0.6279206004,t7:0.6868981171):1.822953508,((t3:0.8435732792,t4:0.8435732792):1.210003923,((((t10:0.5839565914,t11:0.5839565914):0.2271813406,t5:0.811137932):0.3769129703,(t12:0.4923276469,(t18:0.3339928798,t19:0.3339928798):0.158334767):0.6957232554):0.6220279902,((((t8:0.6140755837,t9:0.6140755837):0.4216540133,(t16:0.4729586968,t17:0.4729586968):0.5627709003):0.2148269302,(t14:0.4803316293,t15:0.4803316293):0.7702248979):0.332428687,t1:1.582985214):0.2270936782):0.2434983098):0.4562744233):0.2724979672):0.103205451,((t25:0.07774284778,t26:0.07774284778):0.4423811885,(t20:0.2324601897,t21:0.2324601897):0.2876638466):2.365431007); # nexus format writeLines(readLines("dendrogram.nex.tre")) ## #NEXUS ## [R-package APE, Thu Oct 17 12:10:07 2019] ## ## BEGIN TAXA; ## DIMENSIONS NTAX = 30; ## TAXLABELS ## t2 ## t13 ## t22 ## t23 ## t29 ## t30 ## t24 ## t6 ## t27 ## t28 ## t7 ## t3 ## t4 ## t10 ## t11 ## t5 ## t12 ## t18 ## t19 ## t8 ## t9 ## t16 ## t17 ## t14 ## t15 ## t1 ## t25 ## t26 ## t20 ## t21 ## ; ## END; ## BEGIN TREES; ## TRANSLATE ## 1 t2, ## 2 t13, ## 3 t22, ## 4 t23, ## 5 t29, ## 6 t30, ## 7 t24, 17 ## 8 t6, ## 9 t27, ## 10 t28, ## 11 t7, ## 12 t3, ## 13 t4, ## 14 t10, ## 15 t11, ## 16 t5, ## 17 t12, ## 18 t18, ## 19 t19, ## 20 t8, ## 21 t9, ## 22 t16, ## 23 t17, ## 24 t14, ## 25 t15, ## 26 t1,
Recommended publications
  • Phylogenetic Comparative Methods: a User's Guide for Paleontologists
    Phylogenetic Comparative Methods: A User’s Guide for Paleontologists Laura C. Soul - Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA David F. Wright - Division of Paleontology, American Museum of Natural History, Central Park West at 79th Street, New York, New York 10024, USA and Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA Abstract. Recent advances in statistical approaches called Phylogenetic Comparative Methods (PCMs) have provided paleontologists with a powerful set of analytical tools for investigating evolutionary tempo and mode in fossil lineages. However, attempts to integrate PCMs with fossil data often present workers with practical challenges or unfamiliar literature. In this paper, we present guides to the theory behind, and application of, PCMs with fossil taxa. Based on an empirical dataset of Paleozoic crinoids, we present example analyses to illustrate common applications of PCMs to fossil data, including investigating patterns of correlated trait evolution, and macroevolutionary models of morphological change. We emphasize the importance of accounting for sources of uncertainty, and discuss how to evaluate model fit and adequacy. Finally, we discuss several promising methods for modelling heterogenous evolutionary dynamics with fossil phylogenies. Integrating phylogeny-based approaches with the fossil record provides a rigorous, quantitative perspective to understanding key patterns in the history of life. 1. Introduction A fundamental prediction of biological evolution is that a species will most commonly share many characteristics with lineages from which it has recently diverged, and fewer characteristics with lineages from which it diverged further in the past. This principle, which results from descent with modification, is one of the most basic in biology (Darwin 1859).
    [Show full text]
  • Constructing a Phylogenetic Tree (Cladogram) K.L
    Constructing a Phylogenetic Tree (Cladogram) K.L. Wennstrom, Shoreline Community College Biologists use phylogenetic trees to express the evolutionary relationships among groups of organisms. Such trees are constructed by comparing the anatomical structures, embryology, and genetic sequences of different species. Species that are more similar to one another are interpreted as being more closely related to one another. Before you continue, you should carefully read BioSkills 2, “Reading a Phylogenetic Tree” in your textbook. The BioSkills units can be found at the back of the book. BioSkills 2 begins on page B-3. Steps in creating a phylogenetic tree 1. Obtain a list of characters for the species you are interested in comparing. 2. Construct a character table or Venn diagram that illustrates which characters the groups have in common. a. In a character table, the columns represent characters, beginning with the most common and ending with the least common. The rows represent organisms, beginning with the organism with the fewest derived characters and ending with the organism with the most derived characters. Place an X in the boxes in the table to represent which characters are present in each organism. b. In a Venn diagram, the circles represent the characters, and the contents of each circle represent the organisms that have those characters. Organism Characters Rose Leaves, flowers, thorns Grass Leaves Daisy Leaves, flowers Character Table Venn Diagram leaves thorns flowers Grass X Daisy X X Rose X X X 3. Using the information in your character table or Venn diagram, construct a cladogram that represents the relationship of the organisms through evolutionary time.
    [Show full text]
  • Reading Phylogenetic Trees: a Quick Review (Adapted from Evolution.Berkeley.Edu)
    Biological Trees Gloria Rendon SC11 Education June, 2011 Biological trees • Biological trees are used for the purpose of classification, i.e. grouping and categorization of organisms by biological type such as genus or species. Types of Biological trees • Taxonomy trees, like the one hosted at NCBI, are hierarchies; thus classification is determined by position or rank within the hierarchy. It goes from kingdom to species. • Phylogenetic trees represent evolutionary relationships, or genealogy, among species. Nowadays, these trees are usually constructed by comparing 16s/18s ribosomal RNA. • Gene trees represent evolutionary relationships of a particular biological molecule (gene or protein product) among species. They may or may not match the species genealogy. Examples: hemoglobin tree, kinase tree, etc. TAXONOMY TREES Exercise 1: Exploring the Species Tree at NCBI •There exist many taxonomies. •In this exercise, we will examine the taxonomy at NCBI. •NCBI has a taxonomy database where each category in the tree (from the root to the species level) has a unique identifier called taxid. •The lineage of a species is the full path you take in that tree from the root to the point where that species is located. •The (NCBI) taxonomy common tree is therefore the tree that results from adding together the full lineages of each species in a particular list of your choice. Exercise 1: Exploring the Species Tree at NCBI • Open a web browser on NCBI’s Taxonomy page http://www.ncbi.nlm.n ih.gov/Taxonomy/ • Click on each one of the names here to look up the taxonomy id (taxid) of each one of the five categories of the taxonomy browser: Archaea, bacteria, Eukaryotes, Viroids and Viruses.
    [Show full text]
  • Molecular Evolution and Phylogenetic Tree Reconstruction
    1 4 Molecular Evolution and 3 2 5 Phylogenetic Tree Reconstruction 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees • Nodes: species • Edges: time of independent evolution • Edge length represents evolution time § AKA genetic distance § Not necessarily chronological time Inferring Phylogenetic Trees Trees can be inferred by several criteria: § Morphology of the organisms • Can lead to mistakes § Sequence comparison Example: Mouse: ACAGTGACGCCCCAAACGT Rat: ACAGTGACGCTACAAACGT Baboon: CCTGTGACGTAACAAACGA Chimp: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA Distance Between Two Sequences Basic principle: • Distance proportional to degree of independent sequence evolution Given sequences xi, xj, dij = distance between the two sequences One possible definition: i j dij = fraction f of sites u where x [u] ≠ x [u] Better scores are derived by modeling evolution as a continuous change process Molecular Evolution Modeling sequence substitution: Consider what happens at a position for time Δt, • P(t) = vector of probabilities of {A,C,G,T} at time t • µAC = rate of transition from A to C per unit time • µA = µAC + µAG + µAT rate of transition out of A • pA(t+Δt) = pA(t) – pA(t) µA Δt + pC(t) µCA Δt + pG(t) µGA Δt + pT(t) µTA Δt Molecular Evolution In matrix/vector notation, we get P(t+Δt) = P(t) + Q P(t) Δt where Q is the substitution rate matrix Molecular Evolution • This is a differential equation: P’(t) = Q P(t) • Q => prob. distribution over {A,C,G,T} at each position, stationary (equilibrium) frequencies πA, πC, πG,
    [Show full text]
  • Computational Methods for Phylogenetic Analysis
    Computational Methods for Phylogenetic Analysis Student: Mohd Abdul Hai Zahid Supervisors: Dr. R. C. Joshi and Dr. Ankush Mittal Phylogenetics is the study of relationship among species or genes with the combination of molecular biology and mathematics. Most of the present phy- logenetic analysis softwares and algorithms have limitations of low accuracy, restricting assumptions on size of the dataset, high time complexity, complex results which are difficult to interpret and several others which inhibits their widespread use by the researchers. In this work, we address several problems of phylogenetic analysis and propose better methods addressing prominent issues. It is well known that the network representation of the evolutionary rela- tionship provides a better understanding of the evolutionary process and the non-tree like events such as horizontal gene transfer, hybridization, recom- bination and homoplasy. A pattern recognition based sequence alignment algorithm is proposed which not only employs the similarity of SNP sites, as is generally done, but also the dissimilarity for the classification of the nodes into mutation and recombination nodes. Unlike the existing algo- rithms [1, 2, 3, 4, 5, 6], the proposed algorithm [7] conducts a row-based search to detect the recombination nodes. The existing algorithms search the columns for the detection of recombination. The number of columns 1 in a sequence may be far greater than the rows, which results in increased complexity of the previous algorithms. Most of the individual researchers and research teams are concentrating on the evolutionary pathways of specific phylogenetic groups. Many effi- cient phylogenetic reconstruction methods, such as Maximum Parsimony [8] and Maximum Likelihood [9], are available.
    [Show full text]
  • Phylogenetic Comparative Methods
    Phylogenetic Comparative Methods Luke J. Harmon 2019-3-15 1 Copyright This is book version 1.4, released 15 March 2019. This book is released under a CC-BY-4.0 license. Anyone is free to share and adapt this work with attribution. ISBN-13: 978-1719584463 2 Acknowledgements Thanks to my lab for inspiring me, my family for being my people, and to the students for always keeping us on our toes. Helpful comments on this book came from many sources, including Arne Moo- ers, Brian O’Meara, Mike Whitlock, Matt Pennell, Rosana Zenil-Ferguson, Bob Thacker, Chelsea Specht, Bob Week, Dave Tank, and dozens of others. Thanks to all. Later editions benefited from feedback from many readers, including Liam Rev- ell, Ole Seehausen, Dean Adams and lab, and many others. Thanks! Keep it coming. If you like my publishing model try it yourself. The book barons are rich enough, anyway. Except where otherwise noted, this book is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https: //creativecommons.org/licenses/by/4.0/. 3 Table of contents Chapter 1 - A Macroevolutionary Research Program Chapter 2 - Fitting Statistical Models to Data Chapter 3 - Introduction to Brownian Motion Chapter 4 - Fitting Brownian Motion Chapter 5 - Multivariate Brownian Motion Chapter 6 - Beyond Brownian Motion Chapter 7 - Models of discrete character evolution Chapter 8 - Fitting models of discrete character evolution Chapter 9 - Beyond the Mk model Chapter 10 - Introduction to birth-death models Chapter 11 - Fitting birth-death models Chapter 12 - Beyond birth-death models Chapter 13 - Characters and diversification rates Chapter 14 - Summary 4 Chapter 1: A Macroevolutionary Research Pro- gram Section 1.1: Introduction Evolution is happening all around us.
    [Show full text]
  • Phylogeny and the Tree of Life 537 That Pines and firs Are Different Enough to Be Placed in Sepa- History
    how do biologists distinguish and categorize the millions of species on Earth? An understanding of evolutionary relationships suggests 26 one way to address these questions: We can decide in which “container” to place a species by comparing its traits with those of potential close relatives. For example, the scaly-foot does not have a fused eyelid, a highly mobile jaw, or a short tail posterior to the anus, three traits shared by all snakes. Phylogeny and These and other characteristics suggest that despite a superfi- cial resemblance, the scaly-foot is not a snake. Furthermore, a the Tree of Life survey of the lizards reveals that the scaly-foot is not alone; the legless condition has evolved independently in several different groups of lizards. Most legless lizards are burrowers or live in grasslands, and like snakes, these species lost their legs over generations as they adapted to their environments. Snakes and lizards are part of the continuum of life ex- tending from the earliest organisms to the great variety of species alive today. In this unit, we will survey this diversity and describe hypotheses regarding how it evolved. As we do so, our emphasis will shift from the process of evolution (the evolutionary mechanisms described in Unit Four) to its pattern (observations of evolution’s products over time). To set the stage for surveying life’s diversity, in this chapter we consider how biologists trace phylogeny, the evolution- ary history of a species or group of species. A phylogeny of lizards and snakes, for example, indicates that both the scaly- foot and snakes evolved from lizards with legs—but that they evolved from different lineages of legged lizards.
    [Show full text]
  • Phylogenetic Reconstruction and Divergence Time Estimation of Blumea DC
    plants Article Phylogenetic Reconstruction and Divergence Time Estimation of Blumea DC. (Asteraceae: Inuleae) in China Based on nrDNA ITS and cpDNA trnL-F Sequences 1, 2, 2, 1 1 1 Ying-bo Zhang y, Yuan Yuan y, Yu-xin Pang *, Fu-lai Yu , Chao Yuan , Dan Wang and Xuan Hu 1 1 Tropical Crops Genetic Resources Institute/Hainan Provincial Engineering Research Center for Blumea Balsamifera, Chinese Academy of Tropical Agricultural Sciences (CATAS), Haikou 571101, China 2 School of Traditional Chinese Medicine Resources, Guangdong Pharmaceutical University, Guangzhou 510006, China * Correspondence: [email protected]; Tel.: +86-898-6696-1351 These authors contributed equally to this work. y Received: 21 May 2019; Accepted: 5 July 2019; Published: 8 July 2019 Abstract: The genus Blumea is one of the most economically important genera of Inuleae (Asteraceae) in China. It is particularly diverse in South China, where 30 species are found, more than half of which are used as herbal medicines or in the chemical industry. However, little is known regarding the phylogenetic relationships and molecular evolution of this genus in China. We used nuclear ribosomal DNA (nrDNA) internal transcribed spacer (ITS) and chloroplast DNA (cpDNA) trnL-F sequences to reconstruct the phylogenetic relationship and estimate the divergence time of Blumea in China. The results indicated that the genus Blumea is monophyletic and it could be divided into two clades that differ with respect to the habitat, morphology, chromosome type, and chemical composition of their members. The divergence time of Blumea was estimated based on the two root times of Asteraceae. The results indicated that the root age of Asteraceae of 76–66 Ma may maintain relatively accurate divergence time estimation for Blumea, and Blumea might had diverged around 49.00–18.43 Ma.
    [Show full text]
  • COMPUTING LARGE PHYLOGENIES with STATISTICAL METHODS: PROBLEMS & SOLUTIONS *Stamatakis A.P., Ludwig T., Meier H
    COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS *Stamatakis A.P., Ludwig T., Meier H. Department of Computer Science, Technische Universität München Department of Computer Science, Ruprecht-Karls Universität Heidelberg e-mail: [email protected] *Corresponding author Keywords: evolution, phylogenetics, maximum likelihood, large phylogenies Summary The computation of ever larger as well as more accurate phylogenetic trees with the ultimate goal to compute the “tree of life” represents a major challenge in Bioinformatics. Statistical methods for phylogenetic analysis such as maximum likelihood or bayesian inference, have shown to be the most accurate methods for tree reconstruction. Unfortunately, the size of trees which can be computed in reasonable time is limited by the severe computational complexity induced by these statistical methods. However, the field has witnessed great algorithmic advances over the last 3 years which enable inference of large phylogenetic trees containing 500-1000 sequences on a single CPU within a couple of hours using maximum likelihood programs such as RAxML and PHYML. An additional order of magnitude in terms of computable tree sizes can be obtained by parallelizing these new programs. In this paper we briefly present the MPI-based parallel implementation of RAxML (Randomized Axelerated Maximum Likelihood), as a solution to compute large phylogenies. Within this context, we describe how parallel RAxML has been used to compute the –to the best of our knowledge- first maximum likelihood-based phylogenetic tree containing 10.000 taxa on an inexpensive LINUX PC-Cluster. In addition, we address unresolved problems, which arise when computing large phylogenies for real-world sequence data consisting of more than 1.000 organisms with maximum likelihood, based on our experience with RAxML.
    [Show full text]
  • Understanding Evolutionary Trees
    Evo Edu Outreach (2008) 1:121–137 DOI 10.1007/s12052-008-0035-x ORIGINAL SCIENCE/EVOLUTION REVIEW Understanding Evolutionary Trees T. Ryan Gregory Published online: 12 February 2008 # Springer Science + Business Media, LLC 2008 Abstract Charles Darwin sketched his first evolutionary with the great Tree of Life, which fills with its dead tree in 1837, and trees have remained a central metaphor in and broken branches the crust of the earth, and covers evolutionary biology up to the present. Today, phyloge- the surface with its ever-branching and beautiful netics—the science of constructing and evaluating hypoth- ramifications. eses about historical patterns of descent in the form of evolutionary trees—has become pervasive within and Darwin clearly considered this Tree of Life as an increasingly outside evolutionary biology. Fostering skills important organizing principle in understanding the concept in “tree thinking” is therefore a critical component of of “descent with modification” (what we now call evolu- biological education. Conversely, misconceptions about tion), having used a branching diagram of relatedness early evolutionary trees can be very detrimental to one’s in his exploration of the question (Fig. 1) and including a understanding of the patterns and processes that have tree-like diagram as the only illustration in On the Origin of occurred in the history of life. This paper provides a basic Species (Darwin 1859). Indeed, the depiction of historical introduction to evolutionary trees, including some guide- relationships among living groups as a pattern of branching lines for how and how not to read them. Ten of the most predates Darwin; Lamarck (1809), for example, used a common misconceptions about evolutionary trees and their similar type of illustration (see Gould 1999).
    [Show full text]
  • Universal Common Ancestry, LUCA, and the Tree of Life: Three Distinct Hypotheses About the Evolution of Life
    Biology & Philosophy (2018) 33:31 https://doi.org/10.1007/s10539-018-9641-3 Universal common ancestry, LUCA, and the Tree of Life: three distinct hypotheses about the evolution of life Joel Velasco1 Received: 31 March 2018 / Accepted: 23 August 2018 © Springer Nature B.V. 2018 Abstract Common ancestry is a central feature of the theory of evolution, yet it is not clear what “common ancestry” actually means; nor is it clear how it is related to other terms such as “the Tree of Life” and “the last universal common ancestor”. I argue these terms describe three distinct hypotheses ordered in a logical way: that there is a Tree of Life is a claim about the pattern of evolutionary history, that there is a last universal common ancestor is an ontological claim about the existence of an entity of a specific kind, and that there is universal common ancestry is a claim about a causal pattern in the history of life. With these generalizations in mind, I argue that the existence of a Tree of Life entails a last universal common ancestor, which would entail universal common ancestry, but neither of the converse entail- ments hold. This allows us to make sense of the debates surrounding the Tree, as well as our lack of knowledge about the last universal common ancestor, while still maintaining the uncontroversial truth of universal common ancestry. Keywords Tree of Life · Last universal common ancestor · LUCA · Common ancestry Introduction What exactly is meant by “universal common ancestry” and why are we so certain that it is true? There is no agreed upon definition, but we can work toward under- standing what it means by trying to be explicit about what it entails.
    [Show full text]
  • Activity: Constructing a Phylogeny by Dana Krempels and Julian Lee
    Evolution and Biodiversity Laboratory Activity: Constructing a Phylogeny by Dana Krempels and Julian Lee Systematics is a two-part endeavor. When studying a group of related organisms, the systematist must • Devise an hypothesis of their evolutionary relationships • Devise a classification that faithfully reflects the hypothetical relationships Different schools of thought in systematics have come and gone over the past decades. Cladistics is the modern survivor. We will be using the cladistic method of phylogenetic analysis. The Caminalcules will be our model organisms. Examine the Caminalcules in Figure 1. Each one is an Operational Taxonomic Unit (OTU). We use this term to avoid assigning each to a specific taxon. (Think of them as biological species, and refer to them by number.) Figure 1. A variety of Caminalcules, arranged in no particular order. systematics-1 I. Example: Using Synapomorphies to Construct a Phylogeny OTUs are grouped together on the basis of synapomorphies. The presence (or absence) of a synapomorphy in two or more OTUs is inferred to be the result of common ancestry. Results of a cladistic analysis are summarized in a phylogenetic tree called a cladogram (from the Greek clad meaning "branch"), an explicit hypothesis of evolutionary relationships. Following is an example of how to create a cladogram of our Figure 1 OTUs. Step One. Select a series of binary (i.e., two-state) characters. For example: character a: "eyes present" (+) versus "eyes absent" (-) character b: "body mantle present" (+) versus "body mantle absent" (-) character c: "paired, anterior non-jointed appendages present" (+) versus "paired, anterior non-jointed appendages not present" (-) character d: "anterior appendages flipperlike" (+) versus "anterior appendages not flipperlike" (-) character e: "eyes stalked" (+) versus "eyes not stalked" (-) character f: "body mantle posterior bulbous" (+) versus "body mantle posterior not bulbous" (-) character g: "eyes fused into one" (+) versus "eyes separate" (-) character h.
    [Show full text]