An R Package for Visualizing Bayesian Phylogenetic Analyses from Revbayes
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.05.10.443470; this version posted May 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. RevGadgets: an R Package for visualizing Bayesian phylogenetic analyses from RevBayes Carrie M. Tribble1, 2, 3, ∗, William A. Freyman4, Michael J. Landis5, Jun Ying Lim6, Joelle¨ Barido-Sottani7, Bjørn Tore Kopperud8, 9, Sebastian Hohna¨ 8, 9, and Michael R. May1, 2 1Department of Integrative Biology University of California, Berkeley, CA 94709, USA 2University Herbarium, University of California, Berkeley, CA 94709, USA 3Current address: School of Life Sciences, University of Hawai‘i at M¯anoa,Honolulu, HI, 96822, USA 423andMe, Inc., Sunnyvale, CA, 94086, USA 5Department of Biology, Washington University in St. Louis, MO 63130, USA 6School of Biological Sciences, Nanyang Technological University, Singapore 639798 7Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA 8GeoBio-Center, Ludwig-Maximilians-Universit¨atM¨unchen,80333 Munich, Germany 9Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universit¨atM¨unchen,80333 Munich, Germany ∗E-mail: [email protected] Summary 1. Statistical phylogenetic methods are the foundation for a wide range of evolutionary and epidemiological stud- ies. However, as these methods grow increasingly complex, users often encounter significant challenges with summarizing, visualizing, and communicating their key results. 2. We present RevGadgets, an R package for creating publication-quality figures from the results of a large variety of phylogenetic analyses performed in RevBayes (and other phylogenetic software packages). 3. We demonstrate how to use RevGadgets through a set of vignettes that cover the most common use cases that researchers will encounter. 4. RevGadgets is an open-source, extensible package that will continue to evolve in parallel with RevBayes, helping researchers to make sense of and communicate the results of a diverse array of analyses. [Bayesian phylogenetics, data visualization, R, RevBayes] 13 1 Introduction 1992) and are powerful epidemiological tools (Volz et al., 2013; Baele et al., 2017). 14 2 Beyond being a graphical representation of the Tree of Phylogenetic methods are increasingly based on ex- 15 3 Life, phylogenetic trees provide a rigorous basis for a plicit probabilistic models with parameters that de- 16 4 wide range of evolutionary and epidemiological infer- scribe underlying evolutionary processes. As datasets 17 5 ences. Phylogenetic methods allow researchers to under- grow and evolutionary hypotheses become more nu- 18 6 stand how molecular and morphological traits evolve anced, these models necessarily become more complex. 19 7 (Nei, 1987; Yang, 2014; Felsenstein, 1985; Harvey and RevBayes (Hohna¨ et al., 2016) is a Bayesian phylogenetic 20 8 Pagel, 1991), how lineages disperse over geographic inference program that was developed to accommodate 21 9 space (Ronquist and Sanmart´ın, 2011), and how lineages this increasing complexity and allows users to explore a 22 10 diversify over time (Morlon, 2014), among other evolu- vast space of phylogenetic models. Models in RevBayes 23 11 tionary phenomena. Additionally, phylogenetic meth- are specified as probabilistic graphical models (Hohna¨ 24 12 ods can be used to inform conservation decisions (Faith, et al., 2014), which are graphical representations of the 25 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.10.443470; this version posted May 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26 underlying dependencies among parameters (and their # specify the tree file 27 corresponding prior distributions), similar to individual file <- "bears.mcc.tre" 28 Legos being used to build a complex city. Using this 29 graphical modeling framework, users can design cus- # read the tree 30 tomized models and tailor analyses to their particular tree <- readTrees(paths = file) 31 datasets and research questions. However, this flexibil- 32 ity comes at a cost: because of the nearly infinite vari- # plot the tree 33 ety of possible models (and model combinations) that plotFBDTree(tree = tree, 34 users can explore in RevBayes, the results of these anal- timeline= TRUE, 35 yses are often challenging to summarize and visualize geo_units= "epochs", 36 using standard software. This is a significant limitation tip_age_bars= TRUE, 37 for RevBayes users because, in addition to being the pri- node_age_bars= TRUE, 38 mary method for reporting results of phylogenetic anal- age_bars_colored_by= "posterior", 39 yses, graphical summaries are a valuable tool for mak- label_sampled_ancs= TRUE)+ 40 ing sense of scientific results (Tufte, 2001), and for diag- ggplot2::theme(legend.position=c(0.05, 0.55)) 41 nosing modeling and analytical problems (Kerman et al., Ursus maritimus 1 : Ursavus brevirhinus 42 2008). 2 : Ursavus primaevus Ursus arctos 3 : Kretzoiarctos beatrix 43 Historically, RevBayes users have had to process and 4 : Ailurarctos lufengensis Ursus spelaeus 5 : Indarctos punjabiensis 44 plot their results using ad hoc scripts written for each Ursus thibetanus 45 analysis, which imposed a significant barrier to entry for Posterior Ursus americanus 46 users not familiar with the structure of RevBayes out- 1.0 Helarctos malayanus 47 put or comfortable with developing their own graphical 0.8 Melursus ursinus 0.6 48 summaries. To address these challenges, we developed 0.4 Tremarctos ornatus 49 RevGadgets. RevGadgets is an R package (R Core Team, 0.2 Arctodus simus 50 2020) that adds to the diverse ecosystem of phyloge- 0.0 Indarctos vireti 51 netic visualization tools—e.g., ape (Paradis and Schliep, 5 Indarctos arctoides 52 2019), Tracer (Rambaut et al., 2018), phytools (Rev- 1 2 3 53 ell, 2012), ggtree (Yu et al., 2017), FigTree (Rambaut, Agriarctos spp 4 54 2014), IcyTree (Vaughan, 2017), among many others— Ailuropoda melanoleuca 55 but is specialized for output produced by RevBayes. Ballusia elmensis 56 RevGadgets serves as a bridge between RevBayes anal- Zaragocyon daamsi 57 yses and existing tools for phylogenetic data processing 40 30 20 10 0 58 and plotting in R, especially the ggtree package suite, 59 which includes the ggtree, tidytree, and treeio pack- Eocene Miocene 60 Pliocene ages (Wang et al., 2020; Yu et al., 2017). RevGadgets pro- Oligocene Pleistocene 61 vides tools for plotting summary trees (including sum- Age (Ma) 62 maries of parameters for each branch), ancestral-state 63 estimates, and posterior distributions of parameters for 64 a variety of models. Using the general framework of Figure 1: Plotting a time-calibrated phylogeny of extinct and ex- tant taxa. Top) RevGadgets code for reading in and plotting a time- 65 ggplot2, the tidyverse, and associated packages (Wick- calibrated phylogeny of extant and extinct bears. We use the theme 66 ham, 2011; Wickham et al., 2019), plotting functions re- function from ggplot2 to add the posterior-probability legend. Bot- 67 turn plot objects with default, but customizable, aesthet- tom) The maximum sampled-ancestor clade-credibility (MSACC) tree 68 ics. Here, we present five vignettes demonstrating how for the bears. Sampled ancestors are indicated by numbers along the branches (legend, top left). Bars represent the 95% credible interval of 69 to use RevGadgets to summarize results for a variety of the age of the node, tip or sampled ancestor in millions of years (geo- 70 phylogenetic analyses. logical timescale, x-axis); the color of the bar corresponds to the poste- rior probability (legend, middle left) of that a clade exists, the posterior probability that a fossil is a sampled ancestor, or the posterior proba- bility that a tip is not a sampled ancestor. (Data from Abella et al., 2012; 71 Phylogenies Heath et al., 2014.) 72 Phylogenies are central to all analyses in RevBayes, so 73 accurate and information-rich visualizations of evolu- 78 74 tionary trees are critical. In this case study, we demon- branch-specific parameter estimates. 75 strate the tree-plotting functionality of RevGadgets, with RevGadgets provides paired functions for (1) reading 79 76 methods to visualize phylogenies and their associated in and processing data, and (2) summarizing and visual- 80 77 posterior probabilities, divergence-time estimates, and izing results. For phylogenies, the function readTrees() 81 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.10.443470; this version posted May 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 82 loads trees (either individual trees, or sets of trees) in ei- # specify the annotated tree file 83 ther Newick or NEXUS (Maddison et al., 1997) formats, file <- "relaxed_OU_MAP.tre" 84 then processes associated branch or node annotations, 85 and finally stores the tree(s) as treedata object(s) (as de- # read the tree 86 fined by treeio; Wang et al., 2020). Users can then vi- tree <- readTrees(paths = file) 87 sualize the treedata object using either plotTree() or 88 plotFBDTree(), as we demonstrate below. # plot the tree plotTree(tree = tree, 89 RevGadgets can plot both unrooted and rooted trees, tip_labels_italics= FALSE, 90 and creates plots that are compatible with plotting color_branch_by= "branch_thetas", 91 options from ggtree. Additionally, RevGadgets pro- line_width= 1.7)+ 92 vides extensive functionality for plotting trees with non- ggplot2::theme(legend.position=c(0.1, 0.9)) 93 contemporaneous tips, such as those produced by total- 94 evidence analyses under the fossilized birth-death [FBD] Stenella coeruleoalba Branch thetas 95 Stenella clymene process (Heath et al., 2014; Zhang et al., 2016).