Tutorial: Environment for Tree Exploration Release 2.3.6
Total Page:16
File Type:pdf, Size:1020Kb
Tutorial: Environment for Tree Exploration Release 2.3.6 Jaime Huerta-Cepas August 11, 2015 Contents 1 Changelog history3 1.1 What’s new in ETE 2.3...................................3 1.2 What’s new in ETE 2.2...................................5 1.3 What’s new in ETE 2.1...................................9 2 The ETE tutorial 13 2.1 Working With Tree Data Structures............................ 13 2.2 The Programmable Tree Drawing Engine......................... 43 2.3 Phylogenetic Trees..................................... 63 2.4 Clustering Trees...................................... 80 2.5 Phylogenetic XML standards............................... 85 2.6 Interactive web tree visualization............................. 92 2.7 Testing Evolutionary Hypothesis.............................. 94 2.8 Dealing with the NCBI Taxonomy database........................ 104 2.9 SCRIPTS: orthoXML................................... 108 3 ETE’s Reference Guide 115 3.1 Master Tree class...................................... 115 3.2 Treeview module...................................... 127 3.3 PhyloTree class....................................... 137 3.4 Clustering module..................................... 140 3.5 Nexml module....................................... 142 3.6 Phyloxml Module..................................... 157 3.7 Seqgroup class....................................... 160 3.8 WebTreeApplication object................................ 161 3.9 EvolTree class....................................... 161 3.10 NCBITaxa class...................................... 166 Bibliography 169 Python Module Index 171 Index 173 i ii Tutorial: Environment for Tree Exploration, Release 2.3.6 [Download PDF documentation] || Contents 1 Tutorial: Environment for Tree Exploration, Release 2.3.6 2 Contents CHAPTER 1 Changelog history 1.1 What’s new in ETE 2.3 1.1.1 Update 2.3.2 • added NCBITaxa.get_descendant_taxa() • added NCBITaxa.get_common_names() • ete ncbiquery: dump descendant taxa given a taxid or taxa name. new option ‘–descendants‘_; renamed ‘–taxonomy‘_ by ‘–tree‘_ • fixes <misaligned branches <https://github.com/jhcepas/ete/issues/113>‘_ in ultrametric tree im- ages using vt_line_width > 0 • fixes <windows installation problem <https://github.com/jhcepas/ete/issues/114>‘_ 1.1.2 New Modules tools A collection of command line tools, implementing common tree operations has been added to the ETE core package. All tools are wrapped by the ete command, which should become available in your path after installation. • ete build: Build phylogenetic trees using a using a number of predefined built-in gene-tree and species-tree workflows. Watch example • ete view: visualize and generate tree images directly form the command line. • ete compare: compare tree topologies based on any node feature (i.e. name, species name, etc) using the Robinson-Foulds distance and edge compatibility scores. • ete ncbiquery: query the ncbi taxonomy tree directly from the database. • ete mod: modify tree topologies directly from the command line. Allows rooting, sorting leaves, pruning and more • ete annotate: add features to the tree nodes by combining newick and text files. • ete generate: generate random trees, mostly for teaching and testing 3 Tutorial: Environment for Tree Exploration, Release 2.3.6 ncbi taxonomy The new ncbi_taxonomy module provides the class NCBITaxa, which allows to query a locally parsed NCBI taxonomy database. It provides taxid-name translations, tree annotation tools and other handy functions. A brief tutorial and examples on how to use it is available here 1.1.3 New features News in Tree instances • added TreeNode.iter_edges() and TreeNode.get_edges() • added TreeNode.compare() function • added TreeNode.standardize() utility function to quickly get rid of multifurcations, single-child nodes in a tree. • added TreeNode.get_topology_id() utility function to get an unique identifier of a tree based on their content and topology. • added TreeNode.expand_polytomies() • improved TreeNode.robinson_foulds() function to auto expand polytomies, filter by branch support, and auto prune. • improved TreeNode.check_monophyly() function now accepts unrooted trees as input • Default node is set to blank instead of the “NoName” string, which saves memory in very large trees. • The branch length distance of root nodes is set to 0.0 by default. • newick export allows to control the format of branch distance and support values. • Tree and SeqGroup instances allow now to open gzipped files transparently. News in the treeview module • improved SVG tree rendering • improved random_color() function (a list of colors can be fetch with a single call) • improved SeqMotifFace • Added RectFace • Added StackedBarFace 1.1.4 Highlighted Bug Fixes • Newick parser is now more strict when reading node names and branch distances, avoiding silent errors when parsing node names containing illegal symbols (i.e. ][)(,: ) • fixes several minor bugs when retrieving extra attributes in PhyloNode.get_speciation_trees(). • Tree viewer crashes when redrawing after changing node properties. • fixed installation problem using pip. 4 Chapter 1. Changelog history Tutorial: Environment for Tree Exploration, Release 2.3.6 • visualizing internal tree nodes as a circular tree produce crashes • math domain error in SequencePlotFace. • Fix likelihood calculation bug in EvolTree • Fix BarChartFace problem with negative numbers • Fix problem that produced TreeStyle attributes to be ignored in PhyloTree instances. 1.2 What’s new in ETE 2.2 1.2.1 BUGFIXES • Fixes in NeXML parser and exporting functions • Fixed ‘paste newick’ functionality on the GUI • Fixed PhyloNode.is_monophyletic() and moved to TreeNode.check_monophyly(). • Fixed consistency issued in TreeNode.sort_descendants() function. 1.2.2 SCRIPTS • Improvements in the standalone visualization script (a.k.a. ete2) • Added the etree2orthoxml script, which provides conversion between phylogenetic tree and the orthoXML format 1.2.3 NEW MODULES • New EvolNode tree object type is available as a part of adaptation-test extension recently devel- oped by François Serra (see Testing Evolutionary Hypothesis in the tutorial). 1.2.4 NEW FEATURES • News in core Tree instances: – Added TreeNode.robinson_foulds() distance to compare the topology of two trees (i.e. tree.robinson_foulds(tree2)). It includes automatic pruning to compare trees of different sizes. See tutorial and examples – Added new options to TreeNode.copy() function, allowing faster methods to duplicate tree node instances. See tutorial and examples 1.2. What’s new in ETE 2.2 5 Tutorial: Environment for Tree Exploration, Release 2.3.6 – Added preserve_branch_length argument to TreeNode.prune() and TreeNode.delete(), which allows to remove nodes from a tree while keeping original branch length distances among the remaining nodes. – Added TreeNode.resolve_polytomy() function to convert multifurcated nodes into an arbitrary structure of binary split nodes with distance. See tutorial and examples – Added TreeNode.get_cached_content() function, which returns a dictionary link- ing each node instance with its leaf content. Such a dictionary might be used as a cache to speed up functions that require intensive use of node traversing. See tutorial and examples – Improved TreeNode.get_ascii() function for text-based visualization of trees. A new attributes argument can be passed to display node attributes within the ASCII tree rep- resentation. from ete2 import Tree t= Tree("((A, B)Internal_1:0.7, (C, D)Internal_2:0.5)root:1.3;", format=1) t.add_features(size=4) print t.get_ascii(attributes=["name","dist","size"]) # # /-A, 0.0 # /Internal_1, 0.7 # | \-B, 0.0 # -root, 1.3, 4 # | /-C, 0.0 # \Internal_2, 0.5 # \-D, 0.0 # – Random branch length and support values generation is now available for the TreeNode.populate() function. – a new argument is_leaf_fn is available for a number of traversing functions, thus allow- ing to provide custom stopping criteria when browsing a tree. This is, any node matching the function provided through the is_leaf_fn argument will be temporarily considered as a terminal/leaf node by the traversing function (tree will look as a pruned version of itself). See tutorial and examples – Added TreeNode.iter_ancestors() and TreeNode.get_ancestors() func- tions. – Added TreeNode.iter_prepostorder() tree node iterator. – Newick parser accepts now the creation of single node trees. For example, a text string such as "node1;" will be parsed as a single tree node whose name is node1. By contrast, the newick string (node1); will be interpreted as an unnamed root node plus a single child named name1. – TreeNode.write() accepts now a format_root_node argument to export root node features as a part of the newick string. – The new TreeNode.check_monophyly() method allows to check if a node is mono, poly or paraphyletic for a given attribute and values (i.e. grouped species). Although mono- phyly is actually a phylogenetic concept, the idea can be applied to any tree, so any topology could be queried for the monophyly of certain attribute values. If not monophyletic, the method will return also the type of relationship connecting the provided values (para- or poly-phyletic). See tutorial and examples 6 Chapter 1. Changelog history Tutorial: Environment for Tree Exploration, Release 2.3.6 – New TreeNode.get_monophyletic() method that returns a list of nodes in a tree matching a custom monophyly criteria. • News PhyloTree instances: – Added PhyloNode.get_speciation_trees() method, which returns all