Downloadable from the After Collapsing

Downloadable from the After Collapsing

Genomics Proteomics Bioinformatics 14 (2016) 94–102 HOSTED BY Genomics Proteomics Bioinformatics www.elsevier.com/locate/gpb www.sciencedirect.com APPLICATION NOTE LVTree Viewer: An Interactive Display for the All-Species Living Tree Incorporating Automatic Comparison with Prokaryotic Systematics Guanghong Zuo 1,a, Xiaoyang Zhi 2,b, Zhao Xu 3,c, Bailin Hao 1,*,d 1 T-Life Research Center, Department of Physics, Fudan University, Shanghai 200433, China 2 Yunnan Institute of Microbiology, Yunnan University, Kunming 650091, China 3 Thermo Fisher Scientific, South San Francisco, CA 94080, USA Received 28 October 2015; revised 4 December 2015; accepted 21 December 2015 Available online 25 March 2016 Handled by Jingfa Xiao KEYWORDS Abstract We describe an interactive viewer for the All-Species Living Tree (LVTree). The viewer Archaea and Bacteria; incorporates treeing and lineage information from the ARB-SILVA website. It allows collapsing the 16S rRNA phylogeny; tree branches at different taxonomic ranks and expanding the collapsed branches as well, keeping Monophyly; the overall topology of the tree unchanged. It also enables the user to observe the consequence of Collapsing and expanding a trial lineage modifications by re-collapsing the tree. The system reports taxon statistics at all ranks tree; automatically after each collapsing and re-collapsing. These features greatly facilitate the compar- Lineage modifications ison of the 16S rRNA sequence phylogeny with prokaryotic taxonomy in a taxon by taxon manner. In view of the fact that the present prokaryotic systematics is largely based on 16S rRNA sequence analysis, the current viewer may help reveal discrepancies between phylogeny and taxonomy. As an application, we show that in the latest release of LVTree, based on 11,939 rRNA sequences, as few as 24 lineage modifications are enough to bring all but two phyla (Proteobacteria and Firmicutes)to monophyletic clusters. Introduction The All-Species Living Tree project (www.arb-silva.de/ projects/living-tree) is an initiative of the Editorial Office of Sys- * Corresponding author. E-mail: [email protected] (Hao B). tematic and Applied Microbiology in collaboration with the a ORCID: 0000-0002-7822-5969. ARB-SILVA (www.arb-silva.de) and List of Prokaryotic b ORCID: 0000-0003-2897-888X. Names with Standing in Nomenclature (LPSN, www.bacterio. c ORCID: 0000-0002-2978-7025. net) projects to construct a single phylogenetic tree for all d ORCID: 0000-0003-3547-564X. available type strains of Archaea and Bacteria based on Peer review under responsibility of Beijing Institute of Genomics, high-quality 16S rRNA sequences [1–4]. The latest release Chinese Academy of Sciences and Genetics Society of China. http://dx.doi.org/10.1016/j.gpb.2015.12.002 1672-0229 Ó 2016 The Authors. Production and hosting by Elsevier B.V. on behalf of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Zuo G et al / LVTree Viewer 95 LTPs123 (as of September 2015) contains 449 Archaea and A standard specifier ‘‘Unclassified” is used to indicate any 11,490 Bacteria sequences. However, it is a hard job to com- rank in classification lacking taxonomic assignment, e.g., prehend such a big tree by scrolling up and down the long <O>Unclassified<F>Unclassified<G>Vampirovibrio gives PDF, especially when one wishes to compare the phylogeny a genus name without order and family assignment. Lineage with taxonomy at all ranks from phylum down to species by information containing at least one specifier ‘‘Unclassified” is checking the monophyly of all the individual branches. considered incomplete. If there are ‘‘n” entries with complete A similar situation has been encountered while working on lineage information and ‘‘m” entries with incomplete or miss- the whole-genome-based Composition Vector Tree (CVTree) ing information for a given taxon, these numbers appear as web server [5–8] in order to enable the server to deal with many taxon{n+m}. Furthermore, the two lines <D>Archaea{422 thousands of prokaryotic genomes in one run. To address this +7} and <D>Bacteria{11156+334} are shown in red, issue, several distinctive features have been introduced in the because both are monophyletic, whereas non-monophyletic latest CVTree3 web server [8]. These features include (1) col- entries would appear in blue. An entry can be expanded by lapsing a monophyletic tree branch into a single leaf when clicking on the solid circle in front, whereas an entry with no the branch represents sequences from the same taxon accord- circle present contains only one single sequence and thus can- ing to a reference taxonomy; (2) expanding a collapsed leaf not be further expanded. In a screen with a partially-expanded to show the constituent leaves, while the overall topology of tree one may click on a blank circle to have the subordinate the tree is preserved during collapsing and expanding; (3) branches collapsed. reporting taxon statistics, i.e., the number of sequences repre- In the preamble line of a display screen, four boxes are sented by each taxon, monophyletic or non-monophyletic, in labeled as ‘‘Top Node”, ‘‘Lineage Modification”, ‘‘Monophyly the form of a hierarchical list; (4) enquiring a designated taxon List”, and ‘‘Search Query”. By clicking on ‘‘Monophyly List”, by entering its name in a ‘‘Search Query” box (this is the quick- a hierarchical summary of all taxa with statistics appears with est way to get to the point of interest; while the neighborhood ‘‘Yes” denoting a monophyletic taxon and ‘‘No” denoting a of the enquiry is properly expanded, all the other branches are non-monophyletic taxon, respectively. There are four options maximally collapsed); (5) making trial lineage modifications to present the summary: ‘‘Total” shows statistics of all taxa; and displaying the result of re-collapsing with renewed taxon ‘‘Monophyly” gives a list of all monophyletic taxa (since any statistics; and (6) outputting print-quality sub-tree figures in taxon represented by a single sequence is trivially ‘‘mono- different formats. phyletic”, one can hide such entries to get a more comprehen- All these features have been transplanted to the LVTree sible list); ‘‘None” lists the non-monophyletic taxa; and Viewer presented in this study. There are some technical points ‘‘Unclassified” shows entries with at least an ‘‘Unclassified” in the implementation of the Viewer. For more details, the specifier in the lineage information, i.e., those entries that cor- users are advised to consult the CVTree3 User’s Manual at respond to the augend ‘‘m” in the taxon{n+m} notation. The http://tlife.fudan.edu.cn/cvtree3/. quickest way of getting to a taxon of interest is using the ‘‘Search Query” function. For example, typing ‘‘Epsilonprote obacteria” to replace ‘‘Search Query” in the box would lead Description and usage of the LVTree Viewer to a collapsed tree with <C>Epsilonproteobacteria{111} dis- played in green, indicating that this class represented by 111 The LVTree Viewer is freely accessible at http://tlife.fudan. rRNA sequences has already been well defined, as shown in edu.cn/lvtree/ without login requirement. It takes the tree Figure 1. The ‘‘Top Node” pull down list is used to restore structure from LTPs123_SSU.tree.newick and lineage infor- the displayed tree to a designated node as the leftmost node mation from LPTs123_SSU.csv, both downloadable from the after collapsing. All-Species Living Tree project website. In fact, two previous The present prokaryotic taxonomy as collected in Bergey’s releases (LTPs121 and LTPs119) are also accessible on a rotat- Manual [9] and LPSN [10] is largely based on 16S rRNA ing basis. sequence analysis. Ideally speaking, one should expect full By clicking on ‘‘Show LVTree”, a maximally-collapsed tree agreement of the branching order in LVTree with the taxo- with two leaves, <D>Archaea{422+7} and <D>Bacteria nomic hierarchy. Given an input dataset and a well-tested {11156+334}, appears in the next screen. The taxon{n+m} method of phylogeny inference, the LVTree must be consid- notation is explained appropriately in what follows. Lineage ered as a fixed subject that cannot be adjusted or modified. information is available for each 16S rRNA sequence in To the contrary, prokaryotic taxonomy has always been a LVTree. When hovering the mouse over a taxon name, a small work in progress. Therefore, discrepancies revealed when com- popup window with lineage information appears for a few sec- paring LVTree with systematics most probably hint on prob- onds. For example, an entry may read as ‘‘<D>Bacteria lems in taxonomic assignments. However, there might be <P>Actinobacteria<C>Actinobacteria<O>Streptomycetales errors such as mislabeling of sequences in the underlying <F>Streptomycetaceae<G>Streptomyces<S>Streptomy LVTree dataset. The LVTree Viewer may help spot some of ces_longisporoflavus<T>Streptomyces_longisporoflavus__D the errors as well. Q442520...”, where <D>, <P>, <C>, <O>, <F>, The LVTree Viewer provides a mechanism to examine the <G>, <S>, and <T> stand for Domain, Phylum, Class, consequence of trial lineage modifications. A user may submit Order, Family, Genus, Species, and sTrain, respectively. The a Lineage Modification file, in which each entry describes a line above summarizes the correct lineage of a well-studied proposed modification: old_lineage<space>new_lineage. streptomycete species Streptomyces longisporoflavus, which When suggesting a lineage modification, the names under was initially described by Waksman and Henrici in 1953. It will <S> and <T> are always kept unchanged. In doing so, also be shown as an example of lineage modification in the no confusion would occur concerning the published or cited later part of this study. species names, which can always be used to do ‘‘Query 96 Genomics Proteomics Bioinformatics 14 (2016) 94–102 Figure 1 A collapsed All-Species Living Tree with the neighborhood of Epsilonproteobacteria expanded The tree shown here is the whole LVTree representing all 11,939 16S rRNA sequences.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us