Comparative Analysis of Plant Genomes Through Data Integration

Total Page:16

File Type:pdf, Size:1020Kb

Comparative Analysis of Plant Genomes Through Data Integration Comparative Analysis of Plant Genomes through Data Integration Michiel Van Bel Promoter: Prof. Dr. Yves Van de Peer Co-Promoter: Prof. Dr. Klaas Vandepoele Ghent University Faculty of Sciences Department of Plant Biotechnology and Bioinformatics VIB Department of Plant Systems Biology Bioinformatics and Systems Biology Dissertation submitted in fulfillment of the requirements for the degree of Doctor (PhD) in Sciences, Bioinformatics). Academic year: 2012-2013 Examination Committee Prof. Dr. Geert De Jaeger (chair) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Yves Van de Peer (promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Klaas Vandepoele (co-promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Jan Fostier Faculty of Engineering, Department of Information Technology, Ghent University Prof. Dr. Peter Dawyndt Faculty of Science, Department of Applied Mathematics and Computer Science, Ghent University Dr. Steven Robbens Bayer Cropscience, Belgium Dr. Matthieu Conte Syngenta Seeds, France II Acknowledgements While the cover of this book carries my name, this thesis did not come to fruition by my hand only. These past years have been a great experience, for which I would like to express my gratitude to several people. First of all, I would like to thank Thomas Abeel, for getting me in touch with Yves’ research group, and encouraging me to start a PhD in bioinformatics. Without a chance encounter with him, I never would have dreamed obtaining a PhD would be possible. Secondly, I would like to thank my promoter and co-promoter, Yves Van de Peer and Klaas Vande- poele. The opportunity Yves has given to me to pursue a PhD and the great research environment of Yves’ lab have proven to be invaluable. The constant support and patience of Klaas in guiding me form the fundaments of this PhD. Our numerous discussions on how to proceed with our shared research were definitely instrumental in my growth as a researcher. Thirdly, I would like to express my gratitude to the members of my PhD jury, for reading my thesis and evaluating my work. Next up in line to be thanked is Sebastian Proost: a great colleague and flat-mate. Most of my research was done in collaboration with him, and the results need to be seen as such. A big thank you as well for my fellow IT knowledgeable colleagues at the lab: Sofie Van Landeghem, Marijn Vandevoorde, Thomas Van Parys, Frederik Delaere and Kenny Billeau. Their beacon of computer related jokes, general fun and laughter, and overall support in the darkness of the biology department were definitely important in keeping me happy and working. I also want to thank Yvan Saeys, for guiding me through the rough first year of my PhD. Though our shared research didn’t pan out, it was a good learning experience on what to do and not to do. All Binari people, present and past, need to be thanked, as well as all the people within the BSB and bio- comp group. The overall fun and interesting discussions we had will be remembered. Honorary mention goes to Lieven Sterck, whose constant presence provides a great atmosphere of collegiality within the lab. Another group to be thanked consists of the people from the IT staff. Without their unwavering dedi- cation in keeping our servers running and our hard drives spinning, together with their tacit approval of my development skills on the web server, this PhD would have been a lot more difficult. Outside of the PSB building I would give a big thank you to all my friends, especially my former school mates. The efforts most of you put into obtaining a PhD really gave me that extra boost in confidence to continue my own research. And last but definitely not least I would like to thank my family: my brother, sisters and mother. Their IV constant support, interest and love gave me the strength these past 6 years to carry on. Table of Contents Examination Committee I Acknowledgements III 1 Research Purpose and Scope 1-1 1.1 Overview . 1-3 1.2 Creation of a Platform for Comparative and Evolutionary Genomics . 1-3 1.3 Creation of a Platform for Transcriptome Analysis . 1-3 2 Introduction 2-1 2.1 Abstract: A history of genetics . 2-3 2.2 Comparative and Evolutionary Genomics in Plants . 2-5 2.2.1 Duplications in Plant Genomes . 2-6 2.2.2 Orthology . 2-7 2.3 Functional Genomics . 2-8 2.3.1 Gene Ontology . 2-8 2.3.2 Protein Domains . 2-9 2.3.3 Molecular Interactions . 2-10 2.3.4 Text Mining . 2-10 2.4 Bioinformatics Tools and Platforms . 2-10 2.4.1 Web Visualizations and Technologies . 2-11 2.4.2 Online Plant Genomics Platforms . 2-13 2.5 Author Contribution . 2-14 3 PLAZA: a Comparative Genomics Resource to Study Gene and Genome Evolution in Plants 3-1 3.1 Introduction . 3-5 3.2 Results . 3-6 3.2.1 Data Assembly . 3-6 3.2.2 Delineating Gene Families and Subfamilies . 3-7 3.2.3 Projection of Functional Annotation Using Orthology . 3-10 3.2.4 Exploring Genome Evolution in Plants . 3-12 3.2.5 Database Access, User Interface, and Documentation . 3-14 3.3 Methods . 3-15 3.3.1 Data Retrieval and Delineation of Gene Families . 3-15 3.3.2 Comparison of OrthoMCL with Phylogenetic Trees . 3-16 3.3.3 Alignments and Phylogenetic Trees . 3-16 3.3.4 Functional Annotation . 3-17 3.3.5 Detection of Collinearity . 3-17 3.3.6 Relative Dating using Synonymous Substitutions . 3-18 3.4 Summary and Future Prospects . 3-18 VI 3.5 Author Contribution . 3-19 4 Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform 4-1 4.1 Introduction . 4-5 4.2 Results and Discussion . 4-6 4.2.1 Gene Annotation and Gene Families . 4-6 4.2.2 Core Plant Gene Families and Detection of Clade-specific or Expanded Gene Families . 4-6 4.2.3 Integrative Orthology Viewer: an Ensemble Approach to Detect Orthology Rela- tionships . 4-8 4.2.4 Clusters of Functionally Related Genes in Eukaryotic Genomes . 4-11 4.2.5 Colinearity-based Genome Analysis . 4-13 4.2.6 User Interactivity via Workbench and Bulk Downloads . 4-16 4.3 Material and Methods . 4-16 4.3.1 Gene Models and Gene Families . 4-16 4.3.2 Colinearity . 4-17 4.3.3 Functional Annotation . 4-17 4.3.4 Functional Gene Clusters . 4-17 4.3.5 Orthology Prediction and Evaluation . 4-18 4.4 Author Contribution . 4-18 5 PLAZA Applications 5-1 5.1 The Study of Gene Duplicates Using the PLAZA Platform . 5-5 5.1.1 Duplicated Resistence Genes in Arabidopsis and Poplar . 5-5 5.1.2 Tandem and Block Duplicates in Chlamydomonas reinhardtii .......... 5-5 5.2 Comparative Co-expression Analysis in Plants . 5-8 5.2.1 Construction of Co-expression Networks and Comparison Across Species of Co- expression . 5-8 5.2.2 Functional Annotation . 5-9 5.2.3 Studying Conserved Gene Functions Using Comparative Co-expression Analysis 5-9 5.3 Studying Algal Genomics Using the pico-PLAZA Platform . 5-11 5.3.1 Gene Dynamics in Algal Genomes . 5-12 5.3.2 Functional Analysis of Large-scale Expression Data . 5-14 5.3.3 Environmental Genomics . 5-14 5.4 Author contribution . 5-15 6 TRAPID, an Efficient Online Tool for the Functional and Comparative Analysis of De Novo RNA-Seq Transcriptomes 6-1 6.1 Introduction . 6-5 6.2 Results and Discussion . 6-6 6.2.1 General Properties of the TRAPID Transcript Analysis Platform . 6-6 6.2.2 Evaluation of Homology Assignments . 6-9 6.2.3 Evaluation of the ORF Finding Routine . 6-9 6.2.4 Comparison of TRAPID with Blast2GO and KAAS . 6-10 6.2.5 Detection of Functional Biases in Transcriptome Subsets Using Enrichment Anal- ysis......................................... 6-11 6.3 Material and Methods . 6-13 6.3.1 Datasets, Construction Reference Protein Databases and Selection of Gene Fam- ily Representatives . 6-13 VII 6.3.2 Similarity Search, Gene Family Assignment and Functional Transfer Using Ho- mology . 6-13 6.3.3 Frame Assignment and Detection of Putative Frameshifts . 6-13 6.3.4 Meta-annotation . 6-13 6.3.5 Correction Using FrameDP . 6-14 6.3.6 Multiple Sequence Alignments and Phylogenetic Trees . 6-14 6.3.7 Implementation . 6-14 6.4 Conclusion . 6-14 6.5 Author Contribution . 6-14 7 Technology and Development 7-1 7.1 Data Processing . 7-5 7.1.1 Data Parsing . 7-5 7.1.2 Data Validation . 7-5 7.2 Visualizations . 7-6 7.2.1 Graphs and Charts . 7-6 7.2.2 Phylogenetic Trees . 7-6 7.2.3 WGDotplot . 7-7 7.2.4 CirclePlot . ..
Recommended publications
  • Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
    Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays.
    [Show full text]
  • Abstracts Genome 10K & Genome Science 29 Aug - 1 Sept 2017 Norwich Research Park, Norwich, Uk
    Genome 10K c ABSTRACTS GENOME 10K & GENOME SCIENCE 29 AUG - 1 SEPT 2017 NORWICH RESEARCH PARK, NORWICH, UK Genome 10K c 48 KEYNOTE SPEAKERS ............................................................................................................................... 1 Dr Adam Phillippy: Towards the gapless assembly of complete vertebrate genomes .................... 1 Prof Kathy Belov: Saving the Tasmanian devil from extinction ......................................................... 1 Prof Peter Holland: Homeobox genes and animal evolution: from duplication to divergence ........ 2 Dr Hilary Burton: Genomics in healthcare: the challenges of complexity .......................................... 2 INVITED SPEAKERS ................................................................................................................................. 3 Vertebrate Genomics ........................................................................................................................... 3 Alex Cagan: Comparative genomics of animal domestication .......................................................... 3 Plant Genomics .................................................................................................................................... 4 Ksenia Krasileva: Evolution of plant Immune receptors ..................................................................... 4 Andrea Harper: Using Associative Transcriptomics to predict tolerance to ash dieback disease in European ash trees ............................................................................................................
    [Show full text]
  • Comparative Genomic Analysis of Three Pseudomonas
    microorganisms Article Comparative Genomic Analysis of Three Pseudomonas Species Isolated from the Eastern Oyster (Crassostrea virginica) Tissues, Mantle Fluid, and the Overlying Estuarine Water Column Ashish Pathak 1, Paul Stothard 2 and Ashvini Chauhan 1,* 1 Environmental Biotechnology Laboratory, School of the Environment, 1515 S. Martin Luther King Jr. Blvd., Suite 305B, FSH Science Research Center, Florida A&M University, Tallahassee, FL 32307, USA; [email protected] 2 Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G2P5, Canada; [email protected] * Correspondence: [email protected]; Tel.: +1-850-412-5119; Fax: +1-850-561-2248 Abstract: The eastern oysters serve as important keystone species in the United States, especially in the Gulf of Mexico estuarine waters, and at the same time, provide unparalleled economic, ecological, environmental, and cultural services. One ecosystem service that has garnered recent attention is the ability of oysters to sequester impurities and nutrients, such as nitrogen (N), from the estuarine water that feeds them, via their exceptional filtration mechanism coupled with microbially-mediated denitrification processes. It is the oyster-associated microbiomes that essentially provide these myriads of ecological functions, yet not much is known on these microbiota at the genomic scale, especially from warm temperate and tropical water habitats. Among the suite of bacterial genera that appear to interplay with the oyster host species, pseudomonads deserve further assessment because Citation: Pathak, A.; Stothard, P.; of their immense metabolic and ecological potential. To obtain a comprehensive understanding on Chauhan, A. Comparative Genomic this aspect, we previously reported on the isolation and preliminary genomic characterization of Analysis of Three Pseudomonas Species three Pseudomonas species isolated from minced oyster tissue (P.
    [Show full text]
  • Three New Genome Assemblies Support a Rapid Radiation in Musa Acuminata (Wild Banana)
    GBE Three New Genome Assemblies Support a Rapid Radiation in Musa acuminata (Wild Banana) Mathieu Rouard1,*, Gaetan Droc2,3, Guillaume Martin2,3,JulieSardos1, Yann Hueber1, Valentin Guignon1, Alberto Cenci1,Bjo¨rnGeigle4,MarkS.Hibbins5,6, Nabila Yahiaoui2,3, Franc-Christophe Baurens2,3, Vincent Berry7,MatthewW.Hahn5,6, Angelique D’Hont2,3,andNicolasRoux1 1Bioversity International, Parc Scientifique Agropolis II, Montpellier, France 2CIRAD, UMR AGAP, Montpellier, France 3AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, France 4Computomics GmbH, Tuebingen, Germany 5Department of Biology, Indiana University 6Department of Computer Science, Indiana University 7LIRMM, Universite de Montpellier, CNRS, Montpellier, France *Corresponding author: E-mail: [email protected]. Accepted: October 10, 2018 Data deposition: Raw sequence reads for de novo assemblies were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) (BioProject: PRJNA437930 and SRA: SRP140622). Genome Assemblies and gene annotation data are available on the Banana Genome Hub (Droc G, Lariviere D, Guignon V, Yahiaoui N, This D, Garsmeur O, Dereeper A, Hamelin C, Argout X, Dufayard J-F, Lengelle J, Baurens F–C, Cenci A, Pitollat B, D’Hont A, Ruiz M, Rouard M, Bocs S. The Banana Genome Hub. Database (2013) doi:10.1093/ database/bat035) (http://banana-genome-hub.southgreen.fr/species-list). Cluster and gene tree results are available on a dedicated database (http://panmusa.greenphyl.org) hosted on the South Green Bioinformatics Platform (Guignon et al. 2016). Additional data sets are made available on Dataverse: https://doi.org/10.7910/DVN/IFI1QU. Abstract Edible bananas result from interspecific hybridization between Musa acuminata and Musa balbisiana,aswellasamongsubspeciesin M.
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • Patterns of Flammability Across the Vascular Plant Phylogeny, with Special Emphasis on the Genus Dracophyllum
    Lincoln University Digital Thesis Copyright Statement The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). This thesis may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use: you will use the copy only for the purposes of research or private study you will recognise the author's right to be identified as the author of the thesis and due acknowledgement will be made to the author where appropriate you will obtain the author's permission before publishing any material from the thesis. Patterns of flammability across the vascular plant phylogeny, with special emphasis on the genus Dracophyllum A thesis submitted in partial fulfilment of the requirements for the Degree of Doctor of philosophy at Lincoln University by Xinglei Cui Lincoln University 2020 Abstract of a thesis submitted in partial fulfilment of the requirements for the Degree of Doctor of philosophy. Abstract Patterns of flammability across the vascular plant phylogeny, with special emphasis on the genus Dracophyllum by Xinglei Cui Fire has been part of the environment for the entire history of terrestrial plants and is a common disturbance agent in many ecosystems across the world. Fire has a significant role in influencing the structure, pattern and function of many ecosystems. Plant flammability, which is the ability of a plant to burn and sustain a flame, is an important driver of fire in terrestrial ecosystems and thus has a fundamental role in ecosystem dynamics and species evolution. However, the factors that have influenced the evolution of flammability remain unclear.
    [Show full text]
  • UNIVERSITY of CALIFORNIA, SAN DIEGO the Comparative Genomics
    UNIVERSITY OF CALIFORNIA, SAN DIEGO The Comparative Genomics of Salinispora and the Distribution and Abundance of Secondary Metabolite Genes in Marine Plankton A Dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Marine Biology by Kevin Matthew Penn Committee in charge: Paul R. Jensen, Chair Eric Allen Lin Chao Bradley Moore Brian Palenik Forest Rohwer 2012 UMI Number: 3499839 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI 3499839 Copyright 2012 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 Copyright Kevin Matthew Penn, 2012 All rights reserved The Dissertation of Kevin Matthew Penn is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2012 iii DEDICATION I dedicate this dissertation to my Mom Gail Penn and my Father Lawrence Penn they deserve more credit then any person could imagine. They have supported me through the good times and the bad times. They have never given up on me and they are always excited to know that I am doing well. They just want the best for me.
    [Show full text]
  • Appendix F.7
    APPENDIX F.7 Biological Evaluation Appendix F.7 Pacific Connector Gas Pipeline Project Biological Evaluation March 2019 Prepared by: Tetra Tech, Inc. Reviewed and Approved by: USDA Forest Service BIOLOGICAL EVALUATION This page intentionally left blank BIOLOGICAL EVALUATION Table of Contents INTRODUCTION ............................................................................................................... 1 PROPOSED ACTION AND ACTION ALTERNATIVES .................................................... 1 PRE-FIELD REVIEW ........................................................................................................ 4 RESULTS OF FIELD SURVEYS ...................................................................................... 4 SPECIES IMPACT DETERMINATION SUMMARY .......................................................... 5 DETAILED EFFECTS OF PROPOSED ACTION ON SPECIES CONSIDERED ............ 25 6.1 Global Discussion ........................................................................................................ 25 6.1.1 Analysis Areas and Current Environment ............................................................. 25 6.1.2 Impacts .................................................................................................................. 33 6.1.3 Conservation Measures and Mitigation ................................................................. 62 6.2 Species Accounts and Analysis of Impacts ................................................................. 63 6.2.1 Mammals ..............................................................................................................
    [Show full text]
  • Strategic Plan 2011-2016
    Strategic Plan 2011-2016 Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Mission The Wellcome Trust Sanger Institute uses genome sequences to advance understanding of the biology of humans and pathogens in order to improve human health. -i- Wellcome Trust Sanger Institute Strategic Plan 2011-2016 - ii - Wellcome Trust Sanger Institute Strategic Plan 2011-2016 CONTENTS Foreword ....................................................................................................................................1 Overview .....................................................................................................................................2 1. History and philosophy ............................................................................................................ 5 2. Organisation of the science ..................................................................................................... 5 3. Developments in the scientific portfolio ................................................................................... 7 4. Summary of the Scientific Programmes 2011 – 2016 .............................................................. 8 4.1 Cancer Genetics and Genomics ................................................................................ 8 4.2 Human Genetics ...................................................................................................... 10 4.3 Pathogen Variation .................................................................................................. 13 4.4 Malaria
    [Show full text]
  • Browsing Genomes with Ensembl Annotation
    Browsing genomes with EnsEMBL Annotation • During recent years release of large amounts of sequence data • Raw sequence data are not so useful on its own. They are most valuable when provided with comprehensive good quality annotation CCCAACAAGAATGTAAAATCTTTAAGTGCCTGTTTTCATACTTATTTGACCACCCTATCTCTAGAATCTTGCATGATG TCTAGCCCTAGTAGGATCAAAAAATACTTACAAAGCAACTGAATAGCTACATGAATAGATGGATGAATAAATGCATG GGTGGATGGATGGATTAATGAAATCATTTATATGACTTAAAGTTTGCAGAGGAGTATCATATTTGGAAGGCAGTAAG GAAGTCTGTGTAGTCGATGGTAAAGGCAATTGGGAAGTTTGTTAGGCACAATAGGTCAAAATTTGTTTTTGAAGTCC TGTTACTTCACGTTTCTTTGTTTCACTTTCTTAAAACAGGAAACTCTTTTCTATGATCATTCTTCCAGGGCCTGGCTCT TCATCTGCAACCCAGTAATATCCCTAATGTCAAAAAGCTACTGGTTTAATTCGTGCCATTTTCAAAGAGGACTACTGA ATTCTGATGTGGCTTCAAACATTTAGGTTAGGCATATCTAATGGAGAACTTGCAGCCACACTGACTTGTAGTGAAAT ATCTATTTTGAGCCTGCCCAGTGTTGCTTAAATTGTAGTTTTCCTTGCCAGCTATTCATACAAGAGATGTGAGAAGCA CCATAAAAGGCGTTGTGAGGAGTTGTGGGGGAGTGAGGGAGAGAAGAGGTTGAAAAGCTTATTAGCTGCTGTACGG TAAAAGTGAGCTCTTACGGGAATGGGAATGTAGTTTTAGCCCTCCAGGGATTCTATTTAGCCCGCCAGGAATTAACC TTGACTATAAATAGGCCATCAATGACCTTTCCAGAGAATGTTCAGAGACCTCAACTTTGTTTAGAGATCTTGTGTGGG TGGAACTTCCTGTTTGCACACAGAGCAGCATAAAGCCCAGTTGCTTTGGGAAGTGTTTGGGACCAGATGGATTGTAG GGAGTAGGGTACAATACAGTCTGTTCTCCTCCAGCTCCTTCTTTCTGCAACATGGGGAAGAACAAACTCCTTCATCC AAGTCTGGTTCTTCTCCTCTTGGTCCTCCTGCCCACAGACGCCTCAGTCTCTGGAAAACCGTGAGTTCCACACAGAG AGCGTGAAGCATGAACCTAGAGTCCTTCATTTATTGCAGATTTTTCTTTATATCATTCCTTTTTCTTTCCTATGATACT GTCATCTTCTTATCTCTAAGATTCCTTCCAGATTTTACAAATCTAGTTTACTCATTACTTGCTTACTTTTAATCATTCT TCCCCAACTCTCTGAAGCTCTAATATGCAAAGCCTTCCTAAGGGGTGTCAGAAATTTTTAGCTTTTTAAAAGAATAAA
    [Show full text]
  • Compact Graphical Representation of Phylogenetic Data and Metadata with Graphlan
    Compact graphical representation of phylogenetic data and metadata with GraPhlAn The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Asnicar, Francesco, George Weingart, Timothy L. Tickle, Curtis Huttenhower, and Nicola Segata. 2015. “Compact graphical representation of phylogenetic data and metadata with GraPhlAn.” PeerJ 3 (1): e1029. doi:10.7717/peerj.1029. http://dx.doi.org/10.7717/ peerj.1029. Published Version doi:10.7717/peerj.1029 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:17820708 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Compact graphical representation of phylogenetic data and metadata with GraPhlAn Francesco Asnicar1, George Weingart2, Timothy L. Tickle3, Curtis Huttenhower2,3 and Nicola Segata1 1 Centre for Integrative Biology (CIBIO), University of Trento, Italy 2 Biostatistics Department, Harvard School of Public Health, USA 3 Broad Institute of MIT and Harvard, USA ABSTRACT The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication- quality bioinformatics pipelines.
    [Show full text]