PANTHER Tutorial 2011.Pptx

Total Page:16

File Type:pdf, Size:1020Kb

PANTHER Tutorial 2011.Pptx PANTHER Classificaon System version 7 Huaiyu Mi Department of Preven3ve Medicine Keck School of Medicine University of Southern California USA August 27, 2011, ICSB Tutorial, Heidelberg, Germany 0 Outline • PANTHER Background – How PANTHER is built? • PANTHER Website at a Glance – Brief overview of all PANTHER pages • PANTHER Basic Func3onali3es • PANTHER Tools – Tutorial on tool usage 1 PANTHER BACKGROUND 2 PANTHER Database 3 4 What’s new in PANTHER 7.0? • Whole genome sequence coverage from 48 organisms. • New tree building algorithm (GIGA) for improved phylogene3c relaonships of genes and families. • Improved Hidden-Markov Models • Improved ortholog iden3ficaon. • Implement GO slim and PANTHER protein class for classifying genes and families. • Expanded sets of genomes and sequence iden3fier for PANTHER tools. • PANTHER Pathway diagram in SBGN. 5 PANTHER PROTEIN LIBRARY 6 What is PANTHER? PANTHER library (PANTHER/LIB) • a family tree Sequences • a mul3ple sequence alignment • an HMM PANTHER subfamily HMM models PANTHER GO slim and Protein Class Stas3c models Phylogene3c trees Mul3sequence (HMM) alignments • Molecular func3on • Biological process • Cellular component • Protein class 7 Building PANTHER Protein Family Library Select sequences Build clusters Curaon PANTHER Build MSA Protein Libray Build trees PANTHER GO slim Build and Protein Class HMMs ontology 8 Complete Gene Sets • 12 GO Reference Genomes • 36 other genomes to help reconstruct evolu3onary history – 14 bacterial genomes – 2 archaeal genomes – 2 fungal genomes – 2 plant genomes – 1 amoebozoan genome – 3 prost genomes – 2 protostome genomes – 10 deuterostome genomes 9 “Standard” set of protein coding genes and corresponding protein sequences Get list of genes in each genome • 48 genomes • Sources of genes – MOD Get list of all protein products – ENSEMBL from given source – NCBI (Entrez) • Sources of protein sequences Get mapping of – UniProt each protein – product to UniProt NCBI (Refseq) – ENSEMBL • One protein is selected for Select one each gene. “representative” protein for each gene 10 Building Clusters and MSA Score against PANTHER 6.1 • Family and subfamily IDs HMM library (PTHRxxxxx:SFx) are tracked as much as possible. • New IDs are assigned if no Interpro for necessary. Hit an HMM? addi3onal clusters • In PANTHER 7.2 (release in the end of 2011), all clusters with yes at least one sequence from the 12 MOD will be included in the Family cluster library. • MSA are built with ma, a freely available mul3ple sequence alignment sofware package (Katoh, Nucleic Acid MSA by ma Res., 30:3059-3066) 11 GIGA • An algorithm that makes phylogene3c inferences under the constraint of the species tree. • Use sequence–based distance from mul3ple sequence alignment at each step. – Speciaon – Duplicaon – Ortholog group (subfamily) Thomas, 2010 BMC Bioinformacs, 11:312 12 Phylogene3c inferences based on species tree speciaon speciaon “Fixed differences” between species 13 Speciaon event human human chimpanzee chimpanzee mouse mouse rat rat cow cow horse horse chicken chicken frog frog mosquito mosquito fruit fly fruit fly worm worm yeast yeast 14 human Duplicaon event chimpanzee human chimpanzee human human chimpanzee chimpanzee mouse mouse rat rat cow cow horse horse chicken chicken frog frog mosquito mosquito fruit fly fruit fly worm worm yeast yeast 15 PANTHER Phylogene3c Tree Tree from PTHR11537 • Green node: speciaon • Yellow node: duplicaon • Blue diamond: subfamily 16 PANTHER Protein Library Building 600,000 sequences from 48 62,972 subfamilies organisms Curaon annotated with GO terms and PANTHER pathways. 400,000 sequences In 6594 family clusters 17 Tree Representaon of Subfamilies 18 MSA 19 PANTHER Ontology in Tree 20 PANTHER in InterPro 21 PANTHER in FlyBase 22 PANTHER and Gene Ontology Reference Genome Project 23 PANTHER PATHWAY 24 Goals • Go beyond individual protein. • To understand how mul3ple proteins work together in a complex system. • To build an integrated infrastructure with expert-curated pathways. • To help to establish a standard that will enable the content to be used across a large number of sofware applicaons. • The system should allow users to: – Predict gene and protein func3ons – Analyze research data – Navigate or browse literatures – Design new experiments 25 Biological process ontology vs. Pathway 26 Phylogene3c relaonships help pathway building M p A A 27 Phylogene3c relaonships help pathway building M p A A >40,000 orthologous trees A p X X 28 Phylogene3c relaonships help pathway building M p A A >40,000 orthologous trees A p X X 29 Two approaches to build pathways databases • Boom-up – Start from individual protein/reac3on – Build species specific pathways (or par3al pathways) – Infer to other organisms based on orthologue mapping – Generate a more comprehensive pathway map – Example databases: MetaCyc and Reactome • Top-down – Start with pathways at the conceptual level, usually based on review papers or textbooks – Build a comprehensive pathway map – Assign protein sequences to the pathway 30 PANTHER Pathway Data Structure PANTHER Pathway • A pathway diagram Pathway • Curate the pathway • Display the pathway Reac3on Pathway Molecule Cell type/ • Unambiguous graphical Classes Cellular locaon representation of pathway data Sequences • Structured data for pathway PANTHER subfamily HMM models • Link pathway classes to the sequence database Stas3c models Phylogene3c tree Mul3sequence (HMM) alignment PANTHER library 31 PANTHER Pathway Data Structure • Catalysis • Transition • Nucleus • Transcription and translation • Mitochondria activation/inhibition • Cytoplasm PANTHER pathway • Activation / Inhibition • Nerve terminal • Phosphorylation / dephosphorylation • Lymphocyte • Complex formation Pathway • Astrocytes • Transportation • Upstream / downstream Reac3on Pathway Molecule Cell type/ Classes Cellular locaon Sequences • Proteins: receptor, kinase • Genes:PANTHER subfamily HMM models receptor gene, kinase gene • Simple molecules: Glucose, pyruvate, • Ions: Calcium ion Stas3c models Phylogene3c tree Mul3sequence (HMM) • Phenotypes: stress, glucose deprivationalignment • This entity is also used to link out to other pathways. PANTHER library 32 CellDesigner 33 Pathway Curaon Process Iden3fy pathways To curate Iden3fy curators CellDesigner Pathway Diagrams SBML parser Pathway Index PANTHER library Pathway DB Pathway curaon Web infrastructure PANTHER database Pathway diagram With library sequences applet Associated to pathways Web delivery 34 35 Ac3vity flow view 36 Standard view 37 SBGN-PD view 38 History of PANTHER • 1998: Project was launched at Molecular Applicaon Group. • 1999: Acquired by Celera Genomics. • 2000: PANTHER 1 released in Celera Discovery Systems (CDS). • 2001: PANTHER 2 released, which is used in the annotaon of the first published human genome Celera. • 2002: PANTHER 3 released. PANTHER annotaons are integrated in FlyBase. Moved to ABI • 2003: PANTHER 4 released with the public release of PANTHER Classificaon System. • 2005: PANTHER 5 released with PANTHER Pathway and analysis tool. Establish collaboraon with Interpro. • 2006: PANTHER 6 released. Move to SRI. • 2010: PANTHER 7 released. • 2011: Move to USC. 39 User Stas3cs • 12,000 visits per month • From over 90 countries and territories with USA, India, UK, Germany, China, Japan, Canada, France, Australia and Netherland on the top 10. • 130,000 page views per month • Cited in 2280 scien3fic papers (up to August 2011) 40 PANTHER Stas3cs • 48 organisms • 400,000 genes • 62,972 subfamilies • 6,594 families • 165 pathways 41 PANTHER WEBSITE AT A GLANCE 42 43 Main menu tabs to access to each subject main page 44 PANTHER keyword search and HMM score. 45 Quick links to popular PANTHER func3onali3es. 46 PANTHER news and publicaons. 47 PANTHER Website Pges • List page – Gene list page – Family/subfamily list page – Ontology or pathway list page • Informaon detail page – Gene detail page – Family/subfamily detail page – Pathway descrip3on page – Pathway molecule class detail page – Ontology term detail page • Graph and diagram page – Pie chart – Pathway diagram – Tree viewer 48 PANTHER Gene List Page 49 PANTHER Gene List Page • Click to view the pie chart 50 PANTHER Gene List Page Choose an organism to display your gene list. 51 PANTHER Gene List Page • Sort the list by clicking the column name. • Collapse the column(s) by clicking on the “x” icon. 52 PANTHER Gene List Page • Convert the gene list to another list type 53 PANTHER Gene List Page • Export the list to o Workspace – Need to register an account o File on your computer o Text on the website 54 Gene Detail Page • Informaon is divided into 3 sec3ons – General informaon about the gene • Including IDs, names, gene symbol, alternave IDs, etc. – PANTHER classificaon of the gene • PANTHER family and subfamily informaon. • Links to view the tree and MSA • PANTHER GO slim and protein class – Orthlogs of the gene 55 Gene Detail Page • Columns – ID – Unique gene iden3fiers in PANTHER – Organism - The modern-day organisms in which the ortholog is found. For paralogs, the organism column gives the two speciaon events between which the duplicaon occurred that generated the paralogous genes. ”ND” means ”not determined”. Thus different paralogs can be dis3nguished by how long ago the relevant duplicaons occurred. – Type • LDO - least diverged ortholog • O - other, more diverged orthologs (in case of gene duplicaon) • P - paralogs • Orthologs are genes that can be traced to the same gene in the genome of their most recent common ancestor species.
Recommended publications
  • PANTHER Version 11: Expanded Annotation Data from Gene Ontology and Reactome Pathways, and Data Analysis Tool Enhancements
    Published online 28 November 2016 Nucleic Acids Research, 2017, Vol. 45, Database issue D183–D189 doi: 10.1093/nar/gkw1138 PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements Huaiyu Mi*, Xiaosong Huang, Anushya Muruganujan, Haiming Tang, Caitlin Mills, Diane Kang and Paul D. Thomas* Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA Received October 05, 2016; Revised October 27, 2016; Editorial Decision October 28, 2016; Accepted November 16, 2016 ABSTRACT INTRODUCTION The PANTHER database (Protein ANalysis THrough Protein ANalysis THrough Evolutionary Relationships Evolutionary Relationships, http://pantherdb.org) (PANTHER) is a multifaceted data resource for classifica- contains comprehensive information on the evolu- tion of protein sequences by evolutionary history, and by tion and function of protein-coding genes from 104 function. Protein-coding genes from 104 organisms are clas- completely sequenced genomes. PANTHER software sified by evolutionary relationships, and by structured rep- resentations of protein function including the Gene Ontol- tools allow users to classify new protein sequences, ogy (GO) and biological pathways. The foundation of PAN- and to analyze gene lists obtained from large-scale THER is a comprehensive ‘library’ of phylogenetic trees genomics experiments. In the past year, major im- of protein-coding gene families. These trees attempt to re- provements include a large expansion of classifica- construct the evolutionary events (speciation, gene duplica- tion information available in PANTHER, as well as tion and horizontal gene transfer) that led to the modern- significant enhancements to the analysis tools.
    [Show full text]
  • Exploring the Roles of Horizontal Gene Transfer in Metazoans by Dongliang Chen June, 2016 Director: Dr. Jinling Huang DEPARTMEN
    Exploring the Roles of Horizontal Gene Transfer in Metazoans By Dongliang Chen June, 2016 Director: Dr. Jinling Huang DEPARTMENT OF BIOLOGY Horizontal gene transfer (HGT; also known as lateral gene transfer, LGT) refers to the movement of genetic information between distinct species by overcoming normal mating barriers. Historically HGT is only considered to be important in prokaryotes. Some researchers believe that eukaryotes have sexual recombination and HGT is insignificant. However, HGT has also been found to play roles in many aspects of eukaryotic evolution, like parasitism and the colonization of land by plants, although at lower frequencies than in prokaryotes. In this dissertation, I first estimated the scope of HGT in 16 selected metazoan species by genome screening using AlienG. These species are sampled to represent major lineages of metazoans. Among all the 16 species, Nematostella vectensis (4.08%) has the highest percentage of HGT genes, while parasitic Schistosoma japonicum (0.47%) ranks the lowest. In order to find out which factors are correlated with HGT rates in different species, living habitat, diet, lineage group and reproductive type were analyzed in a statistical framework. In Chapter 3 and Chapter 4, Ciona intestinalis and Trichoplax adhaerens were chosen as models to investigate horizontally acquired genes. Tunicate cellulose synthase was discovered to originate from green algae, instead from bacteria as found in previous studies. 43 genes of 21 families in T. adhaerens were found to be horizontally acquired.
    [Show full text]
  • Tutorial MICROBESONLINE Site Guide & Tutorial
    MICROBES ONLINE Virtual Institute for Microbial Stress and Survival Site Guide & Tutorial MICROBESONLINE Site Guide & Tutorial 2008 Virtual Institute for Microbial Stress and Survival, http://vimss.lbl.gov • http://www.microbesonline.org Ernest Orlando Lawrence Berkeley National Laboratory 1 Cyclotron Road • Berkeley, CA 94720 Table of Contents SITE OVERVIEW ............................................................................................................... 2 SYSTEM REQUIREMENTS .......................................................................................................... 2 SITE PRIVACY POLICY ............................................................................................................ 3 DATA RELIABILITY DISCLAIMER ................................................................................................ 3 DNA Sequences .................................................................................................... 3 Protein-coding Gene Predictions........................................................................ 4 Non-coding RNA Gene Predictions .................................................................... 4 Protein Homology.................................................................................................. 4 Gene Names & Descriptions................................................................................ 5 EC Assignments & Metabolic Maps .................................................................... 5 Orthologs...............................................................................................................
    [Show full text]
  • Flybase: Enhancing Drosophila Gene Ontology Annotations
    Published online 23 October 2008 Nucleic Acids Research, 2009, Vol. 37, Database issue D555–D559 doi:10.1093/nar/gkn788 FlyBase: enhancing Drosophila Gene Ontology annotations Susan Tweedie1,*, Michael Ashburner1, Kathleen Falls2, Paul Leyland1, Peter McQuilton1, Steven Marygold1, Gillian Millburn1, David Osumi-Sutherland1, Andrew Schroeder2, Ruth Seal1, Haiyan Zhang2 and The FlyBase Consortiumy 1Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK and 2The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA Received September 19, 2008; Revised October 8, 2008; Accepted October 9, 2008 ABSTRACT activity’ will automatically gather those products labelled with the more specific types of kinase activity. FlyBase (http://flybase.org) is a database of GO annotation comprises at least three components: a Drosophila genetic and genomic information. Gene GO term that describes molecular function, biological role Ontology (GO) terms are used to describe three or subcellular location; an ‘evidence code’ that describes attributes of wild-type gene products: their molecu- the type of analysis used to support the GO term lar function, the biological processes in which they (Table 1); and an attribution to a specific reference. play a role, and their subcellular location. This arti- There may also be supporting evidence for the choice of cle describes recent changes to the FlyBase GO GO term in the form of database cross-references; for annotation strategy that are improving the quality instance, a gene function may be ‘inferred from genetic of the GO annotation data. Many of these changes interaction’, in which case an identifier for the interacting stem from our participation in the GO Reference gene will be included.
    [Show full text]
  • Homology Relationships for Millions of Proteins
    FastBLAST: Homology Relationships for Millions of Proteins Morgan N. Price1,2*, Paramvir S. Dehal1,2, Adam P. Arkin1,2,3 1 Physical Biosciences Divison, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, 2 Virtual Institute for Microbial Stress and Survival, Berkeley, California, United States of America, 3 Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America Abstract Background: All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. Methodology/Principal Findings: We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database (‘‘NR’’), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results.
    [Show full text]
  • Chemical Proteomics: Mass Spectrometry-Based Label-Free Quantitative Approach in Cell Systems
    Chemical Proteomics: mass spectrometry-based label-free quantitative approach in cell systems Catarina Guimarães Frazão de Faria Thesis to obtain the Master Science Degree in Biological Engineering Supervisors: PhD Alfonsina D’Amato Prof. Maria Matilde Soares Duarte Marques Examination Committee Chairperson: Prof. Arsénio do Carmo Sales Mendes Fialho Supervisor: Prof. Maria Matilde Soares Duarte Marques Member of the Committee: Dra. Alexandra Maria Moita Antunes October 2018 ACKNOWLEDGMENTS First and foremost, I would like to dedicate this work to my grandparents Maria Helena and José Manuel, who unfortunately did not live long enough to accompany my latest journeys yet had always encouraged me to pursue my goals. The second note goes to my parents, for their categorical support, and for their effort to financially make it possible for me to embark on a lifetime experience: the 6-months-Erasmus in Milan that I had long dreamed about, to do the research internship that prospered this thesis. All and all, they are the people that over the years gave all the tools for me to build my character, to enrich my vision on things, and to thrive. I am sincerely thankful for the opportunity of work and the conditions presented at Unità di Ricerca di Analisi Farmaceutica e Biofarmaceutica, Dipartimento di Scienze Farmaceutiche (DISFARM), Università degli Studi di Milano, by the hands of Prof. Marina Carini and Prof. Giancarlo Aldini, who accepted to receive me, and Alfonsina D’Amato PhD, in the quality of my supervisor. To my Italian colleagues, with whom I shared most of my time at the lab, words cannot describe how incredibly fortunate I feel to have found myself in such a joyful, enthusiastic and caring environment.
    [Show full text]
  • PANTHER User Manual
    PANTHER User Manual For PANTHER 9.0 Date: January 7, 2015 Authors: The PANTHER Team Contents 1 Welcome to PANTHER System1 3.1.4 Ontologies............. 33 1.1 About this document...........1 3.1.5 PANTHER Tools.......... 33 1.2 How to cite PANTHER..........1 3.1.6 Workspace............ 33 1.3 PANTHER help..............2 3.2 List pages................. 33 1.4 Overview.................2 3.2.1 Gene list page........... 33 3.2.2 Family/subfamily list page..... 35 2 PANTHER Home Page4 3.2.3 Pathway list page......... 36 2.1 Gene List Analysis............4 3.2.4 Pathway component list page... 37 2.1.1 Available annotation data for the 3.2.5 Ontology term list page...... 39 analyses..............5 3.3 Information detail pages.......... 40 2.1.2 Step 1. Enter IDs or select a list file 3.3.1 Gene detail page......... 40 from your computer or workspace.6 3.3.2 Family/subfamily detail page.... 42 2.1.3 Step 2. Select organism......8 3.3.3 Ontology term detail page..... 44 2.1.4 Step 3.Select Analysis.......8 3.3.4 Pathway description page..... 45 2.2 Browse.................. 19 3.3.5 Pathway molecular class (compo- 2.3 Sequence Search............. 23 nent) detail page.......... 46 2.4 cSNP Scoring............... 24 3.4 Graph and diagram pages........ 48 2.4.1 Input protein and substitution data. 24 3.4.1 Pie charts............. 48 2.4.2 Results of cSNP analysis tool... 25 3.4.2 Pathway diagram......... 49 2.5 Keyword Search.............. 26 3.4.3 TreeViewer...........
    [Show full text]
  • A Library of Protein Families and Subfamilies Indexed by Function Paul D
    Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Methods PANTHER: A Library of Protein Families and Subfamilies Indexed by Function Paul D. Thomas,1,3 Michael J. Campbell,1 Anish Kejariwal, Huaiyu Mi, Brian Karlak,2 Robin Daverman, Karen Diemer, Anushya Muruganujan, and Apurva Narechania Protein Informatics, Celera Genomics, Foster City, California 94404, USA In the genomic era,one of the fundamental goals is to characterize the func tion of proteins on a large scale. We describe a method,PANTHER,for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of “books,” each representing a protein family as a multiple sequence alignment,a Hidden Markov Model (HMM),and a family tre e. Functional divergence within the family is represented by dividing the tree into subtrees based on shared function,and by subtree HMMs. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies. We apply PANTHER to three areas of active research. First,we report the size and sequence diversity of the families and subfamilies,ch aracterizing the relationship between sequence divergence and functional divergence across a wide range of protein families. Second,we use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes. Third,we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs),on a database-wide scale, according to their likelihood of affecting protein function.
    [Show full text]
  • The PANTHER Database of Protein Families, Subfamilies, Functions and Pathways
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by PubMed Central D284–D288 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki078 The PANTHER database of protein families, subfamilies, functions and pathways Huaiyu Mi, Betty Lazareva-Ulitsky, Rozina Loo, Anish Kejariwal, Jody Vandergriff, Steven Rabkin, Nan Guo, Anushya Muruganujan, Olivier Doremieux, Michael J. Campbell, Hiroaki Kitano1 and Paul D. Thomas* Computational Biology, Applied Biosystems, 850 Lincoln Center Drive, Foster City, CA 94404, USA and 1The Systems Biology Institute and ERATO-SORST Kitano Symbiotic Systems Project/Japan Science and Technology Agency, Suite 6A, M31, 6-31-15 Jingumae, Shibuya, Tokyo 150-0001, Japan Received September 15, 2004; Revised and Accepted October 8, 2004 ABSTRACT INTRODUCTION PANTHER is a large collection of protein families that The philosophy, as well as the basic methodology, behind the have been subdivided into functionally related sub- PANTHER database has been described previously (1,2); families, using human expertise. These subfamilies therefore, we focus here on the recent improvements to the model the divergence of specific functions within database and to the functionality available on the website. In brief, there are two main parts to PANTHER: PANTHER/LIB, protein families, allowing more accurate asso- a library of protein families and subfamilies; and PANTHER/X, ciation with function (ontology terms and a set of ontology terms describing protein function. The data- pathways), as well as inference of amino acids impor- base’s main advantage is in the curator-defined grouping of tant for functional specificity. Hidden Markov models protein sequences into functional subfamilies, allowing more (HMMs) are built for each family and subfamily for detailed and accurate association with the ontology terms, and classifying additional protein sequences.
    [Show full text]
  • Genome-Wide Analysis of Horizontally Acquired Genes in the Genus
    www.nature.com/scientificreports OPEN Genome-wide analysis of horizontally acquired genes in the genus Mycobacterium Received: 1 March 2018 Arup Panda1,2, Michel Drancourt1, Tamir Tuller2 & Pierre Pontarotti1,3 Accepted: 7 September 2018 Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evolution Published: xx xx xxxx of prokaryotic genomes. Previously, multiple research endeavors were undertaken to decipher HGT in diferent bacterial lineages. The genus Mycobacterium houses some of the most deadly human pathogens; however, the impact of HGT in Mycobacterium has never been addressed in a systematic way. Previous initiatives to explore the genomic imprints of HGTs in Mycobacterium were focused on few selected species, specifcally among the members of Mycobacterium tuberculosis complex. Considering the recent availability of a large number of genomes, the current study was initiated to decipher the probable events of HGTs among 109 completely sequenced Mycobacterium species. Our comprehensive phylogenetic analysis with more than 9,000 families of Mycobacterium proteins allowed us to list several instances of gene transfers spread across the Mycobacterium phylogeny. Moreover, by examining the topology of gene phylogenies here, we identifed the species most likely to donate and receive these genes and provided a detailed overview of the putative functions these genes may be involved in. Our study suggested that horizontally acquired foreign genes had played an enduring role in the evolution of Mycobacterium genomes and have contributed to their metabolic versatility and pathogenicity. A signifcant fraction of genes in all living species was considered to be acquired from genealogically distant spe- cies1–7.
    [Show full text]
  • The Primary Antisense Transcriptome of Halobacterium Salinarum NRC-1
    G C A T T A C G G C A T genes Article The Primary Antisense Transcriptome of Halobacterium salinarum NRC-1 1, 2, 1 João Paulo Pereira de Almeida y , Ricardo Z. N. Vêncio y, Alan P. R. Lorenzetti , Felipe ten-Caten 1 , José Vicente Gomes-Filho 1 and Tie Koide 1,* 1 Department of Biochemistry and Immunology, Ribeirão Preto Medical School, University of São Paulo, São Paulo 14049-900, Brazil; [email protected] (J.P.P.d.A.); [email protected] (A.P.R.L.); [email protected] (F.t.-C.); vicente.gomes.fi[email protected] (J.V.G.-F.) 2 Department of Computation and Mathematics, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, University of São Paulo, São Paulo 14049-900, Brazil; [email protected] * Correspondence: [email protected]; Tel.: +55-163-3153-107 Authors contributed equally. y Received: 6 February 2019; Accepted: 1 April 2019; Published: 5 April 2019 Abstract: Antisense RNAs (asRNAs) are present in diverse organisms and play important roles in gene regulation. In this work, we mapped the primary antisense transcriptome in the halophilic archaeon Halobacterium salinarum NRC-1. By reanalyzing publicly available data, we mapped antisense transcription start sites (aTSSs) and inferred the probable 30 ends of these transcripts. We analyzed the resulting asRNAs according to the size, location, function of genes on the opposite strand, expression levels and conservation. We show that at least 21% of the genes contain asRNAs in H. salinarum. Most of these asRNAs are expressed at low levels. They are located antisense to genes related to distinctive characteristics of H.
    [Show full text]
  • Phylogenetic Tree-Based Annotation of Proteins with Gene Ontology Terms
    Genome Analysis TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations Haiming Tang1, Robert D Finn2, Paul D Thomas1, * 1Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA. 2European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Ge- nome Campus, Hinxton, Cambridge CB10 1SD, UK. Associate Editor: XXXXXXX ABSTRACT recognize members of a sub-family within the family tree) and Summary: TreeGrafter is a new software tool for annotating protein similarly annotates the query sequence with the GO annotations of sequences using annotated phylogenetic trees. Currently, the tool the matching HMMs. provides annotations to Gene Ontology terms, and PANTHER pro- Over the past few years, the GO Consortium has made substantial tein class, family and subfamily. The approach is generalizable to progress in annotating gene trees with GO terms using the Phylo- any annotations that have been made to internal nodes of a refer- genetic Annotation and INference Tool (PAINT) (Gaudet, et al., ence phylogenetic tree. TreeGrafter takes each input query protein sequence, finds the best matching homologous family in a library of 2011). This tool helps curators to make precise assertions as to pre-calculated, pre-annotated gene trees, and then grafts it to the when functions were gained and lost during evolution, and record best location in the tree. It then annotates the sequence by propa- the evidence
    [Show full text]