Rfam 14: Expanded Coverage of Metagenomic, Viral and Microrna Families Ioanna Kalvari, Eric P

Total Page:16

File Type:pdf, Size:1020Kb

Rfam 14: Expanded Coverage of Metagenomic, Viral and Microrna Families Ioanna Kalvari, Eric P Rfam 14: expanded coverage of metagenomic, viral and microRNA families Ioanna Kalvari, Eric P. Nawrocki, Nancy Ontiveros-Palacios, Joanna Argasinska, Kevin Lamkiewicz, Manja Marz, Sam Griffiths-Jones, Claire Toffano-Nioche, Daniel Gautheret, Zasha Weinberg, et al. To cite this version: Ioanna Kalvari, Eric P. Nawrocki, Nancy Ontiveros-Palacios, Joanna Argasinska, Kevin Lamkiewicz, et al.. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Research, 2020, 10.1093/nar/gkaa1047. hal-03031715 HAL Id: hal-03031715 https://hal.archives-ouvertes.fr/hal-03031715 Submitted on 14 Dec 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. This article has been accepted for publication in Nucleic Acid Research Published by Oxford University Press : • DOI : 10.1093/nar/gkaa1047 • PUBMED : 33211869 Nucleic Acids Research, 2020 1 doi: 10.1093/nar/gkaa1047 Rfam 14: expanded coverage of metagenomic, viral and microRNA families Ioanna Kalvari 1,EricP.Nawrocki 2, Nancy Ontiveros-Palacios 1, Joanna Argasinska 1, Kevin Lamkiewicz 3,4, Manja Marz 3,4, Sam Griffiths-Jones 5, Claire Toffano-Nioche 6, Daniel Gautheret 6, Zasha Weinberg 7, Elena Rivas 8, Sean R. Eddy 8,9,10, Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaa1047/5992291 by guest on 14 December 2020 Robert D. Finn 1, Alex Bateman 1 and Anton I. Petrov 1,* 1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, 3RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany, 4European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany, 5Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, M13 9PT, UK, 6Universite´ Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France, 7Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Leipzig University, 04107 Leipzig, Germany, 8Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA, 9Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA and 10John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA Received September 15, 2020; Revised October 14, 2020; Editorial Decision October 15, 2020; Accepted October 21, 2020 ABSTRACT sation of families with pseudoknots. Rfam is freely available at https://rfam.org. Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covari- INTRODUCTION ance model that can be used to search for additional Rfam is the database of non-coding RNA (ncRNA) fami- members of the family. Recent developments have lies (1), each represented by a multiple sequence alignment involved expert collaborations to improve the quality (known as the seed), a consensus secondary structure, and and coverage of Rfam data, focusing on microRNAs, a covariance model to annotate non-coding RNAs in nu- viral and bacterial RNAs. We have completed the first cleotide datasets using the Infernal software (2). Rfam and phase of synchronising microRNA families in Rfam Infernal are commonly used to annotate ncRNAs in newly and miRBase, creating 356 new Rfam families and sequenced genomes (3,4), and are the core components of updating 40. We established a procedure for compre- the genome annotation pipelines in Ensembl (5), Ensembl hensive annotation of viral RNA families starting with Genomes (6), NCBI Prokaryotic and Eukaryotic Gene An- notation (7,8) and other resources. For example, PDBe uses Flavivirus and Coronaviridae RNAs. We have also in- Rfam to enable searching for RNA chains, such as tRNA or creased the coverage of bacterial and metagenome- rRNA,in3Dstructures(9), while RNAcentral uses Rfam based RNA families from the ZWD database. These to detect incomplete sequences, RNA type annotations er- developments have enabled a significant growth of rors, and provide other quality controls (10). The manually the database, with the addition of 759 new families curated multiple sequence alignments from Rfam are also in Rfam 14. To facilitate further community contribu- used for training and benchmarking new software, such as tion to Rfam, expert users are now able to build and secondary structure prediction algorithms (11,12). submit new families using the newly developed Rfam Here we describe community-driven improvements and Cloud family curation system. New Rfam website fea- updates that are available in Rfam 14 (releases 14.0–14.3), tures include a new sequence similarity search pow- including new RNA families, new features on the Rfam ered by RNAcentral, as well as search and visuali- website, and new tools for expert users to contribute fam- ilies to the database. Rfam 14.3 contains 3444 families, rep- *To whom correspondence should be addressed. Tel: +44 1223 492550; Fax: +44 1223 494468; Email: [email protected] Present address: Joanna Argasinska, Stanford University; Department of Genetics; 3165 Porter Drive, Palo Alto, CA 94304, USA. C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Nucleic Acids Research, 2020 resenting a 28% increase since version 13.0. The major- with Rfam covariance models. Ninety eight percent of ity of the new families have been generated in collabora- ZWD alignments have now been imported into Rfam ex- tion with other RNA resources and experts, in particular cept for seven families lacking covariation support as de- ZWD, miRBase and European Virus Bioinformatics Center termined by R-scape (16) and 43 families marked ‘Not (EVBC), supplemented by in-house literature curation. Fol- for Rfam’ in ZWD. The mapping between ZWD align- lowing the transition from annotating a subset of ENA (13) ments and Rfam accessions is provided in Supplementary to annotation of a collection of non-redundant and com- Information. plete genomes in Rfam 13.0 (1), we have expanded the set of We also used CaCoFold (17) to identify the pre-existing genomes in the Rfam sequence database (Rfamseq) by 76% ZWD-based Rfam families requiring an update using the Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaa1047/5992291 by guest on 14 December 2020 in Rfam 14.0 to reflect the increase in number of genomes. new RNAcentral import mechanism. For example, the Rfamseq now includes 14 772 genomes from all domains manA alignment from Rfam (RF01745)hadonly29sta- of life. Here, we also describe the newly-developed Rfam tistically significant basepairs while a CaCoFold structure Cloud curation pipeline, which marks a further major step based on the corresponding alignment in ZWD contained towards ensuring that Rfam remains a core open and sus- 48 significant basepairs. A comparison between the Rfam tainable resource for the whole RNA community. and ZWD versions of the alignment revealed a mistake in the Rfam secondary structure introduced during manual import. As a result, the manA Rfam alignment was updated NEW FAMILIES FROM ZWD to its original version from ZWD (Figure 1A, right). We Rfam releases 14.1 and 14.3 included 253 new families from are systematically reviewing other ZWD-based families that the Zasha Weinberg Database (ZWD) that were discovered may require an update and will release them in future Rfam by a systematic computational analysis of intergenic regions versions (17). in Bacteria and metagenomic samples (14), as well as newly identified ribosomal leaders (r-leaders) (15). ZWD stores NEW WORKFLOW FOR VIRAL RNA FAMILIES the original manually curated multiple sequence alignments Despite the small size (3400–41 000 nucleotides) of RNA produced in the Breaker and Weinberg groups over the virus genomes, they contain several conserved RNA struc- last decade. ZWD is a git-based resource that currently in- tures vital for protection against exonucleases, genome di- cludes 417 alignments and is available at https://bitbucket. versification, and play a crucial role in various stages of org/zashaw/zashaweinbergdata. Examples of the new and their viral life cycle (18). Many viruses rely on RNAs to updated ZWD-based families are shown in Figure 1. infect and replicate inside a host: for example, the cis- Prior to version 14, Rfam had incorporated 108 fami- acting element of coronaviruses is essential for replication lies from ZWD, but in Rfam 14 a new process was devel- (18), while in Dengue viruses replication depends on RNA oped to automate the
Recommended publications
  • Uniprot at EMBL-EBI's Role in CTTV
    Barbara P. Palka, Daniel Gonzalez, Edd Turner, Xavier Watkins, Maria J. Martin, Claire O’Donovan European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK UniProt at EMBL-EBI’s role in CTTV: contributing to improved disease knowledge Introduction The mission of UniProt is to provide the scientific community with a The Centre for Therapeutic Target Validation (CTTV) comprehensive, high quality and freely accessible resource of launched in Dec 2015 a new web platform for life- protein sequence and functional information. science researchers that helps them identify The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of therapeutic targets for new and repurposed medicines. functional information on proteins, with accurate, consistent and rich CTTV is a public-private initiative to generate evidence on the annotation. As much annotation information as possible is added to each validity of therapeutic targets based on genome-scale experiments UniProtKB record and this includes widely accepted biological ontologies, and analysis. CTTV is working to create an R&D framework that classifications and cross-references, and clear indications of the quality of applies to a wide range of human diseases, and is committed to annotation in the form of evidence attribution of experimental and sharing its data openly with the scientific community. CTTV brings computational data. together expertise from four complementary institutions: GSK, Biogen, EMBL-EBI and Wellcome Trust Sanger Institute. UniProt’s disease expert curation Q5VWK5 (IL23R_HUMAN) This section provides information on the disease(s) associated with genetic variations in a given protein. The information is extracted from the scientific literature and diseases that are also described in the OMIM database are represented with a controlled vocabulary.
    [Show full text]
  • Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
    Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays.
    [Show full text]
  • Abstracts Genome 10K & Genome Science 29 Aug - 1 Sept 2017 Norwich Research Park, Norwich, Uk
    Genome 10K c ABSTRACTS GENOME 10K & GENOME SCIENCE 29 AUG - 1 SEPT 2017 NORWICH RESEARCH PARK, NORWICH, UK Genome 10K c 48 KEYNOTE SPEAKERS ............................................................................................................................... 1 Dr Adam Phillippy: Towards the gapless assembly of complete vertebrate genomes .................... 1 Prof Kathy Belov: Saving the Tasmanian devil from extinction ......................................................... 1 Prof Peter Holland: Homeobox genes and animal evolution: from duplication to divergence ........ 2 Dr Hilary Burton: Genomics in healthcare: the challenges of complexity .......................................... 2 INVITED SPEAKERS ................................................................................................................................. 3 Vertebrate Genomics ........................................................................................................................... 3 Alex Cagan: Comparative genomics of animal domestication .......................................................... 3 Plant Genomics .................................................................................................................................... 4 Ksenia Krasileva: Evolution of plant Immune receptors ..................................................................... 4 Andrea Harper: Using Associative Transcriptomics to predict tolerance to ash dieback disease in European ash trees ............................................................................................................
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Methods in and Applications of the Sequencing of Short Non-Coding Rnas" (2013)
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2013 Methods in and Applications of the Sequencing of Short Non- Coding RNAs Paul Ryvkin University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Genetics Commons, and the Molecular Biology Commons Recommended Citation Ryvkin, Paul, "Methods in and Applications of the Sequencing of Short Non-Coding RNAs" (2013). Publicly Accessible Penn Dissertations. 922. https://repository.upenn.edu/edissertations/922 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/922 For more information, please contact [email protected]. Methods in and Applications of the Sequencing of Short Non-Coding RNAs Abstract Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields angingr from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non- coding RNAs, as well as a study in which they are applied to Alzheimer's Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications ot RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis.
    [Show full text]
  • PDF of Connecting Science's 2017
    Connecting Science Wellcome Genome Campus Hinxton, Cambridgeshire CB10 1RQ wellcomegenomecampus.org/connectingscience Annual Review 2017/18 2 02 Contents DIRECTO R’S INTRODUCTION Connecting Science in twelve months 04 – 05 Global ambitions Have you had your say on your DNA? 10 – 1 1 Increasing global impact by tailoring training needs 12 – 13 Worm hunting in Colombia 14 – 16 Catalysing collaboration Bringing together scientific partners 20 – 21 First global event for genetic counsellors 22 – 23 Decoding genomes together 24 – 26 Snapshot: May 2017 to April 2018 27 – 32 Democratising genomics CONNECTING SCIENCE Blood, sweat and success – Implementing a gender balance policy 36 – 37 2017/18 ANNUAL REVIEW Science communication: remembering to listen 38 – 39 Exploring our world in the Genome Gallery 40 – 41 The mission of Connecting Science is simple: to enable team, the commitment and support of our many partners and everyone to explore genomic science and its impact on collaborators, and the financial support and encouragement research, health and society. Behind those simple words is of Wellcome. I am personally very thankful for all of those Wellcome Genome Campus: considerable complexity. The science itself is complex and things, and know that this support and energy often translates ever-changing, with new technologies being continually into life- or career-changing experiences for the tens of A hub of knowledge, learning, developed and put into practice in both research and thousands of people we engage with directly every year. healthcare. The work that we do is also deceptively and engagement complex, reaching audiences from primary schools to I hope that some of the stories highlighted in this Annual research scientists, NHS staff to patients and their families; Review give you a sense of the excitement we have in all Creating spaces for thinking 46 – 47 it spans the globe, and everything we do involves constant that we do.
    [Show full text]
  • The Metagenomic Data Life-Cycle: Standards and Best Practices Petra Ten Hoopen1, Robert D
    GigaScience, 6, 2017, 1–11 doi: 10.1093/gigascience/gix047 Advance Access Publication Date: 16 June 2017 Review REVIEW Downloaded from https://academic.oup.com/gigascience/article-abstract/6/8/gix047/3869082 by guest on 01 October 2019 The metagenomic data life-cycle: standards and best practices Petra ten Hoopen1, Robert D. Finn1, Lars Ailo Bongo2, Erwan Corre3, Bruno Fosso4, Folker Meyer5, Alex Mitchell1, Eric Pelletier6,7,8, Graziano Pesole4,9, Monica Santamaria4, Nils Peder Willassen2 and Guy Cochrane1,∗ 1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom, 2UiT The Arctic University of Norway, Tromsø N-9037, Norway, 3CNRS-UPMC, FR 2424, Station Biologique, Roscoff 29680, France, 4Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari 70126, Italy, 5Argonne National Laboratory, Argonne IL 60439, USA, 6Genoscope, CEA, Evry´ 91000, France, 7CNRS/UMR-8030, Evry´ 91000, France, 8Universite´ Evry´ val d’Essonne, Evry´ 91000, France and 9Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari “A. Moro,” Bari 70126, Italy ∗Correspondence address. Guy Cochrane, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Tel: +44(0)1223-494444; E-mail: [email protected] Abstract Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described ina harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment.
    [Show full text]
  • Comparing Tools for Non-Coding RNA Multiple Sequence Alignment Based On
    Downloaded from rnajournal.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press ES Wright 1 1 TITLE 2 RNAconTest: Comparing tools for non-coding RNA multiple sequence alignment based on 3 structural consistency 4 Running title: RNAconTest: benchmarking comparative RNA programs 5 Author: Erik S. Wright1,* 6 1 Department of Biomedical Informatics, University of Pittsburgh (Pittsburgh, PA) 7 * Corresponding author: Erik S. Wright ([email protected]) 8 Keywords: Multiple sequence alignment, Secondary structure prediction, Benchmark, non- 9 coding RNA, Consensus secondary structure 10 Downloaded from rnajournal.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press ES Wright 2 11 ABSTRACT 12 The importance of non-coding RNA sequences has become increasingly clear over the past 13 decade. New RNA families are often detected and analyzed using comparative methods based on 14 multiple sequence alignments. Accordingly, a number of programs have been developed for 15 aligning and deriving secondary structures from sets of RNA sequences. Yet, the best tools for 16 these tasks remain unclear because existing benchmarks contain too few sequences belonging to 17 only a small number of RNA families. RNAconTest (RNA consistency test) is a new 18 benchmarking approach relying on the observation that secondary structure is often conserved 19 across highly divergent RNA sequences from the same family. RNAconTest scores multiple 20 sequence alignments based on the level of consistency among known secondary structures 21 belonging to reference sequences in their output alignment. Similarly, consensus secondary 22 structure predictions are scored according to their agreement with one or more known structures 23 in a family.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • A Unicellular Relative of Animals Generates a Layer of Polarized Cells
    RESEARCH ARTICLE A unicellular relative of animals generates a layer of polarized cells by actomyosin- dependent cellularization Omaya Dudin1†*, Andrej Ondracka1†, Xavier Grau-Bove´ 1,2, Arthur AB Haraldsen3, Atsushi Toyoda4, Hiroshi Suga5, Jon Bra˚ te3, In˜ aki Ruiz-Trillo1,6,7* 1Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain; 2Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom; 3Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Oslo, Norway; 4Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan; 5Faculty of Life and Environmental Sciences, Prefectural University of Hiroshima, Hiroshima, Japan; 6Departament de Gene`tica, Microbiologia i Estadı´stica, Universitat de Barcelona, Barcelona, Spain; 7ICREA, Barcelona, Spain Abstract In animals, cellularization of a coenocyte is a specialized form of cytokinesis that results in the formation of a polarized epithelium during early embryonic development. It is characterized by coordinated assembly of an actomyosin network, which drives inward membrane invaginations. However, whether coordinated cellularization driven by membrane invagination exists outside animals is not known. To that end, we investigate cellularization in the ichthyosporean Sphaeroforma arctica, a close unicellular relative of animals. We show that the process of cellularization involves coordinated inward plasma membrane invaginations dependent on an *For correspondence: actomyosin network and reveal the temporal order of its assembly. This leads to the formation of a [email protected] (OD); polarized layer of cells resembling an epithelium. We show that this stage is associated with tightly [email protected] (IR-T) regulated transcriptional activation of genes involved in cell adhesion.
    [Show full text]
  • Strategic Plan 2011-2016
    Strategic Plan 2011-2016 Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Mission The Wellcome Trust Sanger Institute uses genome sequences to advance understanding of the biology of humans and pathogens in order to improve human health. -i- Wellcome Trust Sanger Institute Strategic Plan 2011-2016 - ii - Wellcome Trust Sanger Institute Strategic Plan 2011-2016 CONTENTS Foreword ....................................................................................................................................1 Overview .....................................................................................................................................2 1. History and philosophy ............................................................................................................ 5 2. Organisation of the science ..................................................................................................... 5 3. Developments in the scientific portfolio ................................................................................... 7 4. Summary of the Scientific Programmes 2011 – 2016 .............................................................. 8 4.1 Cancer Genetics and Genomics ................................................................................ 8 4.2 Human Genetics ...................................................................................................... 10 4.3 Pathogen Variation .................................................................................................. 13 4.4 Malaria
    [Show full text]
  • The Pfam Protein Families Database Marco Punta1,*, Penny C
    D290–D301 Nucleic Acids Research, 2012, Vol. 40, Database issue Published online 29 November 2011 doi:10.1093/nar/gkr1065 The Pfam protein families database Marco Punta1,*, Penny C. Coggill1, Ruth Y. Eberhardt1, Jaina Mistry1, John Tate1, Chris Boursnell1, Ningze Pang1, Kristoffer Forslund2, Goran Ceric3, Jody Clements3, Andreas Heger4, Liisa Holm5, Erik L. L. Sonnhammer2, Sean R. Eddy3, Alex Bateman1 and Robert D. Finn3 1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, 2Stockholm Bioinformatics Center, Swedish eScience Research Center, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, SE-17121 Solna, Sweden, 3HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA, 4Department of Physiology, Anatomy and Genetics, MRC Functional Genomics Unit, University of Oxford, Oxford, OX1 3QX, UK and 5Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 Downloaded from (Viikinkaari 5), 00014 Helsinki, Finland Received October 7, 2011; Revised October 26, 2011; Accepted October 27, 2011 http://nar.oxfordjournals.org/ ABSTRACT INTRODUCTION Pfam is a widely used database of protein families, Pfam is a database of protein families, where families are currently containing more than 13 000 manually sets of protein regions that share a significant degree of curated protein families as of release 26.0. Pfam is sequence similarity, thereby suggesting homology. available via servers in the UK (http://pfam.sanger Similarity is detected using the HMMER3 (http:// hmmer.janelia.org/) suite of programs. .ac.uk/), the USA (http://pfam.janelia.org/) and Pfam contains two types of families: high quality, Sweden (http://pfam.sbc.su.se/).
    [Show full text]