Exploring the Development and Maintenance Practices in the Gene Ontology
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The ELIXIR Core Data Resources: Fundamental Infrastructure for The
Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that mention Core Data Resource by name or include specific data record accession numbers. -
Maternally Inherited Pirnas Direct Transient Heterochromatin Formation
RESEARCH ARTICLE Maternally inherited piRNAs direct transient heterochromatin formation at active transposons during early Drosophila embryogenesis Martin H Fabry, Federica A Falconio, Fadwa Joud, Emily K Lythgoe, Benjamin Czech*, Gregory J Hannon* CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, United Kingdom Abstract The PIWI-interacting RNA (piRNA) pathway controls transposon expression in animal germ cells, thereby ensuring genome stability over generations. In Drosophila, piRNAs are intergenerationally inherited through the maternal lineage, and this has demonstrated importance in the specification of piRNA source loci and in silencing of I- and P-elements in the germ cells of daughters. Maternally inherited Piwi protein enters somatic nuclei in early embryos prior to zygotic genome activation and persists therein for roughly half of the time required to complete embryonic development. To investigate the role of the piRNA pathway in the embryonic soma, we created a conditionally unstable Piwi protein. This enabled maternally deposited Piwi to be cleared from newly laid embryos within 30 min and well ahead of the activation of zygotic transcription. Examination of RNA and protein profiles over time, and correlation with patterns of H3K9me3 deposition, suggests a role for maternally deposited Piwi in attenuating zygotic transposon expression in somatic cells of the developing embryo. In particular, robust deposition of piRNAs targeting roo, an element whose expression is mainly restricted to embryonic development, results in the deposition of transient heterochromatic marks at active roo insertions. We hypothesize that *For correspondence: roo, an extremely successful mobile element, may have adopted a lifestyle of expression in the [email protected] embryonic soma to evade silencing in germ cells. -
What Remains to Be Discovered in the Eukaryotic Proteome?
bioRxiv preprint doi: https://doi.org/10.1101/469569; this version posted November 16, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Hidden in plain sight: What remains to be discovered in the eukaryotic proteome? Valerie Wood∗1,2, Antonia Lock3, Midori A. Harris1,2, Kim Rutherford1,2, J¨urgB¨ahler3, and Stephen G. Oliver1,2 1Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK 2Department of Biochemistry, University of Cambridge, Cambridge, UK 3Department of Genetics, Evolution and Environment, University College London, London, UK November 12, 2018 Abstract The first decade of genome sequencing stimulated an explosion in the charac- terization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model or- ganisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology (GO) bio- logical process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. -
NCBI Databases
Jon K. Lærdahl, Structural Bioinforma�cs NCBI databases Read this ar�cle! Nucleic Acids 43 Res. , D6 (2015) Jon K. Lærdahl, EMBL-‐EBI databases Structural Bioinforma�cs European Nucleo�de Archive (ENA) nucleo�de sequence database Ensembl -‐ automa�c and manually curated annota�on on selected eukaryo�c (vertebrate) genomes Ensembl Genomes – Ensembl for “all other organisms” UniProt – protein sequence and func�onal informa�on ChEMBL – database of bioac�ve compounds IntAct -‐ repository of molecular interac�ons, including protein-‐protein, protein-‐small molecule and protein-‐nucleic acid interac�ons CiteXplore – 25 million literature abstracts including PubMed, Agricola & patents Gene Ontology (GO) -‐ controlled vocabulary to describe gene and gene product a�ributes in any organism Gene Ontology Annota�on (GOA) – GO annota�ons for proteins in UniProt All data is publicly available Jon K. Lærdahl, Structural Bioinforma�cs GenBank a comprehensive public database of nucleo�de sequences and suppor�ng bibliographic and biological annota�on all publicly available DNA sequences submissions from authors – web-‐based BankIt – standalone program Sequin submissions from EST and other high-‐throughput sequencing projects daily exchange of data with ENA and DNA Data Bank of Japan (DDBJ) – all sequences submi�ed to DDBJ, ENA, or GenBank will end up in all 3 databases within few days Jon K. Lærdahl, Structural Bioinforma�cs INSDC Jon K. Lærdahl, Structural Bioinforma�cs Sequin – for submi�ng to GenBank BankIt is web-‐ based alterna�ve Link to “Create a submission” Jon K. Lærdahl, Structural Bioinforma�cs Entry in GenBank format Oct 2015: 202,237,081,559 bases in 188,372,017 sequence records in the tradi�onal GenBank divisions 1,222,635,267,498 bases in 309,198,943 sequence records in the WGS Jon K. -
The Biogrid Interaction Database
D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms. -
Flybase: Updates to the Drosophila Melanogaster Knowledge Base Aoife Larkin 1,*, Steven J
Published online 21 November 2020 Nucleic Acids Research, 2021, Vol. 49, Database issue D899–D907 doi: 10.1093/nar/gkaa1026 FlyBase: updates to the Drosophila melanogaster knowledge base Aoife Larkin 1,*, Steven J. Marygold 1, Giulia Antonazzo1, Helen Attrill 1, Gilberto dos Santos2, Phani V. Garapati1, Joshua L. Goodman3,L.SianGramates 2, Gillian Millburn1, Victor B. Strelets3, Christopher J. Tabone2, Jim Thurmond3 and FlyBase Consortium† 1 Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge Downloaded from https://academic.oup.com/nar/article/49/D1/D899/5997437 by guest on 15 January 2021 CB2 3DY, UK, 2The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA and 3Department of Biology, Indiana University, Bloomington, IN 47405, USA Received September 15, 2020; Revised October 13, 2020; Editorial Decision October 14, 2020; Accepted October 22, 2020 ABSTRACT In this paper, we highlight a variety of new features and improvements that have been integrated into FlyBase since FlyBase (flybase.org) is an essential online database our last review (3). We have introduced a number of new fea- for researchers using Drosophila melanogaster as a tures that allow users to more easily find and use the data in model organism, facilitating access to a diverse ar- which they are interested; these include customizable tables, ray of information that includes genetic, molecular, improved searching using expanded Experimental Tool Re- genomic and reagent resources. Here, we describe ports, more links to and from other resources, and bet- the introduction of several new features at FlyBase, ter provision of precomputed bulk files. We have upgraded including Pathway Reports, paralog information, dis- tools such as Fast-Track Your Paper and ID validator. -
Biocuration 2016 - Posters
Biocuration 2016 - Posters Source: http://www.sib.swiss/events/biocuration2016/posters 1 RAM: A standards-based database for extracting and analyzing disease-specified concepts from the multitude of biomedical resources Jinmeng Jia and Tieliu Shi Each year, millions of people around world suffer from the consequence of the misdiagnosis and ineffective treatment of various disease, especially those intractable diseases and rare diseases. Integration of various data related to human diseases help us not only for identifying drug targets, connecting genetic variations of phenotypes and understanding molecular pathways relevant to novel treatment, but also for coupling clinical care and biomedical researches. To this end, we built the Rare disease Annotation & Medicine (RAM) standards-based database which can provide reference to map and extract disease-specified information from multitude of biomedical resources such as free text articles in MEDLINE and Electronic Medical Records (EMRs). RAM integrates disease-specified concepts from ICD-9, ICD-10, SNOMED-CT and MeSH (http://www.nlm.nih.gov/mesh/MBrowser.html) extracted from the Unified Medical Language System (UMLS) based on the UMLS Concept Unique Identifiers for each Disease Term. We also integrated phenotypes from OMIM for each disease term, which link underlying mechanisms and clinical observation. Moreover, we used disease-manifestation (D-M) pairs from existing biomedical ontologies as prior knowledge to automatically recognize D-M-specific syntactic patterns from full text articles in MEDLINE. Considering that most of the record-based disease information in public databases are textual format, we extracted disease terms and their related biomedical descriptive phrases from Online Mendelian Inheritance in Man (OMIM), National Organization for Rare Disorders (NORD) and Orphanet using UMLS Thesaurus. -
Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd
European Bioinformatics Institute EMBL-EBI Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by PickeringHutchins Ltd www.pickeringhutchins.com EMBL member states: Austria, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom. Associate member state: Australia EMBL-EBI is a part of the European Molecular Biology Laboratory (EMBL) EMBL-EBI EMBL-EBI EMBL-EBI EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD United Kingdom Tel. +44 (0)1223 494 444, Fax +44 (0)1223 494 468 www.ebi.ac.uk EMBL Heidelberg Meyerhofstraße 1 69117 Heidelberg Germany Tel. +49 (0)6221 3870, Fax +49 (0)6221 387 8306 www.embl.org [email protected] EMBL Grenoble 6, rue Jules Horowitz, BP181 38042 Grenoble, Cedex 9 France Tel. +33 (0)476 20 7269, Fax +33 (0)476 20 2199 EMBL Hamburg c/o DESY Notkestraße 85 22603 Hamburg Germany Tel. +49 (0)4089 902 110, Fax +49 (0)4089 902 149 EMBL Monterotondo Adriano Buzzati-Traverso Campus Via Ramarini, 32 00015 Monterotondo (Rome) Italy Tel. +39 (0)6900 91402, Fax +39 (0)6900 91406 © 2012 EMBL-European Bioinformatics Institute All texts written by EBI-EMBL Group and Team Leaders. This publication was produced by the EBI’s Outreach and Training Programme. Contents Introduction Foreword 2 Major Achievements 2011 4 Services Rolf Apweiler and Ewan Birney: Protein and nucleotide data 10 Guy Cochrane: The European Nucleotide Archive 14 Paul Flicek: -
Uniprot.Ws: R Interface to Uniprot Web Services
Package ‘UniProt.ws’ September 26, 2021 Type Package Title R Interface to UniProt Web Services Version 2.33.0 Depends methods, utils, RSQLite, RCurl, BiocGenerics (>= 0.13.8) Imports AnnotationDbi, BiocFileCache, rappdirs Suggests RUnit, BiocStyle, knitr Description A collection of functions for retrieving, processing and repackaging the UniProt web services. Collate AllGenerics.R AllClasses.R getFunctions.R methods-select.R utilities.R License Artistic License 2.0 biocViews Annotation, Infrastructure, GO, KEGG, BioCarta VignetteBuilder knitr LazyLoad yes git_url https://git.bioconductor.org/packages/UniProt.ws git_branch master git_last_commit 5062003 git_last_commit_date 2021-05-19 Date/Publication 2021-09-26 Author Marc Carlson [aut], Csaba Ortutay [ctb], Bioconductor Package Maintainer [aut, cre] Maintainer Bioconductor Package Maintainer <[email protected]> R topics documented: UniProt.ws-objects . .2 UNIPROTKB . .4 utilities . .8 Index 11 1 2 UniProt.ws-objects UniProt.ws-objects UniProt.ws objects and their related methods and functions Description UniProt.ws is the base class for interacting with the Uniprot web services from Bioconductor. In much the same way as an AnnotationDb object allows acces to select for many other annotation packages, UniProt.ws is meant to allow usage of select methods and other supporting methods to enable the easy extraction of data from the Uniprot web services. select, columns and keys are used together to extract data via an UniProt.ws object. columns shows which kinds of data can be returned for the UniProt.ws object. keytypes allows the user to discover which keytypes can be passed in to select or keys via the keytype argument. keys returns keys for the database contained in the UniProt.ws object . -
The Drosophila Melanogaster Transcriptome by Paired-End RNA Sequencing
Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press Resource The Drosophila melanogaster transcriptome by paired-end RNA sequencing Bryce Daines,1,2,6 Hui Wang,2,6 Liguo Wang,3,6 Yumei Li,1,2 Yi Han,2 David Emmert,4 William Gelbart,4 Xia Wang,1,2 Wei Li,3,5 Richard Gibbs,1,2,7 and Rui Chen1,2,7 1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA; 2Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 3Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA; 4Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA; 5Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA RNA-seq was used to generate an extensive map of the Drosophila melanogaster transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64–100-bp paired-end reads were generated on the Illumina GA II yielding 3563 sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites. -
Rnacentral: Progress in the Development of the Non-Coding RNA Sequence Database
RNAcentral: Progress in the Development of the Non-coding RNA Sequence Database Other potential titles: ● RNAcentral: New Developments in the Non-coding RNA Sequence Database ● RNAcentral: Three Years of ncRNA Data Integration Anton I. Petrov1, Blake A. Sweeney1, Boris Burkov1*, Simon Kay1, Rob Finn1, Alex Bateman1, and The RNAcentral Consortium European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD *To whom correspondence should be addressed: [email protected] BACKGROUND More than fifty major specialised resources exist in the area of non-coding RNA (ncRNA) research. RNAcentral (http://rnacentral.org) provides a unified interface that aggregates information from over twenty of them, including: - larger databases (e.g. RefSeq, ENA, Ensembl!) - databases with specific content type (e.g. Rfam, PDBe) - RNA type-specific databases (e.g. miRBase, GtRNAdb, snoPY) - organism-specific databases (e.g. HGNC, FlyBase, PomBase). RESULTS Launched in 2014 [2], RNAcentral currently contains over 10 million ncRNA sequences from more than 20 RNA databases and Is updated several times a year. Recent updates include ncRNA data from: - HGNC - Ensembl - FlyBase - Rfam (most of sequences in RNAcentral are annotated with Rfam families, while the rest are candidates for producing new Rfam families) There are three main ways of browsing the data through the RNAcentral website. - The text search makes it easy to explore all ncRNA sequences, compare data across different resources, and discover what is known about each ncRNA. - Using the sequence similarity search one can search data from multiple RNA databases starting from a sequence. - Finally, one can explore ncRNAs in select species by genomic location using an integrated genome browser. -
Unexpected Insertion of Carrier DNA Sequences Into the Fission Yeast Genome During CRISPR–Cas9 Mediated Gene Deletion
Longmuir et al. BMC Res Notes (2019) 12:191 https://doi.org/10.1186/s13104-019-4228-x BMC Research Notes RESEARCH NOTE Open Access Unexpected insertion of carrier DNA sequences into the fssion yeast genome during CRISPR–Cas9 mediated gene deletion Sophie Longmuir, Nabihah Akhtar and Stuart A. MacNeill* Abstract Objectives: The fssion yeast Schizosaccharomyces pombe is predicted to encode ~ 200 proteins of < 100 amino acids, including a number of previously uncharacterised proteins that are found conserved in related Schizosaccharomyces species only. To begin an investigation of the function of four of these so-called microproteins (designated Smp1– Smp4), CRISPR–Cas9 genome editing technology was used to delete the corresponding genes in haploid fssion yeast cells. Results: None of the four microprotein-encoding genes was essential for viability, meiosis or sporulation, and the deletion cells were no more sensitive to a range of cell stressors than wild-type, leaving the function of the proteins unresolved. During CRISPR–Cas9 editing however, a number of strains were isolated in which additional sequences were inserted into the target loci at the Cas9 cut site. Sequencing of the inserts revealed these to be derived from the chum salmon Oncorhynchus keta, the source of the carrier DNA used in the S. pombe transformation. Keywords: Microprotein, Fission yeast, Schizosaccharomyces pombe, Oncorhynchus keta, CRISPR–Cas9 Introduction Te results presented here arose out of a project to Microproteins (also known as SEPs for smORF-encoded investigate the function of four unstudied S. pombe peptides) are small (generally < 100 amino acid) proteins microproteins, designated Smp1–Smp4 (see Table 1 for that are increasingly being implicated in a wide range of systematic IDs).