MODOMICS: a Database of RNA Modification Pathways. 2017 Update
Total Page:16
File Type:pdf, Size:1020Kb
Published online 2 November 2017 Nucleic Acids Research, 2018, Vol. 46, Database issue D303–D307 doi: 10.1093/nar/gkx1030 MODOMICS: a database of RNA modification pathways. 2017 update Pietro Boccaletto1,†, Magdalena A. Machnicka1,2,†, Elzbieta Purta1,Pawe-lPiatkowski˛ 1, B-lazej˙ Baginski´ 1, Tomasz K. Wirecki1,Valerie´ de Crecy-Lagard´ 3, Robert Ross4,Patrick A. Limbach4, Annika Kotter5, Mark Helm5 and Janusz M. Bujnicki1,6,* 1Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland, 2Institute of Informatics, University of Warsaw, Banacha 2, PL-02-097 Warsaw, Poland, 3Microbiology and Cell Science Department, University of Florida, Gainesville, FL 32611, USA, 4Department of Chemistry, Rieveschl Laboratories for Mass Spectrometry, University of Cincinnati, Cincinnati, OH 45221, USA, 5Institut fur¨ Pharmazie und Biochemie, Johannes Gutenberg-Universitat,¨ Staudinger Weg 5, D-55128 Mainz, Germany and 6Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, PL-61-614 Poznan, Poland Received September 15, 2017; Revised October 15, 2017; Editorial Decision October 16, 2017; Accepted October 18, 2017 ABSTRACT for basic biological processes [review: (1)]. Now we know that 163 post-transcriptional modifications of RNA intro- MODOMICS is a database of RNA modifications duce a functional diversity that allows the four basic ribonu- that provides comprehensive information concern- cleotide residues to gain diverse functions, akin to those of ing the chemical structures of modified ribonucleo- side chains of amino acid residues, which may be e.g., po- sides, their biosynthetic pathways, the location of lar, charged, aliphatic or aromatic. Modifications can di- modified residues in RNA sequences, and RNA- rectly influence RNA structure, by promoting or disrupt- modifying enzymes. In the current database version, ing certain intramolecular interactions; they can make the we included the following new features and data: RNA molecule more rigid or more flexible. They can also extended mass spectrometry and liquid chromatog- influence RNA interactions with other molecules, in par- raphy data for modified nucleosides; links between ticular proteins. Overall, they contribute strongly to the di- human tRNA sequences and MINTbase - a frame- versity of functions fulfilled by RNA molecules, especially within complex regulatory networks, where small subtle work for the interactive exploration of mitochondrial structural changes can bring about significant changes to and nuclear tRNA fragments; new, machine-friendly cellular metabolism (2). system of unified abbreviations for modified nucleo- In the last years, new types of RNA modifications have side names; sets of modified tRNA sequences for been found, and biochemical and physiological roles have two bacterial species, updated collection of mam- been elucidated for many known modified ribonucleosides malian tRNA modifications, 19 newly identified mod- (3–5). Some of these advances were driven by the use ified ribonucleosides and 66 functionally character- of liquid chromatography/mass spectrometry (LC/MS)- ized proteins involved in RNA modification. Data based methods, which provide highly precise quantifica- from MODOMICS have been linked to the RNAcentral tion of changes in the spectrum of modified ribonucleo- database of RNA sequences. MODOMICS is available sides in RNA from any organism, facilitating the study of at http://modomics.genesilico.pl. translational control of cellular responses and phenotypes (6). Moreover, a number of previously unknown RNA- modifying enzymes have been identified and characterized. INTRODUCTION New important roles for long-known RNA modifica- The presence of modified nucleosides in RNA, beyond the tions were also discovered. Prominent examples include basic A, U, C and G, has been recognized for more than half the involvement of N6-methyladenosine (m6A) in regulating a century. However, their importance to RNA biochem- gene expression by influencing transcript stability, splicing, istry and cell biology has been underappreciated, mainly translation efficiency and cap-independent translation, and because very few modifications (at defined positions inde- in promoting circular RNA translation [review: (7)]. Mu- fined RNA molecules) were found to be truly indispensable tations in many human genes encoding RNA modification *To whom correspondence should be addressed. Tel: +48 22 597 0750; Fax: +48 22 597 0715; Email: [email protected] †These authors contributed equally to this work as first authors. C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] D304 Nucleic Acids Research, 2018, Vol. 46, Database issue enzymes have been linked to diseases, such as cancer, car- in RNA modification, both functional enzymes and protein diovascular diseases, metabolic diseases, neurological disor- co-factors necessary for multi-protein enzymatic activities. ders, and mitochondria-related defects [review: (8)]. For each protein a set of detailed information is provided To adequately represent the recent accumulation of and includes: identifiers and accession numbers from rel- knowledge, we have added both to the variety and volume evant resources and databases such as: NCBI GI, UniProt of data in the MODOMICS database. The most significant ID, COG number, PDB ID of structure (if available); amino additions are: (i) extensively updated datasets: new modifi- acid sequence; corresponding ORF; information about cat- cations, new enzymes, and new RNA sequences with mod- alyzed reaction, the position of modification and modified ifications; (ii) a new category of LC/MS data for modifi- RNA(s) (if available). For proteins that are parts of enzy- cations (Figure 1); (iii) new naming/numbering convention matic complexes, the name of the complex is provided. for modified residues in RNA sequences; (iv) replacement MODOMICS also contains human and yeast snoRNAs, of Jmol by JSmol for 3D structure viewing. involved in RNA-guided RNA modification by the C/D box and H/ACA box ribonucleoproteins, linked to the cor- responding modification sites in human and yeast RNAs DATABASE CONTENT and the catalog of ‘building blocks’ for the chemical syn- MODOMICS has been developed to house and distribute thesis of naturally occurring modified nucleosides. collections of RNA modification pathways, chemical struc- Several options for database searching and querying tures of modified nucleosides, sequences of modified RNAs, are implemented in MODOMICS, including the BLAST enzymes responsible for individual reactions, a catalog of (12) search of protein sequences and the PARALIGN (13) ‘building blocks’ for chemical synthesis of modified RNA, search of nucleic acid sequences collected in MODOMICS, and to be expanded to include new data types. The database as well as a utility that sends a protein sequence from a was created as a single resource to organize and present MODOMICS entry to BLAST on the NCBI web server. all these data in a convenient and straightforward way and is currently the most comprehensive source of informa- Updated modifications section tion among all existing RNA modification databases. In- formation about modified residues is also available in the Since the previous release of MODOMICS (14), 19 RNAMDB database (9), while information about mod- new modifications were added to the database. Among ified nucleosides identified from high-throughput exper- those are four types of geranylated nucleosides dis- iments like Pseudo-seq, CeU-seq, m6A-seq, Aza-IP and coveredinbacterialtRNA(3), 5-cyanomethyluridine RiboMeth-seq is hosted by RMBase (10). Recently MOD- (cnm5U) (15), and 2-O-methyluridine 5-oxyacetic acid OMICS was linked to RNAcentral, a database of non- methyl ester (mcmo5Um) (5). LC/MS analyses of tRNAs coding RNA sequences (11), and serves as a source of mod- from Bacillus subtilis, plants, and Trypanosoma bru- ified tRNA and rRNA sequences. cei revealed the presence of 2-methylthio cyclic N6- At present, MODOMICS contains 163 different modifi- threonylcarbamoyladenosine (ms2ct6A), a derivative of cations that have been identified in RNA molecules. A typi- N6-threonylcarbamoyladenosine (t6A), at position 37 of cal entry for a modified ribonucleoside contains informa- tRNAs responsible for recognition of adenosine-starting tion about its fundamental chemical properties, chemical codons (16). structure, localization in known RNA molecule types, the 3D chemical structures of modified nucleosides are now phylogenetic distribution with respect to Domains of Life, displayed with JSmol (17). It is an open-source JS library and known enzymes responsible for its biosynthesis. Among and HTML5 viewer for 3D chemical structures which, in other available details information related to MS analyses of contrast to the previously used Jmol tool, does not require modified RNAs is provided (see the ‘LC/MS data for mod- the installation of the Java software package. JSmol can also ified nucleosides’ section). Many of the products of modi- be used on systems that no longer support Java applets due fication reactions