A Resource for RNA Subcellular Localizations
Total Page:16
File Type:pdf, Size:1020Kb
Nucleic Acids Research Advance Access published August 19, 2016 Nucleic Acids Research, 2016 1 doi: 10.1093/nar/gkw728 RNALocate: a resource for RNA subcellular localizations Ting Zhang1,†, Puwen Tan1,†, Liqiang Wang1,†, Nana Jin1,†, Yana Li1, Lin Zhang1, Huan Yang2, Zhenyu Hu1, Lining Zhang1, Chunyu Hu1, Chunhua Li1, Kun Qian1, Changjian Zhang2, Yan Huang1, Kongning Li1,*, Hao Lin2,* and Dong Wang1,3,* 1College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China, 2Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Downloaded from Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and 3Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China Received July 12, 2016; Accepted August 08, 2016 http://nar.oxfordjournals.org/ ABSTRACT gene expression regulation or mitosis etc (4). Thus, the cel- lular role of the RNA after synthesis could be inferred from Increasing evidence has revealed that RNA subcellu- its subcellular localization information. Based on this, sub- lar localization is a very important feature for deeply cellular localization for RNAs plays an important role in understanding RNA’s biological functions after be- studying biological function of RNAs. Therefore, it is ur- ing transported into intra- or extra-cellular regions. gent to construct a database of RNA subcellular localiza- RNALocate is a web-accessible database that aims tion to integrate, analyze and predict RNA subcellular lo- at Bibliotheque Commune De ChimieUNIL - EPFL on November 12, 2016 to provide a high-quality RNA subcellular localiza- calization for speeding up RNA structural and functional tion resource and facilitate future researches on RNA research. function or structure. The current version of RNALo- To complement with related research in RNA subcellu- cate documents more than 37 700 manually curated lar localization, we developed a web-accessible database RNA subcellular localization entries with experimen- (RNALocate, http://www.rna-society.org/rnalocate/), aimed to collect expanding catalog of diverse species’ RNA tal evidence, involving more than 21 800 RNAs with subcellular localization in multiple biological processes 42 subcellular localizations in 65 species, mainly in- by manually curating the literature. The first release of cluding Homo sapiens, Mus musculus and Saccha- RNALocate has contained more than 37 700 manually romyces cerevisiae etc. Besides, RNA homology, se- curated RNA subcellular localization entries with exper- quence and interaction data have also been inte- imental evidence, involving 65 organisms (such as Homo grated into RNALocate. Users can access these data sapiens, Musmusculus and Saccharomyces cerevisiae), 42 through online search, browse, blast and visualiza- subcellular localizations (such as Cytoplasm, Nucleus, tion tools. In conclusion, RNALocate will be of help in Endoplasmic reticulum, Ribosome) and 9 RNA categories elucidating the entirety of RNA subcellular localiza- (such as mRNA, miRNA, lncRNA). Hence, RNALocate tion, and developing new prediction methods. The provides a more specific subcellular localization resource database is available at http://www.rna-society.org/ in which to efficiently investigate, browse and analyze a particular RNA, and even provides insight into the rnalocate/. functions of hypothetical or novel RNAs. The whole data set can be easily queried and downloaded through the INTRODUCTION webpage, and visualization tools for interactively browsing and analyzing the data set are provided. In addition, Biological functions of RNAs, including translation of ge- RNALocate also allows researchers to submit new RNA netic information, cellular signal transduction and tran- subcellular localization. scriptional regulation etc., are determined by their location in cell (1,2). A cell is divided into different compartments that are related to different biological processes (3). For ex- ample, the RNA localized in nuclear usually participates in *To whom correspondence should be addressed. Tel: +86 451 86699584; Fax: +86 451 86699584; Email: [email protected] Correspondence may also be addressed to Hao Lin. Tel: +86 28 83202351; Fax: +86 28 83208238; Email: [email protected] Correspondence may also be addressed to Kongning Li. Tel: +86 451 86615922; Fax: +86 451 86615922; Email: [email protected] †These authors contributed equally to the work as first authors. C The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 2 Nucleic Acids Research, 2016 Downloaded from http://nar.oxfordjournals.org/ at Bibliotheque Commune De ChimieUNIL - EPFL on November 12, 2016 Figure 1. The overview of the RNALocate database. DATA SOURCES AND IMPLEMENTATION before May 2016. In current version, RNALocate docu- ments 37 772 RNA subcellular localization entries with ex- In order to collect all available RNAs, RNALocate inte- perimental evidence from 65 organisms, involving 42 sub- grates all types of RNA symbols, mainly including mi- cellular localizations (Figure 2) and 9 RNA categories croRNA symbols from the miRBase database (5), long non- (including csRNA, lncRNA, mRNA, miRNA, piRNA, coding RNA (lncRNA) and mRNA symbols from NCBI snRNA, rRNA, snoRNA and tRNA) (Figure 3). Among Gene and Ensemble genome database (6,7). Other ncRNA these, more than 1400 entries were collected from lncR- category names are also included, such as transfer RNA NAdb, PomBase, FlyBase, TAIR and DOT databases. Each and small nuclear RNA from NCBI Gene and Ensemble subcellular localization entry contains detailed informa- genome database (6,7). The list of subcellular localization tion on RNA symbol, RNA category, alias, organism, se- names was collected according to the Gene Ontology (GO) quence, homology, subcellular localization, tissue, valida- (8). We have written a simple script to screen all abstracts tion method, PubMed ID, detailed description and net- and articles in the PubMed database using the following work. keyword combinations: (each RNA symbol or RNA cate- In ‘Submit’ page, RNALocate invites users to upload gory name) and/or (each subcellular localization). The rel- novel RNA subcellular localization data, and in ‘Blast’ evant hits were further inspected manually. Besides, we also page, sequence alignment can be done after parameter se- retrieved several thousand subcellular localization entries lection. Except these, the whole data set could be down- from lncRNAdb (9), PomBase (10), FlyBase (11), TAIR loaded through two approaches: ‘Basic Download’ and (12) and DOT (13) databases (Figure 1). ‘API’ (application programming interface). In ‘Basic Down- The RNALocate database is implemented using HTML load’ page, the whole data are saved in Microsoft Excel and and PHP languages with MySQL server. The interface com- TXT formats, users can get them by clicking the download ponent consists of web pages designed and implemented in button. In ‘API’ page, users can access part of RNALo- HTML/CSS. It has been tested in the Google Chrome, Fire- cate data by using script. RNALocate also provides three fox and Internet Explorer web browsers. options in ‘Help’ page to supply instructions for using it, including ‘Statistics’ (detailed statistical tables), ‘Tutorial’ DATABASE CONTENT (procedure and illustrations of RNALocate) and ‘Sister Databases’. RNA subcellular localization information was manually obtained from articles published in the PubMed database Nucleic Acids Research, 2016 3 Downloaded from http://nar.oxfordjournals.org/ Figure 2. The hierarchical organization and statistics of RNA subcellular localization. at Bibliotheque Commune De ChimieUNIL - EPFL on November 12, 2016 Figure 3. The statistics of RNA category and organism. (A) The percentage of 9 RNA categories in RNALocate database (B) The entry number of 65 organism in RNALocate database, only the organisms with ≥100 entries have been listed, respectively. DATA QUERYING, SEARCHING AND BROWSING ciated information of the RNA, such as organism, subcel- lular localization, alias, sequence, homology and validated RNALocate provides an interface for convenient retrieval method. More than 9200 RNAs with orthology/paralogy of all RNA subcellular localizations. Users can query each from miRBase and Homologene database have been pro- entry through ‘Keyword Search’ in ‘Search’ page. In ‘Key- vided in RNALocate for investigation on RNA subcellular word Search’, 5 paths and relevant examples have been pro- localizations conservation. To further understand the in- vided, including ‘RNA Symbol’, ‘RNA Category’, ‘Sub- teraction information between different RNAs in various celllular localization’, ‘Organism’ and ‘Other ID (miRBase types of subcellular localizations online, a ‘Network’ option ID/Entrez ID)’. has also been provided to visualize RNA interaction net- RNALocate provides brief details of search results as work with subcellular localization and organism (14,15). a table in the ‘Search Result’ page, while more detailed In ‘Browse’ page, users can access RNALocate in three descriptions