Databases and Bioinformatics Tools for Rice Research
Total Page:16
File Type:pdf, Size:1020Kb
Current Plant Biology 7–8 (2016) 39–52 Contents lists available at ScienceDirect Current Plant Biology jo urnal homepage: www.elsevier.com/locate/cpb ଝ Databases and bioinformatics tools for rice research ∗ Priyanka Garg, Pankaj Jaiswal Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331, USA a r t i c l e i n f o a b s t r a c t Keywords: Rice is one of the most important agricultural crop in the world and widely studied model plant. The Biological database completion of whole genome sequence of rice (Oryza sativa) and high-throughput experimental plat- Rice forms have led to the generation of the tremendous amount of data, and development of the specialized Gene expression databases and bioinformatics tools for data processing, efficient organization, analysis, and visualiza- Biocuration tion. In this article, we discuss a collection of biological databases that host genomics data on sequence, Ontology Pathways gene expression, genetic variation, gene-interactomes, and pathways, and facilitate data analysis and visualization. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). 1. Introduction organisms including rice. In contrast community-specific databases cater to the need of a specific research community such as plant Over the last decades, an increasing amount of genome-scale databases, e.g. PlantGDB [8], ENSEMBL Plants [9], Gramene [10], experimental data sets became available and several online and PLEXdb [11], Gene Expression Atlas [12], Planteome [13], etc. The open source, biology databases have emerged. For instance, cur- species-specific databases for the model and non-model organisms, rently, ∼1685 publicly available, online databases are listed at NAR for example, RAPdb [14], Beijing Genomics Institute-Rice Informa- online Molecular Biology Database Collection [1]. These databases tion System [15] and Rice SNP-Seek Database [16,17] provide in can be categorized on the basis of data type, data curation methods, depth coverage of the data sets and are more specifically tuned to the scope of data coverage and accessibility of the database. Many the need of a specialized small community of researchers Table 2. such publicly funded resources host data (raw, annotated, ana- Furthermore, each database can be assigned to one or more cate- lyzed) for various species including crops, model and non-model gories on the basis of their content, for example, gene expression plants, whereas, others are dedicated to a group of species from a databases, molecular interaction databases, genome annotation, taxonomic clade and may contain a certain type of data. In addition, nucleotide or protein databases, smallRNA databases, genomic an array of tools and web applications are available that facilitate variation, phenome and pathway databases. formatting, analysis and visualization of various types of genomic Rice is an important crop and serves as a model for monocotyle- data. don family. Recent advances in rice genome biology have generated Data coverage decides the target user community for a database. tremendous amount of data including fully sequenced high qual- These large-scale public repositories or international archives, ity reference genomes [9,10], low coverage sequencing data from usually developed and maintained by national and international 3010 rice accessions of the rice germplasm core collection with projects, provide genomic data from several species. Table 1 lists an average sequencing depth of 14× [16–19], genetic variation, some of the generic large-scale public repositories or archives, and transcriptomes, proteomes, metabolomes, etc. which necessitates databases, for example, GenBank [2], EMBL [3], INSDC [4], and development of bioinformatics resources and databases for storage, DDBJ [5] for sequences and annotation, PDB [6] for protein struc- processing, organization, analysis, and visualization of such data at tures and UniProt [7] for protein information. These are long-term systems level. sustainable repositories for archiving valuable data from several For the benefit of rice researchers, we are providing a compre- hensive list of such generic and specialized genomic databases, resources, web applications and analysis tools (Tables 1 and 2). ଝ Some of the information provided here is also useful to the commu- This article is part of a special issue entitled “Genomic resources and databases”, nity of plant researchers who may not be engaged in rice research. It published in the journal Current Plant Biology 7–8, 2016. ∗ is possible that we may have missed some resources and we expect Corresponding author. E-mail address: [email protected] (P. Jaiswal). this list to grow in future. http://dx.doi.org/10.1016/j.cpb.2016.12.006 2214-6628/© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). 40 P. Garg, P. Jaiswal / Current Plant Biology 7–8 (2016) 39–52 2. Resources for genes, genomes, genetic variations project) and their consequences on the gene function and struc- ture, descriptions of phenotypic traits and plant pathway databases The rice genome assembly, annotation, and associated infor- developed using the BioCyc platform such as RiceCyc [27], and the mation is mainly provided by the MSU Rice Genome Annotation Reactome platform-based Plant Reactome that provides pathways Project [20,21], the International rice genome sequencing project’s from the reference rice and gene homology-based projections to all (IRGSP) rice RAPdb [14,22] and the Oryza Genome Evolution (OGE) Oryza species with sequenced genome or transcriptome [28]. The project (http://oge.gramene.org). Gramene uses annotated rice features and data available at the genome portal of Gramene and genes and genome assemblies of O. sativa ssp. japonica cv Nippon- their collaborator Ensembl Plants [9] are similar. bare from IRGSP [14,22], O. sativa ssp. indica cv 93-11 and several In addition to Gramene, a number of databases provide genetic wild Oryza species sequenced by the OGE and the Internatioanl variation data (SNPs and indels) including PmiRKB [29], Rice Varia- Oryza Map Alignment Project (I-OMAP) (http://oge.gramene.org) tion Database [30], RiceVarMap [31], SNP haplotype database [32], [23–26]. These genomes are presented by building an integrated and Rice SNP-Seek Database [16,17]. The largest data set (∼29 web resource for rice that includes rice species-specific genome million SNPs) for rice genetic variants come from the 3000 rice browser, whole genome alignment, synteny, genetic and physi- genome sequencing project that is now being hosted at IRIC, a Rice cal maps with genes, gene trees, ESTs and QTL locations, genetic SNP-Seek Database together with phenotype and variety informa- diversity data including SNPs (from the 3000 rice genome sequence tion/passport data [16,17]. Table 1 Generic genomic databases and bioinformatics tools. The description may be copied from the original source [2,5–7,9–11,29,30,33,34,38,39,49,50–52,58,63–91]. Data Types te/pr els, els, ng i li nd tation ud Database Name etc.) s no and latest Description Species URL es on release date (incl otein) cti Others on mutants, i Expression Expression , all RNA types RNA all Genetic/genomic Genetic/genomic wild speci wild variati Genome an Genome Gene/gene products products Gene/gene SNP including proteins and and proteins including Pathway/network/intera (transcript/metabo An open source, data resource O. sativa and Gramene for comparative functional other plant h ttp://www.gramene.org Y Y Y Y Y Y Nov 2016 genomics in cereals and other species plant spe cies [10]. It provides, information on metab olic pathways inferred from EXPath O. sativa and 2 microarr ay-based transcriptomic http://expath.itps.ncku.edu.tw Y Y Y Oct 21, 201 2 plant species data, gene annotati on and orthologous genes[38]. PLANEX (PLAnt Contains pub licly available O. sativa and 7 co-Expression) GeneChip data obtained from the http://planex.plantbioinformatics.org Y Y Y plan t species database Gene Express ion Omnibus [63]. Provides informati on on co- expre ssion gene-networks ATT ED-II Oryza and 8 sup ported by microarray and http://atted.jp Y Y Y Sep 1, 2015 plant species RNA seq uencing-based transcriptomic data. [64]. PODC (Plant A repository of annotated gene Omics Data O. sativa and 7 expre ssion data and omics data http://bioinf.mind.meiji.ac.jp/podc Y Y Y Y Center) plant species analysis too ls [65]. Mar 16, 2016 A plan t miRNA data repositories con taining .associated PMRD (Plant information on sequence, O. sativa and MicroRN A secondary structure, target 120 pl ant http://bioinformatics.cau.edu.cn/PMRD Y Y Y Y Datab ase) genes, expression profiles of species Nov 17, 2014 miRNAs and their mappi ng to the spe cies-specific genome browser [66]. NIAS GBdb (National Institute A database containing O. sativa and of Agrobiological information on simple sequence other plant h ttp://www.gene.affrc.go.jp/databases_en.php Y Y Y Sciences repeat (SSR) polymorphisms in species planttfdb plant genomes [67]. database) A databa se for rice reverse O. sativa ssp. geneti cs, build with flanking indica and OryGenesDB seq uence tags of various japonica, and 2 http://orygenesdb.cirad.fr/index.html Y Y Y mutag ens and functional other plant genomics da ta [68] species P. Garg, P. Jaiswal / Current Plant Biology 7–8 (2016) 39–52 41 Table 1 (Continued) All ows the user to retrieve data related conserved structural- O. sativa and 7 FamNe t functional domains within http://www.gene2function.de/famnet.html Y Y Y plan t species proteins from one or more plant spe cies [69] A comparative hub for annotated plant genome and gene family data.