A Web-Based Database for Functional Annotation of Triticum Aestivum
Total Page:16
File Type:pdf, Size:1020Kb
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts Jonathan Vincent, Zhanwu Dai, Catherine Ravel, Frédéric Choulet, Saïd Mouzeyar, Mohamed-Fouad Bouzidi, Marie Agier, Pierre Martre To cite this version: Jonathan Vincent, Zhanwu Dai, Catherine Ravel, Frédéric Choulet, Saïd Mouzeyar, et al.. db- WFA: a web-based database for functional annotation of Triticum aestivum transcripts. Database - The journal of Biological Databases and Curation, Oxford University Press, 2013, 2013, 12 p. 10.1093/database/bat014. hal-00964169 HAL Id: hal-00964169 https://hal.archives-ouvertes.fr/hal-00964169 Submitted on 29 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Database, Vol. 2013, Article ID bat014, doi:10.1093/database/bat014 ............................................................................................................................................................................................................................................................................................. Database tool dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts Jonathan Vincent1,2,3, Zhanwu Dai1,2, Catherine Ravel1,2, Fre´ de´ ric Choulet1,2, Said Mouzeyar1,2, M. Fouad Bouzidi1,2, Marie Agier3 and Pierre Martre1,2,* 1INRA, UMR1095 Genetics, Diversity and Ecophysiology of Cereals, 5 Chemin de Beaulieu, Clermont-Ferrand, F-63 039 Cedex 2, France, 2Blaise Pascal University, UMR1095 Genetics, Diversity and Ecophysiology of Cereals, Aubie` re F-63 177, France and 3Blaise Pascal University, UMR6158 CNRS Downloaded from LIMOS Laboratoire d’Informatique, de Mode´ lisation et d’Optimisation des Syste` mes, Aubie` re F-63 173, France Present address: Zhanwu Dai, INRA, ISVV, UMR1287 E´ cophysiologie et Ge´ nomique Fonctionnelle de la Vigne (EGFV), F-33 882 Villenave d’Ornon, France *Corresponding author: Tel: +33 473 624 351; Fax: +33 473 624 457; Email: [email protected] Submitted 11 July 2012; Revised 19 February 2013; Accepted 20 February 2013 http://database.oxfordjournals.org/ Citation details: Vincent,J., Dai,Z.W., Ravel,C. et al. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts. Database (2013) Vol. 2013: article ID bat014; doi:10.1093/database/bat014 ............................................................................................................................................................................................................................................................................................. The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including at INRA Avignon on May 13, 2013 annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ ............................................................................................................................................................................................................................................................................................. Introduction produced important genomic resources (1–4), the complete sequencing and annotation of the hexaploid (2n = 6Â = 42, Triticum aestivum (L.), common wheat or bread wheat, is AABBDD) T. aestivum genome has yet to be achieved. one of the most important staple crops in the world. It is A first version of the genome of the bread wheat cv. cultivated worldwide and provides >20% of the calories Chinese Spring has recently been published (4), providing and proteins in the human diet (http://faostat.fao.org). the scientific community with highly valuable genomic Although ongoing sequencing efforts have already and evolutionary information, which will facilitate ............................................................................................................................................................................................................................................................................................. ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http:// creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 12 (page number not for citation purposes) Database tool Database, Vol. 2013, Article ID bat014, doi:10.1093/database/bat014 ............................................................................................................................................................................................................................................................................................. genome-wide analysis of bread wheat. However, because make it the most comprehensive coding DNA (cDNA) of the low-coverage (5-fold) shotgun sequencing method assembly available to date, and UniGene assemblies have used in this project, this resource does not represent a high- been used as a reference set of sequences for many species. quality draft of the wheat genome in terms of sequence An additional effort was made to construct full-length completion, quality and annotation. Genome-wide analysis cDNA sequences that were included in the TriFLDB data- of gene expression by means of expression microarrays or base (19). Full-length cDNA sequences are most commonly transcriptome sequencing (RNA-Seq) is now being adopted used in genome annotation as a resource for cross-species in T. aestivum (5, 6), but analysing such large datasets comparative analyses. Currently, TriFLDB is the most reli- requires extensive annotation efforts. Data fragmentation able source of full-length cDNA sequences in T. aestivum. and technical and semantic heterogeneity can severely limit TriFLDB includes annotations based on homologies found the efficient extraction and interpretation of biological by searching protein databases, extensive Gene Ontology data (7, 8). (GO) annotations and InterProScan results. Recently, a new More and more genomic information is becoming avail- collection of nearly 1 million ESTs, assembled into contigs able for T. aestivum research. Various resources and asso- and singlets, was annotated with GO terms (20), but mean- ciated tools grant the user a structural overview of ingful prediction of gene function requires more than one expressed sequence tag (EST) (ITEC, http://avena.pw.usda. system of annotation. Downloaded from gov/genome/) (9, 10) or bacterial artificial chromosome After the sequencing of the first plant genome, clone libraries (11, 12) for instance (13–15). Important ini- Arabidopsis thaliana in 2000 (21), several plant sequencing tiatives are underway to facilitate the breeding of im- projects have been successful. The sequenced genomes proved Triticeae varieties. The TriticeaeGenome project most closely related to the T. aestivum genome are those (www.triticeaegenome.eu) grants access to comprehensive of Oryza sativa ssp. indica (22), Oryza sativa ssp. japonica http://database.oxfordjournals.org/ information extracted from experimental data to provide a (23), Zea mays (24), Glycine max (25), Sorghum bicolor (26), better understanding of