BIOINFORMATICS ORIGINAL PAPER Doi:10.1093/Bioinformatics/Btt266
Total Page:16
File Type:pdf, Size:1020Kb
Vol. 29 no. 13 2013, pages 1671–1678 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btt266 Databases and ontologies Advance Access publication May 8, 2013 FYPO: the fission yeast phenotype ontology Midori A. Harris1,*, Antonia Lock2,Ju¨ rg Ba¨ hler2, Stephen G. Oliver1 and Valerie Wood1,* 1Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and 2Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK Associate Editor: Janet Kelso ABSTRACT data describing genes and their products. Manually curated data Motivation: To provide consistent computable descriptions of pheno- are supplemented by automatic gene annotation and large-scale type data, PomBase is developing a formal ontology of phenotypes datasets, and information about additional sequence feature observed in fission yeast. types. Results: The fission yeast phenotype ontology (FYPO) is a modular We define a phenotype as an observable characteristic, or set ontology that uses several existing ontologies from the open biological of characteristics, of an organism that results from the inter- and biomedical ontologies (OBO) collection as building blocks, includ- action of its genotype with a given environment. Extensive gen- ing the phenotypic quality ontology PATO, the Gene Ontology and etics research has been carried out using S.pombe over several Chemical Entities of Biological Interest. Modular ontology develop- decades, and a comprehensive set of high-quality curated pheno- ment facilitates partially automated effective organization of detailed type data is in high demand in the S.pombe research community. phenotype descriptions with complex relationships to each other and AsurveyofS.pombe researchers conducted in 2007 identified to underlying biological phenomena. As a result, FYPO supports phenotype annotation as the most requested feature not then sophisticated querying, computational analysis and comparison be- available in a fission yeast database. tween different experiments and even between species. In response to community demand, we have developed the Availability: FYPO releases are available from the Subversion reposi- Fission Yeast Phenotype Ontology (FYPO), a formal ontology tory at the PomBase SourceForge project page (https://sourceforge. of phenotypes observed in fission yeast that will allow PomBase net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current to provide consistent computable descriptions of phenotype version of FYPO is also available on the OBO Foundry Web site data. Using FYPO, we have begun to curate accurate and de- (http://obofoundry.org/). tailed annotations of mutant allele phenotypes, with the aim of Contact: [email protected] or [email protected] providing comprehensive coverage of phenotypes reported in the Received on December 12, 2012; revised on April 26, 2013; accepted fission yeast literature. FYPO annotations are available on on May 3, 2013 PomBase gene pages, and we envisage that the availability of genome-scale phenotype datasets will make new types of data analysis possible. Facilitated by the formal structure of FYPO, 1 INTRODUCTION phenotype annotations can be shared and integrated with add- The fission yeast Schizosaccharomyces pombe is a eukaryotic itional data, including other types of data obtained in fission model organism that has been used since the 1950s to study di- yeast as well as phenotype data from other species. verse biological processes including the cell division cycle, genome organization and maintenance, cell morphology and cytokinesis, signaling and stress responses, chromatin, gene regu- lation and meiotic differentiation (Egel, 2004). A large and active 2APPROACH research community uses a wide variety of molecular genetic, cell The application of ontologies to biological curation has become biological and biochemical techniques to study S.pombe. With widespread, and is best illustrated by the GO project (http:// the completion of its genome sequence in 2002 (Wood et al., www.geneontology.org) (The Gene Ontology Consortium 2000, 2002), fission yeast has also become amenable to genome-scale 2012), a collaborative effort to construct and use controlled voca- experimentation, and has emerged as a reliable model for study- bularies to support functional annotation of genes and their ing processes involved in human disease and cell biology. products in a wide variety of organisms. Ontologies facilitate PomBase (http://www.pombase.org) has recently been estab- consistent unambiguous descriptions of biological concepts, lished as a comprehensive model organism database that pro- and can accommodate content at different levels of taxon speci- vides centralized access to information relevant to S.pombe ficity. Ontologies allow annotations at different levels of granu- (Wood et al., 2012). PomBase encompasses a core of manual larity, depending on what is known or what can be inferred, literature curation that provides detailed accurate curation of and provide mechanisms for quality control, consistency check- phenotypes, Gene Ontology (GO) annotations, genetic and phys- ing and error correction using collected data (both within and ical interactions, protein modifications and many other types of between ontologies). We sought to make these advantages avail- able to curators and database users of PomBase phenotype *To whom correspondence should be addressed. annotations. ß The Author 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. M.A.Harris et al. GeneDB S.pombe (http://old.genedb.org/genedb/pombe/), the Phenotype annotation is one of the key features of PomBase’s predecessor database to PomBase, offered an extensive set of GO newly developed community curation system (Rutherford et al., annotations, but did not use ontologies to capture other data manuscript in preparation), which allows researchers to contrib- types. GeneDB provided minimal phenotype annotation using ute annotations from their publications directly to the database. a small, manually constructed, controlled vocabulary. The For use by bench biologists, the simpler procedure of annotating phenotype vocabulary was a flat list of 200 text descriptions, to a single pre-composed term is more intuitive than the parallel with no connections between the different descriptions. annotation process required with post-composition. We also an- Furthermore, because the GeneDB phenotype vocabulary was ticipate that biologists will wish to annotate to highly specific designed and used exclusively for S.pombe annotations, it did not terms, making the reasoning supported by logical EQ definitions support any data sharing or integration between species or essential for ontology maintenance. databases. Annotations using APO terms, however, fall into the post- The launch of PomBase presented an opportunity to create an composed category: terms representing qualities and ‘observ- improved system for phenotype description, starting with a ables’ are combined by curators as part of the annotation ‘blank slate’ and unconstrained by the limitations of the procedure. Thus, although phenotype descriptions using APO GeneDB vocabulary and annotation system. are conceptually compatible with the EQ model, entity–quality combinations are not incorporated into APO itself, nor do APO 2.1 Ontology design considerations terms include logical definitions. Finally, curators using APO often add details drawn from other sources, including separate 2.1.1 The entity–quality model We have constructed FYPO as a controlled vocabularies, meaning that much of the specific infor- modular ontology that uses several existing ontologies from the mation captured in APO annotations is not incorporated into the open biological and biomedical ontologies (OBO) collection ontology. (Smith et al., 2007) as building blocks to support the creation An additional ontology design consideration reflects distinct- and maintenance of an extensive set of pre-coordinated pheno- ive features of fission yeast biology. S.pombe represents an early- type descriptors. Terms from OBO ontologies, including the diverging lineage within the Ascomycota (Taphrinomycota, for- phenotypic quality ontology PATO (Gkoutos et al.,2009),the merly also known as Archiascomycetes) (James et al., 2006). To GO, the Cell Ontology (CL; Meehan et al., 2011) and Chemical accurately and consistently describe fission yeast phenotypes, we Entities of Biological Interest (ChEBI) (de Matos et al., 2010), could reasonably expect to need specific terms that would not are used to construct logical definitions for FYPO terms. For a apply to the other ascomycete fungi (Saccharomycotina and phenotype, a logical definition follows the entity–quality (EQ) Pezizomycotina) that have been annotated using APO. A new model (Mabee et al., 2007): the entity is what is affected, and can ontology offers maximal freedom to fit fission yeast-specific be the whole cell, a population of cells, a part of a cell (corres- terms into a more general framework, without extensively ponding to a GO cellular component) or an event such as a restructuring an already-deployed vocabulary. molecular