Survey-Based Naming Conventions for Use in OBO Foundry Ontology
Total Page:16
File Type:pdf, Size:1020Kb
BMC Bioinformatics BioMed Central Correspondence Open Access Survey-based naming conventions for use in OBO Foundry ontology development Daniel Schober1,2, Barry Smith3, Suzanna E Lewis4, Waclaw Kusnierczyk5, Jane Lomax1, Chris Mungall4, Chris F Taylor1,6, Philippe Rocca-Serra1 and Susanna-Assunta Sansone*1 Address: 1EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, 2Institute of Medical Biometry and Medical Informatics (IMBI), University Medical Center, 79104 Freiburg, Germany, 3Center of Excellence in Bioinformatics and Life Sciences, and Department of Philosophy, University at Buffalo, NY, USA, 4Berkeley Bioinformatics and Ontologies Project, Lawrence Berkeley National Labs, Berkeley, CA 94720 USA, 5Department of Information and Computer Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway and 6NERC Environmental Bioinformatics Centre (NEBC), Mansfield Road, Oxford, OX1 3SR, UK Email: Daniel Schober - [email protected]; Barry Smith - [email protected]; Suzanna E Lewis - [email protected]; Waclaw Kusnierczyk - [email protected]; Jane Lomax - [email protected]; Chris Mungall - [email protected]; Chris F Taylor - [email protected]; Philippe Rocca-Serra - [email protected]; Susanna-Assunta Sansone* - [email protected] * Corresponding author Published: 27 April 2009 Received: 30 April 2008 Accepted: 27 April 2009 BMC Bioinformatics 2009, 10:125 doi:10.1186/1471-2105-10-125 This article is available from: http://www.biomedcentral.com/1471-2105/10/125 © 2009 Schober et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: A wide variety of ontologies relevant to the biological and medical domains are available through the OBO Foundry portal, and their number is growing rapidly. Integration of these ontologies, while requiring considerable effort, is extremely desirable. However, heterogeneities in format and style pose serious obstacles to such integration. In particular, inconsistencies in naming conventions can impair the readability and navigability of ontology class hierarchies, and hinder their alignment and integration. While other sources of diversity are tremendously complex and challenging, agreeing a set of common naming conventions is an achievable goal, particularly if those conventions are based on lessons drawn from pooled practical experience and surveys of community opinion. Results: We summarize a review of existing naming conventions and highlight certain disadvantages with respect to general applicability in the biological domain. We also present the results of a survey carried out to establish which naming conventions are currently employed by OBO Foundry ontologies and to determine what their special requirements regarding the naming of entities might be. Lastly, we propose an initial set of typographic, syntactic and semantic conventions for labelling classes in OBO Foundry ontologies. Conclusion: Adherence to common naming conventions is more than just a matter of aesthetics. Such conventions provide guidance to ontology creators, help developers avoid flaws and inaccuracies when editing, and especially when interlinking, ontologies. Common naming conventions will also assist consumers of ontologies to more readily understand what meanings were intended by the authors of ontologies used in annotating bodies of data. Page 1 of 9 (page number not for citation purposes) BMC Bioinformatics 2009, 10:125 http://www.biomedcentral.com/1471-2105/10/125 Background Domain specificity A wide variety of ontologies, controlled vocabularies, and One significant obstacle to common adoption is that other terminological artifacts relevant to the biological or many of the proposed conventions are domain-specific medical domains are available through open access por- and not generally extendible to other fields; for example, tals such as the Ontology Lookup Service (OLS) [1], and the Human Genome Organization (HUGO) nomencla- the number of such artifacts is growing rapidly. One of the ture [11] is restricted to gene names. Other conventions goals of the Open Biomedical Ontologies (OBO) Foundry refer only to entities occurring within programming lan- initiative [2] is to facilitate integration among these guages [12] or to the naming of natural language docu- diverse ontologies. However, such integration demands ments [13]. considerable effort and differences in format and style can only add obstacles to the execution of this task [3]. The Document inaccessibility heterogeneity within the set of existing ontologies derives A second obstacle relates to poor documentation. A nam- from the use of diverse ontology engineering methodolo- ing convention whose documentation is unclear, or is dis- gies and is manifest in the adoption by different commu- persed in multiple documents or document sections, nities of Description Logic, Common Logic, or other artificially constrains its own chances of acceptance. This formalisms. The spectrum of syntaxes used to express is the case with the BioPAX manual [14], which is in addi- these formalisms, such as the Web Ontology Language tion overly tool-centric in that it addresses only Protégé- (OWL) or the OBO format, and the commitment of indi- OWL issues. Another deficiency is the commercial or vidual communities to conceptualist or realism-based semi-proprietary nature of conventions such as the Inter- philosophical approaches are also contributing factors. national Organization for Standardization (ISO) stand- ards [15]. Many of these proposed conventions also Here we focus on issues of nomenclature [4], and specifi- impair access through information overload, there being cally on the naming conventions used for labeling classes around forty ISO documents addressing naming issues in ontologies, which are an additional contributing factor alone. Other naming conventions are described only to the problem of heterogeneity. Even in this relatively implicitly and via unintuitive search attributes, or are not straightforward area, no conventions have achieved broad available on-line, making access difficult. acceptance (see survey section below). Format and implementation dependency The lack of naming conventions or their inconsistent Sometimes only certain naming issues are tackled by a usage can impair readability and navigation when viewing naming convention – usually those most germane to a ontology class hierarchies. We believe that clear and particular format. The Gene Ontology (GO) Editorial explicit naming becomes of even greater importance when Style Guide [16] for example, is of limited coverage and interlinking ontologies (for example via owl:import, applicability, as it is embedded in an OBO-format specific obo dbxref and other referencing and mapping state- document. The ANSI/ISO Z39.19-2005 Standard [8] is ments [5], or when ontology engineers need to collabo- applicable only to terms organized in an is-a hierarchy rate with external groups to align their ontologies and to without relations and therefore lacks proper conventions ensure effective maintenance of modularity). for representing ontological classes and properties in semantically complex ontologies. While other sources of diversity are tremendously com- plex and challenging, it is our belief that establishing a set In the case of the Ontology Engineering and Patterns Task of naming conventions for the OBO Foundry is a tractable Force of the Semantic Web Best Practices and Deployment goal, particularly if those conventions are based on les- working group [17], the guidelines are restricted to the sons drawn from pooled practical experience and targeted OWL format and are dispersed throughout many docu- surveying. ments and document sections. There is of course no shortage of initiatives for the devel- To overcome this diversity and fragmentation members of opment of specifications and standards tackling naming the OBO Foundry and of the Metabolomics Standards Ini- [6-9]. However, where naming conventions have been tiative (MSI) ontology working group [18] have set up an developed, widespread application has been hampered by infrastructure group that is attempting to: several factors, most notably domain specificity, docu- ment inaccessibility and format dependency. A compre- • collect, review and compare existing naming conven- hensive survey of existing naming convention documents tions can be found at the dedicated OBO Foundry naming con- ventions website [10]. Page 2 of 9 (page number not for citation purposes) BMC Bioinformatics 2009, 10:125 http://www.biomedcentral.com/1471-2105/10/125 • distill universally valid conventions that can be The full questionnaire, the complete set of answers and implemented in both the OWL and OBO formats, and the consolidated results are available from the OBO Foun- conceivably also in other formats dry wiki [10]. For more information on the survey results and list of participants see the Additional file 1: SurveyRe- • engage in discussion with other groups concerned sults.zip. with nomenclature standardization in order to estab- lish a forum for coordinated advance Naming Conventions Our proposed set of naming conventions, founded on the • create a single common guideline document to serve survey results, is summarized in Table 1.