Ontology (Science)

Ontology (Science)

Draft version to be presented at FOIS 2008. Comments welcome. Ontology (Science) Barry SMITH Department of Philosophy, Center of Excellence in Bioinformatics and Life Sciences, and National Center for Biomedical Ontology, University at Buffalo, USA Abstract. Increasingly, in data-intensive areas of the life sciences, experimental results are being described in algorithmically useful ways with the help of ontologies. Such ontologies are authored and maintained by scientists to support the retrieval, integration and analysis of their data. The proposition to be defended here is that ontologies of this type – the Gene Ontology (GO) being the most conspicuous example – are a part of science. Initial evidence for the truth of this proposition (which some will find self-evident) is the increasing recognition of the importance of empirically-based methods of evaluation to the ontology development work being undertaken in support of scientific research. The ontologies created by scientists must, of course, be associated with implementations satisfying the requirements of software engineering. But these ontologies are not themselves engineering artifacts, and to conceive them as such brings grievous consequences. Rather, we shall argue, ontologies such as the GO are comparable to scientific theories, to scientific databases, or to scientific journal publications. Such a view implies a radically new conception of what is involved in the authoring, maintenance and application of ontologies in scientific contexts, and therewith also a radically new approach to the evaluation of ontologies and to the training of ontologists. Keywords: scientific method, expert peer review, ontology engineering, biomedical informatics, Gene Ontology, OBO Foundry Nature Precedings : hdl:10101/npre.2008.2027.2 Posted 15 Jul 2008 1 Introduction For some time now the Gene Ontology (GO) [1] has enjoyed the status of a de facto standard vocabulary for the annotation of experimental data pertaining to the attributes of gene products. The GO has been widely applied to data drawn from experiments involving organisms and biological processes of many different types. It has also been subject to a series of logical reforms, which have enhanced the degree to which it can it be exploited for algorithmic purposes. The GO is now routinely used in gene expression analyses of a wide range of biological phenomena, including phenomena relevant to our understanding of human health and disease. The thesis to be defended here is that the GO and its sister ontologies are a part of science. This means (i) that these ontologies themselves are properly to be understood as results of scientific activity, analogous to textbooks, databases, or journal publications, and (ii) that the processes involved in authoring, maintaining and evaluating them are a part and parcel of the activity of science. In what follows I shall draw out some implications of this thesis, focusing my attentions on the GO and on the other biomedical ontologies participating in the OBO Foundry initiative [2,3]. These provide the most conspicuous examples of ontology (science) in the sense here intended. The views expressed will appear to many to be self-evident; in their detail, however, they are still exploratory in nature (and thus they do not represent any settled policy of the Foundry initiative). 2 The OBO Foundry The Open Biomedical Ontologies (OBO) repository was created in 2001 by Michael Ashburner and Suzanna Lewis as a means of providing convenient access to the GO and its sister ontologies at a time when resources such as the NCBO BioPortal [4,5] did not yet exist. The OBO Foundry was initiated by Ashburner, Lewis and Smith in 2005 as a collaborative experiment designed to enhance the quality and interoperability of life science ontologies from the point of view of both biological content and logical structure [3]. The Foundry initiative is based on the voluntary acceptance by its participants of an evolving set of principles designed to maximize the degree to which ontologies can support the needs of working scientists. The developers of nearly all of the ontologies within the OBO repository have committed themselves to participate in this initiative, which has spawned also the establishment of a number of new ontology projects within the Foundry framework, including most notably the Ontology for Biomedical Investigations, which brings together some two dozen disciplinary communities from different domains of high-throughput biological and biomedical Nature Precedings : hdl:10101/npre.2008.2027.2 Posted 15 Jul 2008 experimentation [6]. 2.1.OBO Foundry Principles The principles of the OBO Foundry can be summarized in their current version as follows. First, are syntactic principles to the effect that an ontology submitted to the Foundry must employ one or another common shared syntax, possess a unique identifier space, and have procedures for identifying distinct successive versions. Second, are principles involving definitions: the Foundry requires that textual definitions (and, by degrees, equivalent formal definitions) be provided for all terms; that terms and definitions be composed using the methodology of cross-products (see below); and that ontologies use relations that are unambiguously defined according to the pattern set forth in the OBO Relation Ontology (RO) [7]. Third, ontologies are required to be open (available to be used by all without any constraint), to have a clearly specified and clearly delineated content, to have a plurality of independent users, and to be subject to a collaborative development process involving the developers of other Foundry ontologies covering neighboring domains. Finally, the Foundry embraces a principle of orthogonality. This asserts that for each domain there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiative. If an ontology is submitted which overlaps substantially with an existing Foundry ontology, then the two sets of developers are invited to collaborate in the creation of a common, improved resource, through application of the sorts of strategies applied as a matter of course in other parts of science wherever alternative theories of a single phenomenon are advanced by competing groups. 2.2.The Problem of Data Silos The primary rationale for our insistence upon the principle of orthogonality is that we believe that it offers a potential solution to a pressing problem facing researchers in information-driven areas of biology and biomedicine, namely the problem of data silos. Currently, the many groups involved in biomedical research find that they have little choice but to create their own local schemes for description of their data. Ideally, their efforts would be invested in the shared development of common schemes that would make their data interoperable with those of their colleagues, and there is now widespread recognition that it would be advantageous to constrain terminologies and data schemes so that they converge on commonly accepted standards [8]. Unfortunately, however, there is normally still no clear answer to the question as what, in any given case, should serve as basis for such constraint. Nature Precedings : hdl:10101/npre.2008.2027.2 Posted 15 Jul 2008 The OBO Foundry proposes a solution to this problem that is incremental, modular, empirically based, and such as to embody a coherent strategy for resolving the problem of motivating potential developers and users. Briefly, it is a solution which requires that ontologies be built as orthogonal, interoperable modules within an incrementally evolving network. Each constituent module within this network is authored and maintained by scientists who need to draw on its resources for their work in annotating (describing in algorithmically useful ways) both their own experimental results and also related results captured in scientific databases and in the journal literature. It is then the need for suitable resources to annotate such results which progressively dictates the ontologies’ evolving content. Domain experts are hereby rewarded for their participation through the fact that they play a direct role in shaping the resources that they will need in their own work in the future. The whole endeavor is entirely voluntary. Yet experience so far suggests that, in at least a significant portion of life science domains, adoption of the Foundry strategy will be considerable. 2.3.Benefits of Orthogonality In addition to providing a strategy for resolving the silo problem, some further benefits of the principle of orthogonality include: First, that it provides assistance to those new users of ontologies who need to know where to look in finding an ontology relating to their subject-matter for which they can have reasonable assurance that it has been validated and will be used and maintained in consistent fashion by fellow subject-matter experts; that it will work well with other established ontologies; and that the expertise acquired in adapting it to specific local and immediate needs will potentially be of more general and lasting utility. Second, the requirement of orthogonality obviates the need for ‘mappings’ between ontologies, which have proved not only difficult to create and use, but also error-prone and hard to keep up-to-date when mapped ontologies change. Third, orthogonality ensures the mutual consistency of ontologies, and thereby also the additivity of the annotations created with their aid by

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us