Metadata Annotations for Representational Units and Representational Artifacts
Total Page:16
File Type:pdf, Size:1020Kb
Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Metadata Annotations for Representational Units and Representational Artifacts
MSI Ontology WG: http://msi-ontology.sourceforge.net/ PSI Ontology WGs: http://psidev.sourceforge.net/ OBI Ontology WG: http://obi.sourceforge.net/
An example owl-implementation of these recommendations can be found at: https://svn.sourceforge.net/svnroot/msi-workgroups/ontology/RU_metadata.owl https://svn.sourceforge.net/svnroot/msi-workgroups/ontology/RA_metadata.owl
Contents 1 Rationale for this document...... 3 2 (Meta-) Reference Terminology...... 4 3 What are Metadata...... 5 3.1 Metadata Definitions (derived from Wiki)...... 5 3.2 Metadata categorisations (this chapter is DRAFT)...... 6 4 Why annotating with Metadata...... 12 5 Where and how to store Metadata...... 14 6 General recommendations...... 15 7 Sources of metadata...... 16 7.1 Using established metadata standards...... 16 7.1.1 RDFS / OWL Comments...... 16 7.1.2 Protégé Metadata ontology...... 16 7.1.3 SKOS (Simple Knowledge Organization System)...... 17 7.1.4 Dublin Core (recommended for re-use!)...... 19 7.1.5 ISO Standards...... 21 7.1.6 Friend of a friend (FOAF)...... 22 7.1.6.1 What's FOAF for?...... 23 7.1.7 Willpecker Glossary...... 24 7.1.8 OntoClean: Metadata for ontology evaluation...... 24 7.1.9 Annotea...... 24 7.1.10 Atom...... 24
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
7.1.11 RSS...... 25 7.2 Self-made metadata annotation properties...... 25 7.2.1 Creating own owl:AnnotationProperties (recommended!)...... 25 7.2.2 Creating metaclasses with new properties...... 26 7.2.3 Search flags within the rdfs:comment field (intermediate solution !)...... 26 7.2.4 Administrative Information within class names (here rdf:label)...... 27 7.2.5 Proposed format independent metadata recommendation...... 28 7.2.5.1 Metadata for representational units (RUs)...... 29 7.2.5.2 Metadata for representational artifacts (RA)...... 46 7.2.5.2.1 OMV (Ontology Metadata Vocabulary) and DEMO...... 49 7.2.5.2.2 Administrative Information within the ontology file name...... 52 7.2.5.3 Owl Implementation of this metadata recommendation...... 54 8 Using Annotation Metadata Properties...... 58 8.1 Using the Protégé QueryTab...... 58 8.2 On term obsoletions (taken from the psi recommendations by Luisa…)...... 58 9 Annotating Instantiations (Annotation of Annotations)...... 61 9.1 GO evidence codes...... 61 9.2 Evidenve codes within GOA...... 63 9.3 An ‘evidence code’ classification within DAS-BioSapiens...... 64 10 Contributions...... 66 11 References...... 67
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
1 Rationale for this document
This document defines a set of metadata elements used for the formal annotation of CVs, ontologies and their representational units (RU). Naming conventions are not covered here, but addressed in the <
Sections in brackets […] are notes for the editor only. Please ignore these.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
2 (Meta-) Reference Terminology
Knowledge representations (KR, also called representational models) are referred to with the term ‘representational artifact’, RA). A representational artifact is made of related ‘representational units’ (RU, also known as KR-idioms) - in most cases classes and properties. We recommend using the term ‘class’ to refer to the representational unit that models a ‘universal’ in an ontological representational artifact. Each class has a ‘class name’, a term (string) to designate the class. An ‘Instance’ is the representation of a ‘particular’ in reality. A particular instantiates a universal and an instance (called an individual in owl) instantiates a class. Properties of universals are represented through representational units called ‘properties’. Properties which have fillers of simple datatypes (e.g. integer, string, boolean, ...) are called ‘attributes’ or ‘datatype properties’. Properties which have classes or instances as their fillers (also called ‘range’) are called ‘relations’ or ‘object properties’. Confusingly other formats use the word "property" for restrictions. The word ‘domain’ can mean a group of classes that a property is asserted to (in owl), but also describes the area of interest of a representational artifact. For a detailed recommendation have a look at the full paper: http://ontology.buffalo.edu/bfo/Terminology_for_Ontologies.pdf
The following key words “MUST,” “MUST NOT,” “REQUIRED,” “SHALL,” “SHALL NOT,” “SHOULD,” “SHOULD NOT,” “RECOMMENDED,” “MAY,” and “OPTIONAL” are to be interpreted as described in RFC-2119 document [6].
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
3 What are Metadata
3.1 Metadata Definitions (derived from Wiki)
The term Metadata (Greek meta "after" and Latin data "information") was introduced intuitively, i.e. without exact definition. Because of that today there is a whole variety of definitions. Metadata is often defined as data that describes other data or any class of objects whose descriptions are required for some purpose. In this sense the word 'metadata' describes a role that certain data could play with respect to other data. A term, which is often used as a synonym of metadata, is annotation. RDF for example has been introduced as a simple KR language for the assignment of semantic descriptions to information resources on the web. Therefore an RDF description of a web page represents metadata. However, an RDF description of a person, independent from any particular documents (e.g., as a part of an RDF(S)-encoded dataset), is not metadata – this is data about a person, not about other data. In the latter case, RDF(S) is used a regular KR language.
Example: 12345 is data, and with no additional context is meaningless. With the additional "metadata" of giving 12345 a meaningful name of "Zip Code". Metadata are data themselves, and data become metadata when they are used in this way. This happens under particular circumstances, for particular purposes, and with certain perspectives, as no data are always metadata. The set of circumstances, purposes, or perspectives for which some data are used as metadata is called the context. So, metadata are data about a resource in some context. Since metadata are data they are stored and managed as data themself. So metadata can be organised as models in some representation language creating a RA. Other definitions: "Metadata is information about data" "Metadata is information about information" "Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities." [ from William R. Durrell, Data Administration: A Practical Guide to Data Administration, McGraw-Hill, 1985]
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
"[Metadata is a set of] optional structured descriptions that are publicly available to explicitly assist in locating objects." [ from Ralph Kimball, The Data Warehouse Lifecycle Toolkit, Wiley, 1998, ISBN 0-471-25547-5] As, according to the common definition, metadata themselves are data, it is possible to create metadata about metadata, metadata about metadata about metadata and so on. This can be essential to archive metadata about metadata, e. g. to keep track of where the metadata came from when merging two documents. Of cause here ones has to avoid to fall into ‚analysis-paralysis’ and has to stop capturing metadata on a senseful (meta-)level.
3.2 Metadata categorisations (this chapter is DRAFT)
Metadata Annotation properties can be structured/classified/sorted/grouped according to many different principles. The following categorisation refers to metadata on RA and was introduced by the National Information Stadards Organization [Understanding metadata. NISO Press, 2004]: – Structural metadata relates to statistical measures on the graph structure underlying a RA , e.g. the number of specific representational units e.g. number of classes and individuals. Its availability influences the usability of a RA in a concrete application scenario, as size and structure parameters constraint the type and performance of tools and methods which are applied to aid the reuse process. Structural metadata are provided by most OE tools, i.e. ‘metrics’ in Protégé. – Descriptive metadata relates to the domain modelled in the ontology in form of class definitions or examples. – Administrative metadata provides information for ontology engineering to help manage ontologies, such as when and how it was created, rights management, file format and other technical information.
An other way would be the following ‚mutated’ from Wiki (http://en.wikipedia.org/wiki/Metadata): Content: Metadata can either describe the resource itself, e. g. name and scope of a whole ontology, or the content of ontology, e. g. class names and Class definitions. (This corresponds to our distinction on Metadata for RA and RU.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Mutability: With respect to the whole resource, metadata can be either immutable, e. g. the title of an ontology does usually not change, no matter what part of the ontology is being considered, or mutable, e. g. the size and granularity of an ontology. Logical function: There are three layers of logical function lying on top of each other: the bottom is the subsymbolic layer that contains the raw data themselves, on the symbolic layer are metadata describing the content of the raw data and the topmost logical layer contains metadata that allow logical reasoning using the symbolic layer. (same as first...) I would add: Storage position: Internal storage allows transferring metadata together with their data; thus they are always at hand and can be manipulated easily. External storage allows bundling metadata, e. g. in a database, for more efficient searching. The Protégé editor allows the selection of different import positions/methods: From a URL, from a local repository or from a relative path.
Others: Metadata on data (instances): RAs and RUs metadata on RU’s administrative metadata security , audit and accessability (privileges, censorship) authoring issues status provenance data curation edit and maintenance intferred extensions versioning semantics metadata content specific metadata e.g. formal definitions, axioms lexical metadata e.g. synonyms metadata on RA’s maintance extension
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
administrative metadata, email contact versioning date provenance collaborators licensing structured data (A box, imstances, Knowledge base) audit (proof/check sequence tracking) Checking and refinement Design, implementation, documentation, debugging, testing support
Metadata elements can be categorized according to their targetted agents, i.e. Human users, Application Software, Search-engines or OE developer-roles. Some OE developer roles are the following: Content PublisherRU Publisher,, RA Publisher Content Curator, Content Editor Content collaborator OE-Tool Developer Analyser Application tool developer Content Contributor, Content Committer Content Evaluator Content User Mapper
Others possible, as taken from: http://www.loc.gov/marc/sourcecode/relator/relatorlist.html Adaptor Annotator
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Arranger Author Bibliographic Censor Client Commentator Compiler Conductor Consultant Contractor Corrector Correspondent Depositor Director Distributor Expert Funder Interviewer Licensee Licensor Moderator Monitor Observer Originator Owner Patent holder Principle Investigator Producer Programmer Proofreader Research Team Head Respondent Reviewer
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Sponsor Standard Boby Transcriber
[refine, add].
The NCI Thesaurus, NCIT (http://www.mindswap.org/2003/CancerOntology/) uses the following metadata categorisation:
For complex properties,e.g. FULL_SYN, NCIT uses xml syntax within OWL (as long annotation on annotation is not provided by owl).
3.3 Separation between Knowledge and Implementation Levels
The description of any metadata standard should distinguish between the ontology conceptualization and the ontology implementation level as concrete realization of an ontology in a particular representation language (in various languages, syntaxes, versions etc.). This separation should be based on the observation that any ontology is based on a language-independent conceptual model. The conceptualization represents the view of the engineering team upon the application domain, which then is implemented using an ontology editor and stored in a specific format. The same conceptualisation might result in several implementations, with various classes, properties and axioms, depending on the concrete
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 representation paradigm, language and syntax. An Ontology Conceptualization (OC) represents the abstract or core idea of an ontology. It describes the core properties of an ontology, independent from any implementation details. An Ontology Implementation (OI) represents a specific implementation of a conceptualization. Therefore, it describes implementation specific properties of an ontology.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
4 Why annotating with Metadata
Metadata is used to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually. In general formalisation of metadata increases data quality: FROM: Developing High Quality Data Models 2-Sep-03
Some important properties of data for which requirements need to be met are: definition related properties relevance: the usefulness of the data in the context of your business. clarity: the availability of a clear and shared definition for the data. consistency: the compatibility of the same type of data from different sources. content related properties timeliness: the availability of data at the time required and how up to date that data is. accuracy: how close to the truth the data is. finally related to both are: completeness: how much of the required data is available. accessibility: where, how, and to whom the data is available or not available (e.g. security). cost: the cost incurred in obtaining the data, and making it available for use.
The advantages of using metadata for annotating RA’s and RU’s are different, but overlapping ones: The availability of metadata on RA’s is a fundamental dimension of RA access, i.e. ontology reusability (see also „DEMO - Design Environment for Metadata Ontologies“, Jens Hartmann, Elena Paslaru Bontas, Raul Palma, Asuncion Gomez-Perez) This facilitates sharing and reusing existing ontologies, which in turn increases their quality, as they are continuously
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 accessed, used and revised, and the quality of their applications, since these applications become (more) interoperable and are provided with a deeper, machine-processable understanding of the underlying domain. Reusing existing ontologies reduces the cost of ontology development, since it avoids the re-implementation. Consistent Metadata help unifying the divergent and isolated efforts in OE under one coherent OE process.
The availability of metadata on RU’s is a fundamental dimension for RU access, i.e. ontology engineering as well as maintenance and versioning. RU-metadata provide a consistent basis for ontology comparison, evaluation, alignment and mapping, i.e. will enable a more robust application of the PROMPT tools. Metadata ease the evaluation and adaption of existing ontologies in new application settings. The usage of standardised Metadata on RA’s enables for the creation of standardised ontology search engines like swoogle. Applying metadata can as well ease an integrated access e.g. to the OBO ontologies through Meta-tools, like LexGrid. APIs for terminological services that describe the basic functionality needed by Applications to access and query terminological content like the OMG Terminology query service or the HL7s Common Terminology service (CTS) and of course they also ease access for further tools developed currently by the NCBO BioPortal.
Anotherone: Data, Information, and Process Integration with Semantic Web Services (DIP), http://dip.semanticweb.org and https://bscw.dip.deri.ie/bscw/bscw.cgi/0/3016 Daniele Rizzi, A framework for representing ontologies consisting of several thousand concepts, An Ontology Representation and Data Integration (ORDI) Framework . A specification for an ontology representation and data integration framework
Metadata here can provide mechanisms to provide proof and trust in automated and semantic web systems. The advantages of metadata on RU’s are roughly the same as the ones listed in the Naming Convention document.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
5 Where and how to store Metadata
Metadata can be stored either internally, i.e. in the same file as the data, or externally, i.e. in a separate file. Both possibilities have advantages and disadvantages. Internal storage allows transferring metadata together with their data; thus they are always at hand and can be manipulated easily. External storage allows bundling metadata, e. g. in a database, for more efficient searching. There is no redundancy and metadata can be transferred simultaneously when using streaming. However, as most formats use URIs for that purpose, the method of how the metadata are linked to their data must be treated with care: What if a resource does not have an URI, e. g. resources on a local hard disk or web pages that are created on-the-fly using a content management system? What if metadata can only be evaluated if there is a connection to the WWW, especially when using RDF/OWL? How to realize that a resource is replaced by another with the same name but different content? Analoguously the Protégé editor allows the selection of different import positions/methods: From a URL, from a local repository or from a relative path. An other question is the one for the level on which Metadata is asserted and on which level to import a metadata ontology, i.e. let BFO import it and all obo ontologies will profit from the inherited metadata. This would ensure at least OBO wide interoperability and eases comparison and mappings.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
6 General recommendations
Semantics metadata (e.g. formal and non-formal definitions, non-formal axioms, synonyms, synonym types, …) and administrative metadata on RU’s (e.g. editorial and authoring issues, status and versioning, provenance data, rights, …) should be captured separately in a sufficiently granular and formal way to be tractable and to allow for querying these in a structured way. Metadata formalized as owl annotation properties should not distort description logics based reasoning. At least not reasoning about the actual ontology content.
Documenting RUs and RAs does not come for free. It takes about one man month to document 50 RUs to a sufficient standard.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
7 Sources of metadata
For a general overview on standardisation resources, look at http://www.taxonomywarehouse.com/twstandards_inc.asp
7.1 Using established metadata standards
Unfortunately valuable metadata elements are dispersed over different metadata standard bodies, not one standard providing integrated acceptable access to all possible desired metadata elements. Further more the standards available are not orthogonal. Unforfunately more and more standard initiatives want to make profit from their standards, so they are access restricted. The most important resources for metadata elements in our context are described briefly in this section.
7.1.1 RDFS / OWL Comments RDF and OWL provide a few elements suitable for capturing annotation metadata. These are mostly being used for auditing and editorial information. Some predefined annotation properties are owl:versionInfo, owl:equivalentClass, owl:sameAs, owl:differentFrom, rdfs:label, rdfs:comment, rdfs:seeAlso and rdfs:isDefinedBy the latter four referring to instances. We currently use the rdfs:comment field to capture metadata through Search Flags described below.
7.1.2 Protégé Metadata ontology The Protégé metadata ontology (http://protege.stanford.edu/plugins/owl/protégé) is an OWL-Full ontology, with annotation properties that have range and domain restrictions. However, the "official" online release of this file is OWL-DL, so that ontologies that use Protégé metadata annotations can still be shared as OWL-DL. It provides the following metadata annotations: Protégé:isCommentedOut: This property can be used in the Protégé-OWL UI to comment out restrictions. The Protege-OWL reasoning API does not send restrictions that have this annotation to the reasoner. Protégé:todoPrefix: The prefix that is used to determine whether a property value is a "todo" item. Protégé:todoProperty: A reference to the property that shall be used for “todo” annotations. The default value of this is owl:versionInfo.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
[refine, add].
7.1.3 SKOS (Simple Knowledge Organization System) W3C recommendation track RDF representation of controlled structured vocabularies used for indexing. Applications: Search/Retrieval exploiting metadata. Structure of CVs Has OWL interoperability in mind: Skos+OWL (swed.org.uk) and Skosowl SKOS (see also section on different sorts of synonyms) stands for Simple Knowledge Organisation System (http://www.w3.org/2004/02/skos/). The W3C standard SKOS provides a simple yet powerful framework for expressing knowledge structures in a machine- understandable way, for use on the semantic web. The SKOS Core Vocabulary is an RDF (Resource Description Framework) application. Using RDF allows data to be linked and merged with other RDF data by Semantic Web applications. SKOS Core provides a model for expressing the basic structure and content of thesauri style classification schemes, subject heading lists, taxonomies, terminologies, glossaries, and other types of controlled vocabulary. The name SKOS emphasises that: The scope of SKOS can be extended beyond thesauri to other types of representational artifacts, such as classification schemes, subject heading systems, taxonomies, glossaries, controlled vocabularies etc... The semantic web is not just about interchange of data, but also about the organisation of data in a distributed, decentralised way. RDF is not a file format, but a data formalism designed to support distributed data management in a web environment. The properties skos:prefLabel and skos:altLabel allow the assignment of preferred and alternative lexical labels to a concept. The property skos:scopeNote is one of a family of
'documentation properties' that also includes skos:definition (seen in the glossary example).
The property skos:related is a semantic relation property that allows the assertion of associative semantic relationships between two concepts. Using these properties and RDF graph expressing the above extract can be generated.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
In the image above, each blue circle represents a resource of type skos:Concept (the rdf:type assertions are left out for readability of the image). Also, as will be seen below, each of these concepts has an assigned URI, however these have been left out of the image also to improve readability of the image. Below is the RDF/XML serialisation of the RDF description of the above 'economic co-operation' class:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Dc:creator: A person, an organisation, or a service responsible for making the content. Dc:date: A date of creation or availability of the resource. The date value follows ISO 8601 [W3CDTF] e.g.: YYYY- MM-DD Dc:content: An abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. Dc:format: The physical or digital manifestation of the resource. The media-type or dimensions of the resource to determine the software, hardware or other equipment needed to display or operate the resource. Dimensions include size and duration. Select a value from a controlled vocabulary (i.e. MIME defining computer media formats). Dc:identifier: Unambiguously identify the resource in a context by means of a string or number conforming to a formal identification system. Example formal identification systems include URI, URL, DOI and ISBN. clsprov ? Dc:language: Best practice is to use RFC 3066 [RFC3066], which, in conjunction with ISO 639 [ISO639], defines two- and three-letter primary language tags with optional subtags. Examples include "en" for English. Dc:publisher: An person, an organisation, or a service responsible for making the resource available. clsprov ? Dc:relation: To reference the resource by means of a string or number conforming to a related resource in an other formal identification system. clsprov, defprov ? Dc:rights: A rights management statement for the resource, or reference to a service providing such information. Rights information often encompasses intellectual property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. Dc:source: The present resource may be derived from the Source resource in whole or in part. Reference this by means of a string or number conforming to a formal identification system. clsprov, defprov ? Dc:subject: Keywords, key phrases or classification codes that describe a topic/content of the resource. Select a value from a controlled vocabulary or formal classification scheme. altsprcls, synonyms ? Dc:title: A name by which the resource is formally known. (Human readable) synonym ? Dc:type: The nature or genre of the content of the resource, describing general categories, functions, genres, or aggregation levels for content. Select a value from a controlled vocabulary (e.g. [DCMITYPE]). altsprcls, synonyms ?
Other DC elements and element refinements as taken from http://purl.org/dc/terms/:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
A Dublin Core annotated owl-class looks like this:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
The International Standard „ISO/IEC 24707 - Metadata for technical standards and specification documents“ address the metadata needed to describe standards and other technical documents. ISO/IEC 24706 recommends relevant data elements for describing standards and other specification documents and specifies data elements that include the scope, normative references, terms and definitions in technical documents. The data elements are divided into those needed to describe the contents of a standard or document and those needed to register a standard or document in a Standards Registry. The data elements are described implementation independant via provisions rather than in some Information Technology modeling paradigm. There are some other ISO standards (http://www.iso.org/iso/en/ISOOnline.frontpage), that tackle (meta-) terminological standardization issues, e.g.: ISO 704:2000 Terminology work – Principles and methods ISO 860:1996 Terminology work – Harmonization of concepts and terms ISO 1087-1:2000 Terminology work – Vocabulary – Part 1: Theory and application ISO 1087-2:2000 Terminology work – Vocabulary – Part 2: Computer applications ISO/IEC 11179 (all parts), Information technology — Metadata registries (MDR) ISO/IEC CD 19773-11:2005, Information technology — Metadata modules (MM), Part 11: Contact information ISO 15188:2001 Project management guidelines for terminology standardization ISO 12620:1999 Computer applications in terminology – Data categories ISO 16642:2003 Computer applications in terminology – Terminological ISO 1951:1997 Lexicographical symbols particularly for use in classified defining vocabularies ISO 12200:1999 Computer applications in terminology - Machine-readable terminology interchange format (MARTIF) - Negotiated interchange ISO/TR 12618:1994 Computer aids in terminology - Creation and use of terminological databases and text corpora ISO 12620:1999 Computer applications in terminology - Data categories For a review on these standards have a look at the following two papers by Barry Smith et al.: http://ontology.buffalo.edu/medo/NCIT.pdf and http://ontology.buffalo.edu/medo/Wuesteria.pdf
7.1.6 Friend of a friend (FOAF) From: http://xmlns.com/foaf/0.1/
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
The FOAF project is based around the use of machine readable Web homepages for people, groups, companies and other kinds of thing. To achieve this we use the "FOAF vocabulary" to provide a collection of basic terms that can be used in these Web pages. At the heart of the FOAF project is a set of definitions designed to serve as a dictionary of terms that can be used to express claims about the world. The initial focus of FOAF has been on the description of people, since people are the things that link together most of the other kinds of things we describe in the Web: they make documents, attend meetings, are depicted in photos, and so on. The FOAF Vocabulary definitions presented here are written using a computer language (RDF/OWL) that makes it easy for software to process some basic facts about the terms in the FOAF vocabulary, and consequently about the things described in FOAF documents. A FOAF document, unlike a traditional Web page, can be combined with other FOAF documents to create a unified database of information. Vocabulary Overview: Classes: | Agent | Document | Group | Image | OnlineAccount | OnlineChatAccount | OnlineEcommerceAccount | OnlineGamingAccount | Organization | Person | PersonalProfileDocument | Project |
Properties: | accountName | accountServiceHomepage | aimChatID | based_near | birthday | currentProject | depiction | depicts | dnaChecksum | family_name | firstName | fundedBy | geekcode | gender | givenname | holdsAccount | homepage | icqChatID | img | interest | isPrimaryTopicOf | jabberID | knows | logo | made | maker | mbox | mbox_sha1sum | member | membershipClass | msnChatID | myersBriggs | name | nick | page | pastProject | phone | plan | primaryTopic | publications | schoolHomepage | sha1 | surname | theme | thumbnail | tipjar | title | topic | topic_interest | weblog | workInfoHomepage | workplaceHomepage | yahooChatID |
7.1.6.1 What's FOAF for?
For a good general introduction to FOAF, see Edd Dumbill's article, XML Watch: Finding friends with XML and RDF (June 2002, IBM developerWorks). Information about the use of FOAF with image metadata is also available. The co-depiction experiment shows a fun use of the vocabulary. Jim Ley's SVG image annotation tool show the use of FOAF with detailed image metadata, and provide tools for labelling image regions within a Web browser. To create a FOAF document, you can use Leigh
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Dodd's FOAF-a-matic javascript tool. To query a FOAF dataset via IRC, you can use Edd Dumbill's FOAFbot tool, an IRC 'community support agent'. For more information on FOAF and related projects, see the FOAF project home page at rdfweb.org.
7.1.7 Willpecker Glossary The Willpower Glossary is used by the current skos documentation. It is a glossary of terms relating to thesauri and other forms of structured vocabulary for information retrieval. (See http://www.willpowerinfo.co.uk/glossary.htm)
7.1.8 OntoClean: Metadata for ontology evaluation OntoClean provides Metaclasses and -properties used for ontology evaluation in terms of semantic rigidity: http://protege.stanford.edu/ontologies/ontoClean/ontoCleanOntology.html If you include this Protégé ontology in your ontology, you can annotate your classes with meta- properties of identity, unity, essence, and dependence. The OntoClean ontology in Protégé also contains constraints in the Protégé Axiom Language (PAL) enabling you to verify whether the ontology is "clean", i.e. does not violate any of the constraints based on these properties.
7.1.9 Annotea Annotea for OWL The Annotea framework provides an infrastructure for Web based creation and sharing of out of band, fine grained, extensible annotations. The Annotea framework consists of two fundamental aspects: an RDF based extensible annotation format and a protocol for sharing, publishing, and retrieving those annotations.
7.1.10 Atom In reaction to recognised issues with RSS (and because RSS 2.0 is frozen), a third group began a new syndication specification, Atom, in June 2003, and their work was later adopted by Internet Engineering Task Force (IETF). The relative benefits of Atom and the two RSS branches are a matter of debate within the Web- syndication community. Supporters of Atom claim that it improves on RSS by relying on standard XML features, by specifying a payload container that can handle many different kinds of content unambiguously, and by having a specification maintained by a recognised standards organisation. Supporters of RSS claim that Atom unnecessarily introduces a third branch of syndication specifications, further confusing the marketplace.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
For a comparison of Atom 1.0 to RSS 2.0 see Atom Compared to RSS 2.0.
7.1.11 RSS RSS 1.0 etc can also be mixed in with FOAF terms, as can local extensions. RSS: http://en.wikipedia.org/wiki/RSS_%28file_format%29
7.2 Self-made metadata annotation properties
7.2.1 Creating own owl:AnnotationProperties (recommended!) One possibility is to create new owl annotation datatype properties. See: http://protege.stanford.edu/plugins/owl/publications/DL2004-protege-owl.pdf#search= %22Protege%20owl%20scalability%20large%22 OWL supports this through annotation properties. The OWL Plugin allows to attach annotations to ontologies, properties, individuals and classes. Annotation properties can be edited by means of a specific table widget. The OWL Plugin allows the user to put arbitrary values into annotations, including complex objects. These are currently being optimized for the Dublin Core metadata so that, for example, annotation properties with change dates and authors can be filled in automatically.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
OWL-DL supports high expressiveness without loosing computational completeness and decidability for reasoning systems, but unlike OWL-Full it has some restriction for using annotation properties. The sets of different properties (object-, datatype- and annotation properties) must be disjoint, i.e. owl:versionInfo is not allowed to define a datatype and an annotation property at the same time. Identifying domain or range constraints and sub- properties for annotation properties is not allowed, because annotation properties must not be used in property axioms. The owl:AnnotationProperty is a commenting assertion which can be uniformly applied to all sorts of OWL entities. All annotation properties are ignored by the reasoner, and they may not themselves be structured by further axioms. owl:AnnotationProperty assertions can have as objects either individuals or datavalues, including rdf:XMLLiterals, thus can embed arbitrary XML, including RDF/XML (e.g., Annotea comments), XHTML, or SVG. The built in annotation properties rdfs:label and rdfs:comment are already extensively used in user interfaces (e.g. tool tips) and in end user renderings of ontologies. [refine, add]
7.2.2 Creating metaclasses with new properties One could add fields for metadata to the metadata-schema (the KR-language itself), but this can force the ontology into OWL Full and one can’t use many useful protégé tools anymore. In general it should be avoided to add slots for capturing metadata by creating a metaclass and attaching a new meta-property to this metaclasss and subclass all usable classes from this new metaclass (as one would do in the traditional OKBC CLIPS Protégé format). The owl plugin is currently undergoing massive changes and will be rebuild in a way, that is more independent from the Protégé meta-architecture.
7.2.3 Search flags within the rdfs:comment field (intermediate solution !) As long as we are still working on the final recommendations to capture metadata formally, we simply use defined search flags within the rdfs:comment and owl:versionInfo fields. To allow fast editing we make use of certain search-strings / markers which flag the rdfs:comment field with annotations for certain semantic and administrative issues. In general each marker starts in a new line and therefore starts with a capital letter. When it has multiple values, these are separated with a comma (has to be discussed, alternatively they are added one per line each time after the marker and a number). Do not use multiple rdfs:comment fields for one class to
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 make more than one comment. Since there is no semantics which could disambiguate between these, there is no advantage in dispersing annotations over multiple comments.
7.2.4 Administrative Information within class names (here rdf:label) This is not a recommendation, rather a possibility: Helper-classes, sometimes called residual categories, that are designated through specific Pre- or Postfixes can be introduced. Helper-classes such as "_unknown" or “_obsolete” do not designate biological universals, but nevertheless they can be useful as temporal containers while developing the ontological structure. The ”_” prefix can be used to mark helper-classes of administrative nature. Following GO editing guidelines (at http://www.geneontology.org/GO.usage.shtml) it is recommended not to delete classes, but to transfer them into an ‘_obsolete’ class container as we might want to use them later. When a RU is deleted, state in the comment field why and add in the definition a remapping to a corresponding RU in use. Administrative markers within the rdfs:label class name will be seen in the hierarchy and therefore will allow for easy recognition when browsing the hierarchy. E.g. like with the “_”, a "?" can be added directly in front of the class name (no delimiter) when its position is not clear or it needs to be deleted or refined. So-called residual categories (‘other’, ‘NOS’, etc.) exist in many biomedical terminologies, though their inclusion has been subjected to criticism [e.g. College of American Pathologists, SNOMED Clinical Terms Consultation Document; Requirements Analysis. Version 10, 2000 Oct 12.]. Often, a residual class is interpreted as the complement of the union of all the non-residual siblings listed, but this interpretation causes problems when a terminology is expanded to include more such siblings. It would be nice if terms can be marked obsolete and they yet remain "in the ontology". Dropping them intop an obsolete container looses positional information on their previous classification. We do not want to re-capitulate the whole structure in the Obsolete container. Maybe we can import an administrative ontology version which has residual categories. Or maybe there is a possibility to just mark classes 'deleted# and let them persist in the ontology and just render them invisible for the end user…
Some systems use suffixes to encode the KR-formalism entities the name belongs to, for example “has_motherPr” (relation/predicate), “moleculeCFn” (function - returns a class), “binary_predicatePMC” (metaclass predicate). This encoding system can be advantageous in the early process of semantic enrichment, for example when "ontologizing" taxonomies.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v2 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
However this is not always possible, because it requires an a priori knowledge of formalism entities (RUs of the KR-Meta-language) and their semantics during the initial phase of the collection of the CV.
7.2.5 Proposed format independent metadata recommendation The following table gives a recommendation for a non-redundant set of metadata elements their descriptions, cardinalities, usage obligations and a mapping to homolog elements from different other metadata approaches (dc:core, skos, NCIT, birnlex). We would like to point out that the recommendation in its tabular form is in principle independent from its implementation in a certain representation formalism. Nevertheless an implementation has been added in front of the tables. So far, some of these metadata elements have been kept as simple markers within the rdfs:comment fields in the owl formalism (see above). This was an intermediate solution and changes in the near future, when a concrete stable formalism/implementation has been agreed upon by the group. Currently for the owl formalism creating annotation properties is the most plausible way. Implementation details, e.g. how to capture synonym provenances are currently being worked out.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
7.2.5.1 Metadata for representational units (RUs)
Find an implementation of these recommendations in owl format at https://svn.sourceforge.net/svnroot/msi- workgroups/ontology/RU_metadata.owl Import this ontology and you will also be provided with some residual categories (e.g. _deleted, _inclusion_list and _temp_orphan) which can help administering classes (see Chapter 7.2.5.3 Owl implementation …).
Metadata- Definition Type RU, not Usage Cardinalit Example for class descriptor (Editorial-, actually category/ y “organism” Administrat an Obligation ive, Usage) annotatio (Required, n property Optional, Extension al) def_need (will To mark where no A o 0:1 - be obligate proper definition when formal has been def capturing assigned yet mechanism exists)
def The formalized U,A * r 1 def: A living (or once and normalized living) entity that has class definition (or can develop) the according to ability to act or OBO-best function practice, independently
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
explanaining the meaning of a class.
alt_def An alternative U,A o 1:* alt_def: An organism definition. Usually is a biomaterial entity the natural which is self- language contained definition initially provided by the domain specialists
temp_def When a def A o 0:1 temp_def: An needs to be organism is a refined, it is selfsustaining system indicated by this that gains energy marker (was: through “???” at the decomposition of beginning of the nutrients. This definition) energy is used to grow and propagate.
def_prov The definition E r 1 def_prov: [source: id] provenance* PMID:14755292 (was: Defsource) GO:GO:0018995 is put after the OBI: Marc definition. It can Mustermann
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
be a source uniprot: publication, a www.websters- database or online-dictionary.org ontology entry, a ISBN:0198506732 group or person name or a URL (dbxref in obo)
cls_prov The class E o 1 cls_prov: provenance. A GO:GO:0018995 database cross reference
ont_imp To mark from E o 0:* ont_imp: NCBI [ontology where on in the Taxonomy:organism name: class- Class-hierarchy ont_imp: name and class we want to import ZDB:ZFA:0001094 id] / refer to a ont_imp:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
complete GO:GO:0018995 subclass hierarchy from other ontologies (was: refer to)
refact When a class has A, E o 0:1 refact: been refactored selfstructuring_entity into more atomic classes, then the
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
compound class is made obsolete and this deleted source class is mentioned after the refact statement in the new atomic classes
alt_supr_cls An alternative A o 0:* alt_spr_cls: superclass material_entity assertion
cls_name The human * r readable class name
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 cls_ID A unique * r OBI_001563 Identifyer for the class, consisting of Group prefix, underscore and number prpty_ID The property * r - unique identifyer pref_propty_na The preferred o - me name for a class, usually the one used to display in the Hierarchy- browser. pref_cls_name The preferred o organism name for a class
short_cls_nam A short name e suitable for graph visualisations etc. synonym An alternative A,U o 0:* synonym: living thing class name used as synonym
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
acronym An acromym for e the class name
abbrev An abbreviation e for the class name
cls_del When a class is A o 0:1 cls_del: redundant deleted state why class it was deleted
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
axiom General axioms U * o 0:* axiom: An organism to be fulfilled be often plays the role instances of the Investigation object. class can be captured formlessly in natural language here cls_expl An example A,U o 0:* cls_expl.: Archaea, subclass for the Bacteria, Eukaryota, class or Database Viroids, Viruses entry which will be annotated through this inst_expl An example A,U o 0:* Iist_expl: my dog value or instance “Lassy”, patient for the class or “Herbert Schmitt” Database entry which will be annotated through this curation_status The status A o 1 From:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
(stability level) of http://www.w3.org/20 the class. 04/02/skos/mapping/ Specifies tracking spec/ information. Each term in the vocabulary has a term status level assigned. The status of a term indicates its level of stability, i.e. how much it may be expected to change in the future. The following status values are allowed:
unstable
unstable, and feedback is welcomed on it's current form and utility (analagous to 'alpha' release in software development). It may currently be poorly defined. It's meaning and/or form may be expected to change at any time. Do not implement mission critical systems that depend on this term
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
persisting in its current form. testing The term has gone beyond the raw proposal stage, and is undergoing testing (analagous to 'beta' release in software development). This term may still change in response to feedback from testing, although it may be expected not to undergo any radical change. The cost to early implementors of changing the term will be considered, however the goal of achieving wider interoperability and long-term stability may override those considerations. stable
(meaning changing) alterations will take place. Implementors can expect the term
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
to persist in its current form indefinitely. unresolved_iss An important ue problematic issue that has to be tackled by the editors editor A specific o editior/curator who is responsible for and edits this RU scope_note Any general A,U o 0:* scope_note: OBI formless remark should link to an or note about the external resource class (was: rem, here (e.g. NCBI note) Taxonomy)
change_note A note that o indicates what was modified or changed concerning the RU.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
replace Deleted / o deprecated terms which the given term has replaced in recommended usage. edit_note A note related to o the RU intended for its editor.
source o OBI, FMA, Chris Mungall, …?
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
creation_date Date on which the A o 2006-04-20 class or property was first issued.
modified_date Date on which the A o 2006-04-20 class or property was last modified.
action_item A description of a task / action for the RU editor to solve an issue related to the RU context_keywor The main usage d contexts can be stated, e.g. for text mining
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
purposes or translation purposes. formal_cls_na A name for the me class that is formaly controlled through linguistical rules and axioms. E.G. OBOL normalized ones that adhere to defined principles of word/morpheme/a ffix order and form. ??? old_cls_state State wheather a class was defined or primitive when deleted. old_sub_cls For deleted classes state their last position within the ontology, state the old subclasses. rights Indicate access rights for a RU. The security policy should be compliant with the rule-based access control
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
standards,INCITS , InterNational Committee for Information Technology Standards (formerly NCITS). (2003). Role Based Access Control. INCITS 359 DRAF, 4/4/2003. http://csrc.nist.gov /rbac/ Those offer, in at the same time, a consistent layered approach for security policy definition and management and for compliance with a growing set of supporting tools. short_cls_nam A short class e name suitable for graph visualisations etc. UMLS_connect A link to a UMLS semantic Type CUI
[this table is work in progress! Not all fields are filled with values!]
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
*Usage categories (Obligation): – Required: These metadata facts are mandatory. Missing elements lead to incomplete metadata descriptions of ontologies and are handled accordingly by metadata management tools. – Optional: The specification of optional metadata elements, though not mandatory, increases the reusability of the corresponding ontology. – Extensional: This class of metadata elements is not represented in detail in the core model, but can be further elaborated in extension modules
* Within the defprov specify the sources as follows:
Person source: State the Persons name or use its initials when it is unique and the full name is available in the dc:creator value names list. Database source: Database source identifiers have two parts, separated by a colon: an abbreviation for the source database and the identifier of the item in that database. For example, the definition source for the protein GTR1_MOUSE in the Uniprot database would be represented as “uniprot:P17809”. Book source: If the definition comes from a book, use the ISBN. For example, a definition source for the Oxford Dictionary of Molecular Biology would be ISBN:0198506732. Hyphens should be removed from the ISBN. Journal publication source: If the definition of a term comes from a paper, use the PubMed ID, e.g. ISBN:0198506732PMID:11910864.
There are some more metadata we might want to capture (this is a draft, contributions are welcome), e.g. synonym types, which could be captured as done in obo (see also SKOS section).
I would like to be able to capture all comments in a RU centric based manner, that means all comments on a RU should be accessible from within the ontology. Possabilities are:
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
link to an external website, e.g. email thread on a discuss list or SF term tracker, capture comments directly as annotation properties …. [add]
In general we could discuss if we want to add annotations helpful for format conversions, analogous to the OBO, e.g.: Builtin: Def: Whether or not this term or relation is builtin to the obo format. Allowable values are "true" and "false" (false assumed as default). Rarely used. One example of where this is used is the Obo relations ontology, which provides a stanza for the "is_a" relation, even though this relation is axiomatic to the language.
We have to decide which properties we want to provide with the ontology when its published (properties useful to the user) and which are ‘for our eyes only’ (pure administrative and editorial). This of cause depends on the general OE approach, i.e. one developer, a small core set of developers or a direct access distributed development approach….
Mappings: [add, refine]
About BIRN Property integration in the table above: For integration and mapping in the table above I left out the misspelling property. Acronym and abbreviation could be modelled as Synonym sub properties. About NCIT Property integration in the table above: Is there a formal semantics behind the different ways to name properties? It has ALL_CAPITALS, CamelCase, and space and underscore separators… For integration and mapping in the table above I left out the domain specific properties. Of the more general ones I left out the following: LONG_DEFINITION, ALT_LONG_DEFINITION, Semantic Type, Display_Name, ImageLink and NCI_META_CUI. ...
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
7.2.5.2 Metadata for representational artifacts (RA)
Currently edited through the metadata tab in protégé, using different metadata ontologies Find an implementation of these recommendations in owl format at https://svn.sourceforge.net/svnroot/msi- workgroups/ontology/RA_metadata.owl [here first draft]. ontology_title The name of Dc:title the ontology. abbreviation subject The Dc:subject knowledge domain which the ontology allows to represent release_date date of the Dc:date, last release created, issued of the file usage short Dec:description description of the usage of the CV file editor Name of the Dc:creator, person contributor and/or the working group where the last release was edited publisher The Dc:publisher organisation or the
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
person who publishes the ontology. version version Owl:versionInfo, number of dc:hasVersion, the released dc:isVersionOf CV file status stability applied_by Declares where the ontology has been applied. tools_used The name of the tool used to create the Ontology ont_type e.g.: Generic, Upper-Level, Domain, Application, Task, Foundational , Linguistic
OE_method/approach The name of the OE principle, method model used to create the Ontology, e.g. OBO. repr_lang home URL namespace metrics founder accessability rights License_model publication documentation domain target_user keywords main_top_level classes obo-foundry Used_top_level_ontology reviewer Main_top_level_relations Num_of_classes
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Num_of_properties Num_of_instances? editing_mechanism versioning_mechanism development_period searchability scalability Reasoning_services max_hierarchy_level maintance known_bugs
[Also evaluate the following....]
7.2.5.2.1 OMV (Ontology Metadata Vocabulary) and DEMO http://omv.ontoware.org/ OMV is a metadata scheme describing ontologies and ontological content. Ontologies have seen quite an enormous development and application in many domains within the last years, especially in the context of the next web generation, the Semantic Web. Besides the work of countless researchers across the world, industry starts developing ontologies to support their daily operative business. Currently, most ontologies exist in pure form without any additional information, e.g. authorship information, such as provided by Dublin Core for text documents. This burden makes it difficult for academia and industry e.g. to identify, find and apply - basically meaning to reuse - ontologies effectively and efficiently. A proposal for a metadata standard is Ontology Metadata Vocabulary OMV. Tools: ..- Oyster: Decentralized Ontology repository for managing, searching and exchanging metadata about ontologies in a P2P network. (http://oyster.ontoware.org) ..- Onthology.org: Centralized Ontology repository that uses the OMV metadata schema to describe, classify, access, evaluate and store ontologies. (http://www.onthology.org)
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
OMV is a proposed core metadata vocabulary used by the DEMO (Design Environment for Metadata Ontologies) project. OMV provides organizational, methodological and technological level metadata. OMV is accessable as an owl file from http://omv.ontoware.org
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
The following annotation properties have to be added yet in the figure above: RepresentationParadigm and OntologyRepresentationLanguage
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
OntologyTask and more specialized sub-concepts such as SemanticSearch or SemanticAnnotation.
DEMO provides the technical means required for metadata management and maintenance for the SemanticWeb in form of the semantic engineering platform OntoWare, which provides a scalable, collaborative software and ontology engineering environment for the collaborating partners. The mission of the DEMO framework can be categorized as follows: – Provision of an organizational infrastructure for the development and maintenance of a commonly agreed metadata vocabulary for ontologies, including equitable participation mechanisms for organizations involved in DEMO activities. – Identification and application of suitable methodologies and technologies to support the complete life cycle of the OMV Core. – Development and maintenance of the OMV Core. – Promotion of OMV extensions relying on the OMV Core. – Provision of an appropriate technical infrastructure for the enumerated activities. DEMO activities are driven and supervised by a Management Board (MB), consisting of representatives from the OMV Consortium, which includes all active OMV contributors. A central organization objective of DEMO is to keep the barrier low for participants to join the OMV Consortium and to get involved in the development and recommendation process. Complementary to this distinction, DEMO foresees several Working Groups (WG) corresponding to the aforementioned components (WG Engineering,WG Evolution,WGs Extensions,WGs Applications).
7.2.5.2.2 Administrative Information within the ontology file name
Ignore the following if you use cvs or svn change tracking systems. Administrative information can be stored within the file and/or captured with a file naming convention similar to the one proposed by GO (decision should be taken):
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
ShortDescriptiveFilename_Date_Author_Version.ext Where "ShortDescriptiveFilename" is a short descriptive filename that may contain upper and lower case text, numerals, "-" (dash) and "_" (underscore). Use upper CamelCase convention and underscore as separators. Space and other symbols are not allowed. This should include the PSI WG acronym with the suffix –CV. For the PSI molecular interaction controlled vocabulary it will be “PSI-MI-CV” "Author" is the name of the author and/or the organization where the file is authored. Separate author and organization with a dash if both are featured. Again, space and other symbols are not allowed here. This should include the PSI WG acronym, followed by the WG CV chair-persons initials. For the PSI molecular interaction controlled vocabulary with a CV chair person called Luisa Montecchi-Palazzi it will be “PSI-MI-LMP” "Version_Date" comprises the version number and/or the date the file is released. Start the version number with a "v"; use "-" instead of "." in the version numbering (like "v2-15" instead of "v2.15"). Separate version and date with an underscore, if both are featured. For the date reference, the more significant parts should come first -- use "yyyymmdd". The advantage is, that if you sort your files alphabetically, the date is still sequential. Add an "a", "b", "c", ... suffix, if multiple versions may occur with the same date reference. Again, space and other symbols are not allowed here. After this follows the "." (a dot, there should only be one dot in the entire filename and that should be right before the file extension). "ext" is the standard file extension by which this file can be associated with an appropriate application that will handle it. This is generally in 2~4 lower case alphanumeric characters. E.g.: PSI-MI-CV_PSI-MI-LMP_v1-9_20060420.obo A similar convention is being used by w3c for their published work (e.g. note their page header information http://www.w3.org/TR/2004/REC-webont-req- 20040210/).
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
7.2.5.3 Owl Implementation of this metadata recommendation
An initial implementation of these recommendations as simple non- hierarchical owl annotation properties has been created: RA_metadata.owl for Representational Artifacts (RA) and RU_metadata.owl for Representational Units (RU)
7.2.5.3.1 RA_metadata.owl
RA_metadata.owl formalizes recommendations for annotation properties to describe the ontology as a whole. Metadata properties required for ontology submission to the OBO and BioPortal repositories have been integrated as well.
7.2.5.3.2 RU_metadata.owl
RU_metadata.owl formalizes recommendations for annotation properties to describe the constitual parts (representational units or 'KR idioms') of the ontology. They should be sufficiently rich to aid the main domain independent ontology engineering and administration processes. These metadata elements provide tractable search tags to query for administrative and editorial metadata on classes and properties.
7.2.5.3.3 Implementation principles
These implementations were build in a manner to be modular (import RU and RA metadata separately) and simple to use. This is a lightweight set of metadata descriptors for people that feel the need for a more ontology-centric coverage, compared to the dublin core or skos. The Annotation properties names were choosen to be maximally intuitive to a majority of developers and yet short as possible, so that their xml element representation does not take too much memory. No use of property hierarchies was made so far, but can be added easily. Besides the dublin core, many of the more domain independent NCIT and birnlex metadata elements were also evaluated to build this RU metadata recommendation. RU_metadata.owl also provides some residual categories (e.g. _deleted, _inclusion_list and _temp_orphan) which can help administering classes, e.g.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 the _deleted class can substitute the current '_deleted_classes' in OBI. Such 'administrative only' classes should be separated from the domain ontology and are hence better placed in an imported metadata ontology.
7.2.5.3.4 Usage of the owl implementation and example annotation
Import these two ontologies into the ontology you want to annotate. Use the owl:imports statement and load them over the web from these URLs: https://svn.sourceforge.net/svnroot/msi-workgroups/ontology/RA_metadata.owl https://svn.sourceforge.net/svnroot/msi-workgroups/ontology/RU_metadata.owl
As an example we show how an annotated class (here the ‘instrument_configuration’ class from the nmr.owl) looks like:
Here is the owl code for the class shown above: Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 >machine configuration instrument setting Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 >This class should capture different instrument settings on the machine used in response to diverse analytical approaches.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
8 Using Annotation Metadata Properties
8.1 Using the Protégé QueryTab
[Explain profit from Search falgs and building a Query Lib] [also look at what FuGE:audit captures…]
8.2 On term obsoletions (taken from the psi recommendations by Luisa…)
Following GO editing guidelines (at http://www.geneontology.org/GO.usage.shtml) a term that is no longer used MUST not be deleted, but tagged as 'obsolete'. A unique identifier MUST NOT be deleted once used. IDs should be conserved at all times so that, even if a term is defunct or has a new ID, someone searching using the old ID can find it. A term can become obsolete when it is removed or redefined, but a term MUST NOT be made obsolete due to changes in wording that do not alter the meaning of the term. When a term's definition changes meaning, the term should also be assigned a new ID, and the old ID considered obsolete. To make a term obsolete, the CV update procedure should be followed (see below section 13.). When you make a term obsolete, insert the word 'OBSOLETE.' at the beginning of the term definition and add a comment that explains why the term has become obsolete and suggests alternative terms for annotators to use. Use the following syntax for the reason for obsoletion: comment: This term was made obsolete because [reason]. Alternative terms for obsoleted terms To suggest alternative terms to be used instead of, or as a replacement for, obsolete terms, use one of the following: Exact replacement(s) When exact replacement is possible (i.e. it is safe to move all existing annotations, keyword mappings, etc. to one term), precede the suggested term with the verb 'use': To update annotations, use the [PSI:XXX namespace] term '[term] ; XXX:[id]'.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
Example, using a use case from the gene ontology term: transfer RNA goid: GO:0005563 comment: This term was made obsolete because it represents a gene product. To update annotations, use the molecular function term 'triplet codon- amino acid adaptor ; GO:0030533'. No exact replacement(s) In cases where all existing annotations and mappings cannot necessarily be transferred to one term, put 'consider' in front of the suggested terms. Syntax for different situations: 1. There is only one suggestion, but it may not work for all annotations: To update annotations, consider the [PSI:XXX namespace] term '[term] ; XXX: [id]'. 2. To make more than one specific suggestion: a) from a single ontology, separate terms with commas: To update annotations, consider the [PSI:XXX namespace] terms '[term1] ; XXX:[id1]', '[term2] ;XXX:[id2]', '[term3] ; XXX:[id3]'. b) from more than one ontology, separate terms from one ontology with commas, and use 'and' between ontology names: To update annotations, consider the [PSI:XXX namespace] terms '[term1] ; XXX:[id1]' and the [PSI:YYY namespace] term '[term2]; YYY:[id2]'. examples: using a use case from the gene ontology term: expansin goid: GO:0009936 comment: This term was made obsolete because it represents a gene product. To update annotations, consider the cellular component term 'cell wall (sensu Magnoliophyta) ; GO:0009505' and the biological process term 'cell growth ; GO:0016049'. term: blue-sensitive opsin goid: GO:0015059 comment: This term was made obsolete because it refers to a class of proteins. To update annotations, consider the molecular function terms 'photoreceptor ; GO:0009881', '3,4-didehydroretinal binding ; GO:0046876'
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 and 'retinal binding ; GO:0016918' and its children, the cellular component term 'integral to membrane ; GO:0016021' and the biological process terms 'phototransduction, visible light ; GO:0007603' and 'UV-A, blue light phototransduction ; GO:0009588'. To suggest a term and all its children, use the syntax 'consider the [ontology name] term '[term] ; GO:[id]' and its children' (as in the example above). Restoring obsolete terms If you need to reinstate an obsolete term back into the CV, use the following: comment: Note that this term was reinstated from obsolete.
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
9 Annotating Instantiations (Annotation of Annotations)
One can see an annotation of data to TBox idioms as instantiation. The result is an assertional box (ABox or knowledgebase (KB)). In many systems in use today, Ontology class-instance (data)-associations derived from manual curation or experimental evidence can carry the same weight as annotations which have been predicted. A distinction between these different stati should be made such that the user of the knowledgebaseis aware of the confidence assocatied with each annotation. This distinction is critical as biologists using this resource must be aware of the different levels of confidence associated with the different methods of annotation. In this context Metadata on Annotation confidence is critical for working with data from KB repositories on ontologoically annotaded Data, e.g. the Open Biomedical Data (OBD) repositories currently under development by NCBO. The first step in this process is to establish the ‘status’ of all annotation types. Could GO evidence codes be expanded and applied to our and the OBO ontologies here? What is the ontology of 'annotation'? Can we use lessons learned from the corrections made in successive versions for the improvement of ontologies ? How can version-tracking be exploited for annotating annotation-evidences ?
9.1 GO evidence codes http://www.geneontology.org/GO.evidence.shtml Every GO annotation must indicate the type of evidence that supports it; these evidence codes correspond to broad categories of experimental or other support. An evidence code indicates how annotation to a particular term is supported, and is not necessarily a classification of an experiment. For every evidence code, there is room for curators to exercise judgement about the quality of the evidence, and how well it supports annotation to a node within each ontology. Evidence Codes
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
IC: Inferred by Curator IDA: Inferred from Direct Assay IEA: Inferred from Electronic Annotation IEP: Inferred from Expression Pattern IGI: Inferred from Genetic Interaction IMP: Inferred from Mutant Phenotype IPI: Inferred from Physical Interaction ISS: Inferred from Sequence or Structural Similarity NAS: Non-traceable Author Statement ND: No biological Data available RCA: inferred from Reviewed Computational Analysis TAS: Traceable Author Statement NR: Not Recorded The distinction between TAS and NAS is particularly sensitive to interpretation
The evidence fields can be thought of in a loose hierarchy of reliability:
TAS / IDA o IMP / IGI / IPI o ISS / IEP o NAS o . IEA
This hierarchy should not be interpreted as a rigid ranking of evidence types; users can and should form their own conclusions as to the reliability of each type of evidence and each individual annotation. It is a loose hierarchy also partly because the strength of the evidence will also depend on to what resolution you are annotating, and because there is a range of reliability within each evidence category (e.g. 90% versus 20% identity for "sequence similarity" or a two-hybrid result versus co-purification over several columns for "physical interaction"). There may be different kinds of evidence available to support annotating a gene product to different levels within each ontology. For example, there might be a direct assay showing that a protein localizes to the mitochondrion,
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006 and a physical interaction suggesting localization to the mitochondrial matrix (more specific node, but less reliable evidence). Curators can annotate genes to both a parent and a child, and cite the same or different kinds of evidence for the annotations as appropriate. Added 2000-11-08: Heather has seen cases where a paper presents several lines of evidence supporting a conclusion, of which each line of evidence alone is sufficient to annotate to a higher-level (more generic) node, but combining the lines of evidence gives the author (or curator) enough data to support annotating to a lower-level (more specific) node. We've decided to annotate each line of evidence singly, with the appropriate evidence code, for the higher node (e.g. have a line for IMP, another line for IPI, for one GO ID). The annotation to the lower node can then be included with TAS as the evidence; cite the paper if the author draws the conclusion. If the curator draws the conclusion, keep some record of what went into the decision.
9.2 Evidenve codes within GOA
GOA uses the following: Use of 'Qualifiers' A curator can choose to alter the meaning of an annotation by using a ‘qualifier’. There are three qualifiers; NOT, colocalises_with and contributes_to and, if used, are present in column 4 of the gene association file. Special attention must be paid to the NOT qualifier as this completely reverses the meaning of the annotation. NOT is used to make an explicit note that the gene product is not associated with the GO term. For example, if a protein has sequence similarity to an enzyme (whose activity is GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as NOT GO:nnnnnnn. Colocalizes_with is used only with terms in the Cellular Component ontology and is given to gene products that are transiently or peripherally associated with an organelle or complex. Contributes_to is used only with terms in the Molecular Function ontology and is given to a gene product that is a member of a complex which has an activity
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
but the individual gene product does not have this activity. All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. [to be refined]
9.3 An ‘evidence code’ classification within DAS-BioSapiens
http://www.ebi.ac.uk/~gabby/classification.html The DAS protocol as used by the BioSapiens project will use an evidence code derived mini-ontology to make annotations of annotations possible: The DAS group proposes to cluster the evidence codes into four broader, super-categories: Manually curated (super-code, MC), experimentally verified (EV), computationally predicted (CP) and those where no assignment is possible (NA). Technically, these evidence codes will be handled by adding an ‘evidence’ attribute to the response from DAS ‘types’ commands. The value of this ‘evidence’ attribute will be the two or three uppercase letter abbreviation given in the following table.
GO Evidence codes clustered into four categories:
Evidence Code Description BioSapiens Supercodes Manually curated IC Inferred by Curator NAS Non-traceable Author Statement MC RCA Comput. Analysis TAS Traceable Author Statement Experimentally verified IDA Inferred from Direct Assay IEP Inferred from Expression Pattern IGI Inferred from Genetic Interaction EV IMP Inferred from Mutant Phenotype IPI Inferred from Physical Interaction Computationally predicted ISS Inferred From Sequence or Struct. CP Similarity
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
IEA Inferred from Electronic Annotation No assignment possible ND No biological Data available ND
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
10 Contributions
This document has been drafted by Daniel Schober and it has received input from the MSI Ontology WG, PSI Ontology WGs, OBO WG and OBI WGs’ members, in particular from: - Gilberto Fragoso (OBI) - Luisa Montecchi-Palazzi, Frank Gibson (PSI) - Chris Mungall (OBO) - Barry Smith (cBIO, OBO) - William Bug (BIRN-Lex) - Phillippe Rocca-Serra and Susanna-Assunta Sansone (MSI)
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members Metadata Annotations for Ontology Engineering Draft v1 MSI Ontology, PSI Ontology and OBI WGs 08.11.2006
11 References
[1] <
***** NOTE: This draft document is a work in progress ***** Comments and ideas are welcomed and should be sent to: [email protected]
Working draft by: Daniel Schober MSI Ontology, PSI Ontology and OBI WGs members