Representing Knowledge in Oral Medicine – Remodeling Clinical Examinations Using OWL∗
Technical Report HS-IKI-TR-06-009 School of Humanities and Informatics, University of Sk¨ovde
Marie Gustafsson [email protected]
School of Humanities and Informatics University of Sk¨ovde, Box 408, SE-541 28 Sk¨ovde, Sweden
Department of Computer Science and Engineering Chalmers University of Technology, SE-412 96 G¨oteborg, Sweden
Abstract This report describes the remodeling of the representation of clinical examinations in oral medicine, from the previous proprietary format used by the MedView project, to using the World Wide Web Consortium’s recommendations Web Ontology Language (OWL) and Resource Description Framework (RDF). This includes the representation of (1) ex- amination templates, (2) lists of values that can be included in individual examination records, and (3) aggregates of such values used for e.g., analyzing and visualizing data. It also includes the representation of (4) individual examination records. We describe how OWL and RDF are used to represent these different knowledge components of MedView, along with the design decisions made in the remodeling process. These design decisions are related to, among other things, whether or not to use the constructs of domain and range, appropriate naming in URIs, the level of detail to initially aim for, and appropriate use of classes and individuals. A description of how these new representations are used in the previous applications and code base is also given, as well as their use in the Swedish Oral Medicine Web (SOMWeb) online community. We found that OWL and RDF can be used to address most, but not all, of the requirements we compiled based on the limitations of the MedView knowledge model. Our experience in using OWL and RDF is that, while there is much useful support material available, there is some lack of support for important design decisions and best practice guidelines are still under development. At the same time, using OWL gives us access to a potentially beneficial array of externally developed tools and the ability to come back and refine the knowledge model after initial deployment.
∗The work presented in this report was supported by the Swedish Agency for Innovation Systems.
1 Contents
1 Introduction 4 1.1 Overview ...... 5
2 Knowledge Representation in Oral Medicine 5 2.1 MedView ...... 5 2.1.1 TheDefinitionalApproach ...... 6 2.1.2 Storing Templates and Values ...... 7 2.1.3 TreeFiles ...... 8 2.1.4 ValueAggregates...... 8 2.2 Requirements for an Ontology for Oral Medicine ...... 9
3 Ontologies, RDF, and OWL 9 3.1 Ontologies...... 9 3.2 W3C Recommendations for the Semantic Web ...... 10 3.2.1 RDF...... 10 3.2.2 OWL ...... 11 3.2.3 Trade-offs in Making OWL ...... 12 3.3 ToolsforWorkingwithOWLandRDF ...... 13 3.3.1 Editors ...... 13 3.3.2 Application Programming Interfaces ...... 13 3.3.3 Visualizers ...... 14 3.3.4 Reasoners...... 14 3.3.5 Validators...... 15 3.4 Reported Experiences in Using OWL and RDF ...... 15 3.4.1 OpenWorldAssumption ...... 15 3.4.2 NoUniqueNamesAssumption ...... 16 3.4.3 Validation...... 17 3.4.4 NoSupportforDefaultReasoning ...... 19 3.4.5 ValueRanges ...... 19 3.4.6 ReusingOtherOntologies ...... 20 3.4.7 Imports ...... 20 3.4.8 UsingInstances...... 20 3.4.9 TheXMLSyntax...... 21 3.4.10 UseofDomainandRange...... 21 3.4.11 OWL’sSublanguages...... 22 3.4.12 Problems for Developers New to OWL ...... 23
4 Design and Development of the SOMWeb Ontologies 23 4.1 Relations between Structures of MedView and SOMWeb ...... 24 4.2 DevelopmentProcess...... 24 4.3 Designing the Examination Template Ontologies ...... 24 4.3.1 The Structure of the Examination Ontologies ...... 26 4.3.2 DesignChoices ...... 27 4.4 Designing the Value List Ontology ...... 32
2 4.4.1 Structure of the Value List Ontology ...... 33 4.4.2 DesignChoices ...... 36 4.5 Representing Individual Examinations ...... 37 4.5.1 Validation...... 37 4.6 RepresentingAggregates...... 38 4.7 End-userInput ...... 39
5 Using the Ontologies 39 5.1 Constructing Input Forms from OWLExaminationTemplates...... 40 5.2 MedViewDatahandling ...... 41 5.2.1 HandlingExaminations ...... 42 5.2.2 HandlingTerms ...... 42
6 Discussion 43 6.1 Results in Relation to the Requirements for an OralMedicineOntology ...... 43 6.2 OurExperiencesinUsingOWL...... 44 6.3 Benefits and Constraints of Starting from an Existing Model ...... 46 6.4 End-User Control and Standardizations ...... 46 6.5 StandardsinMedicine ...... 47 6.5.1 Comparison with the openEHR Approach ...... 47 6.5.2 ExternalClassifications ...... 48
7 Conclusions 48
8 Future Work 49
A MedView XML Examination Template for Meeting Consultation 54
B SOMWeb OWL Examination Template for Meeting Consultation 55
C Part of the SOMWeb Value List 59
D Example Examination Instance 60
3 1 Introduction
Basing clinical decisions on finding, evaluating, and using the latest research results is an essential premise of evidence-based medicine (EBM) [1]. A crucial part of the practice of EBM is the integration of the expertise of the individual clinicians with the best clinical evidence obtainable from external sources [2]. Processes necessary for EBM, such as the collection, analysis, validation, sharing, and harmonization of clinical knowledge, can in part be supported by information technology (IT). The MedView project [3] has aimed to provide IT-support for evidence-based oral medicine. This has been done by equipping the clinicians with a wide range of software tools, assisting in the various processes of EBM, and by providing a formal knowledge model on which to base these tools. However, as this model is only used within the MedView project, it is difficult to reuse external knowledge sources and to share the data collected by MedView tools with others. There is also a need to expand the current model and to reexamine how to best conceptualize examination data in oral medicine. Such an undertaking is also relevant for those areas of medicine that overlap with oral medicine. In knowledge representation, the term ontology is used to denote the definition of concepts and relations between them, for a given domain of interest. The Web Ontology Language1 (OWL) is a recommendation of the World Wide Web Consortium (W3C), along with the related Resource Description Framework2 (RDF). We want to investigate the development of ontologies in oral medicine using these recommendations, which will be studied by taking the previous representation of MedView as a starting point. The knowledge model of MedView includes the representation of (1) individual examination records, (2) examination templates describing the pattern from which the individual records are created and which are used in constructing user input forms, (3) value lists from which values can be chosen when filling out these forms, and (4) aggregates of values created and used when analyzing data from the examination records. Also, parts of the MedView applications will be adapted to handling the new OWL and RDF representations, which will also be used in the SOMWeb (Swedish Oral Medicine Web) online community. This online community serves as support for the discussion of interesting and difficult cases in oral medicine among geographically dispersed clinics in Sweden. The community is further described in [4]. In addition to the contributions of the developed ontologies, and the use of these in the online community, this work also serves as an experience report of using the RDF and OWL recommendations. In this report, we will refer to the original, definitional approach described in Sec. 2.1 as the MedView representation. The new OWL- and RDF-based representation, to be described in Sec. 4, will be denoted the SOMWeb representation.
1http://www.w3.org/2004/OWL 2http://www.w3.org/RDF
4 1.1 Overview
We begin by describing features of the MedView representations in Sec. 2.1, followed by requirements for an ontology of oral medicine in Sec. 2.2. In Sec. 3, brief introductions to ontologies, RDF, and OWL are given, followed by a longer recount of others’ experiences in working with OWL. Section 4 gives details of the remodeling of the MedView knowledge model using OWL and RDF, for both the developed ontologies and the design decisions made. A description of how the ontologies are used in the datahandling of MedView applications is given in Sec. 5. In Sec. 6, the discussion, we compare the developed ontologies to the requirements of Sec. 2.2, as well as to standards for representing patient records and medical classifications. We also discuss our experiences in using OWL, the constraints and benefits of starting with an existing knowledge model and code base, and the trade-offs in maintaining user-control, while aspiring for standardization, reuse, and formal knowledge representation. Finally, in Sec. 7 we provide conclusions of this work and in Sec. 8 give suggestions for future work.
2 Knowledge Representation in Oral Medicine
2.1 MedView
The main goal of MedView, since its inception in 1995, has been to support evidence-based oral medicine. This includes developing models, methods, and tools to aid clinicians in their daily work and research. At the heart of the work is how computer technology can be used to aid clinicians in systematically learning from the gathered clinical data. This learning is supported by a suite of tools. The clinicians specify what data to gather in a clinical examination by defining an examination template (FormEdit), along with lists of values that can be used in examination record (TermEdit). These templates are then used to gather data in e.g., an application for creating and viewing examination records (MedRecords) and in an online tool for collecting data (mForm). The clinicians can then visualize and analyze the collected patient data (mVisualizer). Natural language summaries of examination records can also be generated (MedSummary). The knowledge base built in the MedView project currently contains data from over 15600 examination records, covering more than 6200 different patients. The main knowledge base is located at the clinic of Oral Medicine, faculty of Odontology, G¨oteborg University. The various clinics within the Swedish Oral Medicine Network (SOMNet) have local knowledge bases containing the examination records collected at each clinic. The contents of these local knowledge bases are added regularly to the knowledge base in G¨oteborg so that the entire amount of data collected can be accessed through one common knowledge base. At present, the clinical knowledge used in MedView is divided into examination templates, value lists, and value classes. In addition to these basic knowledge structures, there are also definitions for the layout of text and slot-fillers used in the generation of summaries of examination records in the MedSummary application [5], and definitions for the structure and layout of templates used in the acquisition of examination data.
5 Term Definition Examination = {Patient-data, Anamnesis, Diagnosis,...} Patient-data = {Patient-code, Age, Born, ...} Anamnesis = {Medication, Allergies, Smoke, Alcohol, ...} Patient-code = {”1234567890”} Age = {”37”} Born = {”Sweden”} . . . .
Figure 1: A sample examination in MedView, using the definitional approach. The Examination term is defined by the set of terms that includes Patient-data, Anamnesis, and Diagnosis. The terms Patient-data and Anamnesis are then defined by sets of other terms. Among the terms used in defining Patient-data is the term Age, which in this examination is defined by the value 37. The quotation marks indicate values which have been added to the template to make an examination record.
MedView has a declarative model which is based on the assumption that definitions are central tools in all attempts to provide a precise and formalized representation of knowledge [6]. We begin by explaining how this definitional approach has been realized in MedView, followed by a description of how templates and value lists are stored. As the knowledge model in MedView has become more complex, additional concepts, such as aggregates of values, have been provided to be able to structure related values. After a short description of such aggregates, we move onto the conclusion that it would be useful to consider other forms of knowledge representation for MedView, such as ontologies, and a list of requirements for such an ontology is put forth.
2.1.1 The Definitional Approach
Clinical data in MedView has thus far been seen as definitions of clinical terms [7], where a definition is seen as a collection of equations, where the left-hand side (atoms) are defined in terms of the right-hand sides (conditions). In the case of MedView examination templates, the atomic data unit is an examination. Each examination is a set of terms, and a term is defined either as a set of other terms or as a set of values. In this way, abstract clinical concepts, e.g., examination, diagnosis, and patient data, are given by definitions of collections of specific clinical terms. An example of this is given in Fig. 1. Going from an examination template to an individual patient record, the definitions provided by the template are elaborated on by further filling in values for terms. For example, the terms status, direct, mucos and palpation are all part of the general template that defines a particular clinical examination protocol. A concrete instance of an examination template—an examination record—is given by defining terms like Mucos-site and Mucos-col in terms of observed values, e.g., {l12} and {white, brown} respectively. The knowledge base (KB) also contains knowledge structures describing general domain knowledge. Values for the terms defined in templates are taken from formalized lists of valid values. These value lists are given as value definitions, which are stored in the knowledge base along with the examination records and templates.
6 2.1.2 Storing Templates and Values
The structure of the examination template is stored as XML. The general structure of the form is: EXAMINATION FORMINFO AUTHOR TITLE ... CATEGORY INPUT INPUT ... CATEGORY INPUT ...... An example XML template is given in App. A. The example template is for a meeting consultation, which is for recording data from teleconference meetings. In terms of XML, the root of the template is an examination element, which contains several category elements. Each of these category elements contains a name, a description, and several inputs. Each input has several attributes, such as type and whether it is required, its name, description, and an instruction to be displayed to the clinician entering patient data. An input, in the template could be:
3Visual Analog Scale (VAS) is a method to measure pain intensity, where the patient is shown a 10 cm line, with “no pain” on one end and “worst possible pain” on the other, and is asked to put a mark on the line signifying their experience.
7 term Born). A entry in the termDefinitions file corresponding to the input example above would be: $Born single The term’s entry in termValues file would be: $Born Australien Bolivia Bosnien Bulgarien Chile Danmark England Eritrea ...
2.1.3 Tree Files
Individual examinations, created as a result of a patient encounter, are in stored in a format known as tree files. The individual examination is created by filling in values in the definition given by the template. This can be described as a tree, where the defining concepts and values are seen as children, so that for the example in Fig. 1, the root node Examination has as its children Patient-data, Anamnesis, and Diagnosis. The specific values entered at a patient encounter become leaf nodes.
2.1.4 Value Aggregates
As the KB grows, it becomes increasingly important to be able to group related values into classes in a hierarchical manner. For example, diseases such as Herpes labialis, Herpetic gingivostomatis, and Shingles can be classified into viral diseases. The ability to categorize values into different classes (or groups) has proven very useful in data analysis in that they reduce the complexity of the data set, facilitating the detection of interesting patterns in the data. Value classes can also be useful for concept formation, e.g., for differentiating between two different forms of a diagnosis. Value classes are constructed using class definitions, which are stored in the knowledge base for future use. As an example, the following class definition S groups smoking habits into three classes:
1 cigarette without filter/day = < 10 cigarettes/day < 5 cigarettes without filter/day = 10 cigarettes/day 10–15 filter cigarettes/day = > 10 cigarettes/day S 20 filter cigarettes/day = > 10 cigarettes/day Occasionally = Non-smoking No = Non-smoking
8 2.2 Requirements for an Ontology for Oral Medicine
Despite the value found in the definitional approach used in MedView, there are several limitations, which lead us to consider using other approaches. One such other approach would be to construct an ontology for oral medicine. Requirements for an ontology for oral medicine, based on experience with the MedView system and interviews with domain experts and developers has been described in [8]. To summarize: • We need the possibility and ability to utilize external sources of knowledge. • The relation between the conceptual models of fundamental clinical concepts in use, e.g., examination templates, lists of approved values for terms, and groups of related terms, and their corresponding concrete entities must be formally examined. • Relations and interactions between different entities of the ontology must be captured, e.g., that a certain answer to a specific question in a given examination template triggers another question. • A stronger typing of elements is needed. We must be able to enforce that a certain term only has numeric values, dates as values, or a certain enumerated domain. • We need to be able to capture different kinds of meta-data, e.g., who is the creator of a specific examination template and what its purpose (scientific or clinical) is. • The localization of data has to be addressed rigorously: How to provide different language-based versions of the defined concepts, definitions and terms? • We need to differentiate between different ‘views’ of the underlying data, to be utilized for e.g., information visualization and intelligent user interfaces, e.g., a patient, time or quantitative oriented view.
3 Ontologies, RDF, and OWL
The bulk of this background section reports on experiences that others have had in using OWL. Before delving into aspects of these observations, regarding for example working with an open world assumption, lack of support for some sought after constructs, and developers’ and users’ unfamiliarity with Description Logics, we first discuss what is meant by ontologies and give a short introduction to RDF and OWL. It is not possible to give a comprehensive presentation of these recommendations here, and we refer to [9], [10], and [11] for more background and details.
3.1 Ontologies
The word ontology has come to be used in many different contexts, and thus has several different meanings. It originates in philosophy, where ontology is the science of describing the kinds of entities in the world and how they are related. A key aim of ontologies in the philosophical sense is a definitive and exhaustive classification of all entities. There are different ways of relating the content of ontologies to the world, and this is rooted in
9 philosophical debates going back to Medieval interpretations of Greek philosophy, on whether or not universals4 exist. In the realist stance, reality is taken to exist independently of human perception, and ontological quality is related to the degree to which the ontology is true of a certain portion of reality [12]. If you instead adopt a cognitive (or conceptualist) bias, you consider categories as cognitive artifacts which are dependent on human perception [13]. Further along on this scale we find nominalism, where it is held that abstract concepts exist only as names, having no independent existence. According to Gruber [14], an ontology is an “explicit specification of a conceptualization.” This definition was modified slightly by Borst [15]: “Ontologies are defined as a formal specification of a shared conceptualization.” From these definitions we glean that ontologies are formal in order to be machine-processable. Further, ontologies define concepts, properties, and relations explicitly, and are thus explicit specifications. They are shared in that they capture knowledge agreed-upon by a group and in that they can be communicated between machines. Finally, ontologies are conceptualizations in that they are an abstract model of some phenomenon in the world.
3.2 W3C Recommendations for the Semantic Web
The Web Ontology Language (OWL) and Resource Description Framework (RDF) are rec- ommendations of the W3C. In addition to the short introduction to these, some trade-offs made in making OWL are described, as these provide some background and framing to the issues brought up in Sec. 3.4 on others’ experiences in using OWL.
3.2.1 RDF
RDF is essentially a data-model. Its basic building block is a subject-attribute-object triple, called a statement. The statements form graphs, where subjects and objects are the nodes connected by attributes as the arcs. An example of a triple is: PeanutAllergy rdf:type somwebOntology#Allergy Here PeanutAllergy (subject) are described as being of rdf:type (attribute) Allergy (value). The rdf in rdf:type should be interpreted as a namespace which would have been defined earlier, and this is where we would find an ontology defining a meaning of type. Fundamental concepts of RDF are resources, properties, and statements. Resources are the things we want to talk about, such as diagnoses, medications, and allergies. Every resource has an URI (Universal Resource Identifier), which can be an URL (Unified Resource Locator) or some other kind of unique identifier. Properties are a special kind of resources, describing relations between resources. In RDF, properties are identified by URIs. The notion of using URIs to identify things and relations is central in giving a global naming scheme [9]. There are several ways to represent the abstract data model more concretely, and RDF is most commonly described in an XML (eXtensible Markup Language) syntax5. The example
4Universals are terms or properties that can be applied to many things, such as blue, three, or horse. 5There are many who object to the RDF/XML serialization, viewing it as too verbose and just plain ugly, and propose that other serializations, such as N3 and N-Triples, should be used instead. However, most RDF
10 above would be represented as follows in RDF/XML:
3.2.2 OWL
RDF and RDF Schema have very limited expressivity. RDF is more or less limited to binary ground predicates, while RDFS is more or less limited to subclass and property hierarchies, with domain and range definitions for properties. The need for a more expressive ontology modeling language lead to a European effort called Ontology Inference Language (OIL)6 and an American effort called DAML-ONT7. These initiatives were combined into DAML+OIL8, which laid the foundation for the W3C Web Ontology Working Group in defining OWL. An OWL ontology can include descriptions of classes, properties, and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e., facts not literally present in the ontology, but entailed by the semantics. With OWL is written using RDF/XML. 6http://www.ontoknowledge.org/oil/ 7http://www.daml.org/2000/10/daml-ont.html 8http://www.daml.org/2001/03/daml+oil-index.html
11 we get vocabulary for describing properties and classes, including relations between classes (e.g., disjointness), cardinality (e.g., ‘exactly one’), equality, richer typing of properties, char- acteristics of properties (e.g., symmetry and transitivity), and enumerated classes [11]. An extension over RDFS is that in OWL you can provide restrictions on how properties behave that are local to a class [16]. OWL is designed to be the standardized and broadly accepted language for describing on- tologies, allowing users to write explicit, formal conceptualizations of domain models. OWL builds on RDF and RDFS9, and uses RDF’s XML-based syntax. There are three increasingly expressive sublanguages of OWL: • OWL Lite supports those who primarily need a classification hierarchy and simple constraint features. • OWL DL (Description Logics) supports those who want the maximum expressiveness without losing computational completeness. • OWL Full is for users who want maximum expressiveness with no computational guar- antees.
3.2.3 Trade-offs in Making OWL
The various efforts that preceded and influenced OWL meant that a number of trade-offs had to be made in devising OWL in a way that it could both have various desirable features and keep enough compatibility with its roots. This section describes a number of these trade-offs, and is based on the article “From SHIQ and RDF to OWL: The Making of a Web Ontology Language” [16], by three of the members of the W3C Web Ontology Working Group, which developed OWL. The authors point out that their views might not be shared by all of the members of the working group. The formal specification of OWL was influenced by Description Logics, the language’s surface structure was influenced by the frames paradigm [18], and the RDF/XML exchange syntax was influenced by requirements of compatibility with RDF. Drawing on experience from De- scription Logic research on the complexity-tractability landscape10, the set of constructors and axioms supported by OWL were chosen to balance the typical application’s expressive requirements with a requirement for reliable and efficient reasoning support. This lead to the choice of basing the design of OWL on the SH family of Description Logics. The SH fam- ily of Description Logics [20] includes support for boolean connectives (intersection, union, and complement), restrictions on properties, transitive properties, and a property hierarchy. Description Logics research has also shown that including the use of datatypes can lead to complexity and undecidability issues. This is dealt with by strictly separating the interpreta- tion of datatypes and values from the interpretation of classes and individuals, which is why OWL has separate datatype and object properties.
9It would have been preferable that OWL was an extension of RDF and RDFS, but such a layering cannot be realized in a straightforward manner [17]. 10Given the trade-off between the expressiveness of the representation language and the tractability of the associated reasoning task, there has been much work seeing how a given restriction in expressiveness affects reasoning procedures. Finding these interesting points in the tradeoff between tractability and expressiveness gives rise to a sort of complexity-tractability landscape [19].
12 To increase readability and general ease of use, a surface syntax based on the frames paradigm is provided. In frames, information about each class is grouped together, making ontologies easier to read and understand, especially for those not familiar with Description Logics. The abstract syntax of OWL is influenced by frames in general and by the design of OIL in particular. A class axiom in OIL consists of a compound construction of the name of the class, whether it is a ‘partial’ (indicating that the axiom is asserting a subclass) or ‘complete’ (indicating that we are dealing with an equivalence relation) description, and a sequence of property restrictions and names of more general classes. Given the many requirements, three viable solutions were found, each of which satisfy almost all of the requirements, and these are the three versions of OWL briefly described above: OWL DL, OWL Lite, and OWL Full. The improvement that OWL Lite gives in tractability over OWL DL11, comes with relatively little loss in expressive power, but the syntax is more restricted. However, this restricted syntax can be worked around, so that all OWL DL de- scriptions can be captured in OWL Lite, except those which individual names or cardinalities greater than one.
3.3 Tools for Working with OWL and RDF
There exist various tools for constructing ontologies and developing software based on these, with varying levels of functionality and stability. A few of the most commonly used tools for editing, Application Programming Interfaces (APIs), visualization, reasoning, and validation are presented here.
3.3.1 Editors
For creating an ontology, a text or graphical ontology editor can be used. Such tools can also be used for creating instances of an ontology. Of the graphical editors, Prot´eg´e12 is one of the more popular. Prot´eg´eis an open-source knowledge-base program developed at Stanford Medical. The application’s history dates back to the 1980’s, though the system’s capabilities have changed over time. Prot´eg´ehas an OWL-plugin, which can be used to create OWL ontologies as well as adding instance data. A practical guide to using this is the Prot´eg´e- OWL tutorial by Horridge et al. [21]. Figure 2 shows a screenshot of the application, with the an earlier version of the SOMWeb ontology loaded. In Prot´eg´e, user interfaces, such as input forms, can be generated automatically from the ontological structure.
3.3.2 Application Programming Interfaces
There also exist several APIs for writing programs to interact with OWL and RDF content. Jena13 is a Java framework providing a programmatic environment for RDF, RDFS, and OWL. It is open source and has evolved from the work of Hewlett Packard Semantic Web
11Key inferences can be computed in worst case exponential time in OWL Lite while for OWL DL this is NExpTime. 12http://protege.stanford.edu/ 13http://jena.sourceforge.net/
13 Figure 2: The Prot´eg´eapplication with an earlier version of the SOMWeb ontology loaded and the instance view open. The columns are for, from left to right: browsing classes, browsing individuals, and editing individuals.
Programme. It can be used for reading and writing RDF in its RDF/XML, N3, and N-Triples serializations, has classes for manipulating RDF models and OWL ontologies, and in-memory and persistent storage.
3.3.3 Visualizers
Visualizing ontologies is an active research area (see for example [22, 23]). Many of the existing applications are based on AT&T’s GraphViz graph visualization program. One of the more common is IsaViz,14 which is a visual environment for browsing and authoring RDF models represented as graphs.
3.3.4 Reasoners
There are several different inference engines, aiming to support different OWL dialects. Two of these are Jess (Java Expert System Shell)15 and RACER (Renamed ABox and Concept Expression Reasoner) [24], which both can be used with Prot´eg´e.
14http://www.w3.org/2001/11/IsaViz/ 15http://herzberg.ca.sandia.gov/
14 3.3.5 Validators
As will be discussed in Sec. 3.4.3, there are many issues making validation difficult on the Semantic Web. There are several validators that check the well-formedness of RDF and OWL files, such as the W3C RDF Validation Service16 and the WonderWeb OWL Ontology Validator.17 Eyeball18 is a library and command-line tool for checking RDF models for common problems, which often result in technically correct but implausible RDF. Eyeball uses user-provided schema files and makes various closed-world assumptions. It can check for, among other things, properties and classes which are unknown with respect to the schemas, untyped re- sources, and subjects having a different number of values than you’d expect from the cardi- nality restriction on the property.
3.4 Reported Experiences in Using OWL and RDF
Given that OWL and RDF are quite recent recommendations, experience reports on how people have used them and what benefits they have had, along with what difficulties have been found, are of interest. Quite a few such experience reports were published in association with the “OWL: Experiences and Directions” Workshop held in conjunction with the International Semantic Web Conference in Galway, Ireland 2005. The goal of this workshop is to form a meeting place for practitioners in academia and industry, as well as tool developers and other interested parties to “describe real and potential applications, to share experience and to discuss requirements for language extensions/modifications.”19 Many of the problems that have been indicated in experience reports are ones that the creators of OWL were aware of when the recommendation was published, some of which stem from the trade-offs necessary in constructing OWL (see Sec. 3.2.3). Also, future extensions were sug- gested then, such as better support for modules and imports, defaults, closed world assump- tion, unique names assumption, procedural attachment20, and support for rules [16]. The issues that come up in the experience reports and which are discussed here are the open world assumption, the no unique names assumption, validation, no support for default reason- ing, value ranges, reusing other ontologies, ontology reuse, imports, use of instances, OWL’s XML syntax, use of domain and range, OWL’s sublanguages, and problems for developers new to OWL.
3.4.1 Open World Assumption
Under a closed world assumption (CWA), any ground atomic sentence not asserted true is assumed to be false. This manner of treating information provided as complete is common in databases and is also the way people reason in many situations [25]. However, the Semanitc
16http://www.w3.org/RDF/Validator/ 17http://phoebus.cs.man.ac.uk:9999/OWL/Validator 18http://jena.sourceforge.net/Eyeball/ 19http://www.mindswap.org/2005/OWLWorkshop/ 20defining meaning by attaching a piece of code that when executed computes the meaning of the term.
15 Web has an open world assumption (OWA), meaning that you cannot assume that the absence of a statement means that it is false. As a result of new information, something that we previously had no information about might become either true or false. Having an OWA on the Semantic Web seems natural, since there can always be resources which we have not yet found. Also, the OWA seems particularly fitting in a domain “charac- terized by information that is incomplete either because of limits in the state of knowledge or omissions inherent in curation processes” [26], as is often the case in biomedicine. However, there are several problems associated with the open world assumption. One such problem is that there is no way to require that information be supplied. It would be desirable to have the ability to “express that within a given scope, certain re- strictions must be verifiable with the assertions expressed” [26]. This given scope could for example be assertions in a single file or at a single URL. Ruttenberg et al. note that while this means “closing the world” over a certain scope, it does not have to stay closed and does not affect the semantics of the document outside of the scope. Another problem is the lack of a convenient manner to assert that information is complete [26].
3.4.2 No Unique Names Assumption
The no unique names assumption means that we cannot assume that resources refer to dif- ferent things just because they are named differently. As with the CWA, there is a unique name assumption (UNA) for most databases, and most people make the assumption that when things have different names they refer to different things. While having a no unique names assumption is advantageous on the (Semantic) Web as a whole, where it is likely that the same concept can be named differently by different people, it is less useful within a single source of information. To ensure that different individuals and classes are recognized as such, their inequality has to be asserted explicitly using owl:differentFrom:
16 It has been proposed [26] that this is another situation where it would be useful to have a concept of scope, with the ability to assert that all names within a scope represent different things. While the OWL DL construct owl:AllDifferent can be used to make a set of individuals mutually distinct, there is no such construct to make a set of classes mutually disjoint from each other. As the number of classes can become quite large, the number of disjoint axioms becomes problematic. Knublauch et al. [27] therefore recommend an owl:AllDisjoint construct be added to the OWL specification.
3.4.3 Validation
Fundamental features of the Semantic Web, such as the open world assumption, no unique names assumption, multiple typing, and support for inference mean that there are problems in providing the sort of validation a schema-language a user might be expecting. The ‘S’ in RDF Schema can be misleading, since RDFS is not a schema language in the traditional sense, where you can define when input data is complete and correct enough to be processed, and neither is OWL. For example, we may want to express a constraint like “every examination must have a date”, and say something like:
17
18 That a general OWL processor behaves in this way doesn’t mean that a specialist validator can’t be created, which treats a document as a complete closed description, assumes unique names, and warns when an object is inferred to have a type not known through its supertypes of its declared types. These additional assumptions would be useful for input validation but not for the general case.
3.4.4 No Support for Default Reasoning
We can distinguish between universals, properties which are true for all instances, and gener- ics, properties which hold “in general”. Universals are easily expressible in first order logics, but for generics, which can capture much of our commonsense knowledge, we need to go beyond first order logic. Default reasoning means that some general but not universal fact is applied to a particular individual. While regular deductive reasoning is monotonic, mean- ing that adding a new fact to a knowledge base will only produce additional beliefs, default reasoning is nonmonotonic, meaning that new facts may invalidate previous beliefs [19]. The simplest formalization of default reasoning is closed-world reasoning, where anything unmentioned is assumed false. It is non-monotonic as a sentence assumed false could later be determined to be true. Other ways of handling default reasoning are circumscription, where the abnormality predicates (which tell when a default is not applicable) are minimized, and default logic. In default logic a default theory is defined, consisting of a set of first-order sentences and a set of default rules, which specify which assumptions can be made and when [19]. As already stated, OWL does not have a closed-world assumption. OWL gives no mechanism for default reasoning; it has no built in support for reasoning about that which is ‘typically’ or ‘generally’ true. When there is a limited number of exceptions, this can be handled by making logical statements more specific. But when patterns get more complex this approach leads to combinatorial explosions [28]. Being able to say that something “may occur” is also needed by many users, e.g. a drug “may have side effects” [27].
3.4.5 Value Ranges
While OWL DL is based on what is seen as a highly expressive description logic, there are a number of areas where the OWL language lacks the expressive power required by its users [27]. One of the things left out, which mailing lists such as the one maintained by the Prot´eg´e team see a lot of complaints about, is the poor representation of numeric expressions. Typical examples are needing to express the length between 2mm and 5mm or an age greater than 18. Being able to declare ranges in this way is needed to classify individuals and to express class definitions such as “Adult”. Knublauch et al. [27] argue that even if there cannot be full support for reasoning with user-defined datatypes within existing tools, there should at least be provided a standard mechanism for expressing such constraints in the OWL specification. This could, for example, be used to validate user input on forms.
19 3.4.6 Reusing Other Ontologies
Reuse of ontologies developed by others is often held as a goal and advantage of the Semantic Web. The working group for designing the biopathways ontology [26], which was developed for exchanging biological pathway information, had several ontologies that would be of interest for reuse. However, few of these are provided in OWL DL, which the developers had chosen to use. One option for getting around this, is representing terms from such ontologies as values of two properties, where the name of the vocabulary from which the term was taken is given by one property and the other identifying the term in the vocabulary. An alternative method discussed is translating the ontologies needed into OWL, and then import them. There is also the issue of how to treat changes in the external ontology. In the case of having references to terms in the external ontology, the identifiers may become incorrect as terms in the external ontology are deleted or deprecated. On the other hand, if the translation approach is used, new and changed terms are not available until the translation is updated.
3.4.7 Imports
In RDFS ontologies it is common practice to establish references between models by simply declaring namespaces. In OWL, just declaring a namespace for an external ontology is insuf- ficient to import it [27]. When importing an OWL ontology all statements of the imported ontology are included in the importing ontology. This can mean problems both in perfor- mance, if the imported ontology is large, and that an ontology editor has to differentiate during editing between the parts of the importing and the imported ontology [29]. Knublauch et al. [27] also bring up that with OWL and RDF it is not currently clear how to use namespace and import mechanisms for structuring ontologies into public and private parts, or interface and implementing ontologies. As these are sought after functions, they believe that stronger guidelines are needed for building modular ontologies.
3.4.8 Using Instances
When first encountering OWL and RDF, it seems intuitive to let RDF individuals assume the role of records and the ontology play the role of schema, specifying what kinds of data can be entered for the records, and so on. As some of the members on the working group for the biopathways ontology put it [26]: “Database designers don’t generally spend much time thinking about denotation and truth, but RDF and OWL impose a sort of moral imperative to address these issues.” They continue by reflecting on the challenge of figuring out what the correspondence is between their model and the world, and that on the importance of this when designing an exchange language. If the mapping of classes and instances to biological phenomena is not defined carefully, each information provider will have their own mapping. It would then be up to each client using several sources to determine how to relate these, thus defeating the purpose of creating an exchange format. This defining of the correspondence between classes and instances, and their objects in the world, was not something they had anticipated. In the development of the biopathways ontology, Ruttenberg et al. [26] found that the issue
20 was first raised in trying to understand what it meant to make reference to a particular physical entity instance in more than one reaction. There is an inclination to reuse instances, as they are rather large and include information such as synonyms and chemical structure. There is, on the other hand, a feeling that when you refer to the same instance, that means that you are referring to the same thing in the world. If an instance does not designate a single thing, would it not be more appropriate to use classes to represent them? But in OWL DL there are limits on the ways classes can be related to one another. Ruttenberg et al. conclude that they have had trouble deciding where to draw the line and that more guidance on this topic is needed.
3.4.9 The XML Syntax
The default serializations of RDF, and thus also of OWL, is that of XML. RDF/XML is very verbose, as can be seen in this example (from [16]). A class as it would be described in a Description Logic syntax
Student = Person ⊓≥ 1 enrolledIn
(a Student is a Person who is enrolledIn at least 1 thing), would most canonically be written in the following way in the OWL RDF/XML syntax:
3.4.10 Use of Domain and Range
Two constructs for property axioms that OWL supports are rdfs:domain and rdfs:range. Syntactically, rdfs:domain links a property to a class description, and an rdfs:domain axiom asserts that subjects of such property statements must belong to the class extension of the indicated class description. Likewise, rdfs:range links a property to a class description or data range, and an rdfs:range axiom asserts that the values of this property must be a part
21 of the class extension of the class description or to the data values in the specified data range [30]. For example, we might want to say that the property hasTopping has as its domain instances of the class Pizza:
3.4.11 OWL’s Sublanguages
Knublauch et al. [27] find that most users mostly see OWL as a more expressive variant of RDFS. That is, they use it to define classes, properties, and individuals for sharing on the Web. However, the expressivity of RDFS is greatly extended by OWL. Many users use restrictions to “express what they see as necessary conditions of a class, and use the owl:imports mechanism to link ontologies to each other. Such ontologies carry little semantics that could be exploited by reasoners.” Many ontology designers also ignore the open-world semantics and the lack of the unique names assumption. There are OWL DL supporters who hold that without a clean logical foundation, the Semantic Web will not make sense. Knublauch et al. [27] argue that there are valid use cases for utilizing only subsets of OWL. Which OWL dialect is chosen is decided by whether users build primarily taxonomies, data structures, or rich knowledge models. As an example, an ontology for e-commerce might only contain classes to describe customers and their address and phone number and an initial version of the ontology does not need advanced OWL constructs beyond range and domain statements. Semantically simple ontologies such as this is enough to make a Web application able to generate user interface forms from class definitions and describe schema useful for integration. Then, later in the ontology’s life cycle, additional expressivity
22 can be added as developers find they need it. Knublauch et al. [27] see this as a major selling point for OWL, one often ignored by proponents of DL: “The breadth of the OWL language offers a migration route from entry level, hand-crafted taxonomies of terms, to well defined, normalized ontologies capable of supporting reasoning.”
3.4.12 Problems for Developers New to OWL
There are several topics which are especially difficult for computer professionals not familiar with OWL, such as rdfs:domain, the open-world assumption, and the lack of the unique names assumption. As mentioned above, Ruttenberg et al. [26], tells of how OWL DL was used for exchanging biological pathway information. In the conclusion, they state that: However, in spite of the group’s experience in biological knowledge representation, bioinformatics, software engineering, and database design, it encountered some challenging problems. They also state that they believe that problems similar to those described by them (e.g., not being used to the OWA, issues with reuse, insufficient validation, and whether to use instances or classes) will be common as more groups chose to use Semantic Web technologies. Knublauch et al. [27] argue that it is important that the OWL community be clear about the differences between object-oriented approaches and DL, especially since computer profession- als trying out OWL will often have experience of object-oriented languages. Persons from the field of knowledge modeling are often familiar with frame-based systems. One difference is in the rdfs:domain construct. While domains are in effect mandatory in many frame based sys- tems, in OWL domain constraints are “axioms from which inferences may be drawn”. Also, object-oriented attributes must belong to certain classes, but OWL properties often have no domain statements at all.
4 Design and Development of the SOMWeb Ontologies
The design of the SOMWeb ontologies takes the MedView knowledge representation and content as a starting point. The knowledge model of MedView, as described in Sec. 2.1, includes the representation of (1) individual examination records, (2) examination templates describing the pattern from which the individual records are created and which are used in constructing graphic input forms, (3) value lists from which values can be chosen when filling out these forms, and (4) aggregates of values created and used when analyzing data from the examination records. Much of the focus of this work has been on representing the second of these, the examination templates. The models for the examination templates were also central in MedView. In the following subsections we describe how the examination templates can be represented in OWL (Sec. 4.3), followed by a description of how the value lists are remodeled (Sec. 4.4). For each of these, we begin by describing the general structure, followed some of the design decisions made. We then give short descriptions of how examination records (Sec. 4.5) and aggregates
23 (Sec. 4.6) are represented in SOMWeb. Finally, we discuss some matters related to end-user input (Sec. 4.7). But first we further explain the correspondence between the old, MedView model and the new, SOMWeb model and a brief account of considerations in the translation of the actual content of MedView is provided (Sec. 4.2).
4.1 Relations between Structures of MedView and SOMWeb
Figure 3 shows how the new structures of the SOMWeb representation can be mapped to the old ones of the MedView representation. The examination templates previously described using XML are now described using OWL. Further, there is an OWL file describing general examination structures, which can be compared to the DTD used to describe the XML exami- nation files. That which was previously described by the term definitions and term values files is now described using one OWL file (though there could be several such files, representing multiple sets of term definition and term value files). Aggregates are, just as before, stored separately, but in OWL. Finally, the individual examinations previously in tree files are now in RDF files.
4.2 Development Process
It was decided early on that the current representations should be used as a starting point. To begin with, these were used as inspiration for more or less constructing the examination template by hand using Prot´eg´e. However, given the number of terms and term values in the most commonly used template and term-value file, this turned out to be a lot of more or less manual work. Because of this, we decided to write a program that uses the Java classes in MedView that reads templates, term values, and term definitions. From the internal Java representation of these, we create the appropriate OWL constructs, based on the proposed structure and design decisions described below. This converting program uses the Jena API. The advantage of the more manual approach was more control over the process, especially of the parts of the MedView representation believed not to be conveniently translatable into OWL. However, once a more automatic approach was taken, it turned out that these cases were quite few.
4.3 Designing the Examination Template Ontologies
An examination template describes what should be included in an examination record. It is used both to construct the form-based user interfaces and to structure the actual record. As mentioned above, in MedView, examination templates were stored as XML documents, and there is one DTD to describe general features of such templates. In SOMWeb, the examination templates are represented using OWL, and there is also one OWL document giving a general description of what is included in an examination template. All named entities in RDF (and thus in OWL, which is based on RDF) are referred to by a URI, so all classes, properties, and instances in our examination template and value list ontologies are assigned URIs. The general examination description OWL file has its own namespace, referred to by each MedView examination-template, which are separate OWL
24
G G
e n e a l e x a m i n a t i o n e n e a l e x a m i n a t i o n
r r
( D T D ) ( O W L )
e c i p t i o n e c i p t i o n
d s r d s r
E x a m i n a t i o n E x a m i n a t i o n
t e m p l a t e t e m p l a t e
( X M L ) ( O W L )
t e m V a l u e
r s
D
t e m e fi n i t i o n V a l u e l i t
r s s
( )
t x t
( ) ( O W L )
t x t
A g g e g a t e A g g e g a t e
r s r s
( ) ( O W L )
t x t
E x a m i n a t i o n E x a m i n a t i o n
e c o e c o
r r d s r r d s
( ) ( R D F )
t e e fi l e fi l e
r s s
Figure 3: A comparison between the MedView and SOMWeb representations, with the previous MedView structures to the left and the new SOMWeb structures to the right. The most general aspects of examination templates are described in a DTD in the MedView version, and in OWL in SOMWeb. There is only one such general description. The examination templates were stored in XML files in MedView, and are now stored in OWL files. There can be many different examination templates, corresponding to different examination situations (such as one for regular visits and one for those remitted for fear of dentists). The terms and values that are used by the examination templates and in the individual examinations, are in the MedView representation kept in a termValue and corresponding termDefinition file, which are stored in text files of a certain format (see Sec. 2.1). These are now stored as classes and individuals in an OWL file. Just as there could be different sets of termValue and termDefinition files, it is possible to have different value list OWL files. Aggregates were previously stored in a specific format for aggregate definitions, in separate files. They are now represented in OWL, in separate files. Finally, the examination records are stored as tree files in the old version (see Sec. 2.1.3), and are now stored as RDF files.
25
C
h a s E x a m i n a t i o n a t e g o r y E x a m i n a t i o n
E x a m i n a t i o n
C
a l l V a l u e s F r o m a t e g o r y
A s s o c i a t e d w i t h b o t h
D a t a t y p e I n p u t P r o p e r t y
D a t a t y p e I n p u t P r o p e r t i e s a n d
O b j e c t I n p u t P r o p e r t i e s a r e :
i n s t r u c t i o n P r o p e r t y