<<

Representing Knowledge in Oral Medicine – Remodeling Clinical Examinations Using OWL∗

Technical Report HS-IKI-TR-06-009 School of Humanities and Informatics, University of Sk¨ovde

Marie Gustafsson [email protected]

School of Humanities and Informatics University of Sk¨ovde, Box 408, SE-541 28 Sk¨ovde, Sweden

Department of Computer Science and Engineering Chalmers University of Technology, SE-412 96 G¨oteborg, Sweden

Abstract This report describes the remodeling of the representation of clinical examinations in oral medicine, from the previous proprietary format used by the MedView project, to using the World Wide Web Consortium’s recommendations Web Ontology Language (OWL) and Resource Description Framework (RDF). This includes the representation of (1) ex- amination templates, (2) lists of values that can be included in individual examination records, and (3) aggregates of such values used for e.g., analyzing and visualizing data. It also includes the representation of (4) individual examination records. We describe how OWL and RDF are used to represent these different knowledge components of MedView, along with the design decisions made in the remodeling process. These design decisions are related to, among other things, whether or not to use the constructs of domain and range, appropriate naming in URIs, the level of detail to initially aim for, and appropriate use of classes and individuals. A description of how these new representations are used in the previous applications and code base is also given, as well as their use in the Swedish Oral Medicine Web (SOMWeb) online community. We found that OWL and RDF can be used to address most, but not all, of the requirements we compiled based on the limitations of the MedView knowledge model. Our experience in using OWL and RDF is that, while there is much useful support material available, there is some lack of support for important design decisions and best practice guidelines are still under development. At the same time, using OWL gives us access to a potentially beneficial array of externally developed tools and the ability to come back and refine the knowledge model after initial deployment.

∗The work presented in this report was supported by the Swedish Agency for Innovation Systems.

1 Contents

1 Introduction 4 1.1 Overview ...... 5

2 Knowledge Representation in Oral Medicine 5 2.1 MedView ...... 5 2.1.1 TheDefinitionalApproach ...... 6 2.1.2 Storing Templates and Values ...... 7 2.1.3 TreeFiles ...... 8 2.1.4 ValueAggregates...... 8 2.2 Requirements for an Ontology for Oral Medicine ...... 9

3 Ontologies, RDF, and OWL 9 3.1 Ontologies...... 9 3.2 W3C Recommendations for the Semantic Web ...... 10 3.2.1 RDF...... 10 3.2.2 OWL ...... 11 3.2.3 Trade-offs in Making OWL ...... 12 3.3 ToolsforWorkingwithOWLandRDF ...... 13 3.3.1 Editors ...... 13 3.3.2 Application Programming Interfaces ...... 13 3.3.3 Visualizers ...... 14 3.3.4 Reasoners...... 14 3.3.5 Validators...... 15 3.4 Reported Experiences in Using OWL and RDF ...... 15 3.4.1 OpenWorldAssumption ...... 15 3.4.2 NoUniqueNamesAssumption ...... 16 3.4.3 Validation...... 17 3.4.4 NoSupportforDefaultReasoning ...... 19 3.4.5 ValueRanges ...... 19 3.4.6 ReusingOtherOntologies ...... 20 3.4.7 Imports ...... 20 3.4.8 UsingInstances...... 20 3.4.9 TheXMLSyntax...... 21 3.4.10 UseofDomainandRange...... 21 3.4.11 OWL’sSublanguages...... 22 3.4.12 Problems for Developers New to OWL ...... 23

4 Design and Development of the SOMWeb Ontologies 23 4.1 Relations between Structures of MedView and SOMWeb ...... 24 4.2 DevelopmentProcess...... 24 4.3 Designing the Examination Template Ontologies ...... 24 4.3.1 The Structure of the Examination Ontologies ...... 26 4.3.2 DesignChoices ...... 27 4.4 Designing the Value List Ontology ...... 32

2 4.4.1 Structure of the Value List Ontology ...... 33 4.4.2 DesignChoices ...... 36 4.5 Representing Individual Examinations ...... 37 4.5.1 Validation...... 37 4.6 RepresentingAggregates...... 38 4.7 End-userInput ...... 39

5 Using the Ontologies 39 5.1 Constructing Input Forms from OWLExaminationTemplates...... 40 5.2 MedViewDatahandling ...... 41 5.2.1 HandlingExaminations ...... 42 5.2.2 HandlingTerms ...... 42

6 Discussion 43 6.1 Results in Relation to the Requirements for an OralMedicineOntology ...... 43 6.2 OurExperiencesinUsingOWL...... 44 6.3 Benefits and Constraints of Starting from an Existing Model ...... 46 6.4 End-User Control and Standardizations ...... 46 6.5 StandardsinMedicine ...... 47 6.5.1 Comparison with the openEHR Approach ...... 47 6.5.2 ExternalClassifications ...... 48

7 Conclusions 48

8 Future Work 49

A MedView XML Examination Template for Meeting Consultation 54

B SOMWeb OWL Examination Template for Meeting Consultation 55

C Part of the SOMWeb Value List 59

D Example Examination Instance 60

3 1 Introduction

Basing clinical decisions on finding, evaluating, and using the latest research results is an essential premise of evidence-based medicine (EBM) [1]. A crucial part of the practice of EBM is the integration of the expertise of the individual clinicians with the best clinical evidence obtainable from external sources [2]. Processes necessary for EBM, such as the collection, analysis, validation, sharing, and harmonization of clinical knowledge, can in part be supported by information technology (IT). The MedView project [3] has aimed to provide IT-support for evidence-based oral medicine. This has been done by equipping the clinicians with a wide range of software tools, assisting in the various processes of EBM, and by providing a formal knowledge model on which to base these tools. However, as this model is only used within the MedView project, it is difficult to reuse external knowledge sources and to share the data collected by MedView tools with others. There is also a need to expand the current model and to reexamine how to best conceptualize examination data in oral medicine. Such an undertaking is also relevant for those areas of medicine that overlap with oral medicine. In knowledge representation, the term ontology is used to denote the definition of concepts and relations between them, for a given domain of interest. The Web Ontology Language1 (OWL) is a recommendation of the World Wide Web Consortium (W3C), along with the related Resource Description Framework2 (RDF). We want to investigate the development of ontologies in oral medicine using these recommendations, which will be studied by taking the previous representation of MedView as a starting point. The knowledge model of MedView includes the representation of (1) individual examination records, (2) examination templates describing the pattern from which the individual records are created and which are used in constructing user input forms, (3) value lists from which values can be chosen when filling out these forms, and (4) aggregates of values created and used when analyzing data from the examination records. Also, parts of the MedView applications will be adapted to handling the new OWL and RDF representations, which will also be used in the SOMWeb (Swedish Oral Medicine Web) online community. This online community serves as support for the discussion of interesting and difficult cases in oral medicine among geographically dispersed clinics in Sweden. The community is further described in [4]. In addition to the contributions of the developed ontologies, and the use of these in the online community, this work also serves as an experience report of using the RDF and OWL recommendations. In this report, we will refer to the original, definitional approach described in Sec. 2.1 as the MedView representation. The new OWL- and RDF-based representation, to be described in Sec. 4, will be denoted the SOMWeb representation.

1http://www.w3.org/2004/OWL 2http://www.w3.org/RDF

4 1.1 Overview

We begin by describing features of the MedView representations in Sec. 2.1, followed by requirements for an ontology of oral medicine in Sec. 2.2. In Sec. 3, brief introductions to ontologies, RDF, and OWL are given, followed by a longer recount of others’ experiences in working with OWL. Section 4 gives details of the remodeling of the MedView knowledge model using OWL and RDF, for both the developed ontologies and the design decisions made. A description of how the ontologies are used in the datahandling of MedView applications is given in Sec. 5. In Sec. 6, the discussion, we compare the developed ontologies to the requirements of Sec. 2.2, as well as to standards for representing patient records and medical classifications. We also discuss our experiences in using OWL, the constraints and benefits of starting with an existing knowledge model and code base, and the trade-offs in maintaining user-control, while aspiring for standardization, reuse, and formal knowledge representation. Finally, in Sec. 7 we provide conclusions of this work and in Sec. 8 give suggestions for future work.

2 Knowledge Representation in Oral Medicine

2.1 MedView

The main goal of MedView, since its inception in 1995, has been to support evidence-based oral medicine. This includes developing models, methods, and tools to aid clinicians in their daily work and research. At the heart of the work is how computer technology can be used to aid clinicians in systematically learning from the gathered clinical data. This learning is supported by a suite of tools. The clinicians specify what data to gather in a clinical examination by defining an examination template (FormEdit), along with lists of values that can be used in examination record (TermEdit). These templates are then used to gather data in e.g., an application for creating and viewing examination records (MedRecords) and in an online tool for collecting data (mForm). The clinicians can then visualize and analyze the collected patient data (mVisualizer). Natural language summaries of examination records can also be generated (MedSummary). The knowledge base built in the MedView project currently contains data from over 15600 examination records, covering more than 6200 different patients. The main knowledge base is located at the clinic of Oral Medicine, faculty of Odontology, G¨oteborg University. The various clinics within the Swedish Oral Medicine Network (SOMNet) have local knowledge bases containing the examination records collected at each clinic. The contents of these local knowledge bases are added regularly to the knowledge base in G¨oteborg so that the entire amount of data collected can be accessed through one common knowledge base. At present, the clinical knowledge used in MedView is divided into examination templates, value lists, and value classes. In addition to these basic knowledge structures, there are also definitions for the layout of text and slot-fillers used in the generation of summaries of examination records in the MedSummary application [5], and definitions for the structure and layout of templates used in the acquisition of examination data.

5 Term Definition Examination = {Patient-data, Anamnesis, Diagnosis,...} Patient-data = {Patient-code, Age, Born, ...} Anamnesis = {Medication, Allergies, Smoke, Alcohol, ...} Patient-code = {”1234567890”} Age = {”37”} Born = {”Sweden”} . . . .

Figure 1: A sample examination in MedView, using the definitional approach. The Examination term is defined by the set of terms that includes Patient-data, Anamnesis, and Diagnosis. The terms Patient-data and Anamnesis are then defined by sets of other terms. Among the terms used in defining Patient-data is the term Age, which in this examination is defined by the value 37. The quotation marks indicate values which have been added to the template to make an examination record.

MedView has a declarative model which is based on the assumption that definitions are central tools in all attempts to provide a precise and formalized representation of knowledge [6]. We begin by explaining how this definitional approach has been realized in MedView, followed by a description of how templates and value lists are stored. As the knowledge model in MedView has become more complex, additional concepts, such as aggregates of values, have been provided to be able to structure related values. After a short description of such aggregates, we move onto the conclusion that it would be useful to consider other forms of knowledge representation for MedView, such as ontologies, and a list of requirements for such an ontology is put forth.

2.1.1 The Definitional Approach

Clinical data in MedView has thus far been seen as definitions of clinical terms [7], where a definition is seen as a collection of equations, where the left-hand side (atoms) are defined in terms of the right-hand sides (conditions). In the case of MedView examination templates, the atomic data unit is an examination. Each examination is a set of terms, and a term is defined either as a set of other terms or as a set of values. In this way, abstract clinical concepts, e.g., examination, diagnosis, and patient data, are given by definitions of collections of specific clinical terms. An example of this is given in Fig. 1. Going from an examination template to an individual patient record, the definitions provided by the template are elaborated on by further filling in values for terms. For example, the terms status, direct, mucos and palpation are all part of the general template that defines a particular clinical examination protocol. A concrete instance of an examination template—an examination record—is given by defining terms like Mucos-site and Mucos-col in terms of observed values, e.g., {l12} and {white, brown} respectively. The knowledge base (KB) also contains knowledge structures describing general domain knowledge. Values for the terms defined in templates are taken from formalized lists of valid values. These value lists are given as value definitions, which are stored in the knowledge base along with the examination records and templates.

6 2.1.2 Storing Templates and Values

The structure of the examination template is stored as XML. The general structure of the form is: EXAMINATION FORMINFO AUTHOR TITLE ... CATEGORY INPUT INPUT ... CATEGORY INPUT ...... An example XML template is given in App. A. The example template is for a meeting consultation, which is for recording data from teleconference meetings. In terms of XML, the root of the template is an examination element, which contains several category elements. Each of these category elements contains a name, a description, and several inputs. Each input has several attributes, such as type and whether it is required, its name, description, and an instruction to be displayed to the clinician entering patient data. An input, in the template could be: Treat-sugg Suggested action/treatment Type here indicates how the values can be chosen for the input. These types can be: • single – only one value can be chosen (e.g., country of birth) • multiple – more than one value can be chosen (e.g., medications) • text – free text (e.g., a note field) • question – a composite answer, consisting of a number and for example an amount and/or unit (e.g., smoking habits) • VAS – a number between between zero and ten (e.g., indicating pain or treatment success)3 Values are described using two files; termDefinitions gives the possible types of values for each term, while termValues gives the actual possible values (e.g., a list of countries for the

3Visual Analog Scale (VAS) is a method to measure pain intensity, where the patient is shown a 10 cm line, with “no pain” on one end and “worst possible pain” on the other, and is asked to put a mark on the line signifying their experience.

7 term Born). A entry in the termDefinitions file corresponding to the input example above would be: $Born single The term’s entry in termValues file would be: $Born Australien Bolivia Bosnien Bulgarien Chile Danmark England Eritrea ...

2.1.3 Tree Files

Individual examinations, created as a result of a patient encounter, are in stored in a format known as tree files. The individual examination is created by filling in values in the definition given by the template. This can be described as a tree, where the defining concepts and values are seen as children, so that for the example in Fig. 1, the root node Examination has as its children Patient-data, Anamnesis, and Diagnosis. The specific values entered at a patient encounter become leaf nodes.

2.1.4 Value Aggregates

As the KB grows, it becomes increasingly important to be able to group related values into classes in a hierarchical manner. For example, such as Herpes labialis, Herpetic gingivostomatis, and Shingles can be classified into viral diseases. The ability to categorize values into different classes (or groups) has proven very useful in data analysis in that they reduce the complexity of the data set, facilitating the detection of interesting patterns in the data. Value classes can also be useful for concept formation, e.g., for differentiating between two different forms of a diagnosis. Value classes are constructed using class definitions, which are stored in the knowledge base for future use. As an example, the following class definition S groups smoking habits into three classes:

1 cigarette without filter/day = < 10 cigarettes/day  <  5 cigarettes without filter/day = 10 cigarettes/day   10–15 filter cigarettes/day = > 10 cigarettes/day S   20 filter cigarettes/day = > 10 cigarettes/day  Occasionally = Non-smoking   No = Non-smoking 

8 2.2 Requirements for an Ontology for Oral Medicine

Despite the value found in the definitional approach used in MedView, there are several limitations, which lead us to consider using other approaches. One such other approach would be to construct an ontology for oral medicine. Requirements for an ontology for oral medicine, based on experience with the MedView system and interviews with domain experts and developers has been described in [8]. To summarize: • We need the possibility and ability to utilize external sources of knowledge. • The relation between the conceptual models of fundamental clinical concepts in use, e.g., examination templates, lists of approved values for terms, and groups of related terms, and their corresponding concrete entities must be formally examined. • Relations and interactions between different entities of the ontology must be captured, e.g., that a certain answer to a specific question in a given examination template triggers another question. • A stronger typing of elements is needed. We must be able to enforce that a certain term only has numeric values, dates as values, or a certain enumerated domain. • We need to be able to capture different kinds of meta-data, e.g., who is the creator of a specific examination template and what its purpose (scientific or clinical) is. • The localization of data has to be addressed rigorously: How to provide different language-based versions of the defined concepts, definitions and terms? • We need to differentiate between different ‘views’ of the underlying data, to be utilized for e.g., information visualization and intelligent user interfaces, e.g., a patient, time or quantitative oriented view.

3 Ontologies, RDF, and OWL

The bulk of this background section reports on experiences that others have had in using OWL. Before delving into aspects of these observations, regarding for example working with an open world assumption, lack of support for some sought after constructs, and developers’ and users’ unfamiliarity with Description Logics, we first discuss what is meant by ontologies and give a short introduction to RDF and OWL. It is not possible to give a comprehensive presentation of these recommendations here, and we refer to [9], [10], and [11] for more background and details.

3.1 Ontologies

The word ontology has come to be used in many different contexts, and thus has several different meanings. It originates in philosophy, where ontology is the science of describing the kinds of entities in the world and how they are related. A key aim of ontologies in the philosophical sense is a definitive and exhaustive classification of all entities. There are different ways of relating the content of ontologies to the world, and this is rooted in

9 philosophical debates going back to Medieval interpretations of Greek philosophy, on whether or not universals4 exist. In the realist stance, reality is taken to exist independently of human perception, and ontological quality is related to the degree to which the ontology is true of a certain portion of reality [12]. If you instead adopt a cognitive (or conceptualist) bias, you consider categories as cognitive artifacts which are dependent on human perception [13]. Further along on this scale we find nominalism, where it is held that abstract concepts exist only as names, having no independent existence. According to Gruber [14], an ontology is an “explicit specification of a conceptualization.” This definition was modified slightly by Borst [15]: “Ontologies are defined as a formal specification of a shared conceptualization.” From these definitions we glean that ontologies are formal in order to be machine-processable. Further, ontologies define concepts, properties, and relations explicitly, and are thus explicit specifications. They are shared in that they capture knowledge agreed-upon by a group and in that they can be communicated between machines. Finally, ontologies are conceptualizations in that they are an abstract model of some phenomenon in the world.

3.2 W3C Recommendations for the Semantic Web

The Web Ontology Language (OWL) and Resource Description Framework (RDF) are rec- ommendations of the W3C. In addition to the short introduction to these, some trade-offs made in making OWL are described, as these provide some background and framing to the issues brought up in Sec. 3.4 on others’ experiences in using OWL.

3.2.1 RDF

RDF is essentially a data-model. Its basic building block is a subject-attribute-object triple, called a statement. The statements form graphs, where subjects and objects are the nodes connected by attributes as the arcs. An example of a triple is: PeanutAllergy rdf:type somwebOntology#Allergy Here PeanutAllergy (subject) are described as being of rdf:type (attribute) Allergy (value). The rdf in rdf:type should be interpreted as a namespace which would have been defined earlier, and this is where we would find an ontology defining a meaning of type. Fundamental concepts of RDF are resources, properties, and statements. Resources are the things we want to talk about, such as diagnoses, medications, and allergies. Every resource has an URI (Universal Resource Identifier), which can be an URL (Unified Resource Locator) or some other kind of unique identifier. Properties are a special kind of resources, describing relations between resources. In RDF, properties are identified by URIs. The notion of using URIs to identify things and relations is central in giving a global naming scheme [9]. There are several ways to represent the abstract data model more concretely, and RDF is most commonly described in an XML (eXtensible Markup Language) syntax5. The example

4Universals are terms or properties that can be applied to many things, such as blue, three, or horse. 5There are many who object to the RDF/XML serialization, viewing it as too verbose and just plain ugly, and propose that other serializations, such as N3 and N-Triples, should be used instead. However, most RDF

10 above would be represented as follows in RDF/XML: "somwebOntology#Allergy" RDF/XML documents are made up of a number of descriptions, indicated by rdf:Description. Each description makes a statement about a resource, which can be identified in three dif- ferent ways. One is as above, with rdf:ID, which indicates the creation of a new resource. Another way is using rdf:about, whereby an existing resource is referenced. Finally, it can be used without a name, creating an anonymous resource. Within the description above, the property rdf:type is used as a tag, and the value of the property is its content. In XML, namespaces are used to prevent name clashes when an XML document uses more than one DTD or schema. Disambiguation is achieved by using a different prefix for each DTD or schema. A prefix is declared for the location of each DTD used, and and when a term is used it is separated from the local name by a colon (prefix:name). RDF/XML uses the namespace mechanism of XML, but in an expanded manner. Namespaces in XML is used only for purposes of disambiguation, while in RDF it is expected that external namespaces are RDF documents which define resources. This lets people reuse resources defined by others, by for example adding more information about the reused resource. Ideally, the result of this is that vast, distributed collections of knowledge-bases emerge. The vocabulary of RDF models can be described using RDF Schema (RDFS), where we get the ability to define classes using rdfs:Class, of which we can create RDF instances. We can also describe subclass hierarchies using rdfs:subClassOf. Likewise we can define properties using rdfs:Property which we can link using rdfs:subPropertyOf. Properties can be restricted using rdfs:domain and rdfs:range. Using rdfs:domain you can specify for a property the class of resources that may appear as subjects in a triple using the property, and if no domain is specified, any resource can be used as subject. Using rdfs:range you can specify the class of resources that may appear as values in a triple using the property.

3.2.2 OWL

RDF and RDF Schema have very limited expressivity. RDF is more or less limited to binary ground predicates, while RDFS is more or less limited to subclass and property hierarchies, with domain and range definitions for properties. The need for a more expressive ontology modeling language lead to a European effort called Ontology Inference Language (OIL)6 and an American effort called DAML-ONT7. These initiatives were combined into DAML+OIL8, which laid the foundation for the W3C Web Ontology Working Group in defining OWL. An OWL ontology can include descriptions of classes, properties, and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e., facts not literally present in the ontology, but entailed by the semantics. With OWL is written using RDF/XML. 6http://www.ontoknowledge.org/oil/ 7http://www.daml.org/2000/10/daml-ont.html 8http://www.daml.org/2001/03/daml+oil-index.html

11 we get vocabulary for describing properties and classes, including relations between classes (e.g., disjointness), cardinality (e.g., ‘exactly one’), equality, richer typing of properties, char- acteristics of properties (e.g., symmetry and transitivity), and enumerated classes [11]. An extension over RDFS is that in OWL you can provide restrictions on how properties behave that are local to a class [16]. OWL is designed to be the standardized and broadly accepted language for describing on- tologies, allowing users to write explicit, formal conceptualizations of domain models. OWL builds on RDF and RDFS9, and uses RDF’s XML-based syntax. There are three increasingly expressive sublanguages of OWL: • OWL Lite supports those who primarily need a classification hierarchy and simple constraint features. • OWL DL (Description Logics) supports those who want the maximum expressiveness without losing computational completeness. • OWL Full is for users who want maximum expressiveness with no computational guar- antees.

3.2.3 Trade-offs in Making OWL

The various efforts that preceded and influenced OWL meant that a number of trade-offs had to be made in devising OWL in a way that it could both have various desirable features and keep enough compatibility with its roots. This section describes a number of these trade-offs, and is based on the article “From SHIQ and RDF to OWL: The Making of a Web Ontology Language” [16], by three of the members of the W3C Web Ontology Working Group, which developed OWL. The authors point out that their views might not be shared by all of the members of the working group. The formal specification of OWL was influenced by Description Logics, the language’s surface structure was influenced by the frames paradigm [18], and the RDF/XML exchange syntax was influenced by requirements of compatibility with RDF. Drawing on experience from De- scription Logic research on the complexity-tractability landscape10, the set of constructors and axioms supported by OWL were chosen to balance the typical application’s expressive requirements with a requirement for reliable and efficient reasoning support. This lead to the choice of basing the design of OWL on the SH family of Description Logics. The SH fam- ily of Description Logics [20] includes support for boolean connectives (intersection, union, and complement), restrictions on properties, transitive properties, and a property hierarchy. Description Logics research has also shown that including the use of datatypes can lead to complexity and undecidability issues. This is dealt with by strictly separating the interpreta- tion of datatypes and values from the interpretation of classes and individuals, which is why OWL has separate datatype and object properties.

9It would have been preferable that OWL was an extension of RDF and RDFS, but such a layering cannot be realized in a straightforward manner [17]. 10Given the trade-off between the expressiveness of the representation language and the tractability of the associated reasoning task, there has been much work seeing how a given restriction in expressiveness affects reasoning procedures. Finding these interesting points in the tradeoff between tractability and expressiveness gives rise to a sort of complexity-tractability landscape [19].

12 To increase readability and general ease of use, a surface syntax based on the frames paradigm is provided. In frames, information about each class is grouped together, making ontologies easier to read and understand, especially for those not familiar with Description Logics. The abstract syntax of OWL is influenced by frames in general and by the design of OIL in particular. A class axiom in OIL consists of a compound construction of the name of the class, whether it is a ‘partial’ (indicating that the axiom is asserting a subclass) or ‘complete’ (indicating that we are dealing with an equivalence relation) description, and a sequence of property restrictions and names of more general classes. Given the many requirements, three viable solutions were found, each of which satisfy almost all of the requirements, and these are the three versions of OWL briefly described above: OWL DL, OWL Lite, and OWL Full. The improvement that OWL Lite gives in tractability over OWL DL11, comes with relatively little loss in expressive power, but the syntax is more restricted. However, this restricted syntax can be worked around, so that all OWL DL de- scriptions can be captured in OWL Lite, except those which individual names or cardinalities greater than one.

3.3 Tools for Working with OWL and RDF

There exist various tools for constructing ontologies and developing software based on these, with varying levels of functionality and stability. A few of the most commonly used tools for editing, Application Programming Interfaces (APIs), visualization, reasoning, and validation are presented here.

3.3.1 Editors

For creating an ontology, a text or graphical ontology editor can be used. Such tools can also be used for creating instances of an ontology. Of the graphical editors, Prot´eg´e12 is one of the more popular. Prot´eg´eis an open-source knowledge-base program developed at Stanford Medical. The application’s history dates back to the 1980’s, though the system’s capabilities have changed over time. Prot´eg´ehas an OWL-plugin, which can be used to create OWL ontologies as well as adding instance data. A practical guide to using this is the Prot´eg´e- OWL tutorial by Horridge et al. [21]. Figure 2 shows a screenshot of the application, with the an earlier version of the SOMWeb ontology loaded. In Prot´eg´e, user interfaces, such as input forms, can be generated automatically from the ontological structure.

3.3.2 Application Programming Interfaces

There also exist several APIs for writing programs to interact with OWL and RDF content. Jena13 is a Java framework providing a programmatic environment for RDF, RDFS, and OWL. It is open source and has evolved from the work of Hewlett Packard Semantic Web

11Key inferences can be computed in worst case exponential time in OWL Lite while for OWL DL this is NExpTime. 12http://protege.stanford.edu/ 13http://jena.sourceforge.net/

13 Figure 2: The Prot´eg´eapplication with an earlier version of the SOMWeb ontology loaded and the instance view open. The columns are for, from left to right: browsing classes, browsing individuals, and editing individuals.

Programme. It can be used for reading and writing RDF in its RDF/XML, N3, and N-Triples serializations, has classes for manipulating RDF models and OWL ontologies, and in-memory and persistent storage.

3.3.3 Visualizers

Visualizing ontologies is an active research area (see for example [22, 23]). Many of the existing applications are based on AT&T’s GraphViz graph visualization program. One of the more common is IsaViz,14 which is a visual environment for browsing and authoring RDF models represented as graphs.

3.3.4 Reasoners

There are several different inference engines, aiming to support different OWL dialects. Two of these are Jess (Java Expert System Shell)15 and RACER (Renamed ABox and Concept Expression Reasoner) [24], which both can be used with Prot´eg´e.

14http://www.w3.org/2001/11/IsaViz/ 15http://herzberg.ca.sandia.gov/

14 3.3.5 Validators

As will be discussed in Sec. 3.4.3, there are many issues making validation difficult on the Semantic Web. There are several validators that check the well-formedness of RDF and OWL files, such as the W3C RDF Validation Service16 and the WonderWeb OWL Ontology Validator.17 Eyeball18 is a library and command-line tool for checking RDF models for common problems, which often result in technically correct but implausible RDF. Eyeball uses user-provided schema files and makes various closed-world assumptions. It can check for, among other things, properties and classes which are unknown with respect to the schemas, untyped re- sources, and subjects having a different number of values than you’d expect from the cardi- nality restriction on the property.

3.4 Reported Experiences in Using OWL and RDF

Given that OWL and RDF are quite recent recommendations, experience reports on how people have used them and what benefits they have had, along with what difficulties have been found, are of interest. Quite a few such experience reports were published in association with the “OWL: Experiences and Directions” Workshop held in conjunction with the International Semantic Web Conference in Galway, Ireland 2005. The goal of this workshop is to form a meeting place for practitioners in academia and industry, as well as tool developers and other interested parties to “describe real and potential applications, to share experience and to discuss requirements for language extensions/modifications.”19 Many of the problems that have been indicated in experience reports are ones that the creators of OWL were aware of when the recommendation was published, some of which stem from the trade-offs necessary in constructing OWL (see Sec. 3.2.3). Also, future extensions were sug- gested then, such as better support for modules and imports, defaults, closed world assump- tion, unique names assumption, procedural attachment20, and support for rules [16]. The issues that come up in the experience reports and which are discussed here are the open world assumption, the no unique names assumption, validation, no support for default reason- ing, value ranges, reusing other ontologies, ontology reuse, imports, use of instances, OWL’s XML syntax, use of domain and range, OWL’s sublanguages, and problems for developers new to OWL.

3.4.1 Open World Assumption

Under a closed world assumption (CWA), any ground atomic sentence not asserted true is assumed to be false. This manner of treating information provided as complete is common in databases and is also the way people reason in many situations [25]. However, the Semanitc

16http://www.w3.org/RDF/Validator/ 17http://phoebus.cs.man.ac.uk:9999/OWL/Validator 18http://jena.sourceforge.net/Eyeball/ 19http://www.mindswap.org/2005/OWLWorkshop/ 20defining meaning by attaching a piece of code that when executed computes the meaning of the term.

15 Web has an open world assumption (OWA), meaning that you cannot assume that the absence of a statement means that it is false. As a result of new information, something that we previously had no information about might become either true or false. Having an OWA on the Semantic Web seems natural, since there can always be resources which we have not yet found. Also, the OWA seems particularly fitting in a domain “charac- terized by information that is incomplete either because of limits in the state of knowledge or omissions inherent in curation processes” [26], as is often the case in biomedicine. However, there are several problems associated with the open world assumption. One such problem is that there is no way to require that information be supplied. It would be desirable to have the ability to “express that within a given scope, certain re- strictions must be verifiable with the assertions expressed” [26]. This given scope could for example be assertions in a single file or at a single URL. Ruttenberg et al. note that while this means “closing the world” over a certain scope, it does not have to stay closed and does not affect the semantics of the document outside of the scope. Another problem is the lack of a convenient manner to assert that information is complete [26].

3.4.2 No Unique Names Assumption

The no unique names assumption means that we cannot assume that resources refer to dif- ferent things just because they are named differently. As with the CWA, there is a unique name assumption (UNA) for most databases, and most people make the assumption that when things have different names they refer to different things. While having a no unique names assumption is advantageous on the (Semantic) Web as a whole, where it is likely that the same concept can be named differently by different people, it is less useful within a single source of information. To ensure that different individuals and classes are recognized as such, their inequality has to be asserted explicitly using owl:differentFrom: We need to declare inequality quite often given the no unique names assumption, and we would get very many such statements if we wanted to assert inequality for a lot of individuals. A shorthand notation is provided by OWL to assert pairwise inequality of all individuals in a given list: Of course, even with this construct, the list would get quite long, and within a single source we normally know that different names name different objects. Maintaining all the owl:differentFrom assertions is inconvenient and difficult, especially as the document evolves.

16 It has been proposed [26] that this is another situation where it would be useful to have a concept of scope, with the ability to assert that all names within a scope represent different things. While the OWL DL construct owl:AllDifferent can be used to make a set of individuals mutually distinct, there is no such construct to make a set of classes mutually disjoint from each other. As the number of classes can become quite large, the number of disjoint axioms becomes problematic. Knublauch et al. [27] therefore recommend an owl:AllDisjoint construct be added to the OWL specification.

3.4.3 Validation

Fundamental features of the Semantic Web, such as the open world assumption, no unique names assumption, multiple typing, and support for inference mean that there are problems in providing the sort of validation a schema-language a user might be expecting. The ‘S’ in RDF Schema can be misleading, since RDFS is not a schema language in the traditional sense, where you can define when input data is complete and correct enough to be processed, and neither is OWL. For example, we may want to express a constraint like “every examination must have a date”, and say something like: 1 We then describe a resource of type Examination but include no date: "#Examination" What we might expect if we asked a general OWL processor to validate this, is that it would say “invalid” as mariesFakeExamination does not have a date. This is not the case, as the OWL restriction says something that is ‘true of the world’ rather than of a given document of data. Thus, when an OWL processor working under an OWA sees an instance of Examination it concludes that the given Examination must have a date, but that we just don’t know about it yet. We might also want to express that “every person is born in at most one country”:

17 1 We then describe a resource of type Person with two countryOfBirth properties: "#Examination" Again, if this is sent to an OWL validator, we expect some complaint that there are two values for countryOfBirth when an owl:maxCardinality of one has been declared. Because of the no UNA, this is not the case. If there is no explicit declaration that Denmark and Sweden are owl:differentFrom each other, there is no violation. In fact, the above assertions will lead an OWL reasoner working under a no UNA to conclude that Denmark and Sweden refer to the same thing. Another example shows the problems of checking type constraints. Suppose we want to say that for the class AnamnesisGeneral can only take values from the class for the property diseasesPast: We then describe a resource of type GeneralAnamnesis which takes a value for the diseasesPast property which is an instance of the class Allergy: "#GeneralAnamnesis"

"#Allergy" This example will not cause a constrain violation since an RDF instance can be a member of many classes, and if we have not explicitly said that the classes Disease and Allergy are disjoint, then what happens is that it is inferred that PeanutAllergy must be a Disease as well.

18 That a general OWL processor behaves in this way doesn’t mean that a specialist validator can’t be created, which treats a document as a complete closed description, assumes unique names, and warns when an object is inferred to have a type not known through its supertypes of its declared types. These additional assumptions would be useful for input validation but not for the general case.

3.4.4 No Support for Default Reasoning

We can distinguish between universals, properties which are true for all instances, and gener- ics, properties which hold “in general”. Universals are easily expressible in first order logics, but for generics, which can capture much of our commonsense knowledge, we need to go beyond first order logic. Default reasoning means that some general but not universal fact is applied to a particular individual. While regular deductive reasoning is monotonic, mean- ing that adding a new fact to a knowledge base will only produce additional beliefs, default reasoning is nonmonotonic, meaning that new facts may invalidate previous beliefs [19]. The simplest formalization of default reasoning is closed-world reasoning, where anything unmentioned is assumed false. It is non-monotonic as a sentence assumed false could later be determined to be true. Other ways of handling default reasoning are circumscription, where the abnormality predicates (which tell when a default is not applicable) are minimized, and default logic. In default logic a default theory is defined, consisting of a set of first-order sentences and a set of default rules, which specify which assumptions can be made and when [19]. As already stated, OWL does not have a closed-world assumption. OWL gives no mechanism for default reasoning; it has no built in support for reasoning about that which is ‘typically’ or ‘generally’ true. When there is a limited number of exceptions, this can be handled by making logical statements more specific. But when patterns get more complex this approach leads to combinatorial explosions [28]. Being able to say that something “may occur” is also needed by many users, e.g. a drug “may have side effects” [27].

3.4.5 Value Ranges

While OWL DL is based on what is seen as a highly expressive description logic, there are a number of areas where the OWL language lacks the expressive power required by its users [27]. One of the things left out, which mailing lists such as the one maintained by the Prot´eg´e team see a lot of complaints about, is the poor representation of numeric expressions. Typical examples are needing to express the length between 2mm and 5mm or an age greater than 18. Being able to declare ranges in this way is needed to classify individuals and to express class definitions such as “Adult”. Knublauch et al. [27] argue that even if there cannot be full support for reasoning with user-defined datatypes within existing tools, there should at least be provided a standard mechanism for expressing such constraints in the OWL specification. This could, for example, be used to validate user input on forms.

19 3.4.6 Reusing Other Ontologies

Reuse of ontologies developed by others is often held as a goal and advantage of the Semantic Web. The working group for designing the biopathways ontology [26], which was developed for exchanging biological pathway information, had several ontologies that would be of interest for reuse. However, few of these are provided in OWL DL, which the developers had chosen to use. One option for getting around this, is representing terms from such ontologies as values of two properties, where the name of the vocabulary from which the term was taken is given by one property and the other identifying the term in the vocabulary. An alternative method discussed is translating the ontologies needed into OWL, and then import them. There is also the issue of how to treat changes in the external ontology. In the case of having references to terms in the external ontology, the identifiers may become incorrect as terms in the external ontology are deleted or deprecated. On the other hand, if the translation approach is used, new and changed terms are not available until the translation is updated.

3.4.7 Imports

In RDFS ontologies it is common practice to establish references between models by simply declaring namespaces. In OWL, just declaring a namespace for an external ontology is insuf- ficient to import it [27]. When importing an OWL ontology all statements of the imported ontology are included in the importing ontology. This can mean problems both in perfor- mance, if the imported ontology is large, and that an ontology editor has to differentiate during editing between the parts of the importing and the imported ontology [29]. Knublauch et al. [27] also bring up that with OWL and RDF it is not currently clear how to use namespace and import mechanisms for structuring ontologies into public and private parts, or interface and implementing ontologies. As these are sought after functions, they believe that stronger guidelines are needed for building modular ontologies.

3.4.8 Using Instances

When first encountering OWL and RDF, it seems intuitive to let RDF individuals assume the role of records and the ontology play the role of schema, specifying what kinds of data can be entered for the records, and so on. As some of the members on the working group for the biopathways ontology put it [26]: “Database designers don’t generally spend much time thinking about denotation and truth, but RDF and OWL impose a sort of moral imperative to address these issues.” They continue by reflecting on the challenge of figuring out what the correspondence is between their model and the world, and that on the importance of this when designing an exchange language. If the mapping of classes and instances to biological phenomena is not defined carefully, each information provider will have their own mapping. It would then be up to each client using several sources to determine how to relate these, thus defeating the purpose of creating an exchange format. This defining of the correspondence between classes and instances, and their objects in the world, was not something they had anticipated. In the development of the biopathways ontology, Ruttenberg et al. [26] found that the issue

20 was first raised in trying to understand what it meant to make reference to a particular physical entity instance in more than one reaction. There is an inclination to reuse instances, as they are rather large and include information such as synonyms and chemical structure. There is, on the other hand, a feeling that when you refer to the same instance, that means that you are referring to the same thing in the world. If an instance does not designate a single thing, would it not be more appropriate to use classes to represent them? But in OWL DL there are limits on the ways classes can be related to one another. Ruttenberg et al. conclude that they have had trouble deciding where to draw the line and that more guidance on this topic is needed.

3.4.9 The XML Syntax

The default serializations of RDF, and thus also of OWL, is that of XML. RDF/XML is very verbose, as can be seen in this example (from [16]). A class as it would be described in a Description Logic syntax

Student = Person ⊓≥ 1 enrolledIn

(a Student is a Person who is enrolledIn at least 1 thing), would most canonically be written in the following way in the OWL RDF/XML syntax: 1 Many find this hard to read, to which some respond that RDF/XML isn’t supposed to be read by people, without using an editor. Also, users are tempted to hand edit these OWL files, leading to that they have trouble loading them using the parsers of for example the Jena API. Knublauch et al. [27] argue that the XML serialization of OWL should be considered as a binary format that should not be edited outside of specialized tools.

3.4.10 Use of Domain and Range

Two constructs for property axioms that OWL supports are rdfs:domain and rdfs:range. Syntactically, rdfs:domain links a property to a class description, and an rdfs:domain axiom asserts that subjects of such property statements must belong to the class extension of the indicated class description. Likewise, rdfs:range links a property to a class description or data range, and an rdfs:range axiom asserts that the values of this property must be a part

21 of the class extension of the class description or to the data values in the specified data range [30]. For example, we might want to say that the property hasTopping has as its domain instances of the class Pizza: In most languages, using constraints on domain and range means that these are checked and errors are generated if they are violated. In OWL they are used for reasoning, which means potentially far-reaching and unexpected effects. In the above example, if we expect some sort of constraint-checking, we would think that an error message would be generated if instances of some class other than Pizza took a hasTopping property. If we, for example, had a vanilla ice-cream with a mint topping: "#IceCream" We would expect to be told that hasTopping cannot be used on instances of type IceCream. What happens when using a general OWL reasoner is that any instance to which hasTopping is applied is classified as a kind of Pizza, i.e., that vanillaIceCream is a type of Pizza. The same train of thought can be applied to rdfs:range. According to [31], for new users difficulties with domain and range constraints are the largest single source of errors after problems with open world reasoning.

3.4.11 OWL’s Sublanguages

Knublauch et al. [27] find that most users mostly see OWL as a more expressive variant of RDFS. That is, they use it to define classes, properties, and individuals for sharing on the Web. However, the expressivity of RDFS is greatly extended by OWL. Many users use restrictions to “express what they see as necessary conditions of a class, and use the owl:imports mechanism to link ontologies to each other. Such ontologies carry little semantics that could be exploited by reasoners.” Many ontology designers also ignore the open-world semantics and the lack of the unique names assumption. There are OWL DL supporters who hold that without a clean logical foundation, the Semantic Web will not make sense. Knublauch et al. [27] argue that there are valid use cases for utilizing only subsets of OWL. Which OWL dialect is chosen is decided by whether users build primarily taxonomies, data structures, or rich knowledge models. As an example, an ontology for e-commerce might only contain classes to describe customers and their address and phone number and an initial version of the ontology does not need advanced OWL constructs beyond range and domain statements. Semantically simple ontologies such as this is enough to make a Web application able to generate user interface forms from class definitions and describe schema useful for integration. Then, later in the ontology’s life cycle, additional expressivity

22 can be added as developers find they need it. Knublauch et al. [27] see this as a major selling point for OWL, one often ignored by proponents of DL: “The breadth of the OWL language offers a migration route from entry level, hand-crafted taxonomies of terms, to well defined, normalized ontologies capable of supporting reasoning.”

3.4.12 Problems for Developers New to OWL

There are several topics which are especially difficult for computer professionals not familiar with OWL, such as rdfs:domain, the open-world assumption, and the lack of the unique names assumption. As mentioned above, Ruttenberg et al. [26], tells of how OWL DL was used for exchanging biological pathway information. In the conclusion, they state that: However, in spite of the group’s experience in biological knowledge representation, bioinformatics, software engineering, and database design, it encountered some challenging problems. They also state that they believe that problems similar to those described by them (e.g., not being used to the OWA, issues with reuse, insufficient validation, and whether to use instances or classes) will be common as more groups chose to use Semantic Web technologies. Knublauch et al. [27] argue that it is important that the OWL community be clear about the differences between object-oriented approaches and DL, especially since computer profession- als trying out OWL will often have experience of object-oriented languages. Persons from the field of knowledge modeling are often familiar with frame-based systems. One difference is in the rdfs:domain construct. While domains are in effect mandatory in many frame based sys- tems, in OWL domain constraints are “axioms from which inferences may be drawn”. Also, object-oriented attributes must belong to certain classes, but OWL properties often have no domain statements at all.

4 Design and Development of the SOMWeb Ontologies

The design of the SOMWeb ontologies takes the MedView knowledge representation and content as a starting point. The knowledge model of MedView, as described in Sec. 2.1, includes the representation of (1) individual examination records, (2) examination templates describing the pattern from which the individual records are created and which are used in constructing graphic input forms, (3) value lists from which values can be chosen when filling out these forms, and (4) aggregates of values created and used when analyzing data from the examination records. Much of the focus of this work has been on representing the second of these, the examination templates. The models for the examination templates were also central in MedView. In the following subsections we describe how the examination templates can be represented in OWL (Sec. 4.3), followed by a description of how the value lists are remodeled (Sec. 4.4). For each of these, we begin by describing the general structure, followed some of the design decisions made. We then give short descriptions of how examination records (Sec. 4.5) and aggregates

23 (Sec. 4.6) are represented in SOMWeb. Finally, we discuss some matters related to end-user input (Sec. 4.7). But first we further explain the correspondence between the old, MedView model and the new, SOMWeb model and a brief account of considerations in the translation of the actual content of MedView is provided (Sec. 4.2).

4.1 Relations between Structures of MedView and SOMWeb

Figure 3 shows how the new structures of the SOMWeb representation can be mapped to the old ones of the MedView representation. The examination templates previously described using XML are now described using OWL. Further, there is an OWL file describing general examination structures, which can be compared to the DTD used to describe the XML exami- nation files. That which was previously described by the term definitions and term values files is now described using one OWL file (though there could be several such files, representing multiple sets of term definition and term value files). Aggregates are, just as before, stored separately, but in OWL. Finally, the individual examinations previously in tree files are now in RDF files.

4.2 Development Process

It was decided early on that the current representations should be used as a starting point. To begin with, these were used as inspiration for more or less constructing the examination template by hand using Prot´eg´e. However, given the number of terms and term values in the most commonly used template and term-value file, this turned out to be a lot of more or less manual work. Because of this, we decided to write a program that uses the Java classes in MedView that reads templates, term values, and term definitions. From the internal Java representation of these, we create the appropriate OWL constructs, based on the proposed structure and design decisions described below. This converting program uses the Jena API. The advantage of the more manual approach was more control over the process, especially of the parts of the MedView representation believed not to be conveniently translatable into OWL. However, once a more automatic approach was taken, it turned out that these cases were quite few.

4.3 Designing the Examination Template Ontologies

An examination template describes what should be included in an examination record. It is used both to construct the form-based user interfaces and to structure the actual record. As mentioned above, in MedView, examination templates were stored as XML documents, and there is one DTD to describe general features of such templates. In SOMWeb, the examination templates are represented using OWL, and there is also one OWL document giving a general description of what is included in an examination template. All named entities in RDF (and thus in OWL, which is based on RDF) are referred to by a URI, so all classes, properties, and instances in our examination template and value list ontologies are assigned URIs. The general examination description OWL file has its own namespace, referred to by each MedView examination-template, which are separate OWL

24

G G

e n e a l e x a m i n a t i o n e n e a l e x a m i n a t i o n

r r

( D T D ) ( O W L )

e c i p t i o n e c i p t i o n

d s r d s r

E x a m i n a t i o n E x a m i n a t i o n

t e m p l a t e t e m p l a t e

( X M L ) ( O W L )

t e m V a l u e

r s

D

t e m e fi n i t i o n V a l u e l i t

r s s

( )

t x t

( ) ( O W L )

t x t

A g g e g a t e A g g e g a t e

r s r s

( ) ( O W L )

t x t

E x a m i n a t i o n E x a m i n a t i o n

e c o e c o

r r d s r r d s

( ) ( R D F )

t e e fi l e fi l e

r s s

Figure 3: A comparison between the MedView and SOMWeb representations, with the previous MedView structures to the left and the new SOMWeb structures to the right. The most general aspects of examination templates are described in a DTD in the MedView version, and in OWL in SOMWeb. There is only one such general description. The examination templates were stored in XML files in MedView, and are now stored in OWL files. There can be many different examination templates, corresponding to different examination situations (such as one for regular visits and one for those remitted for fear of dentists). The terms and values that are used by the examination templates and in the individual examinations, are in the MedView representation kept in a termValue and corresponding termDefinition file, which are stored in text files of a certain format (see Sec. 2.1). These are now stored as classes and individuals in an OWL file. Just as there could be different sets of termValue and termDefinition files, it is possible to have different value list OWL files. Aggregates were previously stored in a specific format for aggregate definitions, in separate files. They are now represented in OWL, in separate files. Finally, the examination records are stored as tree files in the old version (see Sec. 2.1.3), and are now stored as RDF files.

25

C

h a s E x a m i n a t i o n a t e g o r y E x a m i n a t i o n

E x a m i n a t i o n

C

a l l V a l u e s F r o m a t e g o r y

A s s o c i a t e d w i t h b o t h

D a t a t y p e I n p u t P r o p e r t y

D a t a t y p e I n p u t P r o p e r t i e s a n d

O b j e c t I n p u t P r o p e r t i e s a r e :

i n s t r u c t i o n P r o p e r t y

S

r d f s : u b P r o p e r t y O f

d e s c r i p t i o n P r o p e r t y

i s L o c k e d P r o p e r t y

i s I n c l u d e d P r o p e r t y

S S

i n g l e D a t a t y p e I n p u t M u l t i p l e D a t a t y p e I n p u t V A I n p u t

i s V i s i b l e P r o p e r t y

O b j e c t I n p u t P r o p e r t y

S

r d f s : u b P r o p e r t y O f

S

i n g l e O b j e c t I n p u t M u l t i p l e O b j e c t I n p u t Q u e s t i o n I n p u t

Figure 4: This figure shows the structure of the general examination ontology. It is used by all individual examination templates. Classes are Examination and ExaminationCategory. The property hasExaminationCategory is used to connect instances of these classes. The inputs of the old MedView templates are represented as properties, which are either object properties or datatype properties. To allow for easier integration with the existing code base, the different kinds of input are explicitly represented, such as single, multiple, and VAS. There are also various properties associated with the inputs, such as instructionProperty and descriptionProperty.

files. We will now describe the structure of the ontologies describing examinations, followed by some design decisions related to this structure and other features of the examination ontologies.

4.3.1 The Structure of the Examination Ontologies

As introduced above, the general features of an examination template in SOMWeb are de- scribed in an OWL file referred to by all other examination templates. Features of this are de- picted in Fig. 4. They include classes such as Examination, ExaminationCategory, and prop- erties corresponding to the different input-types of MedView, such as SingleObjectInput, MultipleDatatypeInput, and VASExaminationInput. Also, hasExaminationCategory is a property for connecting an Examination instance to ExaminationCategory instances, and we state that an Examination must have at least one ExaminationCategory instance as the object of hasExaminationCategory. Each examination template is described in a separate OWL file. An examination template in

26 OWL is given in App. B, which corresponds to the one in the original format given in App. A. They describe the form for a meeting consultation, which is a record of what is decided at the teleconference meetings of SOMNet. An examination template OWL file contains definitions of the categories that can or need to be included in an examination constructed from that template. Examples of subclasses of ExaminationCategory in current use are PatientData, GeneralAnamnesis, and MucosAnamnesis. In each examination template we also describe properties (or inputs) of the template, as subproperties of the properties such as SingleExaminationInput described in the general examination description OWL file. For each property, there are also properties pertaining to descriptions and instructions, as in the XML template. The ordering of the categories and the properties within the categories is described, as well as any cardinality constraints. Figure 5 shows the structure of an example ExaminationCategory, GeneralAnamnesis, and how it relates to the general description and the value list ontologies. Note that in OWL, properties can be either object properties or datatype properties. Ob- ject properties take classes as values, while datatype properties take simple XML Schema Datatypes as values. These properties correspond to ‘terms’ or ‘inputs’ in the old MedView nomenclature. For each input we need to define what ExaminationCategory they can be- long to and what types of values they can take. This can be done using rdfs:domain and rdfs:range or using restrictions such as owl:allValuesFrom and owl:someValuesFrom. In the first case, the properties are connected with their ExaminationCategory by rdfs:domain. Classes from which values of object properties can be taken are specified by rdfs:range. In the second alternative this is specified as a restriction on the ExaminationCategory, that for a given property all values has to be from a certain class. This issue is discussed further in Sec. 3.4.10.

4.3.2 Design Choices

Now that the general structure of the SOMWeb examination ontologies has been presented, we go into detail on a few of the design choices made. We begin with the issue of whether or not the constructs of domain and range should be used as it intuitively seems they should be used. We then consider how and whether the order of categories and properties should be included in the examination template description. Namespaces and URIs are then discussed, followed by some issues pertaining to the subclassing and structuring of examination categories and properties. Finally, the representation of intervals and n-ary relations are treated.

Domain and range In OWL you can use the RDFS constructs of domain and range for properties, as introduced in Sec. 3.4.10. In Fig. 6 we exemplify this for the case of the SOMWeb ontology. Here, we define an object property named diseasesPastInput, say that its do- main is instances of the class AnamnesisGeneral, and that its range is instances of the class DiseasesPast. What is expected is that this would provide some sort of constraint checking; if instances of the class DiseasesPast is specified as the range of the diseasesPastInput, then having an instance of a class other than DiseasesPast as the value of a diseasesPastInput would result in a non-valid statement. However, this is not the case. In the given case, it will be assumed that any instance taken as a value of the diseasesPastInput is of class DiseasesPast. Likewise we expect to get some sort of error if an instance of a class other

27

D

i s n o w

d

i s n o w I n p u t

a l l V a l u e s F r o m

D

i s p a s t

d

i s p a s t I n p u t

a l l V a l u e s F r o m

f

O

s

d

s

r u g I n p u t E x a m i n a t i o n

A D

n a m n g e n r u g

C

l a

a l l V a l u e s F r o m a t e g o r y

b c

u

s

s m o k e I n p u t

a l l V a l u e s F r o m

S

m o k e _ R e l a t i o n

s m o k e I n p u t

a l l V a l u e s F r o m

A

l c o h o l _ R e l a t i o n

g e n e r a l E x a m i n a t i o n .

o w l c a s e E x a m . o w l v a l u e L i s t . o w l

Figure 5: Illustration of the structure of an ExaminationCategory, Anamn-gen. The ovals indi- cate classes and the arrows properties. The boxes in the background indicate which OWL files the classes and properties are located in. The inputs/properties used by Anamn-gen are ‘declared’ in the caseExam.owl file. This file also contains restrictions that, for example, an instance of Anamn-gen has only instances of the class Dis-now as values of the property dis-nowInput, as indicated by the allValuesFrom on the arrow. than AnamnesisGeneral had a diseasesPastInput, but such an instance would be inferred to be of class AnamnesisGeneral in addition to any other classes it has been declared an instance of. To achieve the intended effect, we need to use owl:allValuesFrom construct, as shown in Fig. 7. Here owl:allValuesFrom is used to specify the class of possible values that the property specified by owl:onProperty can take. All values of the property must come from this class. In this example, for instances of the class AnamnesisGeneral, only members of the class DiseasesPast are allowed as values of the diseasesPastInput. Even though using domain and range in the above way does not always lead to the expected result, you still see a lot of this usage in the ontologies in ontology repositories on the Web. Since we programatically generate the examination template from the previous templates, we have included the option of using either method, though in the future domain and range will probably not be used.

28

Figure 6: An example of using domain and range to link properties to classes. We define an object property named diseasesPastInput, and say that its domain is instances of the class AnamnesisGeneral (we restrict what objects the property can be applied to), and that its range is instances of the class DiseasesPast (we restrict what values the property can take).

Figure 7: The figure shows another way of representing what we wanted to say in Fig. 6. We use owl:allValuesFrom to specify the class of possible values that the property specified by owl:onProperty can take. All values of the property must come from this class. In this example, only members of the classDiseasesPast are allowed as values of the diseasesPastProperty.

There is another consequence of this choice. When you use domain and range, the information about which ExaminationCategory a property/input belongs to is associated with the prop- erty. When you use restrictions, this information is associated with the ExaminationCategory. If we return to the examples in Fig. 6 and Fig. 7, we have an example of an ExaminationCategory: AnamnesisGeneral. In Fig. 6, the domain and range example, the association between the property and the ExaminationCategory is declared with the property, here the diseasesPastInput. In Fig. 7, the allValuesFrom example, this association is given by the AnamnesisGeneral class, an ExaminationCategory. It seems in some sense more intuitive to associate this information locally with the category with which it belongs than globally with the property.

Representing order An examination template has an implicit order. A general anamnesis is normally defined to come before a diagnosis. However, OWL and RDF, being graphs, have no order. We need order both on the level of examination categories, and on the level of the questions/properties of those categories. The approach we take is to use rdf:list to represent this. In the OWL examination template, this could look like:

29 However, using the rdf:list construct brings us into OWL Full. Right now we can motivate this is not much of a problem, as we do not to reason over the order of ExaminationCategories. However, one can think of cases where we will need to reason over the order (for example in dependency relations between different categories). The W3C Working Group Note on N-ary relations [32] has suggestions for representing lists as N-ary relations, which could be used to avoid rdf:list. See also recent work on representing lists in OWL [33]. One could of course discuss whether or not this order is intrinsic to the examination, or if it is part of the presentation of the examination. Whether or not it is appropriate or useful to separate what is part of an examination and what order it should be presented remains an open question.

‘Reusing’ Property URIs All resources in RDF, including properties, are identified by URIs. In SOMWeb, for example, we could have a property identified by the following URI: http://www.cs.chalmers.se/proj/medview/somweb/ somwebExamination.owl#diseasesPastInput This indicates that the diseasesPastInput is defined in the somwebExamination.owl file located at http://www.cs.chalmers.se/proj/medview/somweb/. Before it was decided to have an ontology describing the general structure, we had considered using the same names- pace for all examination templates. Now that each template is defined in a separate OWL file, they also have a separate namespace, distinguished by ending in different file-names. If we had stayed in one namespace, we had the decision of what to do if two different examination templates want to use the same input name. If the same names are use, this could lead to that an examination input, with a certain URI, can be used in different ways, such as taking different classes as values or belonging to different examination categories. In MedView right now it is assumed that a given term is used only in one context and takes only a one class of terms as values. Therefore, in SOMWeb, this is possibly not a problem either, they can have the same name, and it is enforced at a ‘user’-level that they can only be used in one context. With the current approach, where separate namespaces are used, we have the opposite prob- lem. We could have two properties in separate OWL examination descriptions, which are equivalent. There is a built-in OWL property owl:sameAs, which links an individual to an- other individual, indicating that two different URIs refer to the same thing. It might be used to solve our problem here:

30 However, this requires that properties be treated as individuals, which brings us into OWL Full.21 There is also the possibility of using owl:equivalentProperty to link properties, which means that the two properties have the same extension: Equivalent properties have the same ‘values’, i.e., the same property extension, but can have different intensional meaning, i.e., denote different concepts.

Properties Reflecting ExaminationCategory As a means to avoid the decision of whether or not to use domain to describe what category a property belongs to, we con- sidered having properties for each examination category, e.g. GeneralAnamnesisInput, which all properties that should belong to the GeneralAnamnesis ExaminationCategory are sub- properties of. However, there might be cases where a input can belong to several examination categories. Further, it means that new super properties have to be created each time we create a new category.

Properties Reflecting MedView Term Types We decided that it would be useful to have properties for each term grouping that exists in MedView today – such as single, multi, and VAS – which suggest properties that can take only one value, many values, or val- ues on a VAS scale. Corresponding OWL properties could be SingleDatatypeProperty, MultiObjectProperty, and VASProperty. This makes it easier to understand for those used to the old representation, as well as making it easier to use old MedView Java code to read and write SOMWeb examinations. See Fig. 8 for examples of how the inputs of the examination are subproperties of the more general MedView groupings. However, the old groupings do not always apply for the new SOMWeb model, and need to be subgrouped further. For example, each multi and single need to be further subgrouped into object property and datatype prop- erty. Further, some of these groupings can be easily represented using cardinality constrains in OWL, such as the single and multi groupings. It would be easy to determine what grouping the MedView program should use for these, from the cardinality constraints. However, for others, such as the VASProperty, it might be more complicated. Rather than using this sub- property construct for only some of the properties, we decided to use it for all, even though some information might be redundant.

MedView’s Question Inputs MedView has an input category called ‘question’ or ‘default placeholder’. This is used, for example, for specifying how long a patient has had a certain problem (value + time unit), or smoking habits (value + tobacco category + time unit). In the term values file this has been represented as: $Smoke ? cigarettes no filter/day

21In OWL Full – where there is no strict separation of classes, properties, individuals, and data values – we can apply properties that apply to individuals to classes.

31

Figure 8: Properties in examination templats in OWL are subproperties of more general properties, defined in the general examination template description.

? cigarilles/day ? cigarrs/day ? filter cigarettes/day ? piptobakspaket/week Not daily No This is quite unsatisfactory, since there is, for example no relation between the time units. These ‘questions’ could be represented using the W3C OWL Best Practices Proposal for Representing N-ary Relations [32]. In the case of examinations in MedView, the ‘n’ will most often be 2 or 3. In the case of 2, we will first have a value and then a unit. In the case of 3, we will first have a value, then a ‘sort’, then a time unit. Figure 9 depicts how this pattern can be used at the instance level for the case of smoking. The corresponding RDF is shown in Fig. 10, and the description of such a relation in the examination template is given in OWL

in Fig. 11.

1 0

s m o k e A m o u n t

s m o k e T y p e

s m o k e I n p u t

A n a m n g e n _ 1 2 3 S m o k i n g _ R e l a t i o n _ 1 l t e r C i g a r e t t e s

t i m e P e r i o d

d a y

Figure 9: Depicted is the use of a separate class for representing n-ary relations, here the Smoking Relation, at the level of individuals. The Smoking Relation 1 connects the Anamn-gen instance with appropriate properties and values for describing the smoking habit of a patient: the number of tobacco products used, the type of tobacco product, and the time period during which this usage takes place.

4.4 Designing the Value List Ontology

In Sec. 2.1.2 we explained that, apart from the templates describing the examinations, Med- View also needs files with term definitions and term values. The term definition file lists the

32 10

Figure 10: RDF describing the relation depicted in Fig. 9. terms and contains information about what the type of each term is, whether it is, e.g., single, multiple, or interval. This is done in a text file, on each line listing the term and its type. The term value file lists the values that each term can take. It is also a text file, with a the term name followed by a possible value per line. When considering how to translate this information into OWL, the idea has been to repre- sent values as instances. BirchPollenAllergy would be an instance of the class Allergy. Initially, there were attempts to categorize and clean up the instances at this first remodel- ing, so that for example Allergy would have a subclass such as PollenAllergy, of which BirchPollenAllergy would be an instance. After several attempts at doing this, using Prot´eg´eand some of its wizards, it was decided that too much manual work would be needed to complete the translation, and that such work should be carried out by the clinicians (or better yet from reuse of other ontologies). Ideally, there would also be a connection to a more complete knowledge structure of e.g., how the allergic reaction is connected to the allergen, which could be used in reasoning about allergic reactions. It was instead decided to convert the MedView term value lists programatically, even though a lot of the problems of these lists will be included in the SOMWeb version. Apart from things that are ‘irregular’ in some sense, such as misspellings and values appearing several times with small variations, there is also the use of Swedish letters and spaces, which makes them bad URIs. The naming issue is discussed further below. An approach of an initial automatic translation followed by manual fine tuning seems reasonable. Also, for the translation of old cases to work, we still need to include the ‘irregular’ values used in these cases.

4.4.1 Structure of the Value List Ontology

All of the terms of MedView will be represented as OWL classes, and their values will be instances of these classes. So the term Allergy will become the class Allergy, and the values of the Allergy term will become instances of the class Allergy. Each such class will also have a property indicating what type it has, even though this seems like a slight duplication of this information, as it is also contained in the examination description. However, to be able to use the existing MedView datahandling code, we need to have easy access to this information. While this information could be accessed from the examination template, the term values and the templates are not read by the same parts of the system. Since the structure of the terms in MedView is flat, no subclasses are created. Each class has a property medviewType corresponding to the type given in the term definition file. It is also

33

Figure 11: How the n-ary relation pattern can be used for representing a smoking habit in the examination template. possible to add metadata about who has created the term. Here is an example of the class Drug, which has as its creator “termValues”, to indicate that it is a translation from the older format, and medviewType multiple: termValues multiple Individuals of these classes are created for each of the values of a term in the term values file.

34 For example: Pepcid termValues Again, we have termValues as the creator. There is also an rdfs:label for the Swedish language. The somwebInstances.owl file, created from the term definitions and term values currently used in the SOMWeb community, basically contains only these two kinds of statements. They are in no particular order, since RDF has no order, making the file difficult to read for humans. An excerpt is shown in App. C. For brevity, no owl:allDifferent or owl:differentFrom are included in the excerpt. If the name of the value contains symbols not allowed in URIs, such as spaces and Swedish letters, the URI is made by removing these. The rdfs:label is used for representing the original name of the value. Because we can attach language information, we can have labels in many languages here: Kemikalier Chemicals termValues Though it would have been appealing to have the URIs of the instances use English, this would require much manual work. Though the Swedish word ‘kemikalier’ is easy enough to translate using a common dictionary, the term values contain many words which are not. We can begin with having all labels for the Swedish words, and the labels for English can be added later. Should we at some later time want to switch to URIs using English, we can declare that the old URI using Swedish refers to the same thing as the new one using English. We can also add meta-information for each term value, about who has added the term. In this case, it was created from the termValues list, but in future use the name of the clinician adding it can be automatically added. We have used the Dublin Core22 property creator for including this information. Some of the values in the current value lists contain extra information in the value name. For example, the diagnosis values have an International Classification Diseases (ICD) code concatenated to them: Gingivit - plackinducerad K051 (gingivitis - plaque induced). In the value list ontology such extra information is separated from the name: Gingivit - plackinducerad K051 termValues

22http://dublincore.org/

35 4.4.2 Design Choices

Apart from the more general structure decisions presented above, we will now discuss some more specific questions that have been under consideration during development. These are using instances in the first place, the naming of instances, as well as reuse and import of ontologies developed by others.

Using Instances We have chosen to use instances, rather than classes, for the term values. This can be seen as problematic, if we consider, for example, a property hasAllergy pointing to the instance of the class Allergy representing a dog allergy. If we use this, in two different examinations, for two different people, are we somehow saying that they have the same allergy against dogs, in the sense that the chemical reaction in their bodies resulting from an exposure to dogs is identical, or at least the same kind of reaction? If this is considered a problem, one approach to getting around this would be to have each kind of allergy as a subclass of the class Allergy, and then creating instances of these subclasses for each patient’s allergy. But this would mean creating a large amount of instances, thereby adding a lot of complexity. Such an approach would be similar to the advanced system for referent tracking of such entities, suggested by Ceusters and Smith [34]. In the referent tracking paradigm, all concrete individual entities relevant to the correct description of a patient’s condition, therapies, and outcomes should be referable explicitly by use of unique identifiers. Not only does the patient receive such an identifier, but also the patient’s particular fracture, the particular bone that is fractured, and so on. When seeking advice on how to make this design decision, it becomes apparent that this choice is somewhat a matter of taste, and also that the W3C has intentions to provide guidance on this issue. In the W3C Best Practice Working Group note on N-ary relations [32], they state in passing, when discussing how to represent a diagnosis, that: “For simplicity, we represent each disease as an individual. This decision may not always be appropriate, and we refer the reader to a different note (to be written).” Such a note has yet to be made available.

Naming Each value needs to have a URI. In deciding what this URI should be, the most straightforward is perhaps to create one based on the current term value, which is in Swedish. This is, as described above, what was decided upon. But at first it seemed more natural to have names in English in the URIs, which created problems as many values are in Swedish. We then considered using URIs consisting IDs generated by concatenating the class name and some numbers, and having the Swedish name only in the label. However, a problem with this approach are that the RDF-file describing the examination becomes less readable and contains less information in that you would need to look in the description of the value to find its Swedish name. There is also the issue of how to assign the numbers. Yet another problem is keeping track of this mapping. Related to the question of URIs and naming are Life Science Identifiers (LSID)23, designed to be location-independent, stable, and resolvable identifiers for entities on the Semantic Web for the life sciences [35]. LSIDs are on the form of Uniform Resource Names (URNs), guaranteed

23http://lsid.sourceforge.net/

36 to be globally distinct from all other named entities, meaning that two different entities are guaranteed not to have the same LSID. The reason for using URNs rather than URLs is that URLs are originally intended to be used for documents rather than individual conceptual units, and no assumption of stability over time is offered. The LSID of an entity is thus independent of its location [36]. While using LSID may be appropriate for SOMWeb, it was not within the scope of the project.

Reuse and Imports Creating the SOMWeb ontology leads us to conclude that reuse is perhaps more difficult than creating an ontology from scratch. Further, finding a good ontology suiting our needs, in OWL, to reuse, is difficult. It may have become easier over the time of this project, the past two years. The ontology to reuse has to have concepts of interest, be of ‘good’ quality, and so on. Also, we need a means to add a Swedish translation in many cases. Also, as discussed above in Sec. 3.4.7, the import support in OWL needs to be improved, so that you don’t need to import all concepts of the imported ontology at runtime. Although earlier versions of the SOMWeb value list ontology did reuse ontologies for countries, the lack of a Swedish translation lead us to just use the previous value list. One feature of the MedView applications is that it has been easy for users to add values to the value lists, as they saw fit. If we reuse instances of another ontology, this would mean that we would add new instances to the classes of that ontology ‘locally’, kept separate from the ‘original’ instance list. This might in itself not be a problem, and can perhaps rather be seen as a feature of using Semantic Web technology, but it does add complexity to the management of the value lists. It thus appears that if we want to attain the benefit of interoperability from reusing other ontologies, we will have to add it later, when and if usable ontologies appear. This can be done either by declaring some SOMWeb class as a subclass of some external class, or by using owl:sameAs.

4.5 Representing Individual Examinations

An examination generated by the SOMWeb community can be found in App. D. Another ex- ample of an examination is visualized in Fig. 12. It shows an instance (Examination 123) of the Examination subclass defined in an examination template. Each of the PatientData 123, Anamn-gen 123, and so on, are instances of subclasses of ExaminationCategory, which are also defined in the examination template, along with the properties/inputs “going out” from these categories. These properties/categories point to instances found in the value list ontol- ogy, valueList.owl, except for ‘39’, which is an int, since age is a datatype property.

4.5.1 Validation

One problem we have been faced with is how do we validate a given examination, given the open world assumption and the no unique names assumption? That is, how do we take a RDF description of an examination instance and check that it fulfills all the requirements of the corresponding OWL examination template description? One tool that could be of use

37

a l u e L i s t . o w l : K i n n a

v v

d r

g e n e

d b r

a l u e L i s t . o w l : S w e e n o n

P a t i e n t D a t a _ 1 2 3

v

a g e

3 9

A n a m n g e n _ 1 2 3

d r

u g

a l u e L i s t . o w l : K e s t i n e

v

r y

o

g

E x a m i n a t i o n _ 1 2 3

a l u e L i s t . o w l :J a

v

t e

r

s y m p P e

v

A n a m n l o c _ 1 2 3

C

a

s y m p N o w

s

a l u e L i s t . o w l : B e s k s m a k

v

h a

d d

D i a g _ 1 2 3 i a g T e n t a l u e L i s t . o w l :T a n s t e n

v

r r

T e a t _ 1 2 3 t e a t S u g g a l u e L i s t . o w l :A a k t a

v v v

Figure 12: A fictive example of an examination instance. Examination 123 is an instance of an Examination subclass defined in an examination template. Each of the PatientData 123, Anamn-gen 123, and so on, are instances of subclasses of ExaminationCategory, which are also de- fined in the examination template, along with the properties/inputs “going out” from these categories. These properties/categories point to instances found in the value list ontology, valueList.owl, except for ‘39’, which is an int, since age is a datatype property. is Eyeball (see Sec. 3.3), which makes closed world assumptions for checking RDF models common problems.

4.6 Representing Aggregates

In keeping with the allergy example, it may be of value to group the different allergies into different categories to see if there is any relation between these categories and certain mucous membrane changes in the mouth. In the MedView representation such value classes were seen as an abstraction of values, but which are in themselves values. In the SOMWeb representa- tion, this is mainly done by subclassing the values in the value list ontology, and making the appropriate individual values instances of this subclass. Above, the PeanutAllergy instance of the class Allergy has been mentioned as an example of how instances are used to represent values. In the current version of the Value List Ontology, there is no extra information about the PeanutAllergy, we just know that it is an instance of allergy. We can make the value list more specific by stating that PeanutAllergy is an instance

38 of the class FoodAllergy. Additionally, we might want to declare that PeanutAllergy is related to the class Peanut (which we currently have no representation of). Relations to e.g., SNOMED could be useful here.24

4.7 End-user Input

As described in Sec. 4.4, the initial attempts at remodeling aimed at adding subclass structure to the term values right away. However, given the amount of manual work needed to do so, it was decided that this would have to be done later, preferably by giving the domain experts tools to do this.With OWL we have the possibility of adding such structure in a later version of the value list ontology, and this ability to add structure after the fact can be seen as an argument for using this representation. We could then first state that PeanutAllergy is of class Allergy, and later state that it is of class FoodAllergy. Such subclassing of term classes could be very useful for reasoning, querying, and visualizing. An appealing reason for using an RDF-based representation is the built in ability to add sup- port for multiple languages. While this is an enticing possibility, it requires that a translation is made. Parts of this can be done using a dictionary, but there will be terms which are not in a dictionary and someone will still have to check that an appropriate translation has been made. Since the approach of translating the old term values to OWL was chosen, we still have many of the inconsistencies from the old value lists. These oddities include that there is a separation so that there are two separate lists of diseases, one for the term Dis-now and one for the term Dis-past. It would seem appropriate that the same was used in both cases. However, for handling this, an interface for the end-users must be constructed.

5 Using the Ontologies

This section describes how the existing MedView code has been adapted to using the OWL examination templates, the OWL value lists, and to outputting examination records in RDF. The OWL examination templates and value lists are used for generating input forms for examination data from the SOMWeb online community. Screenshots of the community is shown in Fig. 13. The adaptations using OWL examination templates were made in the model package of the SOMWeb Java code (described in Sec. 5.1), while the adaptations to using OWL value lists and writing RDF examination records were made to the MedView datahandling package (described in Sec. 5.2). For accessing and manipulating OWL and RDF in Java, the Jena API was used.

24SNOMED stands for Systematized Nomenclature Medicine. It is a standardized vocabulary system for medical databases.

39 Figure 13: The figure shows screenshots of some key parts of the SOMWeb community: overview of cases at a meeting (top), case presentation with pictures and text description generated from examination data (left), and part of an examination data entry form (right). All text is in Swedish.

5.1 Constructing Input Forms from OWL Examination Templates

The SOMWeb community is built on Java Enterprise technology, using Apache Tomcat25 as the core web container. The system is an extension of the Apache Struts Model-2 web appli- cation framework26. Model-2 frameworks (a variation of the classic Model-View-Controller (MVC) design paradigm) are based on the idea that Java Servlets execute business logic while presentation resides mainly in server pages. The SOMWeb system is a layered architecture, conceptually divided into four main layers –

25http://tomcat.apache.org 26http://struts.apache.org/

40 the view layer, the session layer, the model layer, and the foundation layer as depicted in Fig. 14. The view layer is comprised of Java Server Pages (JSP) using Expression Language (EL) constructs, with custom tags and functions in addition to tags from the Java Standards Tag Library (JSTL) and the various Apache Struts tag libraries. Styling and layout of content is done using Cascading Style Sheets (CSS). The session layer has components dealing with the current user session, and is responsible for transforming the application’s internal state into the presentation JavaBeans used by the server pages. The model layer has components making up the application’s internal state, and is roughly divided into the major functional areas provided by the system. Here we also have persistence classes, which read the RDF-files for users, meetings, cases, and news and creates objects of the corresponding Java-classes used by the system.

Figure 14: Overview of the SOMWeb system architecture. The model layer contains persistence classes that read RDF-files for users, meetings, cases, and news and constructs objects of the corre- sponding Java classes used by the system. These classes are also used for making changes to the RDF model and writing to file.

The SOMWeb online community allows its users to select different examination templates to use for entering individual examinations. Such examination templates could originally only be in the MedForm XML format, but adaptations have been made so that OWL exami- nation templates can also be used. These adaptations are made mainly by creating imple- mentations of the FormPersistence and FormReader interfaces which read OWL templates: FormPersistenceOWL and OWLFormReader. Also, the FormUtilities class has an added createRdfExaminationModelFromForm method, complementing its existing createExamina- tionTreeFromForm method.

5.2 MedView Datahandling

In the MedView application suite, almost all datahandling functions have been pulled out into a common package called Medview datahandling. The naming of the package is historical,

41 and it has grown to handle more than just the data of examinations. It now also includes the handling of terms and values, the templates and translators used in automatically generating summaries of examinations, the parsing of patient identifiers, and language handling. This has lead to the package being further subdivided into the subpackages aggregation, examina- tion, images, queries, and termValues. The MedViewDataHandler class, a singleton-accessed facade in terms of design patterns, is the single access point for outside packages. There are separate datahandlers for examinations (with a corresponding interface ExaminationData- Handler) and for terms (with a corresponding interface TermDataHandler).

5.2.1 Handling Examinations

The ExaminationDataHandler interface defines methods for getting patient and examination objects, and for saving examinations. The class implementing this method that was hithertho mostly used is named MVDHandler. A corresponding implementation was created, called SWDHandler (SOMWebDataHandler, or Semantic Web DataHandler if one wants). Quite a few of its methods are taken directly from the MVDHandler, except those that deal directly with the RDF-files. The Tree has been an important data structure previously, with subclasses of TreeNode and TreeBranch. The new class for representing examinations is RdfExaminationModel. To be able to pass objects that are both of class Tree and RdfExaminationModel from the SOMWeb community to the MedViewDataHandler class, an interface MedViewExamModel is defined, which both the Tree class and the RdfExaminationModel implements. Many of the methods used in the MVDHandler used a class called MedViewUtilities, located in medview.common.data. Methods for constructing ExaminationValueContainers (which are basically a hashmap of terms and their values for a given examination of a given patient) from an RdfExaminationModel, for loading RDF models, and for saving them are found here, with their corresponding methods for handling Trees. Since both Trees and RDFExamina- tionModels use ExaminationValueContainers, these can be used to convert examinations in the tree file format to the RDF file format.

5.2.2 Handling Terms

The TermDataHandler interface is implemented by the abstract class AbstractTermDataHan- dler. In addition to the previous implementation, ParsedTermDataHandler, we now have the RdfTermDataHandler. These classes read the value list files to hashmaps, and allow for ac- cessing terms and values. The values are stored in a Java Hashtable where the key is the term name, and the value is a vector of values for the corresponding term. In RdfTermDataHan- dler, the terms are found by getting all the named OWL classes in the value list OWL-file, which is done by using a Jena method for listing the named classes of an ontology model For each of these classes, we then another Jena method, for listing the individuals of a given class. The Swedish rdfs:label of these individuals are then added to the vector of values for the given term. Since RDF is unordered, these vectors are sorted alphabetically before being added to the Hashtable for the term.

42 6 Discussion

In deciding to remodel the clinical knowledge of MedView, several requirements for an ontol- ogy of clinical knowledge in oral medicine were identified (Sec. 2.2). We begin by rexamining these points in light of the described development of the SOMWeb ontologies (Sec. 4), followed by a discussion of our experiences in using OWL. Benefits and constraints of starting from an existing model are then considered, as well as trade-offs between end-user control on one hand, and reuse and standardization on the other. Finally, we relate this work to standards in medical informatics

6.1 Results in Relation to the Requirements for an Oral Medicine Ontology

Utilization of External Sources While the possibility of reuse is attractive, we found that it was quite difficult to locate relevant ontologies to reuse. Part of this was lack of appropriate services for finding ontologies of use27, and partly due to a lack of ontologies fitting our needs. The approach taken was trying to find ontologies that included the concepts that we wanted to use. A problem may have been naming, and there could exist ontologies that fits our needs, but which name the important concepts differently. Once a candidate ontology for reuse is located, it should be evaluated according to our needs. Since not many relevant ontologies were found, there was never much need for such an evaluation. One problem with this approach is that a relevant ontology may become available after this initial search is done. An alternative The problems of reuse are widely known (e.g., [37]), and we have indeed found that reuse is difficult, both on the level of finding and on the level of evaluating for use. Further, in the case of an application, such as the SOMWeb community, where the users want to have a high level of control over what kind of data they collect, how can this need be balanced with the possibilities of interoperability which may be gained from reuse? While our problems were partly from not finding relevant ontologies in OWL to reuse, it might not be entirely alleviated by more ontologies being published. It has been noted that [38] “As more ontologies become available, it becomes harder, rather than easier to find an ontology for reuse.” This relates to the time and effort needed to evaluate ontologies which may vary much with regard to level of detail and quality, and there are few available objective measures to determine ontology quality. Relations Between Conceptual Models of Clinical Concepts The remodeling work has value in itself, as it meant that many of the structures of the knowledge model had to be thought through further. Indeed, one of the reasons often stated for developing an ontology is to elaborate on a common conceptual model [39]. Capturing Interactions Between Different Parts of the Ontology This requirement, having to do with representing that e.g., when a clinician is entering values in an ex- amination form, a certain answer to a specific question triggers another question, has not been studied within this work, but it would probably be most appropriate to use

27Though Swoogle (http://swoogle.umbc.edu/) is a good resource for this.

43 the Semantic Web Rule Language (SWRL)28 for this, if one wants to stay within the sphere of Semantic Web technologies. Stronger Typing of Elements Here we get the benefit of working within a larger frame- work, where structures for this is already in place. However, the consequences of the OWA on validation means that we do not get constraints in the way that we had ex- pected. Capturing Examination Template Meta-data Using OWL provides good possibilities for capturing meta-data regarding creators and purpose of different examination tem- plates. Localization of Data It is possible to provide different language-based versions of the parts of the examination template and the values that can be used, by utilizing the xml:lang of rdfs:label. However, such translations have not been provided, as much manual work is necessary to provide such translations, though dictionaries could be used to ease the effort. The issue of localization and translation can also be related to the topic of reuse. It would be convenient to be able to reuse value lists, but an appropriate translation to Swedish would be needed. Differentiating between different ‘views’ of the underlying data While this has not been rigorously addressed in this work, it seems feasible that the RDF graph is a good starting point for such work, where we can have the ability to ‘pull’ the graph from different nodes (thus making them starting points), and thereby providing different perspectives. Representing Data Ranges Though not listed as a requirement, the possibility to define data ranges would have been useful for e.g., representing VAS values. However, as discussed in Sec. 3.4.5, there is currently no support for this in OWL. This is not something that we had anticipated when we considered using OWL.

6.2 Our Experiences in Using OWL

In deciding to work with a standard (or in W3C terms, a recommendation) for knowledge representation, we increase our possibilities of knowledge sharing and get the option of using general tools developed by others. In using a Semantic Web technology, we also get the prospect of being part of a distributed knowledge-base. However, using OWL has not been without issues. This work on the remodeling of the MedView knowledge model of examination templates and value lists has taken place over the course of approximately two years (fall of 2004 to fall of 2006). At the beginning of this process, OWL was a fairly recent recommendation (it was issued in February of 2004). Though ontology research has been an active research field since the 90’s and the precursors of OWL (i.e., OIL, DAML, and DAML+OIL) provided initial expertise, we still experienced a lack of guidance of appropriate use of many constructs. Some of these problems could be said to hold for knowledge modeling in general. However, in the case of OWL, much of the more practical guidance was to be found in mailing lists,

28http://www.w3.org/Submission/SWRL/

44 wiki’s, and W3C working group notes. While these are public, their existence and use might not be obvious to the novice developer. It was therefore edifying to read the proceedings of the OWL: Experiences and Directions workshop (from which much of the background in Sec. 3.4 is gathered), both to gain insight about how others have used OWL and the problems they have had. An example of the latter is the remarks of the biopathways ontology working group, who found that despite having a thorough background in biological knowledge representation, bioinformatics, software engineering, and database design, they faced many challenges [26]. It is apparent when working with OWL that it, and especially how it should best be used, are still under development. This is also reflected in the W3C Working Group Notes, which are by their very nature works in progress. One such note that has been very useful for this work is “Defining N-ary Relations on the Semantic Web” [32]. In the current draft of this document, dating from April 12, 2006, we find the following note on how to represent units: “For a discussion on how to represent units and quantities in OWL, please refer to a different note (to be written)”. The Best Practice Working Group from which this note resulted concluded on September 29, 2006. When we first started looking at RDFS and OWL, we expected to be able to specify a schema, which would be used to validate examination records, in addition to being able to give more elaborate definitions of what to include in an examination. As it turns out, this kind of validation is not available without making several assumptions, as described in Sec. 3.4.3. This is something that should be made clearer to newcomers to OWL. Further, there is a need to be able to provide schema functionality within the Semantic Web framework. The issue of whether or not to use domain and range took quite some time to decide upon. Some of this comes down to whether or not there is one, correct way to use OWL, and how important it is to follow this way. Connected with this is that it is difficult, when in the process of creating an ontology, to decide when the ontology is elaborate enough to be used. It is compelling to want to create the ‘perfect’ ontology, one which matches the world it seeks to describe completely. Combined with lack of support for some design choices in OWL, it becomes very difficult to determine when the ontology is ready to be deployed. In Sec. 3.4.11, we consider, among other things, whether or not there are valid use cases for using only subsets of OWL. Knublauch et al. [27] see a major benefit in OWL’s breadth, in that it offers a “migration route from entry level, hand-crafted taxonomies of terms, to well defined, normalized ontologies capable of supporting reasoning.” We concur with this, and there is a great need for support for such a route, where the developer is given guidelines for what is the appropriate way of representing these ‘entry-level’ ontologies, so that they are coherent and so that the path to more well-defined ontologies is clear or even possible. This will probably be necessary for wide-spread use of OWL, since it cannot be expected that all Semantic Web developers are or want to become Description Logics experts. In an assessment of RDF and OWL modeling [40], it is found that one of the largest weaknesses of RDF and OWL are their relative unfamiliarity, and that it is unclear in the existing literature how much of the background theory of for example DL is needed to develop successful applications.

45 6.3 Benefits and Constraints of Starting from an Existing Model

The development of the SOMWeb ontologies was strongly based on the knowledge model of MedView. This includes the structure of the examination templates, the structure and content of the value lists, as well as the naming of these structures. In addition to being influenced by the previous representation, the created ontologies were also affected by the organization of the existing code base, such as how and where the templates, term definitions, and term values are processed. For example, the different input types were represented explicitly in the examination template ontologies, as this greatly simplified the processing of the templates within the MedView code. Taking a previous representation as a starting point for ontology creation simplifies the devel- opment, as parts of the knowledge acquisition has already been carried out, and consideration has already been given to the structuring of this knowledge. At the same time, this means that the ontology is developed within certain constraints, and cannot adhere entirely to the new representation paradigm. Further, by developing within an existing code base, we get access to software functions in which the newly developed ontologies can be used, and to which users are accustomed. At the same time, this leads to compromises in the developed ontology, that come from previous code design decisions.

6.4 End-User Control and Standardizations

A general tenet of MedView has been to provide a means for the end users themselves to formulate what to include in an examination template and what values should be contained in the value lists. This user-created knowledge content was taken as a starting point for the SOMWeb ontologies. However, while the value lists have had the ambition of harmonizing the values used in examination records, the users have been able to add values to the lists as they see fit. This ability gives the clinicians control over the value lists, and this control can be seen as a factor motivating use through easing the clinician in filling out the examination forms. As a result, the value lists contain duplicates (different spellings and different names for the same thing), and values that were mistakenly entered. When beginning to remodel the value lists, initial attempts were made by the author to structure and ‘clean up’ these value lists. However, it was realized that, given the size of the value lists, this would mean spending much time on more or less manual work, by a person who is not a domain expert. We therefore decided that all previous values would be included as is, and that the clinicians should be given tools to carry out such ‘clean up’ and structuring by themselves. Such a tool has to be presented to the users in such a way that they see a value in structuring, as it might not seem a natural part of their work tasks. One way to achieve this is for the structuring to be done as a part of the analysis of the collected data, where the clinicians may be most motivated to do it, which is the one currently used in MedView. Another way to get more structured value lists would be to reuse external resources. For example, SOMWeb could reuse an externally developed allergens ontology. One obstacle to reuse is, as mentioned, locating appropriate ontologies in the desired formalism. A second obstacle is that the most appropriate ontology might not be at the right level of detail for

46 our users. This is a general problem of using medical classifications, and an aspect of knowl- edge structuring which can probably never be gotten around completely. At the same time, technological solutions could simplify this. A third obstacle to reuse in our case is that it is slightly at odds with the possibilities of end-user control.

6.5 Standards in Medicine

There are various efforts of standardization in medicine, both for terminologies and for repre- senting patient records, to the extent that it is difficult to properly navigate the landscape of them. We now look more closely at initiatives that are related to modeling clinical knowledge, such as patient records and terminologies, which are relevant to the work described.

6.5.1 Comparison with the openEHR Approach

As an initiative for representing and structuring EHRs, the openEHR project [41] has its origins in the Good European Health Record project, which resulted in identifying the im- portant requirements for an electronic health record as clinical comprehensiveness, semantic sophistication, sharing, computability and implementability, open standards. Key among openEHR’s ideas is that domain and technical concerns should be separated, which means that domain concepts should be removed from concrete software and database models and put in independently managed and standardized vocabularies and libraries of domain con- cept models. Further, the software and database should use a generic reference model system architecture, made to process information from externally provided domain definitions. For defining domain concepts, archetypes are used, since the term connotes an original model or typical specimen. The archetypes are constraint-based definitions of business concepts partic- ular to a certain domain. The openEHR foundation publishes specifications, as well as builds reference implementations of them, as open source software. Its specifications include infor- mation and service models for the EHR, demographics, clinical workflow, and archetypes. These are designed to be the basis of a medico-legally sound, distributed, versioned EHR infrastructure. If we had chosen to use openEHR for remodeling the MedView knowledge model, we would have gained access to a more elaborate framework for handling health records. However, in practice this framework is still under development. For example, the 1.0 version of the specification was released in February, 2006 [41]. Further, openEHR intends only to provide specifications for implementations, not the implementations themselves. The project links to some implementations of different parts of the framework, but there is no obvious package to download to get started. However, given the release of the 1.0 specification and the continued development by the openEHR community, were we to make the choice now, using openEHR would have been given strong consideration. In using OWL and RDF for our examination records, in addition to using them for represent- ing the community data, we get a community that is grounded in Semantic Web technologies. While the remodeling described above has not provided a format which adheres to the pro- posed standards of patient records, there is improvement in that the RDF-files are not in a proprietary format as the tree files were. They can thus be processed by any Semantic Web

47 application, leading to more opportunities for information exchange and to leverage externally developed tools. While there have been initiatives to represent the archetypes of openEHR in OWL, e.g., [42], it is not the recommended approach, as there are, for example, constraints that can be represented in the Archetype Definition Language which cannot be represented in the current version of OWL.29

6.5.2 External Classifications

In addition to considering the structure of examination templates and examination records, there are also classifications and terminologies which could have been used as value lists. One commonly used classification is the International Classification of Diseases (ICD). It is pub- lished by the World Health Organization and has a clinical adaptation (ICD-9-CM). Diseases are divided into categories with a common characteristic, and each category is sub-divided into hierarchical levels that enable a precise diagnosis. Another classification is SNOMED (Systematized Nomenclature of Medicine), where terms from different categories (there are eleven main axes, among them are topography, morphology, social context, and disease) are juxtaposed. In the dental domain, the America Dental Association provides both a Current Dental Ter- minology (CTD), which is updated regularly but limited to treatments and procedures, as well as a Systemized Nomenclature of Dentistry (SNODENT), which is an effort to create a comprehensive dental vocabulary. However, it has been found that it needs improvements in content, quality of coding, and quality of ontological structure [43]. In the MedView representation, ICD codes are included in some of the names of diagnosis values. In the SOMWeb representation, such codes can be included as properties of instances of the Diagnosis class (see example in Sec. 4.4.1). Preferably, ICD as a whole could have been reused, but it is not available in OWL. Our approach has the advantage of only providing the subset of values that our users are interested in. SNOMED lacks many of the concepts relevant to oral medicine, and unfortunately SNODENT is also lacking in content, both according to [43] and our domain experts. Reusing classifi- cations presents difficulties in that some medical entities will seem insufficiently detailed to some specialists, and too complex to use for others. Further, it is in conflict with requirements of end-user control, which we return to later, in Sec. 6.4.

7 Conclusions

This report describes how the W3C recommendations OWL and RDF can be used for repre- senting clinical knowledge in oral medicine. Contributions of this work are: the remodeling of the knowledge content of MedView, previously in a proprietary format, using OWL and RDF;

29See for example the following note from an openEHR mailing list: http://www.openehr.org/advice/ openehr-technical/msg00966.html

48 the use of these ontologies in an online community for distributed knowledge management in oral medicine; and an experience report of using these recommendations. From the limitations of MedView’s original knowledge model, requirements for a new model were identified. Of these requirements, OWL and RDF have proven useful for capturing meta-data, different language versions, and providing a stronger typing of elements. While using these recommendations give better opportunities for knowledge reuse in theory, it was found that in practice it was difficult to find ontologies to reuse, which were at an appropriate level of conceptualization and were available in OWL. For accommodating the requirement of supplying different views on the data, the graph structure of RDF can be used, though this has not yet been exploited. Further, the process of remodeling has given opportunities for elaborating the conceptual models of MedView, which was another requirement. Of the requirements identified, the only one which cannot be readily handled is that of capturing interactions between different parts of the examination template, for which using SWRL would be more appropriate. We examined how clinical software can be supported by ontologies by adapting existing MedView and SOMWeb classes to handling examination templates in OWL and writing examination records in RDF. This lead to some reworking of the ontology to better fit the way that MedView handles data, while some abstractions had to be made in the old code to handle the new concepts. In using OWL and RDF it was found that, while there is much useful support material available, there is some lack of support for important design decisions and best practice guidelines are still under development. Much practical information is contained in informal knowledge sources, such as mailing lists, of which new developers may not be aware. Finally, there is a lack of recommendations for what is necessary in developing ontologies with different levels of sophistication and guiding materials at an intermediary level. At the same time, using OWL gives us access to a potentially beneficial array of externally developed tools, the ability to come back and refine the knowledge model after initial deployment, as well as the possibility of, post development and use, aligning the developed ontology with that of potential collaborators.

8 Future Work

In continuing and expanding on the remodeling of MedView’s knowledge model, we need to provide tools for the clinicians to be able to add detail to and refine the ontologies described above. The interface for allowing users to define examination templates and terms has not been adapted to the new representation. Parts of this should be fairly straightforward, since in the base case we can just use a MedForm XML template and translate it into the OWL format, as we have described above. However, since the new OWL representation supports some new constructs, the interface will have to be adapted to reflect this. Since we are now able to add more detail and structure to the value lists and aggregates, a whole new interface will probably need to be built, where one of the main functions would be to provide subclass relations. There has yet to be a thorough evaluation of the developed ontologies, either from a technical

49 perspective or from a user perspective. However, in some ways they have gone through an iteration of evaluation and adjustments by using them in the MedView datahandling and the SOMWeb community. To demonstrate the added value from using Semantic Web representations, it would be interesting to look at how available tools for browsing RDF can be used with content of SOMWeb. As stated above, the requirement of representing interrelations between parts of examination templates could be handled using SWRL, which has yet to be done. In constructing knowledge management tools, we have to keep in mind that different peo- ple will have different views of the same concept. It has been suggested that a Pragmatic Web [44] be added to the Semantic Web, to address the problems associated with trying to produce ‘the’ description of a joint reality, by using a pragmatic approach which allows for contradictions and different views on the same phenomenon. Proponents of the Pragmatic Web propose that this be done by adding a pragmatic context – which consists of both a common context and a set of individual contexts – to the semantic resources. We believe that pragmatic contexts could be useful for distributed knowledge management, and suggest that the SOMWeb community provides a good backdrop for investigating such contexts. Our experiences in using OWL and RDF have shown that more guidelines for using them are needed, especially at an intermediary level, where the developer has moved beyond the introductory materials but is not yet prepared for the more advanced nuances of using OWL. Further, methodologies for ontology development should cater to different levels of ontology sophistication, so that the developer knows what is necessary for reaching an intended result. Beneficial for this would be more cognitive support for making decisions in the modeling process. Finally, one rather trivial problem related to ontology reuse is that a relevant ontology may become available after the initial search is completed. An alternative to having to go back and repeat the search at certain intervals is to have an alerting tool, where the user could enter concept or domain names, and be notified if a relevant ontology is published.

50 References

[1] Rosenberg, W.M.C., Donald, A.: Evidence based medicine: An approach to clinical problem solving. Brit. Med. J. 310(6987) (1995) 1122–1126 [2] Sackett, D.L., Rosenberg, W.M.C., Gray, J.A.M., Haynes, R.B., Richardson, W.S.: Ev- idence based medicine: What it is and what it isn’t. Brit. Med. J. 312(7023) (1996) 71–72 [3] Jontell, M., Mattsson, U., Torgersson, O.: MedView: An instrument for clinical research and education in oral medicine. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 99 (2005) 55–63 [4] Gustafsson, M., Lindahl, F., Falkman, G., Torgersson, O.: Enabling an online community for sharing oral medicine cases using Semantic Web technologies. In: Proc. Int. Semantic Web Conf. (ISWC-06). (2006) 820–832 [5] Lindahl, F., Torgersson, O.: mGen – An open source framework for generating clinical documents. In: Proc. Medical Informatics Europe (MIE-05). (2005) 107–112 [6] Halln¨as, L.: Partial inductive definitions. Theoretical Computer Science 87(1) (1991) 115–142 [7] Falkman, G., Torgersson, O.: MedView: a declarative approach to evidence-based medicine. In: Proc. Medical Informatics Europe (MIE-02). (2002) 577–581 [8] Gustafsson, M., Falkman, G.: Representing clinical knowledge in oral medicine using ontologies. In: Proc. Medical Informatics Europe (MIE-05). (2005) 743–748 [9] Antoniou, G., Van Harmelen, F.: A Semantic Web Primer. MIT Press (2004) [10] Manola, F., Miller, E.: RDF Primer. W3C Recommendation (2004) Available at: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/. [11] McGuinness, D., van Harmelen, F.: OWL Web Ontology Language: Overview. W3C Recommendation (2004) Available at: http://www.w3.org/TR/2004/ REC-owl-features-20040210/. [12] Schneider, L.: Foundational ontologies and the realist bias. In: Proc. Workshop on Ref- erence Ontologies and Application Ontologies held in conjunction with the 26th German Conf. on AI. (2003) [13] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A.: Ontology library. Technical report, WonderWeb Deliverable D18 (2003) Available at: http://wonderweb. semanticweb.org/deliverables/documents/D18.pdf. [14] Gruber, T.: A translation approach to portable ontologies. Knowledge Acquisition 5(2) (1993) 199–220 [15] Borst, W.: Construction of Engineering Ontologies. PhD thesis, University of Twente, Enschede (1997) [16] Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: The making of a web ontology language. Journal of Web Semantics 1(1) (2003) 7–26

51 [17] Patel-Schneider, P.F., Fensel, D.: Layering the Semantic Web: Problems and directions. In: Proc. 1st Int. Semantic Web Conf. (ISWC-02). (2002) 16–29 [18] Minsky, M.: A Framework for Representing Knowledge. In: The Psychology of Computer Vision. McGraw-Hill (1975) [19] Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Else- vier/Morgan Kaufmann Publishers (2004) [20] Horrocks, I., Sattler, U., Tobies, S.: Practical reasoning for very expressive description logics. J. Interest Group in Pure and Applied Logic 8(3) (2000) 239–264 [21] Horridge, M., Knublauch, H., Rector, A., Stevens, R., Wroe, C.: A practical guide to building OWL ontologies using the Prot´eg´eOWL Plugin and CO-ODE tools. Tech- nical report, University Of Manchester (2004) Available at: http://www.co-ode.org/ resources/tutorials/ProtegeOWLTutorial.pdf. [22] Mutton, P., Golbeck, J.: Visualization of semantic metadata and ontologies. In: Proc. 7th Int. Conf. on Information Visualization (IV-03). (2003) 300–305 [23] Geroimenko, V., Chen, C.: Visualising the Semantic Web. Springer Verlag (2002) [24] Haarslev, V., M¨oller, R.: Racer: A Core Inference Engine for the Semantic Web. In: Proc. 2nd Int. Workshop on Evaluation of Ontology-based Tools in conjunction with 2nd Int. Semantic Web Conf. (2003) Available at: http://sunsite.informatik.rwth-aachen. de/Publications/CEUR-WS//Vol-87/EON2003 Haarslev.pdf. [25] Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ (1995) [26] Ruttenberg, A., Rees, J., Luciano, J.: Experience using OWL DL for the exchange of biological pathway information. In: Proc. OWL: Experiences and Directions Workshop 2005. (2005) Available at: http://www.mindswap.org/OWLWorkshop/sub37.pdf. [27] Knublauch, H., Horridge, M., Musen, M., Rector, A., Stevens, R., Drummond, N., Lord, P., Noy, N., Seidenberg, J., Wang, H.: The Prot´eg´eOWL experience. In: Proc. OWL: Experiences and Directions Workshop 2005. (2005) Available at: http://www.mindswap. org/OWLWorkshop/sub14.pdf. [28] Rector, A.: Defaults, context, and knowledge: Alternatives for OWL-indexed knowledge bases. In: Proc. Pacific Symposium on Biocomputing, World Scientific (2004) 226–237 [29] Liebig, T., Luther, M., Noppens, O., Paolucci, M., Wagner, M., von Henke, F.: Building Applications and Tools for OWL – Experiences and Suggestions. In: Proc. OWL: Ex- periences and Directions Workshop 2005. (2005) Available at: http://www.mindswap. org/OWLWorkshop/sub28.pdf. [30] Dean, M., Schreiber, G.: OWL Web Ontology Language: Reference. W3C Recommen- dation (2004) Available at: http://www.w3.org/TR/2004/REC-owl-ref-20040210/. [31] Rector, A., Drummond, N., Horridge, M., Rogers, J., Knublauch, H., Stevens, R., Wang, H., Wroe, C.: OWL pizzas: Practical experience of teaching OWL-DL: Common errors and common patterns. In: Proc. Int. Conf. on Knowledge Engineering and Knowledge Management (EKAW-04). (2004) 63–81

52 [32] Noy, N., Rector, A.: Defining n-ary relations on the Semantic Web. W3C Working Group Note (2006) Available at: http://www.w3.org/TR/swbp-n-aryRelations/. [33] Drummond, N., Rector, A., Stevens, R., Moulton, G., Horridge, M., Wang, H., Seiden- berg, J.: Putting OWL in order: Patterns for sequences in OWL. In: Proc. 2nd Workshop on OWL: Experiences and Directions. (2006) Available at: http://owl-workshop.man. ac.uk/acceptedLong/submission 12.pdf. [34] Ceusters, W., Smith, B.: Strategies for referent tracking in electronic health records. J. Biomedical Informatics 39 (2006) 362–378 [35] Clark, T., Martin, S., Liefield, T.: Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics 5 (2004) 59–70 [36] Good, B., Wilkinson, M.: The life sciences Semantic Web is full of creeps! Briefings in Bioinformatics 7(3) (2006) 275–286 [37] Paslaru Bontas, E., Mochol, M., Tolksdorf, R.: Case studies on ontology reuse. In: Proc. 5th Int. Conf. on Knowledge Management (I-Know-05). (2005) [38] Noy, N.: Order from chaos. Queue 3(8) (2005) 42–49 [39] Noy, N., McGuinness, D.: Ontology development 101: A guide to cre- ating your first ontology. Stanford Knowledge Systems Laboratory Techni- cal Report (2001) Available at: http://ksl.stanford.edu/people/dlm/papers/ ontology-tutorial-noy-mcguinness.pdf. [40] Reynolds, D., Thompson, C., Mukerji, J., Coleman, D.: An assessment of RDF/OWL modelling. Technical report, Hewlett Packard Labs (2005) Available at: http://www. hpl.hp.com/techreports/2005/HPL-2005-189.html. [41] OpenEHR foundation: Introducing openEHR (2005) Available at: http://svn. openehr.org/specification/TRUNK/publishing/openEHR/introducing openEHR. pdf. [42] Bicer, V., Kilic, O., Dogac, A., Laleci, G.: Archetype-based semantic interoperability of web service messages in the health care domain. Int. J. on Semantic Web and Information Systems 1(4) (2005) 1–23 [43] Goldberg, L.J., Ceusters, W., Eisnerc, J., Smith, B.: The significance of SNODENT. In: Proc. Medical Informatics Europe (MIE-05). (2005) 737–742 [44] de Moor, A.: Patterns for the Pragmatic Web. In: Proc. Int. Conference on Conceptual Structures. (2005) 1–18

53 A MedView XML Examination Template for Meeting Con- sultation

Fredrik Lindahl/Marie Gustafsson (translation) 2005-11-23 SOMWeb form for meeting consultations Admin General information CID Case ID Automatically generated by the system DID Consultation ID Automatically generated by the system Reg-person Registering care-giver Reg-clinic Registering clinic Meeting Meeting consultation Treat-sugg Suggested action/treatment Note-epi Epikris Collected judgement of the case

54 B SOMWeb OWL Examination Template for Meeting Con- sultation

A meeting consultation template for the SOMWeb online community. General information 1 1

55 1 1

56 Meeting consultation true true false true Suggested action/treatment false false true false Case ID Automatically generated by the system true true

57 false false Epikris Collected judgement of the case false true true false Registering clinic false true true false Registering care-giver false false true false Consultation ID Automatically generated by the system

58 C Part of the SOMWeb Value List

termValues multiple termValues regular termValues regular termValues multiple Eucardic termValues Inredningssnickare termValues Doktacillin termValues

59 D Example Examination Instance

somwebExam#CaseFormExam < somweb:hasExaminationCategory rdf:resource= "somwebExam#PatientSMW0000729771_060818235618"/> InterestingCase Mats Jontell 26 Ingen storre forandring

60