Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability

MEDINFO 2007 K. Kuhn et al. (Eds) IOS Press, 2007 © 2007 The authors. All rights reserved. Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability Rahil Qamara, Alan Rectora a Medical Informatics Group, University of Manchester, Manchester, U.K. Abstract Background Matching clinical data to codes in controlled terminolo- Archetype models gies is the first step towards achieving standardisation of In this paper, archetypes refer to the openEHR archetypes. data for safe and accurate data interoperability. The MoST Archetypes are being put forward by the openEHR organi- automated system was used to generate a list of candidate sation as a method for modeling clinical concepts. The SNOMED CT code mappings. The paper discusses the models conform to the openEHR Reference Model (RM). semantic issues which arose when generating lexical and Archetypes are computable expressions of a domain content semantic matches of terms from the archetype model to rel- model of medical records. The expression is in the form of evant SNOMED codes. It also discusses some of the structured constraint statements, inherited from the RM [1]. solutions that were developed to address the issues. The The intended purpose of archetypes is to empower clinicians aim of the paper is to highlight the need to be flexible when to define the content, semantics and data-entry interfaces of integrating data from two separate models. However, the systems independently from the information systems [1]. paper also stresses that the context and semantics of the Archetypes were selected because of their feature to sepa- data in either model should be taken into consideration at rate the internal model data from formal terminologies. The all times to increase the chances of true positives and internal data is assigned local names which can later be reduce the occurrence of false negatives. bound or mapped to external terminology codes. This feature eliminates the need to make changes to the model Keywords: whenever the terminology changes. medical informatics computing, semantic mapping, term SNOMED CT terminology binding, data interoperability SNOMED-CT, also referred to as SNOMED in the paper, Introduction aims to be a comprehensive terminology that provides clinical content and expressivity for clinical documenta- The field of medical informatics is growing rapidly and tion and reporting [3]. SNOMED has been developed with it informatics solutions to improve the health care using the description logic (DL) Ontylog [4] to allow for- process. One of the more important issues gaining the mal representation of the meanings of concepts and their attention of clinical and informatics experts is the need to inter-relationship [5]. SNOMED concepts are placed in a control the vocabulary used to record patient data. Termi- subsumption i.e., ‘is_a’ hierarchy. Two concepts may also nologies covering various aspects of medicine are being be linked to each other in terms of role value maps, and developed to bring about some standardisation of data. defining or primitive concepts7. The SNOMED hierarchy However, these terminologies are seldom integrated into is easy to compute, which was the primary reason for information systems which are used to capture data at var- selecting the terminology for the research. The July 2006 ious stages of the health care process. release of SNOMED was used for testing the mapping approach. It has approximately 370,000 concepts and 1.5 The paper briefly discusses a middleware application million triples i.e. relationships of one concept with developed to map back-end data model terms to clinical another in the terminology. terminologies. However, the main focus of the paper is to highlight the issues encountered when integrating the terminology and data models to achieve the common Data integration issues objective of interoperable patient data through the use of The aim of the research is to enable health care profession- controlled vocabularies. als to capture information in a precise, standardised, and Archetype models, commonly referred to as archetypes, reproducible manner to achieve data interoperability. Data and the SNOMED CT terminology system were used to interoperability can be defined as the ability to transfer test the mapping process. Therefore, the issues discussed data to and use data in any conforming system such that are focussed on these two modeling techniques. the original semantics of the data are retained irrespective of its point of access. Standardised data is critical to 674 R. Qamar et al. / Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability exchanging information accurately among widely distrib- can be categorised as observables or findings, among other uted and differing users. intended semantics, while the ‘total’ could be a qualifier value or an observable with a value. Therefore, it is impor- Matching clinical data to codes in controlled terminologies tant when working with two different models that loose is the first step towards achieving standardisation of data. semantics are maintained initially unless stated otherwise. The research focuses on accomplishing the task of perform- Too much reliance on the categorisation of terms in a par- ing automated lexical and semantic matches of clinical ticular model will result in fewer matches with terms model data to standard terminology codes using the Model categorised differently in another model. Strict adherence Standardisation using Terminology (MoST) system. A list to categories can only be maintained when working within of candidate SNOMED codes generated by MoST are pre- the same modeling environment as it can be assumed that sented to the modeler who chooses from the list the most the hierarchies are logically sound and classifiable. relevant codes to bind the archetype term to. However, the matching task is made difficult because of two main rea- 2) Determining the source of semantics: Another issue that sons. First, the size and complexity of the terminologies was commonly observed was determining the main source makes it difficult to search for semantic matches. Second, of the semantics of an archetype term. For example, in the the ambiguity of the intended meaning of data in both the ‘blood film’ archetype, the term ‘haemoglobin’ had a local data models and terminology systems results in inconsistent meaning of ‘the mass concentration of haemoglobin’, matches with terminology codes. The paper will focus on shown in Figure 1. The modeler was questioned whether the second issue and suggest some of the solutions that were he would prefer a match for ‘haemoglobin’ or ‘haemoglo- developed to resolve ambiguity of purpose and use of arche- bin concentration’. The suggestion was that a match for the type terms with respect to SNOMED. later term would be closer to its intended meaning. How- ever, based on this suggestion, the assumption of assigning In this paper, the term ‘modeler’ is used to denote a person more weightage to the term definition proved incorrect in with clinical knowledge who is engaged in the task of the ‘autopsy’ archetype. Conversely based on the above modeling clinical data for use in information systems. Four weightage, the term ‘cardiovascular system’ defined in the modelers were used to conduct the study and provide their model as 'findings of the pericardium, heart and large ves- feedback on the results of the mapping. The second author sels' should have resulted in ‘findings’ of the was also involved in the evaluation process. cardiovascular system instead of the ‘body structure’ Issues with archetype models itself. However, in this case the modeler stated that the use of the term was intended to serve as a label i.e. ‘body struc- Archetypes can be regarded as models of use. Archetypes ture’ rather than the actual values i.e. ‘findings’ to be based on the openEHR specification have four main entered during data-entry. Therefore, it is advisable not to ENTRY types i.e. Evaluation, Instruction, Action, and depend on any single semantic source when looking up Observation [12]. For the research, only Observation matches for an archetype term. archetypes were considered as they are the most commonly used model type with the maximum number of 3) Spelling errors leading to incorrect or no matches: examples. However, there is no strict guideline used to cat- Finally, there was the issue of resolving spelling errors in egorise archetypes. Also all archetype terms contained in a the model. For instance, ‘haemoglobin’ using U.K. particular archetype model do not necessarily belong to the English was spelt as ‘haemaglobin’ in the ‘blood film’ same archetype model category. For example, the archetype. Such spelling errors can give rise to incorrect or ‘autopsy’ archetype1 belongs to an Observation type. no results. However, the ‘cardiovascular system’ term contained in the model could belong to either the SNOMED category ‘body structure’ or ‘finding’ based on the intended meaning of the modeler. The SNOMED category ‘clinical finding’ is referred to as ‘finding’ and ‘observable entity’ is referred to as ‘observable’ in the paper. 1) Determining the semantics of use: The main issue encountered when looking up matches for archetype terms in SNOMED was the semantics of use. As stated earlier, there are no strict guidelines to categorise either the archetype models or the terms contained in them. The main reason being that archetype models are intended to be used as archetypical representations of a particular clinical sce- nario. In addition, archetype modelers do not always have SNOMED in mind when modeling clinical scenarios. For example, the recording of the apgar score of a neonate at 1, 5 and 10 minutes from birth will not only require assessment of the breathing, color, reflexes, heart rate, and muscle tone but also the total score calculated at the end of the assessment. Using SNOMED, the assessment terms Figure 1 - Haemoglobin-related issues in blood film archetype 1 openEHR archetypes obtained from http://svn.openehr.org/ knowledge/archetypes/dev/html/index_en.html 675 R.

Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability

Ontology and Information Systems

Word Senses and Wordnet Lady Bracknell

Using Latent Semantic Analysis to Identify Research Trends in Openstreetmap

SEMANTIC ANALYSIS of NATURAL LANGUAGE and DEFINITE CLAUSE GRAMMAR USING STATISTICAL PARSING and THESAURI By

Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies

Arguments About the Nature of Concepts: Symbols, Embodiment, and Beyond

Semantics and Computational Semantics∗

Low-Dimensional Embodied Semantics for Music and Language

6.863J Natural Language Processing Lecture 15: Word Semantics – Working with Wordnet

Semantic Networks John F