Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability

Total Page:16

File Type:pdf, Size:1020Kb

Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability MEDINFO 2007 K. Kuhn et al. (Eds) IOS Press, 2007 © 2007 The authors. All rights reserved. Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability Rahil Qamara, Alan Rectora a Medical Informatics Group, University of Manchester, Manchester, U.K. Abstract Background Matching clinical data to codes in controlled terminolo- Archetype models gies is the first step towards achieving standardisation of In this paper, archetypes refer to the openEHR archetypes. data for safe and accurate data interoperability. The MoST Archetypes are being put forward by the openEHR organi- automated system was used to generate a list of candidate sation as a method for modeling clinical concepts. The SNOMED CT code mappings. The paper discusses the models conform to the openEHR Reference Model (RM). semantic issues which arose when generating lexical and Archetypes are computable expressions of a domain content semantic matches of terms from the archetype model to rel- model of medical records. The expression is in the form of evant SNOMED codes. It also discusses some of the structured constraint statements, inherited from the RM [1]. solutions that were developed to address the issues. The The intended purpose of archetypes is to empower clinicians aim of the paper is to highlight the need to be flexible when to define the content, semantics and data-entry interfaces of integrating data from two separate models. However, the systems independently from the information systems [1]. paper also stresses that the context and semantics of the Archetypes were selected because of their feature to sepa- data in either model should be taken into consideration at rate the internal model data from formal terminologies. The all times to increase the chances of true positives and internal data is assigned local names which can later be reduce the occurrence of false negatives. bound or mapped to external terminology codes. This fea- ture eliminates the need to make changes to the model Keywords: whenever the terminology changes. medical informatics computing, semantic mapping, term SNOMED CT terminology binding, data interoperability SNOMED-CT, also referred to as SNOMED in the paper, Introduction aims to be a comprehensive terminology that provides clinical content and expressivity for clinical documenta- The field of medical informatics is growing rapidly and tion and reporting [3]. SNOMED has been developed with it informatics solutions to improve the health care using the description logic (DL) Ontylog [4] to allow for- process. One of the more important issues gaining the mal representation of the meanings of concepts and their attention of clinical and informatics experts is the need to inter-relationship [5]. SNOMED concepts are placed in a control the vocabulary used to record patient data. Termi- subsumption i.e., ‘is_a’ hierarchy. Two concepts may also nologies covering various aspects of medicine are being be linked to each other in terms of role value maps, and developed to bring about some standardisation of data. defining or primitive concepts7. The SNOMED hierarchy However, these terminologies are seldom integrated into is easy to compute, which was the primary reason for information systems which are used to capture data at var- selecting the terminology for the research. The July 2006 ious stages of the health care process. release of SNOMED was used for testing the mapping approach. It has approximately 370,000 concepts and 1.5 The paper briefly discusses a middleware application million triples i.e. relationships of one concept with developed to map back-end data model terms to clinical another in the terminology. terminologies. However, the main focus of the paper is to highlight the issues encountered when integrating the ter- minology and data models to achieve the common Data integration issues objective of interoperable patient data through the use of The aim of the research is to enable health care profession- controlled vocabularies. als to capture information in a precise, standardised, and Archetype models, commonly referred to as archetypes, reproducible manner to achieve data interoperability. Data and the SNOMED CT terminology system were used to interoperability can be defined as the ability to transfer test the mapping process. Therefore, the issues discussed data to and use data in any conforming system such that are focussed on these two modeling techniques. the original semantics of the data are retained irrespective of its point of access. Standardised data is critical to 674 R. Qamar et al. / Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability exchanging information accurately among widely distrib- can be categorised as observables or findings, among other uted and differing users. intended semantics, while the ‘total’ could be a qualifier value or an observable with a value. Therefore, it is impor- Matching clinical data to codes in controlled terminologies tant when working with two different models that loose is the first step towards achieving standardisation of data. semantics are maintained initially unless stated otherwise. The research focuses on accomplishing the task of perform- Too much reliance on the categorisation of terms in a par- ing automated lexical and semantic matches of clinical ticular model will result in fewer matches with terms model data to standard terminology codes using the Model categorised differently in another model. Strict adherence Standardisation using Terminology (MoST) system. A list to categories can only be maintained when working within of candidate SNOMED codes generated by MoST are pre- the same modeling environment as it can be assumed that sented to the modeler who chooses from the list the most the hierarchies are logically sound and classifiable. relevant codes to bind the archetype term to. However, the matching task is made difficult because of two main rea- 2) Determining the source of semantics: Another issue that sons. First, the size and complexity of the terminologies was commonly observed was determining the main source makes it difficult to search for semantic matches. Second, of the semantics of an archetype term. For example, in the the ambiguity of the intended meaning of data in both the ‘blood film’ archetype, the term ‘haemoglobin’ had a local data models and terminology systems results in inconsistent meaning of ‘the mass concentration of haemoglobin’, matches with terminology codes. The paper will focus on shown in Figure 1. The modeler was questioned whether the second issue and suggest some of the solutions that were he would prefer a match for ‘haemoglobin’ or ‘haemoglo- developed to resolve ambiguity of purpose and use of arche- bin concentration’. The suggestion was that a match for the type terms with respect to SNOMED. later term would be closer to its intended meaning. How- ever, based on this suggestion, the assumption of assigning In this paper, the term ‘modeler’ is used to denote a person more weightage to the term definition proved incorrect in with clinical knowledge who is engaged in the task of the ‘autopsy’ archetype. Conversely based on the above modeling clinical data for use in information systems. Four weightage, the term ‘cardiovascular system’ defined in the modelers were used to conduct the study and provide their model as 'findings of the pericardium, heart and large ves- feedback on the results of the mapping. The second author sels' should have resulted in ‘findings’ of the was also involved in the evaluation process. cardiovascular system instead of the ‘body structure’ Issues with archetype models itself. However, in this case the modeler stated that the use of the term was intended to serve as a label i.e. ‘body struc- Archetypes can be regarded as models of use. Archetypes ture’ rather than the actual values i.e. ‘findings’ to be based on the openEHR specification have four main entered during data-entry. Therefore, it is advisable not to ENTRY types i.e. Evaluation, Instruction, Action, and depend on any single semantic source when looking up Observation [12]. For the research, only Observation matches for an archetype term. archetypes were considered as they are the most com- monly used model type with the maximum number of 3) Spelling errors leading to incorrect or no matches: examples. However, there is no strict guideline used to cat- Finally, there was the issue of resolving spelling errors in egorise archetypes. Also all archetype terms contained in a the model. For instance, ‘haemoglobin’ using U.K. particular archetype model do not necessarily belong to the English was spelt as ‘haemaglobin’ in the ‘blood film’ same archetype model category. For example, the archetype. Such spelling errors can give rise to incorrect or ‘autopsy’ archetype1 belongs to an Observation type. no results. However, the ‘cardiovascular system’ term contained in the model could belong to either the SNOMED category ‘body structure’ or ‘finding’ based on the intended mean- ing of the modeler. The SNOMED category ‘clinical finding’ is referred to as ‘finding’ and ‘observable entity’ is referred to as ‘observable’ in the paper. 1) Determining the semantics of use: The main issue encountered when looking up matches for archetype terms in SNOMED was the semantics of use. As stated earlier, there are no strict guidelines to categorise either the arche- type models or the terms contained in them. The main reason being that archetype models are intended to be used as archetypical representations of a particular clinical sce- nario. In addition, archetype modelers do not always have SNOMED in mind when modeling clinical scenarios. For example, the recording of the apgar score of a neonate at 1, 5 and 10 minutes from birth will not only require assess- ment of the breathing, color, reflexes, heart rate, and muscle tone but also the total score calculated at the end of the assessment. Using SNOMED, the assessment terms Figure 1 - Haemoglobin-related issues in blood film archetype 1 openEHR archetypes obtained from http://svn.openehr.org/ knowledge/archetypes/dev/html/index_en.html 675 R.
Recommended publications
  • Ontology and Information Systems
    Ontology and Information Systems 1 Barry Smith Philosophical Ontology Ontology as a branch of philosophy is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. ‘Ontology’ is often used by philosophers as a synonym for ‘metaphysics’ (literally: ‘what comes after the Physics’), a term which was used by early students of Aristotle to refer to what Aristotle himself called ‘first philosophy’.2 The term ‘ontology’ (or ontologia) was itself coined in 1613, independently, by two philosophers, Rudolf Göckel (Goclenius), in his Lexicon philosophicum and Jacob Lorhard (Lorhardus), in his Theatrum philosophicum. The first occurrence in English recorded by the OED appears in Bailey’s dictionary of 1721, which defines ontology as ‘an Account of being in the Abstract’. Methods and Goals of Philosophical Ontology The methods of philosophical ontology are the methods of philosophy in general. They include the development of theories of wider or narrower scope and the testing and refinement of such theories by measuring them up, either against difficult 1 This paper is based upon work supported by the National Science Foundation under Grant No. BCS-9975557 (“Ontology and Geographic Categories”) and by the Alexander von Humboldt Foundation under the auspices of its Wolfgang Paul Program. Thanks go to Thomas Bittner, Olivier Bodenreider, Anita Burgun, Charles Dement, Andrew Frank, Angelika Franzke, Wolfgang Grassl, Pierre Grenon, Nicola Guarino, Patrick Hayes, Kathleen Hornsby, Ingvar Johansson, Fritz Lehmann, Chris Menzel, Kevin Mulligan, Chris Partridge, David W. Smith, William Rapaport, Daniel von Wachter, Chris Welty and Graham White for helpful comments.
    [Show full text]
  • Word Senses and Wordnet Lady Bracknell
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER 18 Word Senses and WordNet Lady Bracknell. Are your parents living? Jack. I have lost both my parents. Lady Bracknell. To lose one parent, Mr. Worthing, may be regarded as a misfortune; to lose both looks like carelessness. Oscar Wilde, The Importance of Being Earnest ambiguous Words are ambiguous: the same word can be used to mean different things. In Chapter 6 we saw that the word “mouse” has (at least) two meanings: (1) a small rodent, or (2) a hand-operated device to control a cursor. The word “bank” can mean: (1) a financial institution or (2) a sloping mound. In the quote above from his play The Importance of Being Earnest, Oscar Wilde plays with two meanings of “lose” (to misplace an object, and to suffer the death of a close person). We say that the words ‘mouse’ or ‘bank’ are polysemous (from Greek ‘having word sense many senses’, poly- ‘many’ + sema, ‘sign, mark’).1 A sense (or word sense) is a discrete representation of one aspect of the meaning of a word. In this chapter WordNet we discuss word senses in more detail and introduce WordNet, a large online the- saurus —a database that represents word senses—with versions in many languages. WordNet also represents relations between senses. For example, there is an IS-A relation between dog and mammal (a dog is a kind of mammal) and a part-whole relation between engine and car (an engine is a part of a car).
    [Show full text]
  • Using Latent Semantic Analysis to Identify Research Trends in Openstreetmap
    Article Using Latent Semantic Analysis to Identify Research Trends in OpenStreetMap Sukhjit Singh Sehra 1,2,*, Jaiteg Singh 3 and Hardeep Singh Rai 4 1 Department of Research, Innovation & Consultancy, I.K. Gujral Punjab Technical University, Kapurthala, Punjab 144603, India 2 Department of Computer Science & Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab 141006, India 3 School of Computer Sciences, Chitkara University, Patiala, Punjab 140401, India; [email protected] 4 Department of Civil Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab 141006, India; [email protected] * Correspondence: [email protected]; Tel.: +91-985-595-9200 Academic Editors: Bernd Resch, Linda See and Wolfgang Kainz Received: 13 May 2017; Accepted: 28 June 2017; Published: 1 July 2017 Abstract: OpenStreetMap (OSM), based on collaborative mapping, has become a subject of great interest to the academic community, resulting in a considerable body of literature produced by many researchers. In this paper, we use Latent Semantic Analysis (LSA) to help identify the emerging research trends in OSM. An extensive corpus of 485 academic abstracts of papers published during the period 2007–2016 was used. Five core research areas and fifty research trends were identified in this study. In addition, potential future research directions have been provided to aid geospatial information scientists, technologists and researchers in undertaking future OSM research. Keywords: OpenStreetMap; research trends; latent semantic analysis; volunteered geographic information 1. Introduction OpenStreetMap (OSM), founded in 2004, provides a free editable guide to the world, available under an Open Database License (ODbL). The project is supported by Web 2.0 technologies, which enable more coordinated efforts among web clients, different users and content suppliers [1].
    [Show full text]
  • SEMANTIC ANALYSIS of NATURAL LANGUAGE and DEFINITE CLAUSE GRAMMAR USING STATISTICAL PARSING and THESAURI By
    SEMANTIC ANALYSIS OF NATURAL LANGUAGE AND DEFINITE CLAUSE GRAMMAR USING STATISTICAL PARSING AND THESAURI by Björn Dagerman A thesis submitted in partial fulfillment of the requirements for the degree of BACHELOR OF SCIENCE in Computer Science Examiner: Baran Çürüklü Supervisor: Batu Akan MÄLARDALEN UNIVERSITY School of Innovation, Design and Engineering 2013 i ABSTRACT Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the seman- tic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be accept- able by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems. (27 pages) ii SAMMANFATTNING Tjänster som beror på semantiska beräkningar av användares naturliga tal blir allt van- ligare. Beräkning av semantisk likhet mellan texter är problematiskt på grund av naturligt tals medförda tvetydighet. Syftet med det här examensarbetet är att visa hur en mening can jämföras med en fördefinierad semantisk Definite Clause Grammar (DCG).
    [Show full text]
  • Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies
    Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies Vandana Kabilan October 2007. A Dissertation submitted to The Royal Institute of Technology in partial fullfillment of the requirements for the degree of Doctor of Technology . The Royal Institute of Technology School of Information and Communication Technology Department of Computer and Systems Sciences IV DSV Report Series No. 07–013 ISBN 978–91–7178–752–1 ISSN 1101–8526 ISRN SU–KTH/DSV/R– –07/13– –SE V All knowledge that the world has ever received comes from the mind; the infinite library of the universe is in our own mind. – Swami Vivekananda. (1863 – 1902) Indian spiritual philosopher. The whole of science is nothing more than a refinement of everyday thinking. – Albert Einstein (1879 – 1955) German-Swiss-U.S. scientist. Science is a mechanism, a way of trying to improve your knowledge of na- ture. It’s a system for testing your thoughts against the universe, and seeing whether they match. – Isaac Asimov. (1920 – 1992) Russian-U.S. science-fiction author. VII Dedicated to the three KAs of my life: Kabilan, Rithika and Kavin. IX Abstract. Globalization has opened new frontiers for business enterprises and human com- munication. There is an information explosion that necessitates huge amounts of informa- tion to be speedily processed and acted upon. Information Systems aim to facilitate human decision-making by retrieving context-sensitive information, making implicit knowledge ex- plicit and to reuse the knowledge that has already been discovered. A possible answer to meet these goals is the use of Ontology.
    [Show full text]
  • Arguments About the Nature of Concepts: Symbols, Embodiment, and Beyond
    Psychon Bull Rev (2016) 23:941–958 DOI 10.3758/s13423-016-1045-2 THEORETICAL REVIEW Arguments about the nature of concepts: Symbols, embodiment, and beyond Bradford Z. Mahon1,2,3,4 & Gregory Hickok5 Published online: 9 June 2016 # Psychonomic Society, Inc. 2016 Abstract How are the meanings of words, events, and ob- formed; for those areas in which opinion is divided, we seek jects represented and organized in the brain? This question, to clarify the relation of theory and evidence and to set in relief perhaps more than any other in the field, probes some of the the bridging assumptions that undergird current discussions. deepest and most foundational puzzles regarding the structure of the mind and brain. Accordingly, it has spawned a field of Keywords Concept representation . Concept organization . inquiry that is diverse and multidisciplinary, has led to the Embodied cognition . Grounded cognition . Neural basis of discovery of numerous empirical phenomena, and has spurred concepts . Symbols . Semantic categories . Word meaning . the development of a wide range of theoretical positions. This Actions and objects special issue brings together the most recent theoretical devel- opments from the leaders in the field, representing a range of viewpoints on issues of fundamental significance to a theory Scope of the special issue of meaning representation. Here we introduce the special issue by way of pulling out some key themes that cut across the The contributions that form this issue of Psychonomic Bulletin contributions that form this issue and situating those themes in & Review are unified around the objective of developing and the broader literature.
    [Show full text]
  • Semantics and Computational Semantics∗
    Semantics and Computational Semantics∗ Matthew Stone Rutgers University Abstract Computational semanticists typically devise applied computer systems simply to work as well as possible, but their work enriches the study of linguistic meaning with important tools, resources and methods which I survey here. To get useful results requires systematizing models from semantics in computational terms. Key computational challenges involve the compositional analysis of sentence meaning; the formalization of lexical meaning and related world knowledge; and modeling context and conver- sation to connect semantics and pragmatics. Underpinning computational semanticists’ successes in all these areas, we find the distinctive roles of representation, problem solving and learning in computation. 1 Introduction Interdisciplinary investigations marry the methods and concerns of differ- ent fields. Computer science is the study of precise descriptions of finite processes; semantics is the study of meaning in language. Thus, computa- tional semantics embraces any project that approaches the phenomenon of meaning by way of tasks that can be performed by following definite sets of mechanical instructions. So understood, computational semantics revels in applying semantics, by creating intelligent devices whose broader behavior fits the meanings of utterances, and not just their form. IBM’s Watson (Fer- rucci, Brown, Chu-Carroll, Fan, Gondek, Kalyanpur, Lally, Murdock, Nyberg, Prager, Schlaefer & Welty 2010) is a harbinger of the excitement and potential of this technology. In applications, the key questions of meaning are the questions engineers must answer to make things work. How can we build a system that copes robustly with the richness and variety of meaning in language, and with its ambiguities and underspecification? The focus is on methods that give us clear problems we can state and solve.
    [Show full text]
  • Low-Dimensional Embodied Semantics for Music and Language
    Low-dimensional Embodied Semantics for Music and Language Francisco Afonso Raposoa,c, David Martins de Matosa,c, Ricardo Ribeirob,c aInstituto Superior T´ecnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001 Lisboa, Portugal bInstituto Universit´ariode Lisboa (ISCTE-IUL), Av. das For¸casArmadas, 1649-026 Lisboa, Portugal cINESC-ID Lisboa, R. Alves Redol 9, 1000-029 Lisboa, Portugal Abstract Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with respect to the overall semantics inherent to media artifacts, such as music and language excerpts. We propose to represent shared semantics using low-dimensional vector embeddings by jointly modeling several brains from human subjects. We show these unsupervised efficient representations outperform the original high-dimensional fMRI voxel spaces in proxy music genre and language topic classification tasks. We further show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces. Keywords: Semantics, Embodied Cognition, fMRI, Music, Natural Language, Machine Learning, Cross-modal Retrieval, Classification 1. Introduction scriptions can be captured by jointly modeling the statis- tics of neural firing of several similarly stimulated brains. The current consensus in cognitive science defends that In this work, we propose modeling shared semantics conceptual knowledge is encoded in the brain as firing pat- as low-dimensional encodings via statistical inter-subject terns of neural populations [1]. Correlated patterns of ex- brain agreement using Generalized Canonical Correlation perience trigger the firing of neural populations, creating Analysis (GCCA).
    [Show full text]
  • 6.863J Natural Language Processing Lecture 15: Word Semantics – Working with Wordnet
    6.863J Natural Language Processing Lecture 15: Word semantics – Working with Wordnet Robert C. Berwick [email protected] The Menu Bar • Administrivia: • Lab 4 due April 9 (Weds.); • Start w/ final projects, unless there are objections • Agenda: • Working with Wordnet • What’s Wordnet • What can we do with it? • Solving some reasoning problems: • Mending a torn dress • Enjoying a movie; What’s a shelf? • Implementing EVCA and Wordnet together 6.863J/9.611J Lecture 15 Sp03 Wordnet motivation But people have persistent problem. When they look up a word, especially a commonly used word, they often find a dozen or more different meanings. What the dictionary does not make clear are the contexts in which each of these different meanings would be understood. So we know what kind of information is required, but we have not yet learned how to provide it to a computer. (G. Miller, U.S./Japan Joint Workshop on Electronic Dictionaries and Language Technologies January 23--25, 1993.) 6.863J/9.611J Lecture 15 Sp03 What’s Wordnet? • Psychological motivation • Nouns, verbs, adjectives organized into (fairly) distinct networks of • Synonym Sets (synsets) • Each synset = 1 concept • Supposedly intersubstitutable within synset (“synonomy”) 6.863J/9.611J Lecture 15 Sp03 Practical motivation • What’s not in a dictionary? • Take example, like tree – “large, woody perennial plant with a distinct trunk” • What info is missing? 6.863J/9.611J Lecture 15 Sp03 Psychological motivation • Why these categories? • Words association: first word thought of drawn from
    [Show full text]
  • Semantic Networks John F
    Semantic Networks John F. Sowa This is an updated version of an article in the Encyclopedia of Artificial Intelligence, Wiley, 1987; second edition, 1992. Most of the text from 1992 is unchanged, but more recent material and references have been added. A semantic network or net is a graph structure for representing knowledge in patterns of interconnected nodes and arcs. Computer implementations of semantic networks were first developed for artificial intelligence and machine translation, but earlier versions have long been used in philosophy, psychology, and linguistics. The Giant Global Graph of the Semantic Web is a large semantic network (Berners-Lee et al. 2001; Hendler & van Harmelen 2008). What is common to all semantic networks is a declarative graphic representation that can be used to represent knowledge and support automated systems for reasoning about the knowledge. Some versions are highly informal, but others are formally defined systems of logic. Following are six of the most common kinds of semantic networks: 1. Definitional networks emphasize the subtype or is-a relation between a concept type and a newly defined subtype. The resulting network, also called a generalization or subsumption hierarchy, supports the rule of inheritance for copying properties defined for a supertype to all of its subtypes. Since definitions are true by definition, the information in these networks is often assumed to be necessarily true. 2. Assertional networks are designed to assert propositions. Unlike definitional networks, the information in an assertional network is assumed to be contingently true, unless it is explicitly marked with a modal operator. Some assertional networks have been proposed as models of the conceptual structures underlying natural language semantics.
    [Show full text]