Semantic Mapping of Clinical Model Data to Biomedical Terminologies to Facilitate Interoperability
Total Page:16
File Type:pdf, Size:1020Kb
SEMANTIC MAPPING OF CLINICAL MODEL DATA TO BIOMEDICAL TERMINOLOGIES TO FACILITATE INTEROPERABILITY A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2008 By Rahil Qamar School of Computer Science Contents Abstract 17 Declaration 19 Copyright 20 Dedication 21 Acknowledgments 22 1 Introduction 24 1.1 The task of semantic interoperability . 25 1.2 The Research Agenda . 26 1.2.1 Research Objective . 27 1.2.2 Research Aim . 29 1.2.3 Research Question . 30 1.2.4 Research Contributions . 30 1.2.5 Significance of the Research . 31 1.2.6 The Research Programme . 32 1.3 Vocabulary and Notations . 35 1.4 Thesis Outline . 36 1.5 List of Publications . 37 2 Clinical Data Models and Terminologies 39 2.1 Why is the medical domain particularly complex? . 40 2.2 Problems facing interoperability standards . 42 2.3 Benefits of standardised data in health care . 43 2.4 Clinical Data Models . 44 2.4.1 Need for Structured Clinical Data Models . 45 2 2.4.2 Examples of Data Models . 46 2.4.2.1 The openEHR Archetype Models . 46 2.4.2.2 HL7 CDA documents . 47 2.4.3 History of openEHR Archetypes . 49 2.4.3.1 Archetype Modeling Features . 50 2.4.3.2 Purpose of Archetypes . 51 2.4.3.3 Reasons for selecting Archetypes . 52 2.4.3.4 Term Binding in Archetypes . 53 2.5 The openEHR Information Model . 54 2.5.1 The openEHR ENTRY submodel . 55 2.5.1.1 The Observation subclass . 57 2.5.1.2 Structure and Syntax of Archetype Models . 58 2.5.1.3 Archetype Example . 60 2.6 Clinical Terminologies . 63 2.6.1 Need for Clinical Terminologies . 64 2.6.2 Problems with clinical terminologies and their use . 65 2.6.3 Examples of Terminologies . 66 2.6.3.1 LOINC . 66 2.6.3.2 GALEN . 67 2.6.3.3 SNOMED CT . 68 2.6.4 History of SNOMED CT . 69 2.6.4.1 Reasons for selecting SNOMED CT . 71 2.6.4.2 The SNOMED CT Concept Hierarchies . 72 2.6.4.3 SNOMED CT Core Model . 74 2.6.4.4 SNOMED Pre and Post Coordination . 76 2.7 Critical Comparison with Comparable Models . 77 2.7.1 Terminology Comparison: SNOMED versus GALEN . 77 2.7.2 Data Model Comparison: Archetypes versus HL7 . 79 2.7.3 Additional Vocabulary . 82 2.8 Summary . 83 3 Related Work 84 3.1 Mapping from Data Source to Terminologies . 85 3.1.1 Projects using Medical narratives and free text . 85 3.1.1.1 RELMA . 85 3.1.1.2 MedLEE . 85 3 3.1.1.3 Comparative Overview . 87 3.1.2 Projects using Databases . 88 3.1.2.1 MediClass . 88 3.1.2.2 Comparative Overview . 88 3.1.3 Projects using Ontologies . 89 3.1.3.1 BioPAX . 89 3.1.3.2 Anchor-PROMPT . 89 3.1.3.3 Comparative Overview . 89 3.1.4 Projects using Clinical Data Models . 90 3.1.4.1 Ocean Terminology Service . 90 3.1.4.2 Comparative Overview of OTS . 90 3.1.4.3 HL7 Terminfo . 92 3.1.4.4 Critical Overview of Terminfo . 94 3.2 Other Recent Efforts . 94 3.2.1 SNOMED CT Subsets . 95 3.2.2 Code Binding Interface . 95 3.3 Summary . 97 4 The MoST Methodology and System 98 4.1 Background . 99 4.2 The MoST Methodology . 100 4.3 The MoST System: Semi-automated Term Finding and Mapping 101 4.3.1 The Four Archetypes used in Development . 103 4.3.2 The Preparation Step . 103 4.3.3 Tier 1: Automated Term Finding Process . 104 4.3.3.1 Step 1: Context Independent Methods . 105 4.3.3.2 Step 2: Context Dependent Methods . 108 4.3.3.3 Step 3: Filtering Process . 110 4.3.3.4 Step 3a: Filtering Rules . 111 4.3.4 Tier 2: Assisted Manual Data Mapping Process . 117 4.4 Summary . 119 5 Evaluation I: Quantitative Analysis 120 5.1 Evaluation Criteria . 121 5.2 Evaluation Methodology . 122 5.2.1 The Four Evaluation Archetypes . 123 4 5.3 Statistical Measures for Evaluation . 125 5.3.1 Sensitivity and Specificity . 126 5.3.2 Mapping Coverage . 129 5.3.3 Trust Score . 131 5.3.4 Inter-rater Reliability . 132 5.4 Summary of Evaluation Data Results . 133 5.5 Evaluation Phase 1: Comparative Quantitative Analysis . 134 5.5.1 Comparison of Sensitivity and Specificity . 135 5.5.2 Comparison of Coverage Rating . 136 5.5.3 Comparison of Trust Score . 138 5.5.4 Comparison of Inter-rater reliability . 140 5.6 Evaluation Phase 2: Comparative Data Analysis . 141 5.7 Summary . 141 6 Evaluation II: Qualitative Analysis 143 6.1 Issues with the Evaluation Archetypes and SNOMED . 144 6.1.1 The Histology Pap Archetype . 144 6.1.1.1 Issues with the Histology Pap archetype . 145 6.1.1.2 SNOMED Issues with Histology Pap Archetype . 149 6.1.2 The Tendon Babinski Archetype . 150 6.1.2.1 Issues with the Tendon Babinski archetype . 151 6.1.2.2 SNOMED Issues with Tendon babinski archetype 156 6.1.3 The Visual Acuity Archetype . 164 6.1.3.1 Issues with the Visual acuity archetype . 165 6.1.3.2 SNOMED Issues with Visual acuity archetype . 168 6.1.4 The Body Weight Archetype . 170 6.1.4.1 Issues with the Body weight archetype . 171 6.1.4.2 SNOMED Issues with Body weight archetype . 172 6.2 Revising the Histology pap archetype as a gold standard . 173 6.2.1 Summary of changes to Histology Pap archetype . 175 6.2.2 Evaluation of revised Histology Pap Archetype . 175 6.3 Summary . 177 7 General Issues, Criteria and Guidelines 178 7.1 Integration Issues and Modeling Criteria . 179 7.1.1 Issues Integrating the Archetype and SNOMED Model . 179 5 7.1.1.1 Different Modeling Techniques . 180 7.1.1.2 Data categorisation to reflect intended semantics 180 7.1.1.3 Semantic results as against lexical results . 181 7.1.1.4 Context information . 182 7.1.2 Issues with Archetype Models . 184 7.1.2.1 Categorisation of Archetypes . 184 7.1.2.2 Specific and General Content . 187 7.1.2.3 Context Information . 188 7.1.2.4 Naming Conventions . 190 7.1.3 Issues with the SNOMED CT Terminology Model . 191 7.1.3.1 Concept Categorisation . 192 7.1.3.2 Internal use of category names . 194 7.1.3.3 Disjoint Axioms . 194 7.1.3.4 Post-coordinated concepts . 197 7.2 Data Modeling Guidelines . 198 7.2.1 General Archetype Modeling Guidelines . 199 7.2.2 Observation Archetype Model Guidelines . 200 7.2.3 Procedure Archetype Model Guidelines . 201 7.3 Terminology Guidelines . 202 7.3.1 General Terminology Modeling Guidelines . 203 7.3.2 Concepts inclusion in SNOMED . 204 7.4 Summary . 206 8 Future Work and Conclusion 207 8.1 Summary of Research Work . 207 8.2 Implications for Archetypes and SNOMED CT . 209 8.3 Future Work . 211 8.3.1 Specific-to-general approach . 211 8.3.2 Inclusion of ancestor archetype fragments . 211 8.3.3 Inclusion of SNOMED grandchildren concepts . 213 8.3.4 Filtering based on SNOMED Concept Model . 213 8.3.5 Post-coordinated results . 214 8.3.6 Extending testing to HL7 CDA . 216 8.4 Conclusion . 216 Bibliography 218 6 A EXTRA TABLES AND FIGURES 228 A.1 HL7 Reference Information Model . 228 A.2 HL7 Clinical Document Architecture . 228 B EVALUATION DETAILS 231 B.1 Description of Evaluators . 231 B.2 First Phase Evaluation Process . 232 B.3 Phase 1 of Evaluation: Fleiss’ Kappa Inter-rater Reliability . 237 B.4 Phase 2 of Evaluation: Comparative Data Analysis . 243 B.5 Evaluation Results Sheet for the Archetypes . 253 7 List of Tables 2.1 Characteristics of Conventional versus Clinical Data Warehouses (Taken from [72]). ............................... 41 2.2 The nineteen SNOMED CT Concept hierarchies. 72 2.3 The eight most relevant SNOMED CT Concept Hierarchies. Note: The Context dependent category has been changed to Situation with explicit context since the July 2006 release. 73 3.1 Value set constraints using SNOMED CT for some of the major classes of the Act and Entity RIM classes. NOTE: ≤ implies ‘the class and any of its subtypes’ (Taken from [64]). 93 4.1 Total number of SNOMED (SCT) codes that were returned by MoST before and after the filtering process. 112 4.2 Total number of SNOMED (SCT) codes that were retained at the end of application of Rule 2 for the Autopsy archetype. 115 4.3 Total number of SNOMED (SCT) codes that were retained at the end of application of Rule 3 for the Autopsy archetype. 116 5.1 Symbols used to measure degree of reflexes in the tendon babinski archetype and the corresponding lexical annotations. 125 5.2 Example scores of the two SNOMED codes for the archetype fragment ‘cervical smear’, to illustrate the application of Criteria 1 and 2. Scores are between 0 and 10, and have been assigned by 5 experts. 130 5.3 Summary of.