Integrated Representation of Clinical Data and Medical Knowledge: An
Total Page:16
File Type:pdf, Size:1020Kb
INTEGRATED REPRESENTATION OF CLINICAL DATA AND MEDICAL KNOWLEDGE AN ONTOLOGY-BASED APPROACH FOR THE RADIOLOGY DOMAIN Heiner Oberkampf DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.) University of Augsburg Department of Computer Science Software Methodologies for Distributed Systems Software Methodologies for distributed systems October 2015 Integrated Representation of Clinical Data and Medical Knowledge An Ontology-based Approach Supervisor: Prof. Dr. Bernhard L. Bauer, Department of Computer Science, University of Augsburg, Germany Prof. Dr. Elisabeth André, Department of Computer Science, University of Augsburg, Germany Advisor: Prof. Dr. Sonja Zillner, Siemens AG Corporate Technology, Munich, Germany Date of defence of dissertation (oral examination): March 17th, 2016 Copyright © Heiner Oberkampf, Augsburg, October 2015 Contents I. INTRODUCTION, BASICS, RELATED WORK1 1. Introduction3 1.1. Problems and Challenges...........................6 1.2. Objectives and Approach........................... 11 1.3. Publications................................... 16 1.4. Outline...................................... 19 1.5. Acknowledgements.............................. 21 2. Basics 23 2.1. Knowledge Representation.......................... 24 2.1.1. Terminologies, Classification Systems and Ontologies.... 24 2.1.2. Description Logic and Reasoning................. 27 2.1.3. The Semantic Web Technology Stack............... 28 2.2. Biomedical Ontologies............................ 32 2.2.1. Unified Medical Language System................ 33 2.2.2. Open Biological and Biomedical Ontologies.......... 34 2.2.3. Other Ontologies, Classification and Coding Systems.... 38 2.3. Semantic Annotations............................. 40 3. Related Work 41 3.1. The Relation between Data Models and Reference Ontologies.... 42 3.1.1. HL7 Reference Information Model Version 3.......... 44 3.1.2. OpenEHR................................ 46 3.1.3. DICOM-Structured Reporting................... 48 3.1.4. OBO-based Models.......................... 48 3.1.5. Quantitative Imaging Biomarker Ontology........... 50 3.1.6. Other Data Models.......................... 51 3.2. Related Work Regarding the Case Studies................ 52 3.2.1. Measurement Extraction and Classification........... 52 3.2.2. Disease Ranking............................ 53 II. THE MODEL FOR CLINICAL INFORMATION 55 4. Model Foundation 57 4.1. Requirements of the Model Foundation.................. 58 4.2. Reused Ontologies............................... 60 4.2.1. Basic Formal Ontology........................ 60 4.2.2. Relation Ontology........................... 62 4.2.3. Phenotypic Quality Ontology................... 62 4.2.4. Units Ontology............................. 64 iii Contents 4.2.5. Information Artefact Ontology................... 66 4.2.6. Ontology for General Medical Science.............. 66 4.2.7. Ontology for Biomedical Investigations............. 67 4.2.8. Foundational Model of Anatomy................. 69 4.2.9. Ontology of Medically Related Social Entities......... 70 4.2.10. Other Reused Ontologies...................... 71 4.2.11. Summary of Reused Ontologies.................. 71 4.3. MIREOTing................................... 73 5. The Model for Clinical Information 75 5.1. General Modelling Decisions........................ 77 5.1.1. Definitions............................... 77 5.1.2. Don’t Repeat Yourself Principle.................. 77 5.1.3. Focus on Class Hierarchies..................... 78 5.2. Reference to Terminologies.......................... 80 5.2.1. Post-Coordination........................... 80 5.2.2. Bindings to Terminologies...................... 81 5.3. Classes and Properties............................. 83 5.3.1. General Properties.......................... 83 5.3.2. Roles................................... 84 5.3.3. Diagnosis................................ 85 5.3.4. Examinations.............................. 86 5.3.5. Reports................................. 87 5.3.6. Images and Image Regions..................... 90 5.3.7. Qualities and Units.......................... 91 5.3.8. Anatomical structures........................ 92 5.3.9. Scalar Range Specifications..................... 93 5.3.10. Measurements............................. 94 5.3.11. Clinical Findings........................... 98 5.3.12. RECIST Target Lesions........................ 108 6. Knowledge Models 111 6.1. Range Specifications.............................. 112 6.1.1. Range Specifications About Normal and Abnormal Anatom- ical Entities............................... 112 6.1.2. Range Specifications of Clinical Findings............ 115 6.1.3. Patient-Specific Range Specifications............... 115 6.1.4. Coverage................................ 116 6.2. Disease-Symptom-Examination Ontology................ 118 6.2.1. Diseases................................. 119 6.2.2. Clinical Findings and Symptoms................. 120 6.2.3. Examinations.............................. 123 6.2.4. Relations................................. 123 iv Contents III. USING MCI: STORAGE, INFERENCE AND ACCESS 127 7. Architecture 129 7.1. Overview..................................... 130 7.2. Annotator.................................... 132 7.2.1. Image Annotation........................... 132 7.2.2. Text Annotation............................ 133 7.3. Mapper...................................... 136 7.3.1. Mapping of Image and Text Annotations............ 136 7.3.2. Mapping Data from Relational Data Bases........... 138 7.4. Triple Store.................................... 140 7.5. Inference Component............................. 142 8. Inference 143 8.1. Overview..................................... 144 8.1.1. Inference with RDFS and OWL.................. 144 8.2. Resolution of Measurement-Entity Relations.............. 146 8.2.1. Overview of Resolution Algorithm................ 147 8.2.2. Knowledge-based Resolution Algorithm............ 148 8.3. Normality Classification........................... 151 8.3.1. Patient-specific Normality Classification............. 152 8.4. Linking Findings................................ 153 8.4.1. Unique Anatomical Entities..................... 153 8.4.2. Lymph Nodes and Lesions..................... 154 8.5. Propagation of Finding Information.................... 155 8.5.1. Mapping................................. 155 8.5.2. Łukasiewicz Three Value Logic.................. 156 8.5.3. Getting to a Consistent State of the Patient’s Findings Status 157 8.5.4. Determining the Status of Findings................ 159 8.5.5. Changing the Finding Status.................... 159 8.6. Ranking Likely Diseases........................... 161 9. Integrated Data Views 163 9.1. Disease-centric Snapshot View....................... 164 9.1.1. Findings-Status for one Disease.................. 164 9.1.2. Ranking of Likely Diseases..................... 164 9.1.3. Examination Planning........................ 165 9.2. Longitudinal Finding-centric View..................... 167 9.2.1. Type of Clinical Finding....................... 167 9.2.2. Anatomical Entity........................... 168 9.2.3. Quality.................................. 168 9.2.4. Location................................. 169 v Contents IV. CASE STUDIES 171 10.Data Sets 173 10.1. Baseline Data Set on Melanoma Patients................. 174 10.1.1. Structured Data............................ 175 10.1.2. Unstructured Data.......................... 178 10.2. Corpora of Radiology Reports........................ 182 10.2.1. German Radiology Reports on Lymphoma Patients..... 182 10.2.2. German Radiology Reports on Internistic Patients...... 182 10.2.3. English Radiology Reports..................... 182 10.3. Patients with Structured Finding Information.............. 184 11.Evaluation 185 11.1. Knowledge-based Extraction of Measurement-Entity Relations... 186 11.1.1. Evaluation Schema.......................... 186 11.1.2. Summary of Results......................... 187 11.1.3. Detailed Evaluation for German Radiology Reports..... 189 11.1.4. Detailed Evaluation for Data Set of English Reports..... 194 11.2. Normality Classification........................... 198 11.2.1. Evaluation Schema.......................... 198 11.2.2. Results.................................. 198 11.3. Disease-centric Data Access......................... 200 V. CONCLUSION 203 12.Conclusion 205 12.1. Contribution................................... 206 12.2. Outlook...................................... 209 VI. Annex 211 Bibliography 213 Printed Sources 215 Online Sources 225 Glossary 229 Acronyms 233 List of Figures 237 List of Tables 241 vi Contents A. Appendix 243 A.1. Big Picture of the Model for Clinical Information............ 244 A.2. Example Radiology Reports......................... 245 A.3. Notation..................................... 249 A.4. The Biological Spatial Ontology....................... 250 vii Part I. INTRODUCTION, BASICS, RELATED WORK 1 1 Introduction Within the healthcare sector there is a tremendous increase of available data. Firstly, all data about the patient which is documented in some electronic health record. This encompasses all data about the health status of a particular patient such as finding descriptions and reports or images, demographic information, medica- tions etc. This data is referred to as clinical data. Secondly, there is an increase of scientifically proved medical knowledge, i.e. knowledge about the structure and functioning of the human organism in general. For example knowledge about diseases and their manifestations