Medical Knowledge Acquisition Using Biomedical
Total Page:16
File Type:pdf, Size:1020Kb
MEDICAL KNOWLEDGE ACQUISITION USING BIOMEDICAL KNOWLEDGE RESOURCES FOR DISEASE-SPECIFIC ONTOLOGIES by Liqin Wang A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah May 2017 Copyright © Liqin Wang 2017 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Liqin Wang has been approved by the following supervisory committee members: Peter J. Haug , Chair 03/14/2017 Date Approved Bruce E. Bray , Member 03/14/2017 Date Approved Guilherme Del Fiol , Member 03/13/2017 Date Approved Olivier Bodenreider , Member 03/07/2017 Date Approved Wendy W. Chapman , Member 03/13/2017 Date Approved and by Wendy W. Chapman , Chair/Dean of the Department/College/School of Biomedical Informatics and by David B. Kieda, Dean of The Graduate School. ABSTRACT Disease-specific ontologies, designed to structure and represent the medical knowledge about disease etiology, diagnosis, treatment, and prognosis, are essential for many advanced applications, such as predictive modeling, cohort identification, and clinical decision support. However, manually building disease-specific ontologies is very labor-intensive, especially in the process of knowledge acquisition. On the other hand, medical knowledge has been documented in a variety of biomedical knowledge resources, such as textbook, clinical guidelines, research articles, and clinical data repositories, which offers a great opportunity for an automated knowledge acquisition. In this dissertation, we aim to facilitate the large-scale development of disease-specific ontologies through automated extraction of disease-specific vocabularies from existing biomedical knowledge resources. Three separate studies presented in this dissertation explored both manual and automated vocabulary extraction. The first study addresses the question of whether disease-specific reference vocabularies derived from manual concept acquisition can achieve a near-saturated coverage (or near the greatest possible amount of disease-pertinent concepts) by using a small number of literature sources. Using a general-purpose, manual acquisition approach we developed, this study concludes that a small number of expert-curated biomedical literature resources can prove sufficient for acquiring near-saturated disease-specific vocabularies. The second and third studies introduce automated techniques for extracting disease-specific vocabularies from both MEDLINE citations (title and abstract) and a clinical data repository. In the second study, we developed and assessed a pipeline-based system which extracts disease-specific treatments from PubMed citations. The system has achieved a mean precision of 0.8 for the top 100 extracted treatment concepts. In the third study, we applied classification models to reduce irrelevant disease-concepts associations extracted from MEDLINE citations and electronic medical records. This study suggested the combination of measures of relevance from disparate sources to improve the identification of true- relevant concepts through classification and also demonstrated the generalizability of the studied classification model to new diseases. With the studies, we concluded that existing biomedical knowledge resources are valuable sources for extracting disease-concept associations, from which classification based on statistical measures of relevance could assist a semi-automated generation of disease-specific vocabularies. iv To Jinsong, Siyu, and Yiran. TABLE OF CONTENTS ABSTRACT ................................................................................................................... iii LIST OF FIGURES ...................................................................................................... viii LIST OF TABLES .......................................................................................................... x ACKNOWLEDGEMENTS ........................................................................................... xii Chapters 1 INTRODUCTION ....................................................................................................... 1 1.1 The Need of Disease-Specific Medical Knowledge .............................................. 1 1.2 Objectives and Hypothesis ................................................................................... 2 1.3 Rationale for Analysis.......................................................................................... 4 1.4 Overview of the Dissertation ................................................................................ 5 1.5 References ........................................................................................................... 7 2 BACKGROUND ......................................................................................................... 9 2.1 Disease-Specific Ontologies ................................................................................ 9 2.2 Disease-Pertinent Knowledge Acquisition ......................................................... 15 2.3 References ......................................................................................................... 18 3 A METHOD FOR THE DEVELOPMENT OF DISEASE-SPECIFIC REFERENCE STANDARDS VOCABULARIES FROM TEXTUAL BIOMEDICAL LITERATURE RESOURCES ................................................................................. 23 3.1 Abstract ............................................................................................................. 24 3.2 Introduction ....................................................................................................... 24 3.3 Methods ............................................................................................................. 25 3.4 Results ............................................................................................................... 28 3.5 Discussion ......................................................................................................... 32 3.6 Conclusions ....................................................................................................... 33 3.7 Acknowledgements ............................................................................................ 34 3.8 Appendix A. Supplementary Data ...................................................................... 34 3.9 References ......................................................................................................... 34 4 GENERATING DISEASE-PERTINENT TREATMENT VOCABULARIES FROM MEDLINE CITATIONS ......................................................................................... 35 4.1 Abstract ............................................................................................................. 36 4.2 Introduction ....................................................................................................... 36 4.3 Background ....................................................................................................... 37 4.4 Materials and Methods ....................................................................................... 38 4.5 Results ............................................................................................................... 43 4.6 Discussion ......................................................................................................... 46 4.7 Conclusions ....................................................................................................... 47 4.8 Acknowledgements ............................................................................................ 47 4.9 References ......................................................................................................... 47 5 USING CLASSIFICATION MODELS FOR THE GENERATION OF DISEASE- SPECIFIC MEDICATIONS FROM BIOMEDICAL LITERATURE AND CLINICAL DATA REPOSITORY .......................................................................... 48 5.1 Abstract ............................................................................................................. 48 5.2 Introduction ....................................................................................................... 50 5.3 Background and Significance ............................................................................. 50 5.4 Materials and Methods ....................................................................................... 54 5.5 Results ............................................................................................................... 65 5.6 Discussion ......................................................................................................... 67 5.7 Conclusion ......................................................................................................... 71 5.8 Acknowledgements ............................................................................................ 71 5.9 References ......................................................................................................... 72 6 DISCUSSION ........................................................................................................... 75 6.1 Summary ........................................................................................................... 75 6.2 Significance of Contributions ............................................................................. 78 6.3 Limitations