Development and Evaluation of Leveraging Biomedical
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPMENT AND EVALUATION OF LEVERAGING BIOMEDICAL INFORMATICS TECHNIQUES TO ENHANCE PUBLIC HEALTH SURVEILLANCE by Kailah T. Davis A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah August 2014 Copyright © Kailah T. Davis 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of ________________ Kailah T. Davis ________________________ has been approved by the following supervisory committee members: ___ Julio Facelli_________________, Chair ____ 1/27/2014 __ Date Approved ____________Catherine Staes_____________, Member _____2/06/2014____ Date Approved ____________Leslie Lenert________________, Member _____2/03/2014____ Date Approved ____________Per Gesteland _______________, Member _____2/12/2014____ Date Approved ____________Robert Kessler_______________, Member _____6/10/2014____ Date Approved And by _____________ Wendy Chapman_____________________, Chair of the Department _________________ Biomedical Informatics_________________ __ and by David B. Kieda, Dean of The Graduate School. ABSTRACT Public health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today’s public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted. iv To my loving family. TABLE OF CONTENTS ABSTRACT ................................................................................................................. iii LIST OF TABLES ....................................................................................................... ix LIST OF FIGURES ...................................................................................................... x LIST OF ACRONYMS AND ABBREVIATIONS ..................................................... xii ACKNOWLEDGMENTS ......................................................................................... xiv Chapters 1. INTRODUCTION ........................................................................................................ 1 Problem Statement .................................................................................................. 1 Problem Definition.................................................................................................. 2 Study Justification ................................................................................................... 7 Main Objectives ...................................................................................................... 9 Study Aims .............................................................................................................. 9 Related Work ..........................................................................................................11 Division of Chapters ............................................................................................. 13 2. PUBLIC HEALTH SURVEILLANCE SYSTEMS AND INFORMATICS’ IMPACT ON PUBLIC HEALTH SURVEILLANCE ................................................................ 15 Introduction ........................................................................................................... 15 Brief History ......................................................................................................... 15 Mortality Surveillance .......................................................................................... 19 Public Health Informatics Influence on Public Health Surveillance Systems ...... 21 Existing Public Health Surveillance Systems ....................................................... 26 Other Systems and Systems Proposals ................................................................. 28 Limitations of Current Systems ............................................................................ 31 3. TECHNOLOGY BACKGROUND ............................................................................ 33 Natural Language Processing ............................................................................... 33 Grid Technology ................................................................................................... 37 Overview of Proposed Solution ............................................................................ 44 4. METHODS ................................................................................................................. 46 Introduction ........................................................................................................... 46 Research Philosophy and Approaches .................................................................. 47 Methods for Phase One ......................................................................................... 49 Data Collection ..................................................................................................... 50 Pneumonia and Influenza (P-I) Deaths Case Definition ....................................... 51 Study Procedures .................................................................................................. 53 Comparison of Methodologies .............................................................................. 58 Statistical Analysis ................................................................................................ 58 Methods for Phase Two ......................................................................................... 60 Defining Elements for the Feasibility Study ......................................................... 60 Creating caGrid Data and Analytical Services...................................................... 64 Workflows ............................................................................................................. 66 Evaluation of Phase Two ....................................................................................... 68 Summary ............................................................................................................... 68 5. RESULTS .................................................................................................................... 70 Introduction ........................................................................................................... 70 Results for Phase One ........................................................................................... 70 Statistical Analysis ................................................................................................ 71 Analysis of DCP Failures ...................................................................................... 73 Results for Phase Two ........................................................................................... 74 Validation through Proof-of-Concept Studies ....................................................... 74 Discussion of Outcomes ....................................................................................... 95 Summary ............................................................................................................... 96 6. DISCUSSION ............................................................................................................. 97 General Observations Phase 1 .............................................................................