© Copyright 2018 Nicholas Robison
Total Page:16
File Type:pdf, Size:1020Kb
© Copyright 2018 Nicholas Robison The Problem of Time: Addressing challenges in spatio-temporal data integration Nicholas Robison A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2018 Reading Committee: Neil F. Abernethy, Chair Abraham Flaxman Ian Painter Program Authorized to Offer Degree: Biomedical and Health Informatics Abstract The Problem of Time: Addressing challenges in spatio-temporal data integration Nicholas Robison Chair of the Supervisory Committee: Neil F. Abernethy Department of Biomedical Informatics and Medical Education Across scientific disciplines, an ever-growing proportion of data can be effectively described in spatial terms. As researchers have become comfortable with techniques for dealing with spatial data, the next progression is to not only model the data itself, but also the complexities of the dynamic environment it represents. This has led to the rise of spatio-temporal modeling and the development of robust statistical methods for effectively modeling and understanding interactions between complex and dynamic systems. Unfortunately, many of these techniques are an extension to existing spatial analysis methods and struggle to account for the data complexity introduced by the added temporal dimension; this has limited many researchers to developing statistical and visual models that assume either a static state of the world, or one modeled by a set of specific temporal snapshots. This challenge is especially acute in the world of public health where researchers attempting to visualize historical, spatial data, often find themselves forced to ignore shifting geographic features because both the tooling and the existing data sources are insufficient. Consider, as an example, a model of vaccine coverage for the administrative regions of Sudan over the past 30 years. In wake of civil war, Sudan was partitioned into two countries, with South Sudan emerging as an independent nation in 2011. This has an immediate impact on both the visual accuracy as well as the quantitative usefulness of any data generated from aggregate spatial statistics. Or, consider epidemiological case reports that are issued from local medical facilities, how does one account for the fact that their locations may change, or that new facilities may spring up or close down as time progresses. These are real-world problems that existing GIS platforms struggle to account for. While there have been prior attempts to develop data models and applications for managing spatio-temporal data, the growing depth and complexity of scientific research has left room for improved systems which can take advantage of the highly interconnected datasets and spatial objects, which are common in this type of research. To that end, we have developed the Trestle data model and application, which leverage graph-based techniques for efficiently storing and querying complex spatio-temporal data. This system simple interface to allow users to perform query operations over time-varying spatial data and return logically valid information based on specific spatial and temporal constraints. This system is applicable to a number of GIS related projects, specifically those attempting to visualize historical public health indicators such as vaccination rates, or develop complex spatio-temporal models, such as malaria risk maps. TABLE OF CONTENTS Table of Figures ............................................................................................................................. iii Table of Tables .............................................................................................................................. vi Chapter 1. Introduction ................................................................................................................. 12 1.1 Space and Time Representations in GIScience ............................................................ 14 1.2 The Relevance of GIScience in Public Health .............................................................. 15 1.3 Dissertation Goals ......................................................................................................... 16 1.4 Outline and Contributions ............................................................................................. 17 1.5 Audience ....................................................................................................................... 19 Chapter 2. Temporal Challenges in GIS ....................................................................................... 20 2.1 Uncertain Geographic Context ..................................................................................... 20 2.2 Uncertain Temporal Context ......................................................................................... 21 2.2.1 Disease Surveillance ................................................................................................. 21 2.2.2 Population Movements During Disasters ................................................................. 29 2.3 Desiderata of a Temporal GIS ...................................................................................... 31 2.3.1 Enable spatio-temporal object modeling .................................................................. 31 2.3.2 Support spatial and temporal relationships between spatio-temporal objects .......... 31 2.3.3 Annotate events that occur in an object’s lifetime .................................................... 32 2.3.4 Support multiple time states ...................................................................................... 32 2.4 Conclusion .................................................................................................................... 32 Chapter 3. Prior Data Modeling Approaches ................................................................................ 33 3.1 Terminology .................................................................................................................. 33 3.1.1 Objects and Records ................................................................................................. 33 3.1.2 Object Identity .......................................................................................................... 35 3.2 Previous Modeling Approaches .................................................................................... 35 3.2.1 Evaluation Approach and Criteria ............................................................................ 36 3.2.2 Previous Approach 1: Time slicing ........................................................................... 39 3.2.3 Previous Approach 2: ST-Composite ....................................................................... 45 3.2.4 Previous Approach 3: Spatio-temporal Object Model .............................................. 49 3.2.5 Previous Approach 4: 3-domain Model .................................................................... 52 3.2.6 Evaluation Summary ................................................................................................. 56 3.3 Alternative Data Storage Approaches ........................................................................... 57 3.4 Application-specific implementations .......................................................................... 58 3.5 Conclusion .................................................................................................................... 59 Chapter 4. Design and Architecture of Trestle ............................................................................. 60 4.1 Data Integration and Data Curation .............................................................................. 60 4.2 Trestle ........................................................................................................................... 61 4.2.1 Organizing data with ontologies ............................................................................... 63 4.2.2 Trestle ontology design ............................................................................................. 67 4.2.3 Trestle Application implementation ......................................................................... 73 i 4.3 Fulfillment of Desiderata and Comparison with Previous Methods ............................. 86 4.3.1 Fulfillment of Desiderata .......................................................................................... 86 4.3.2 Comparison with Previous Methods ......................................................................... 89 4.4 Conclusion .................................................................................................................... 92 Chapter 5. Evalution 1: Dataset Integration .................................................................................. 94 5.1 Introduction ................................................................................................................... 94 5.2 Tracking Changes in Administrative Boundaries ......................................................... 94 5.3 Split/Merge Algorithm Design and Implementation .................................................. 102 5.3.1 Algorithm Design .................................................................................................... 102 5.3.2 GAUL Data selection .............................................................................................. 103 5.3.3 Algorithm Implementation .....................................................................................