<<

CONTEXT IN GEOGRPAHIC DATA: HOW TO EXPLORE, EXTRACT AND ANALYZE DATA FROM SPATIAL VIDEO AND SPATIAL VIDEO GEONARRATIVES

A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by

Jayakrishnan Ajayakumar

August 2019 © Copyright All rights reserved Except for previously published materials

Dissertation written by Jayakrishnan Ajayakumar B.Tech, Cochin University of and Technology, 2009 M.S., Kent State University, 2015 Ph.D., Kent State University, 2019

Approved by Dr. Andrew Curtis , Chair, Doctoral Dissertation Committee Dr. Jacqueline W. Curtis , Members, Doctoral Dissertation Committee Dr. Eric Jefferis Dr. Michael Leitner Dr. Ye Zhao

Accepted by Dr. Scott C. Sheridan , Chair, Department of Dr. James L. Blank , Dean, College of Arts and Science

TABLE OF CONTENTS

TABLE OF CONTENTS ...... iii

LIST OF FIGURES ...... v

LIST OF ...... xi

ACKNOWLEDGEMENTS ...... xii

CHAPTERS

I. Introduction ...... 1

II. Background ...... 6

III. Exploring Spatial Videos ...... 14

Camera Player ...... 15

Spatial Video Player (SVP) ...... 25

Discussions ...... 35

IV. Spatio-Temporal Exploration of Spatial Videos ...... 37

Spatial Video Library (SVL) ...... 38

Illustrative Examples ...... 48

Discussions ...... 53

V. The Use of Geonarratives to Add Context to Fine Scale Geospatial Research ...... 55

Design and Components ...... 60

An Empirical Illustration: A Spatial Video Geonarrative in Joplin, Missouri ...... 67

Discussions ...... 76

iii

VI. GeoCluster: A Software for Extracting Spatial Patterns From Spatial Video Geonarratives

...... 80

Spatial Filter ...... 81

Workflow and Technical Implementation ...... 92

Empirical Illustration ...... 94

Discussions ...... 98

VII. Addressing the Data Guardian and Geospatial Scientist Collaborator Dilemma: How to

share health records for spatial analysis while maintaining patient confidentiality ...... 100

Design and Algorithms ...... 105

Workflow and Technical Implementation ...... 109

Experiments ...... 111

Results ...... 113

Discussions ...... 119

VIII. CONCLUSION AND WORK ...... 123

REFERENCES ...... 129

iv

LIST OF FIGURES

Figure 1. GPX file example with a single-track point’s latitude and represented as an attribute and the media as an extension tag ...... 20

Figure 2. Synchronized video and spatial data. The synchronized data is represented as a set of tuples used for spatio-temporal querying...... 20

Figure 3. GPS path for the synchronized video. The orange lines indicate missing path information...... 21

Figure 4. Category creation for mapping. A) create a category B) edit category for a mapped spatial object C) create a spatial object and assign category D) spatial video for mapping ...... 22

Figure 5. Complete work flow for Camera Player ...... 24

Figure 6. Complete workflow for the synchronizer module ...... 28

Figure 7. User interface for the synchronizer module with map for the GPS path, and embedded

YouTube player to display the spatial videos ...... 28

Figure 8. Visualization module showing three videos synchronized with a single GPS path...... 30

Figure 9. GPS Creator. The marker on the map indicate the nodes of the new path, while the polyline shows the original GPS reference path...... 32

Figure 10. Digitizer module used for mapping objects seen in the spatial videos...... 34

v

Figure 11. Adding categories in the Digitizer. A) creating a new category B) the green marker represents a categorized mapped point C) spatial video display to enhance mapping...... 35

Figure 12. Workflow diagram for the crawler. The input is a folder and output is an index file...

...... 40

Figure 13. Index JSON file containing when the video was created, the media time and the corresponding GPS coordinates and the GPS data bounds...... 40

Figure 14. R- for spatial indexing. A, B, and C denotes nodes (bounding boxes), and d, e, f, g, and h are the leaves...... 42

Figure 15. User interface for Explorer...... 43

Figure 16. A) Grid based heatmap showing spatial distribution of spatial videos. B) Study area selection. C) Spatial query with circular as query object...... 44

Figure 17. Results of a query with , videos and sections...... 45

Figure 18. Results of a sample video section selection. The video is at the selected video section time and the red dot on the map indicates the corresponding GPS position...... 45

Figure 19. A dialog box to add information to a geotagged image...... 47

Figure 20. An image cart with geotagged images...... 47

Figure 21. Geotagged images from the spatial video library opened in Earth as a KML...

...... 48

Figure 22. A spatial point query with a buffer of 15 meters...... 49

vi

Figure 23. Frames for a spatial point query for A) 2011, B) 2012, C) 2013, D) 2015, E)

2016...... 50

Figure 24. Results of a frame-based query. A) The source frames are 2011, B) 2012, C) 2013, D)

2014, E) 2015, F) 2016...... 51

Figure 25. Geonarrative sentences added to the spatial video library as an external dataset...... 52

Figure 26. A) A geonarrative sentence location, B) Geonarrative sentence in textual form for a location...... 52

Figure 27. Results from a spatial query using a geonarrative as an external dataset. The location is a Conoco gas station at Joplin, MO. The images are from A) 2011, B) 2012, C) 2013, D) 2014,

E) 2015...... 53

Figure 28. Conceptual diagram for Wordmapper with the six different modules...... 60

Figure 29. Illustration of the GPS record to narrative mapping...... 62

Figure 30. Interpolation algorithm for geotagged words...... 63

Figure 31. Wordmapper interface diagram A) Input module B) Map in visualization module C)

Narrative sentence table D) Wordcloud in visualization module E) Query module F) Category module G) Output module...... 66

Figure 32. Wordcloud diagram for A) an entire geonarrative B) For the keyword ‘recover*’ C) and then ‘rebuilding’...... 69

Figure 33. Visualization module showing the narratives containing the searched for keyword

‘recover*’ in yellow pins. All other narrative sentences are shown by red pins...... 69

vii

Figure 34. Wordmapper visualization module showing the A) narrative sentence for the keyword

‘recover*’ and B) response by the interviewee about recovery without using the word

‘recover’...... 71

Figure 35. A) Narrative sentence about a convenience store that was completely destroyed in the tornado B) the corresponding tornado article from the Daily News...... 72

Figure 36. A combined wordcloud for ‘Location specific’ and ‘Fuzzy ’...... 73

Figure 37. Narrative text for matching keyword ‘damage’ and category ‘Fuzzy space’...... 74

Figure 38. Heat map for ‘damage’ category containing spatial references...... 75

Figure 39. A) Uniform grid used for spatial filter analysis B) Case (red) and Control (yellow) data along with the uniform grid...... 83

Figure 40. Geometrical representation of the spatial filter (radius of circle centered at each grid point), with cases (red points), and controls (yellow points)...... 84

Figure 41. Random relabeling technique. For every simulation, controls (yellow points) are randomly re-labeled as cases...... 86

Figure 42. Spatial Domain Decomposition. Rectangular division of grid points are assigned to different processors...... 89

Figure 43. Point in polygon operation for assigning cases and controls to grids. The grid boundaries are extended to accommodate the boundary calculations...... 89

Figure 44. p-value calculations. The grids from all the processors are combined together to generate a rank grid which is further used to generate p-values. ‘obs’ represents observed value while ‘sim’ represents simulation...... 90

viii

Figure 45. K-D tree for space partitioning...... 92

Figure 46. Complete workflow for the spatial filter...... 93

Figure 47. Spatial grid with statistically significant (α=.05) rates for the word ‘Heroin’...... 96

Figure 48. Spatial grid with statistically significant (α=.05) rates for the word ‘Coke’...... 97

Figure 49. Spatial grid with statistically significant (α=.05) rates for the word ‘Coke’ and

‘Heroin’...... 98

Figure 50. Obfuscation by point data translation and rotation. An offset generated from a random number is used for translation and the rotation is performed using rotation matrix...... 107

Figure 51. Re-transformation of obfuscated point data through rotation and re-translation. The point data are rotated in space using the rotation matrix and re-translation is performed using the offset generated from the random number...... 107

Figure 52. Re-transformation of raster through rotation and re-translation...... 109

Figure 53. User interface for Privy...... 110

Figure 54. Privy workflow for data obfuscation and re-transformation...... 111

Figure 55. A) Unmasked yellow fever death data B) obfuscated data C) re-transformed data... 114

Figure 56. Local Moran’s I clusters for A) unmasked B) obfuscated C) re-transformed snow depth data...... 116

Figure 57. KDE results for snow depth data A) raster from real data B) raster from obfuscated data C) re-transformed raster...... 117

ix

Figure 58. IDW results for snow depth data: A) raster from real data B) raster from obfuscated data C) re-transformed raster...... 117

Figure 59. Kriging results for snow depth data; A) raster from real data B) raster from obfuscated data C) re-transformed raster...... 118

Figure 60. Trend analysis results for snow depth data; A) raster from real data B) raster from obfuscated data C) re-transformed raster...... 119

x

LIST OF TABLES

Table 1. Different Spatial Video camera types, with the format of representation and sample GPS snippet ...... 16

Table 2. Average nearest neighbor results for yellow fever unmasked and obfuscated data. OMD represents observed mean distance, EMD represents expected mean distance and NNR represent nearest neighbor ratio ...... 114

Table 3. Ripley’s-K function results for unmasked and obfuscated data. L(d) represents transform value and Diff represents the difference between expected and observed value. The subscripts R and O represents real and obfuscated results ...... 115

Table 4. Global Moran’s I results for snow depth unmasked and obfuscated data. I represents

Moran’s I and EI represents expected value of I ...... 115

Table 5. Local Moran’s I results for unmasked and obfuscated data. The subscripts R and O represents real and obfuscated results...... 116

Table 6. Mean of difference between the raster generated from unmasked data and then the re- transformed raster ...... 119

xi

ACKNOWLEDGEMENTS

This work was supported mainly by GIS Health and Hazards Lab at Kent State University, and partially by University fellowship. But I strongly believe, this dissertation work was only possible with the timely support and guidance of my advisor Dr. Andrew Curtis. The seeds for many of the ideas that have been incorporated in this dissertation were developed during the long white board sessions that I had with Dr. Curtis. He was always ready with suggestions and have always urged me to think about how society could benefit from my dissertation. His advice on how to include a “hook” on publications has helped me a lot being a young research scholar.

With Dr. Curtis, I got a chance to meet police officers, health experts, and county officials, which helped me to gain the much-needed community experience, and got me thinking about solutions for real world problems. In short, I could say he has helped me evolve from a computer programmer or software developer to a research scientist who thinks about society. Similarly, I had a great time working with Dr. Jacqueline Curtis on multiple scholarly ventures, and she has always been a source of inspiration for her passion and diligence towards work. I am always thankful to my committee members Dr. Eric Jefferis, and Dr. Michael Leitner for asking thought provoking questions which helped me to improve my work immensely. I would also like to thank

Dr. Eric Shook and Dr. Kelly Turner who had worked with me on multiple projects and provided me their guidance and support. I would also like to thank all the faculty members of Geography

Department at Kent State University who were always ready to share their wealth of wisdom.

xii

I am always thankful to my friends Sandra Bempah, Bharat Chaturvedi, Rob Squires, and

Ortis Yankey at GIS Health & Hazards Lab who had been instrumental for making our lab a great place to work. I would also like to thank my colleagues Adiyana Sharag-Eldin, Xin Hong, and Chris Willer, with whom I always had interesting conversations. Finally, I would always want to thank my family who was always there to support me unconditionally. I would like to thank my brother-in-law Gnanasekar Thiyagarajan, who has shown me that there is no substitute for hard work and dedication. Likewise, my sister Sree Renjini Ajayakumar always provided me with the much-needed warmth and affection, which was crucial during the stressful graduate days. Regular conversations with my “pearls” Surabhi (niece) and Vishnu Prayag (nephew), had been a great stress buster during tough of doctoral studies. However, the two most important persons that I am always indebted to, and to whom I want to dedicate my work are my father Ajayakumar K and my mother Premakumari K. Apart from providing me the much needed emotional and motivational support, they also imparted their wealth of wisdom on me, which had always urged me to move forward. Their daily enquiries about my dissertation works made sure that I always finished on time and I am always indebted to them for that.

xiii

Chapter I. Introduction

Social and spatial data can vary over relatively small . In addition, there is considerable complexity involved in interpreting these spaces. In the geographical , new work has attempted to address this situation by developing novel methods to understand social and spatial context. This dissertation advances this literature by showcasing a new approach that can be used to address the topic of context.

Advances in spatial mobile technologies (Kar, Sieber, Haklay, & Ghose, 2016; Kitchin,

Lauriault, & Wilson, 2016; Jones & Evans, 2012; Richardson et al., 2013) has facilitated a movement in primary data collection, moving away from more aggregated traditional sources such as the census, and instead collecting data at a geographic scale that matters, for the right time period, which allows for research into process and not just pattern (Lee et al., 2016). In the study of human agency, these technologies can provide “context” to more traditional geospatial analysis. We can now study the complexity of an individual’s “map”, by studying his/her activity space (Kwan, 2012; Perchoux, Chaix, Cummins, & Kestens, 2013). While there have been different mobile spatial applications developed for various forms of health research, from capturing gang activities and identifying crime boundaries (Curtis, Curtis, Porter, Jefferis, &

Shook, 2016), to supporting mosquito control (Curtis et al., 2015), in this dissertation we advance two particular methods, Spatial Videos (SV), and Spatial Video Geonarratives (SVG).

SV uses GPS and video technology to generate spatially embedded videos, which can be used to collect data in challenging environments, with the visual frames

1 becoming a source for digitizing spatial features. SV has been used across various research domains such as public health (Curtis, Blackburn, Widmer, & Morris Jr, 2013; Curtis & Fagan,

2013), disaster recovery (Curtis & Mills, 2012; Lue, Wilson, & Curtis, 2014; Mills, Curtis,

Kennedy, Kennedy, & Edwards, 2010; Montoya, 2003; Curtis, Duval-Diop, & Novak, 2010), and crime (Curtis & Mills, 2011; Leitner, 2013). An evolution of SV is the SVG, which is an environmentally-cued narrative, used to capture the fine-scale geographic characteristics of an area and the context of their occurrence by recording the insights of those who know that environment (Bell, Phoenix, Lovell, & Wheeler, 2015; Curtis et al., 2015; Curtis, Curtis, Porter,

Jefferis, & Shook, 2016). Even though SV and SVG has been championed as an innovative technology to generate fine-scale spatial data in data-poor environments, these technologies have been largely applied in ad-hoc ways, experiencing various technical challenges, with the data collection-to-analysis pathway often evolving with each project.

This dissertation advances this body of work by providing a standardized core to the method. Here we develop a software ensemble to support exploratory geo-spatial analysis using

SV and SVG. In so doing, this dissertation moves this method into the mainstream, and introduces it as ubiquitous approach for the geospatial sciences and its various collaborative disciplines.

One of the main advantages of SV as a data source is the ability to provide a synchronous video and map visualization. This can help extract physical contextual information, such as the quality of an environment around a water point in a slum. While specific-camera based software exists to visualize SV, these are often limited in functionality, and often disappear from business.

In this dissertation, a new software will be presented that can visualize SV, while also providing functionalities such as adding basic digitizing and mapping features, and being able to export

2

GPS coordinates and other map layers. First part of chapter III will describe about “Camera

Player” which has been developed to achieve these goals. Another major challenge associated with using SV as data source is the issue of data transferability. High quality SV videos occupy large volumes of disk space, which can be an impedance for collaborative works. As a solution to this problem, the part of chapter III presents ‘Spatial Video Player’ (SVP) which utilizes a cloud-based storage system archive and access large volumes of SV data. This software also helps in synchronizing multiple SVs to a common GPS path. Apart from visualization, the spatial video player also addresses the issue of GPS signal distortion caused by several different factors, including faulty GPS receivers, latitude, the nature of the built environment and security issues (Curtis, Bempah, Ajayakumar, Mofleh, & Odhiambo, 2019). As the SV archive expands, either for a single location where temporal change is monitored, or through multiple projects sometimes overlapping the same space, a need arises whereby a user would want to mine this resource based on location. However, performing spatio-temporal queries on such a large set of

SV data requires novel algorithms and indexing strategies. Currently there is a lack of software that can achieve this goal. Chapter IV describes ‘Spatial Video Library’ (SVL) which is developed to explore large archives of SV data using the map as an index.

Adding a narrative to a SV data collection ride has opened up an additional and even more detailed perspective on an environment. Even though including an audio recorder is a simple technical addition, there is no software available that can turn these recordings into useful qualitative or quantitative data. Advancements in text processing, and geo-spatial analysis with the influx of new spatio-temporal data sources such as Volunteered Geographic Information

(VGI) (Goodchild, 2007; Zook, Graham, Shelton, & Gorman, 2010), geo-spatial social media sources (Stefanidis, Crooks, & Radzikowski, 2013), provide potential opportunity for leveraging,

3 analyzing, and visualizing contextual data from SVG. More specifically, the challenge is how to turn an audio conversation into a mapped output. Previously this task was cobbled together using computer code, spreadsheet manipulations, and GIS skills. The complexity of these actions limited the use of SVG outside of the research team who had developed the approach. Chapter V describes a solution, which helps to develop SVG as a ubiquitous tool. A new standalone software ‘Wordmapper’ can combine text (transcribed narratives) and GPS coordinates to generate a geotagged narrative. Wordmapper also supports exploratory analysis of SVG using word searches, theme classifications of narrative sentences by topic or spatiality, and visual support through mapping and wordclouds.

Typical exploratory visual analysis of point data often employs techniques such as Kernel

Density Estimate (KDE), and contour mapping, and these have been previously used to explore maps of spatialized words that are generated from SVG. While these types of analysis are still useful, there is a need to incorporate more traditional quantitative spatial analysis to SVG research to provide the necessary statistical rigor. Chapter VI describes “GeoCluster” which incorporates a variant of spatial filtering methodology to identify significant clusters of point data (queried words from WordMapper). The ability to assess the statistical significance of clusters through Monte Carlo simulations is one of the key highlights of the software. In this way

“hotspots” of a common theme, such as overdoses, or a drug type, can be analyzed from the SVG archive.

As with any health data, the textual and spatial data obtained from SVG can cause privacy concerns. Scholarly works by Curtis et al. (2006) and Brownstein et al. (2005) has shown that different types of maps containing confidential data can be re-engineered to identify confidential information such as the residence of a patient. It is important to address these spatial

4 privacy concerns when working with SVG. While not directly related to typical SVG output, situations with collaborators regarding confidential data transfer led to the discussion about how safely share spatial data. Chapter VII describes a new software “Privy”, which utilizes a random perturbation based geo-masking to obfuscate spatial point data. This tool now allows a health agency, and a collaborating researcher, to share spatial data at the finest resolution, conduct research, and then re-share back (and discuss) findings.

A common criticism of using a GIS, especially for more non-traditional data such as narratives, is that the inflexible data structure and “black box” nature of the software limits the potential of true spatial mixed-method approaches. By focusing on developing a new set of software for leveraging contextual fine-grained spatial data from SV and SVG, this dissertation addresses this problem while also catering for both researchers, professionals, and society in general. As this new set of software tend to abstract out the complexities related to SV and SVG data collection and processing, researchers with minimal geospatial skill can now utilize the method.

Chapter II provides the necessary background for this study, which includes situating SV and SVG in the realm of mixed methods and the existing software methodologies used to process such data sources. The five chapters from III to VII includes the core research sections for this thesis and will contain descriptive details about “Camera Player and Spatial Video Player”,

“Spatial Video Library”, “Wordmapper”, “GeoCluster”, and “Privy” respectively. The last section provides a conclusion to the work and a roadmap for future work.

5

Chapter II. Background

This dissertation includes different methodological approaches that can be used to contextualize spatial data. While there are several components to this type of work, including how to collect data in challenging environments, or how to improve collaboration among researchers, much of the work described here originated from a mixed-methods beginning. Simply put, how could environment inspired commentary, when combined with imagery and map layers, can be used to provide the quantitative and qualitative data layers needed to gain a more complete contextualized understanding of fine scale environments. In so doing, this work also falls into the broader body of work using qualitative GIS.

Qualitative GIS and mixed method approaches

Feminist and critical geographers challenged the labelling of GIS as a tool for the storage and analysis of quantitative data and argued that use of the software did not necessarily have to be quantitative because of the capability of a GIS to include qualitative materials such as photos, videos and narratives (Jung & Elwood, 2010). One of the earlier works by Kwan (Kwan & Ding,

2008) suggests that considering GIS as a quantitative method alone is unnecessarily limiting. A typical qualitative GIS approach is to understand how individuals understand space and the subsequent production of socio-spatial relations (McCall, 2003). Some of the earlier approaches in integrating qualitative data into a GIS included geotagging photos, text, and videos to locations and attaching them to the underlying spatial database (Jung, 2009). Pavlovskaya

(2006), argues that even-though GIS depends on complex

6 mathematical algorithms and statistical analysis, the mode of analysis and data production are essentially qualitative. She has also posited about the qualitative nature of the visualization capabilities in GIS. Seeing GIS as a natural extension of positivism, some have criticized the

“reductionist” nature of data analysis and the associated research methods. Simply put, whatever the data source, it must somehow fit into the classic architecture of the GIS, which include attribute columns and a character limited cell. Nuanced data such as perception or emotion are traditionally hard to capture, store and visualize within the GIS (Pickles, 2008). Putting aside this limitation for the moment, research has shown how qualitative data in a GIS can complement more traditional quantitative analysis, for example the ‘grounded visualization’ by Knigge and Cope (2006). Their work discusses the importance of integrating qualitative and quantitative data to improve the contextual information gained from qualitative data. Another qualitative approach utilizing GIS has been ‘geo-ethnography’ (Matthews,

Detwiler, & Burton, 2005). The integration of GIS and ethnography helped researchers to understand the lives of people and strategies they adopt through neighborhood walkthroughs.

The recent development of mobile technologies and easy data creation and consumption with advances in Web 2.0 (Haklay, Singleton, & Parker, 2008) provide opportunities for researchers to make significant progress in incorporating qualitative data into a GIS for contextual analysis.

However, while technology costs reduce, researchers are always face with challenges of how to incorporate such a varied set of output into the restrictive architecture of the GIS.

Mobile Methodologies and Technologies

Mobile methodologies and technologies provide an exciting opportunity to capture fine-scale data and interpret micro spaces of activity (Kwan, 2002; Miller, 2006; Phillips, Hall, Esmen,

7

Lynch, & Johnson, 2001; RodrÍguez, Brown, & Troped, 2005; Rainham, Krewski, McDowell,

Sawada, & Liekens, 2008; Bell, Phoenix, Lovell, & Wheeler, 2015).

As previously mentioned, geo-ethnography has started to investigate the potential of mobile technologies, methodologies and the ride-along or “walk along” interview. Kusenbach

(2003) refers to the practice of accompanying participants on their daily walks as “walk along” interviews, although there are several different approaches to walking with participants utilizing greater or lesser degrees of freedom in the routes chosen (Carpiano 2009). “Walk along” interviews can be adapted to include more open-ended and spontaneous exchanges as the environment and circumstances revealing place-based and practice-based insights of participant observation. This approach helps to contextualize spatial practices in place while facilitating discussion, for example, the impact of a graffiti on a community’s activity. Other forms of go- along include car rides with police (Beckett & Herbert, 2009), commutes in a car (Laurier et al.,

2008), train (Bissell, 2009), and ferry (Vannini, 2011). In their work on understanding the differences between sedentary and walk along interviews, Evans and Jones (2011) suggest that walking with interviewees generates more place-specific data than sedentary interviews. They also cite that walking interviews tend to be longer and more spatially focused, engaging with features in the area under study than with the autobiographical narrative of interviewees. In another work, Haines, Evans, and Jones (2008) used a cheap digital Dictaphone combined with a wearable sports style GPS for recording and transcribing data to create spatial transcripts which could be integrated into a GIS to understand and analyze temporally and spatially mobility.

Perhaps one of the most exciting methodologies that integrate qualitative data within a

GIS is geonarratives. Kwan and Ding (2008) developed geonarratives based on narrative analysis

8 and 3D visualization of time geography (Hägerstrand, 1975). Earlier works on (Kwan, 2002a; Kwan, 2007; Kwan, 2002b; Kwan, 2002c), geo-ethnography

(Matthews, Detwiler, & Burton, 2005), and grounded visualization (Knigge & Cope, 2006) laid the foundation for developing geonarratives. In their initial work, Kwan and Ding (2008) used geonarratives to illustrate the spatio-temporal changes in mobility of Muslim women in

Columbus, Ohio, after 11 September 2001. In his work on mapping narratives for the 1992 Los

Angeles riots, Watts (Watts, 2010) used narrative-based to investigate human action during the riots. In this study, the author used published narratives describing individual observations and experiences to extract out and map locations from place-based mentions such as a street intersection. As mobile technology improved and became cheaper, the number of studies integrating mobile devices and geonarratives has increased. Bell et al. (2015) utilized in- depth geonarrative interviews along with accelerometer and Global Positioning System (GPS) to understand emotional attachment of people to the coast with reference to coastal management policy and practice. They cited that geonarratives are particularly useful to understand place attachment, place identity, place dependence, and social bonding. Through this work, they have also shown that a combination of mobile technologies and geonarrative-based interviews could help to (1) engage participants (2) improve visualization of routine movement and interactions

(3) generate in-depth detail about everyday life experiences. Jones and Evans (2012) utilized a

GPS device, along with a microphone and audio recorder to develop what they called “Spatial

Transcripts”. They conducted walk-along and ride-along interviews on bicycles to understand personal or professional interest in an urban landscape.

Another major development in the area of mixed methods research and qualitative GIS was the introduction of geospatial video or spatial video that combines GPS with a video

9 recording (Lewis, Fotheringham, & Winstanley, 2011; Curtis, Mills, Kennedy, Fotheringham, &

McCarthy, 2007; Curtis, Mills, McCarthy, Fotheringham, & Fagan, 2009). Spatial video enables the walking or driving path to be captured visually, along with observable features being geolocated. This geospatial technology has been employed in a wide variety of settings and for different studies, from understanding socio-environmental variables related to crime (Leitner,

2013; Curtis & Mills, 2011), patterns of post-disaster damage and recovery (Curtis & Mills,

2012), to physical disorder and environmental health risks (Curtis, Mills, Kennedy,

Fotheringham, & McCarthy, 2007; Curtis, Blackburn, Widmer, & Morris Jr, 2013). The spatial video approach improves data collection efficiency by providing the capability to survey locations over multiple periods to analyze spatio-temporal phenomenon (Mills, Curtis, Kennedy,

Kennedy, & Edwards, 2010). These SVs can then be archived to provide a resource for different follow-on, or sometimes completely different research, as well as to assess the changes in environment with time. Indeed, the approach can provide an excellent source of data for dynamically changing landscapes where little or no other data exists, such as for disaster response and recovery, the way a neighborhood changes with regards physical or social externalities, or for simple unrecorded daily activities like mapping the journey to water or school (Curtis et al., 2016). The spatial video system essentially consists of a video recording device synchronized to a GPS receiver. This “system” can be hand carried, or mounted on a wide array of vehicles. Typically, as the video is linked to a GPS, coordinates can be extracted and re- merged with the video stream for further analysis.

Spatial video can also be enriched with narrative insights in the form of a spatial video geonarratives (SVG) (Curtis, Curtis, Porter, Jefferis, & Shook, 2016). The benefit of integrating spatial videos and geonarratives is the ability to capture the interactions between environment,

10 interviewee, and interviewer (Curtis et al., 2015). SVG also adds context to finer scale environments, or microspaces, as the narratives contain references to specific locations, as well as general impressions and beliefs from the people who live in or interact with that environment.

Apart from improving the contextual information, the narratives can be stored as a spatial archive that facilitates subsequent reinvestigation, which is important for the process of triangulation (the insights can be validated or questioned using multiple SVGs covering the same space). In their work on exploring the impacts of crime-health nexus on neighborhoods, Curtis et al. (2016) used SVG to contextualize traditional hotspot maps of crime rate. Their findings suggest that SVG could enrich typical hotspot approaches with more on-the-ground context.

Using SVG, the authors were able to compare the perceptions of expert and residents about the experiences of place where they live. However, SVG can also be used to generate point level data, based around specific spatial mentions, which can be used as a data source for more traditional spatial analysis. These new “hotspots” can either be compared with other traditional or non-traditional data layers to validate or identify data gaps (Curtis, Curtis, Ajayakumar,

Jefferis, & Mitchell, 2018). This approach has been extended to other topical areas to analyze post-disaster recovery, crime, vector borne diseases, and health patterns in the homeless population community (Curtis et al., 2015; Curtis, Curtis, Porter, Jefferis, & Shook, 2016; Curtis,

Quinn, Obenauer, & Renk, 2017; Curtis, Felix, Mitchell, Ajayakumar, & Kerndt, 2018).

Turning Spatial Mixed Method Approaches into a Replicable, Ubiquitous,

Technique

One of the challenges with many of the previously identified research approaches is how to turn a good idea, with considerable research potential, into a tool that could be widely used. Various software approaches have been developed to incorporate qualitative data with GIS (Jung, 2009).

11

Most efforts involve supporting the analysis of individuals’ movements, activities, and emotional attachments to places. For example, Matthews et al. (2005), developed contextual databases to store demographic data as well as ethnographic data including stories and other textual narratives. Kwan and Ding (2008) in their work on geonarratives, developed 3D-VQGIS, a GIS- based computer-aided qualitative data analysis component for space time visualizations and textual analysis. They developed 3D-VQGIS as a GIS-based support tool for qualitative data analysis, by incorporating analysis capabilities from NVivo, which is a sophisticated Computer-

Aided Qualitative Data Analysis (CAQDAS) tool. McIntosh et al. (2011) offered a formal approach for extracting and spatio-temporally referencing narrative database ‘objects’ from electronic text sources, such as newspapers, with spatial, temporal, and semantic content being mined to represent and encode a series of chronological events. Kraak and He (2009) provide another example, where visualization of an individual’s space-time path within a space-time cube is illustrated in the context of collections of photographs, videos, and other geotagged that are often uploaded to the web for sharing. Mennis et al. (2013) developed an integrated software environment for the visual exploration of qualitative activity space data encoded as georeferenced narrative text. They employed a novel set of cartographic symbolization strategies to visually represent a variety of characteristics that may be attached to activity space locations and paths, including emotional attachments, the frequency and temporal character of visitation, as well as conventional quantitative GIS data that captures neighborhood socioeconomic status and related variables. They also showed that visualization could be employed in the context of different inductive and deductive analytical tasks, such as data exploration, hypothesis generation, and the evaluation of statistical modeling results. Hawthorne et al. (2015) used a wordcloud approach in comparing the qualitative reflections of different sites. This approach

12 allowed the textual information to be understood by specific spatial contexts. Dunkel (2015) also used wordcloud as a way of representing photo-tags of Flickr data on maps with spatial continuity.

Even though SV and SVG has matured into a fine scale data collection and exploration tool across many scientific domains, the software methodologies necessary to completely support a full cycle of data collection, storage, retrieval, visualization, and mapping are yet to be developed. Most of the initial works using SV as a spatial data source uses a combination of

Google Earth and standard video player such as Windows Media Player to generate layers of spatial information through mapping. While this could be a technique to generate new layers of local knowledge, the absence of synchronization between the spatial data extracted as a GPS path and the video could be a potential hindrance to knowledge generation through mapping.

With the large volume of video and spatial data generated through SV data collection, the storage challenge, including indexing and access needs to be more fully addressed. There is also a lack of software methodologies that support spatio-temporal querying on large volumes of SV data.

Other technical aspects, such as GPS drop outs, and absence of video stream need to be considered. Prior to the work covered in this dissertation, all SV and SVG work consisted of a simple bespoke code, ‘G-code’ (Curtis et al., 2015), a custom web-based software developed to extract out narrative comments and words from SVG, and the use of various off the shelf software. By developing new software methodologies for processing, visualizing, and analyzing data obtained from SV and SVG, this dissertation aims to maximize the local contextual potential from this new methodology.

13

Chapter III. Exploring Spatial Videos

Even though Spatial Video (SV) data collection methodology is being more widely used across different disciplines, several methodological challenges have emerged that limit various aspects of its utility. One such challenge is the variety of different video-GPS combinations that exist.

While spatially encoded video cameras are becoming more commonplace, there is no standard model for data merging of video and coordinates. Various formats of Global Positioning System

(GPS) data embedded in videos include EXIF (Exchangeable Image File Format) tags, shapefiles, or external text files. For researchers, and field collaborators, extracting the embedded

GPS and synchronizing it with video frames for exploratory analysis can be technically challenging, especially if they are reliant on camera specific software, as these companies often go out of business. Another important challenge for using SV as a technology is the amount of disk storage that is required to explore and analyze high quality spatial video. As an example, if three video cameras are used to capture the left, front, and right view for a SV ride, then this would generate a total of 12 gigabytes (GB) of spatial video data (assuming a forty- video is around 4 GB). As a result, transferring large volumes of SV data to multiple collaborators can be challenging especially with limited internet capability. A further challenge, especially in more challenging environments, is GPS error. Unfortunately, the types of environments where SV are most needed, are often those where signal error is the greatest. There are several reasons for this

GPS error, including latitude, closely packed passageways with overhanging roofs, multiple reflective surfaces, perceived danger resulting in frequent camera

14 concealment, and the generally poor quality of the GPS receiver found in many SV cameras

(Curtis, Bempah, Ajayakumar, Mofleh, & Odhiambo, 2019). Output error includes displacement of the GPS path, erroneous coordinates leading to large coordinate spurs and signal dropouts resulting in sections with no coordinate path. If SVs are to be used for crime and health interventions, it is quintessential that the GPS signals are of good quality.

Accepting these challenges, SV still provides an excellent opportunity for generating fine scale contextual data. The simultaneous map and video visualization provide ample opportunities for researchers to identify and classify spatial objects such as houses, stores, bridges, drainage channels, and open toilets, which can be used as data points themselves in an analysis, or as additional explanatory layers for more traditional data collection and modeling.

In order to address some of these challenges associated with SV data collection, while also providing a tool to enable researchers to map from these data sources, Camera Player and

Spatial Video Player (SVP) have been developed. While Camera Player supports SV processing and explorative analysis of the data, SVP utilizes a cloud-based storage system for the sharing and mapping of SV data across collaborative teams irrespective of separating distance. The following sections will provide an overview, technical details, workflow, and software implementation for both tools.

Camera Player

Different types of spatial video cameras have been utilized in research since approximately 2005.

Older versions of GPS enabled video cameras such as the system developed by NCGA or Red

Hen Systems (Lewis, Fotheringham, & Winstanley, 2011) have GPS embedded to the video through audio signals (with modem like frequencies), while more recent models use EXIF tags.

15

Even though cameras that have GPS receivers tend to have accompanying software that could extract GPS, they are prone to technical glitches and licensing issues, and in several cases, the closing of the company leaves archived data useless. With cameras having various ways of embedding GPS, and the embedded GPS itself having various format, it becomes quintessential to develop ubiquitous software that can extract and process GPS data from any SV. Once extracted, or if combined back with the video in a single display, these spatially encoded videos then become the digitizing source for mapping within the same system. Camera Player achieves these tasks by accessing SVs stored on a local drive.

GPS Embedding in Spatial Videos

The first step in the Camera Player process is to extract GPS from the SV video. The research team at the GIS Health & Hazards Lab has utilized five different SV systems for their work. The four most frequently used camera and data types can be seen in Table 1.

Table 1. Different Spatial Video camera types, with the format of representation and sample GPS snippet Type GPS Sample Format

Format Red Hen Shapefile LON,LAT,ALT,UTC,VTR_TIME PatrolEyes Text 10.05.17 10:28:09 A 18.534912,N,72.372948,W Miufly EXIF $G:2017-12-15 16:43:06-S1.271928-E36.833693 Contour EXIF $GPRMC,192009.00,A,2958.12922,N,09001.25247,W,10.775,281.29,100418

Video cameras from Red Hen Systems were mainly used in the Hurricane Katrina, California

Wildfires and 2011 Tornado projects (Curtis, Mills, Kennedy, Fotheringham, & McCarthy, 2007;

Burkett & Curtis, 2011; Curtis & Mills, 2012). MPEG-2, which is almost obsolete now, is used

16 for video encoding, while GPS signals are embedded as audio signals, which could only be extracted through demodulation of the audio. The demodulation process is carried out through separate hardware, which when used in tandem with a custom software could generate point shapefiles containing attributes such as latitude, longitude, altitude, and media time (Table 1).

The video file and the shapefile have the same file name with (.mpg), and (.shp) extension respectively. When a video file from Red Hen System is uploaded to the Camera Player, the software checks for a shapefile (.shp) with the same filename as the video within the file system.

If a shapefile is found, the software parses the shapefile and generates the corresponding GPX tags for further processing. This system was discarded because of the excessive costs involved and the hardware and software complexity, which limited use with collaborators.

Beginning around 2011, the research team began using Contour cameras which remains one of the most commonly used video cameras for spatial video collection (Lue, Wilson, &

Curtis, 2014; Paulikas, Curtis, & Veldman, 2014; Rahman, Schmidlin, Munro-Stasiuk, & Curtis,

2017; Curtis, Blackburn, Widmer, & Morris Jr, 2013). GPS coordinates are embedded into the video as EXIF tags. EXIF tags are formatted according to the Tagged Image File Format (TIFF) specification and can contain a variety of information such as the video creation date, audio and video encodings, along with GPS data and media time. The EXIF tag for the Contour camera include packets of GPS information in the form of sample time (media time equivalent), sample (duration of single pulse), and coordinates in National Marine Electronics Association

(NMEA 0183) format (Table 1). For this camera type, Camera Player extracts latitude and longitude from the GPRMC (GPS specific information) or GPGGA (GPS fix data and undulation) tag and generates a corresponding GPS Exchange Format (GPX) tags.

17

Miufly camera, which is based on a police body worn camera, has been used since 2018 by the team. Like the Contour camera, GPS information is embedded as EXIF tags. Similar to

Contour camera, the GPS data is in the form of a packet containing the sample time, sample duration, and coordinate data in a custom format (Table 1). For this camera, Camera Player parses GPS data using a pattern extraction method to generate GPX tags.

Unlike the Contour and Miufly cameras, the Patrol Eyes camera, which is an alternative body worn camera type, has GPS data generated as a separate text file (Table 1). The video file and the textual GPS data from the camera bear the same name, and Camera Player utilizes this information to find the text file containing the GPS data. A pattern matching method is used to extract coordinates, media time and Greenwich Mean Time (GMT) time from the text file with

GPS data.

After GPS extraction, the next step is to merge video with the coordinate stream.

Time Synchronization

Along with coordinates, the GPX file extracted from the video also contains a media time in timestamp format (Figure 1). The media time in a GPX file represents the video frame locations where there is corresponding locational information. This time and location pair are joined together to from tuples of the form , which is used to display the video and support map simultaneously. While each frame of the video is being displayed, the video player generates an containing the current video frame time information. The

Camera Player captures this frame time and checks the set of tuples for matching instances. If a matching instance is found, the corresponding coordinate values are retrieved and displayed on the map as a marker. As an example, if an event with frame time of 50 is generated, and

18 the set of tuples has a record of the form <00:00:50,42.4567,-81.345>, then the video player will extract the coordinates (42.4567,-81.345), and display it as a marker on the map (Figure 2).

There can be instances where a video exists, but a corresponding GPS is missing. The Camera

Player has an option to synchronize a video with external GPS. The user has an option to find a common synchronization point with the video and the GPS by using the map (for example a known coordinate at a visual reference such as a bridge), and can time synchronize the video and

GPS manually. The entire GPS path is represented as a polyline with coordinates as nodes

(Figure 3). If nodes are missing for a section of the GPS path, a different color-coded line is used to represent the missing data (Figure 3). After this time synchronization, the video and map are interlinked. The user can click on any point on the path and correspondingly, the video for the respective GPS timestamp is displayed. A video to map lookup can also be made by moving the seek bar of the video player to different segments. Along with the change in frame time, the marker location on the map also changes (Figure 2).

19

Figure 1. GPX file example with a single-track point’s latitude and longitude represented as an attribute and the media time as an extension tag

Figure 2. Synchronized video and spatial data. The synchronized data is represented as a set of tuples used for spatio-temporal querying.

20

Figure 3. GPS path for the synchronized video. The orange lines indicate missing path information for video time.

Mapping

The synchronized SV can be utilized for mapping by digitizing objects from the (now spatially linked) video frames. Camera Player supports mapping as well as classifying mapped spatial objects based on categories. Camera Player supports three different types of geometries including point, polyline, and polygon. At first, the user can create a new category using the category creator section (Figure 4A). Every category should have a unique name, a set of category values separated by commas, the spatial object type (point, polygon, and line), and an optional marker icon for point spatial objects. An example would be, a point type category with

21 name ‘blight’, having values ‘low’, ‘medium’, and ‘high’ (Figure 4A). Spatial objects can be added to the map using mouse clicks and its location can be verified through the video frames from SV (Figure 4D). The mapped points (Figure 4C) can be assigned to different categories by just clicking on the point and editing the required fields (Figure 4B). These map layers can then be downloaded from the software as a Keyhole Markup Language (KML) file or as a shapefile, which enables maximum access (through ) or more advanced or spatial analysis (using a Geographical Information System (GIS)).

Figure 4. Category creation for mapping. A) create a category B) edit category for a mapped spatial object C) create a spatial object and assign category D) spatial video for mapping

22

Workflow and Technical Implementation

The initial step in the Camera Player workflow (Figure 5) is the video upload. After the video file is uploaded, the GPS extractor module in the software executes a set of conditions to verify the format of GPS embedding. Initially, the GPS extractor checks for the EXIF tag metadata in the video. The EXIF tag extraction is facilitated by ExifTool, which is a free and open- source software program for manipulating video and image metadata (Harvey, 2013). String pattern matching is used to check for the occurrences of ‘$GPGGA’ (Table 1) or ‘$G:’ (Table 1) in the EXIF tag document. If matching tags are found, pattern matching is further used to extract coordinates as well as other relevant information such as media time, GMT time, and altitude. If the pattern matching algorithm is not able to find the required EXIF tags, the GPS extractor checks for a shapefile (for Red Hen camera), or text file (Patrol Eyes camera) with the same name as the video file. If a shapefile is found, a spatial library GDAL (Warmerdam, 2008) is used to extract the coordinates as well as other attributes. If the GPS extractor is not able to retrieve the coordinates, the user is notified with a missing spatial data notification. If there is no

GPS data associated with a video, the user has an option to upload an external GPS data. The synchronizer accepts GPS coordinates and media time to generate a data structure that contains spatio-temporal data tuples of the form . For time-based searches, a Python dictionary is used with key as timestamp and coordinate pair as value. For supporting spatial searches, a Python based k-d tree is utilized. Each leaf of the k-d tree is associated with a coordinate pair and coordinate pair is linked to a timestamp, which would help to retrieve timestamps associated with a coordinate during spatial searches.

23

Figure 5. Complete workflow for Camera Player

For mapping and spatial visualizations, JavaScript API is utilized in the software. The GPS coordinates are joined together and represented as a polyline in the map. The drawing library in Google Maps is used to add spatial objects in the form of lines, points, and polygons. Mouse clicks on the GPS path polyline are captured as events, and the event location is used to search for the nearest GPS coordinate from the k-d tree. The timestamp information associated with the retrieved coordinates from the k-d tree is used to move the video to the corresponding section (Figure 2). The entire software is developed using PyQT, which is a

Python based framework for developing standalone software. PyQT has an inbuilt web browser, which supports web technologies such as HTML5, and JavaScript. Camera Player utilizes chromium web browser for web-based visualizations, while all the algorithms for processing SV is written in Python.

24

Spatial Video Player (SVP)

Even though the Camera Player supports functionalities such as mapping, and visualization of

SVs, the problem of transferability of videos remain a major challenge especially in collaborative projects. Cloud-based storage solutions provide a better option for storing large volume data

(Wang, Ren, Lou, & Li, 2010). As a cloud service for video storage, YouTube can store an unlimited number of videos from a single account with a limit of 20 GB per video. Though cloud-based video storage platforms such as YouTube (using secure non-public channels) could be a solution for SV storage and transferability, the issues such as GPS being out of sync with video (GPS receiver might only start to receive signal after considerable video time), synchronizing multiple videos to a single GPS, and GPS signal errors needs to be addressed.

SVP was developed to tackle some of the important challenges associated with SV analysis and in the process generate a software suite that could aid fine scale geo-spatial analysis with GPS enabled video technology. There are four core sections in SVP, each with the task of addressing various challenges pertaining to SV data collection and analysis. The synchronizer module helps to combine cloud-based videos and GPS extracted from SVs. The visualization module has an inbuilt video player that can play multiple SVs synchronously, along with a dynamically changing GPS path interlinked to the video. The GPS Creator module helps to generate new GPS paths and the digitizer module helps to generate new spatial data layers from the SVs. The following sections will discuss in detail about the four different modules and the technical implementation.

25

Synchronizer Module

A major challenge associated with synchronization is when there are multiple videos to be combined with a single GPS path data. An example would be to synchronize the left, front, and rear view videos of a SV ride to a single GPS path, so that all videos could be visualized together along with map visualization. The synchronizer module helps to combine multiple SVs together by using the video time and GPS time. A combined set of videos could provide a panoramic view of the study area, which is helpful when using the SV approach as a communication tool.

The inputs for the synchronizer module include the SV video uploaded to YouTube and

GPS data extracted as a GPX file using Camera Player (Figure 6). Any user with a YouTube or

Google account can upload a video to the cloud platform. To maintain privacy, the video is uploaded in an “unlisted” category, as this would make sure the video is not directly searchable without a Uniform Resource Locator (URL). The GPS from the video can be extracted in a GPX file format using the Camera Player software. The synchronizer module accepts a GPX file, and displays the traversed path as a polyline in Google Maps (Figure 7). There are control options available to move along the GPS path (Figure 7), and the location for a particular instant of time is displayed as a marker. When the “play” button is clicked, the marker shifts position with each time-step indicating movement along the path. The user has an option to move to different locations in the GPS path by mouse clicks on the path polyline, and the GPS time shifts to the corresponding clicked location. After a GPS has been uploaded, the user can select a video through the embedded video player (Figure 7). The YouTube IFrame API, which supports

JavaScript commands, is used to embed YouTube player into the software. For uploading a new video, the URL for the video is added to the software. The software retrieves the “video-id” from the URL and acquires the respective video from YouTube cloud storage. Once the video is

26 retrieved, the user can use all the standard controls of the embedded video player to control the video playback (Figure 7). The IFrame API provide functionality to retrieve important information related to the video such as the current playback time (in seconds), and current video status (playing, paused, stopped, and searched for section) continuously. In order to synchronize a video and GPS, a common location is identified in the video as well as in the GPS path. The

Google Maps used for the GPS path visualization can be switched to “satellite” mode to provide high-resolution satellite imagery to help find synchronization points such as landmarks and intersections. Once a common location is identified in the map and a corresponding section is found in the video, the “add” option is used to add the “video-id”, and the offset differences associated with the video time and GPS time, to a video cart. As an example, if the GPS time associated with a synchronization point was 45 seconds and the video time was 150 seconds then the offset will be set as +105 seconds, while if the GPS time was 150 seconds and the corresponding video time was 45 seconds, then the offset would be -105 seconds. If the user wants to add another video for synchronization, the same set of steps is repeated. A maximum of four videos can be added to the cart after the synchronization procedure. After all the required videos along with the offsets are added to the cart, the user can download the cart as a single

JSON (JavaScript Object Notation) file (Figure 6). The JSON file is the key input for the visualization, and digitizer module. The file contains a set of “video-ids”, a corresponding set of

“offsets”, and the complete GPS data as coordinates (Figure 6). The maximum size of the file typically ranges around 100 to 140 Kilobytes (KB), which is minimal compared to a set of videos typically having data sizes ranging from 4 to 12 GB. The minimization in data footprint means the JSON file is easily transferred to a potential collaborator via email or other internet- based technologies.

27

Figure 6. Complete workflow for the synchronizer module

Figure 7. User interface for the synchronizer module with map for the GPS path, and embedded YouTube player to display the spatial videos

28

Visualization Module

As the name suggests, the visualization module utilizes the JSON file generated as an output from the synchronizer module to generate synchronized video and spatial visualization (Figure

8). The visualization module uses a combination of YouTube IFrame API and Google Maps

JavaScript API to manipulate video and spatial data. At the first stage, the JSON file containing the synchronized video information is uploaded to the software. Details about all the videos, the required offsets, and the spatial data are extracted from the JSON file by the module. Based on the number of videos, the visualization screen is divided into sections, with each section having an embedded YouTube video player (Figure 8). The coordinate information stored in the JSON file is used to generate a polyline, which is displayed in a Google Map as a GPS path. A single set of controls are used to manipulate the video and GPS data (Figure 8). On clicking the play button, a timer instance is generated, which increments GPS time for every second. The current

GPS time instance is used to progress the video playback. As an example, if the current GPS time is 50 seconds and there are two synchronized videos A and B with 0 and 10 second offsets, the video time for A will be moved to 50 seconds, and the video time for B will be moved to 60 seconds. The GPS time can be changed by clicking on different locations of the GPS path or by moving the search-bar available with the controls. The video time is calculated on the fly and changed based on the GPS time, using the offsets. The stopping time of the videos are identified while loading the video. This step is important, as there could be instances where one of the videos ends much earlier than the GPS. In order to make sure if a video has not ended during the current time-step, the current video time and the total video time are used to verify whether there are any video frames to be displayed. To facilitate nearest location lookup along the GPS path, a k-d tree is utilized. The k-d tree helps to identify the closest GPS location associated with a

29 mouse click and retrieve the corresponding time associated with the location. When the search bar is moved, a reverse lookup with time in seconds as key is issued, which will retrieve the corresponding location associated with the GPS time. Once the corresponding location is retrieved, the marker is moved to the matching location, and the videos are changed according to the difference in GPS time.

Figure 8. Visualization module showing three videos synchronized with a single GPS path.

GPS Creator Module

One of the major problems encountered with SV usage is the GPS error caused due to factors such as built-in environment, technical failures, and human induced errors. As SV is being used as a tool for fine scale spatial analysis, if the digitized map has errors, so too the results from spatial analysis. GPS errors can be classified into minor (with small deviations), and major (with large deviations or even missing path) types. The GPS Creator module is developed to resolve

30 major types of GPS errors such as missing path, and large distortions. The software utilizes SV that has been uploaded to YouTube as the reference video to generate a new GPS path.

The initial step in the workflow is to add a reference to the YouTube video, which will be used for digitizing a new path (Figure 9). The embedded player along with the YouTube IFrame

API is used to display the video. In order to support node addition to create the GPS path,

Google Maps along with its drawing library is utilized. The user has an option to upload a reference GPS path as a GPX file through the software (Figure 9). The addition of external reference GPS data will be particularly useful when the existing GPS data has large low-error sections. The node addition process is a straightforward task with the user adding marker pins based on the different insights gained from videos. Some of the important spatial objects that can help in node addition include road intersection, buildings, and other landmarks such as churches, bridges, and stores. One of the important aspects of the software is the ability to add video time detail to every node. As an example, if there is a matching landmark in the map as well as in a video frame at video time instance t1, a node is added at the particular location, then along with spatial coordinates, the node will also have the time information (t1) assigned to it. The time information is particularly important when generating new GPS data, as the media time attribute in the GPX file is used for synchronizing the video and GPS data.

31

Figure 9. GPS Creator. The marker on the map indicate the nodes of the new path, while the polyline shows the original GPS reference path.

The path generation between nodes uses a linear interpolation technique. Each pin (called a track point) is encoded with a location attribute and an associated time stamp from the video.

For example, location (x,y) at video time t seconds will have the information as a triplet of the form . The linear interpolation algorithm is used to generate spatio-temporal triplets between two track points that are separated in space and time. For two spatiotemporal triplets of the form and , the linear interpolation algorithm generates the series

, ,…, , . Therefore, if we have two track points with a data tuple of the form and and there is a temporal difference of 20 seconds, the algorithm calculates the slope for the line between and and uses this information to generate 20 points between and . The assumption of slope being

32 constant works for straight segments but tends to be erroneous when there are sharp turns. To tackle this problem, the user needs to provide additional reference points especially at turns and corners.

After GPS correction, the new path is downloaded as a GPX (GPS) file. The video can then be synced back to the GPS using the synchronizer module, in effect creating a new SV.

There are some limitations to the implementation of the coordinate correction software. The video needs to be uploaded to YouTube, which can be a potential problem for collaborators with slow internet connection. In addition, the accuracy of the corrected path depends on three factors: the user effort, overhead image quality, and type of environment. It should be noted that a variation of GPS Creator has appeared in the publication by Curtis et al. (2019).

Digitizer

The digitizer module in SVP provides a set of tools for basic mapping. The JSON file, generated as an output from the visualization module, is used as the input for the digitizer module. The SV used for mapping is retrieved from the YouTube cloud platform using the YouTube IFrame API, and the video is displayed using the embedded video player (Figure 10). Google Maps is used extensively to display path visualizations as well as for mapping.

33

Figure 10. Digitizer module used for mapping objects seen in the spatial videos.

Initially, the synchronized JSON file is uploaded to the digitizer (Figure 10). The digitizer parses the JSON file and retrieves the GPS path and the set of videos. The user can play a video, and based on the corresponding video time, a marker that shows the current GPS location also shifts. If the user wants to switch between videos, the required video can be played from the list of videos available in the table of contents (Figure 10). Since all the videos are coordinated to each other through GPS time, while switching between videos, the corresponding timestamp offsets are maintained. As an example, if the camera on left side of a car is being used for mapping and the current video time is at t1 seconds, when the user switches to another video

(possibly the right-side video), the video time is automatically shifted to match the corresponding time instance t1. This feature is particularly useful for mapping both sides of a street.

34

Apart from mapping spatial objects, the digitizer module also supports category creation.

Categories can be useful for classification of digitized objects as well as for spatial analysis.

Users have an option to create new categories along with the spatial objects, including points, polygons, and polylines can be assigned to any of these categories along with values and description, through the software (Figure 11). Further, all the spatial objects along with the categories and other attributes can be downloaded as a KML file or shapefile from the software.

Figure 11. Adding categories in the Digitizer. A) creating a new category B) the green marker represents a categorized mapped point C) spatial video display to enhance mapping

Discussions

SV is rapidly growing as a mobile data collection methodology that can complement official data sources as well as generate new contextual insights on places where there is a paucity of spatial data. Even though SV as spatial data collection methodology is growing in popularity there are still technical challenges that needs to be addressed. By developing Camera Player and SVP, two

35 major challenges related to the variability in GPS formats and the quality of GPS data are addressed. Further, the capability to offload video storage to the cloud and the ability to synchronize online video and GPS data is vital for collaborators in challenging environments or who have limited data storage capacity. The improved transferability option with cloud-based storage also helps researchers from multiple research domains to easily share SVs. The mapping option available with the SVP and Camera Player can be used to generate new layers of spatial data, which can be overlaid with more traditional spatial data. The new GPS path created using the GPS Creator module can further be used to identify and assess the various factors that affect

GPS signal quality. Next steps in this area could include automating the process of GPS correction through advanced techniques such as the Kalman filters (Hide, Moore, & Smith,

2003). Even though this study has mainly focused on extracting GPS data from four different cameras, there are other different SV sources, which generate GPS data in various formats.

Therefore, it is imperative that the Camera Player software is flexible enough (and the code open enough) to allow for updates and additions.

36

Chapter IV. Spatio-Temporal Exploration of Spatial Videos

Spatial Video (SV) data collection provides an elegant way to capture space-time variations in microenvironments, which would otherwise be difficult to capture (Mills, Curtis, Kennedy,

Kennedy, & Edwards, 2010). The ability to capture spatio-temporal changes and the inherent archival nature makes SV particularly suitable to study events where change occurs frequently and over relatively short periods, such as a community recovering after a disaster (Curtis &

Mills, 2012). This is because a researcher can use the video archive to revisit a single location to see what type and how quickly change has occurred. Alternatively, researchers could mine the video archive to perform a visual longitudinal analysis, for example along a single street of a declining community. This could provide a spatial layer of change that can be associated with other events, such as temporal variations in crime patterns. Indeed, this ability to capture spatio- temporal change makes SV a suitable data source to create models of processes that are non- linear in nature. Area of studies that have shown SVs utility over multiple time periods include blight monitoring (Porter, De Biasi, Mitchell, Curtis, & Jefferis, 2018) mapping support for community groups and disaster managers (Duval-Diop, Curtis, & Clark, 2010), crime analysis

(Curtis, Curtis, Ajayakumar, Jefferis, & Mitchell, 2019), and even water point environment assessments (Curtis et al., 2019). When the same path for data collection is repeated, then variations in an attribute for each period can be collected, analyzed and used as a metric of change. Such an approach was used by Curtis et al. (2010) to generate a Recovery Score (RS) to capture the improvement, stagnation, and decline of disaster-impacted

37 neighborhoods in New Orleans after the Hurricane Katrina. Further, these results were used to generate animations that gave a holistic picture of the recovery process across many neighborhoods. Another study by Mills et al. (2010) used longitudinal analysis of SVs to assess the variations in recovery after a large-scale wildfire incident in San Diego County,

California.

However, even though conceptually this approach is useful, and while different studies have manually attempted such a temporal assessment, currently there is no automated means to mine such a complex visual and spatial data resource. High quality SVs are data intensive (typically 4 GB for 40 of video), and a single SV collection might entail five cameras, collecting up to two of data each. Performing spatial queries on such large datasets can be challenging due to the high volume of spatial data associated with each video (a 40-minute video could contain around 2,400 spatial coordinates), and the high memory requirement needed to load all the videos. As a result, this chapter presents a new approach that can mine these data, and in so doing, open the opportunities for fine scale space-time change assessment. In addition, the video frames are also vital for comparative analysis, as it is the spatial object, such as a building, and not just the location that is the focus of our interest. In this way, it would be relatively easy to compare the rebuilding process across multiple streets for a particular period. This type of spatial querying of the SV archive could also support data linking with other external sources, such as American

Community Survey (ACS) data, hospital medical records, or police calls for service.

Spatial Video Library (SVL)

The conceptual mining of the SV archive is now presented as a Spatial Video Library (SVL) method. In general, this has been developed to 1) generate efficient spatial indexes for large collections of SVs and 2) perform exploratory space-time analysis on large sets of SV data.

38

To do this, SVL would require a crawler for extracting and indexing from a file system, and an explorer, to read the indexed files and allow for spatio-temporal explorative analysis.

Practically, this concept has been programmed as a stand-alone piece of software. The crawler utilizes an r-tree (Beckmann, Kriegel, Schneider, & Seeger, 1990) based implementation to efficiently index large sets of SV data, and the explorer, with the help of bounding-box queries on the r-tree based index, helps to retrieve SV datasets that could be explored through a video player and map. The following sections will describe the technical implementation, features and the workflow for each of the components in the SVL.

Crawler

The crawler in SVL helps to spatially index large sets of SVs, which could further be used to perform spatial queries on SV’s. Crawler takes a folder as input and recursively searches for all SVs inside (Figure 12). For all the extracted videos, the crawler uses an in-built parser to extract the Global Positioning System (GPS) data from the video. As project teams have used different types of SV cameras, slight modifications were required as to how GPS extraction occurs. For SVs generated from Contour or Miufly camera’s, an EXIF (Exchangeable Image

File format) parser is used to extract the embedded GPS data from EXIF tags attached to the video as metadata. For PatrolEyes camera, the GPS data is attached as a separate text file, which is extracted using a string-matching technique. The extracted spatial coordinates along with the media time for each coordinate is used to generate a spatio-temporal data array.

From the spatial extent of the coordinates, a bounding box object is created using the bottom left, and top right coordinates. The spatio-temporal data array and bounding box are added to an index Java Script Object Notation (JSON) file (Figure 13), which has the same name as the video file preceded by the word ‘_index’. As an example, if the SV file name is ‘video.mp4’, the JSON file name will be ‘video_index.json’. The index JSON file is saved to the same folder containing the video file. The bounding boxes for all the videos are combined together 39

to from an r-tree, which is a type of tree data structure, used for spatial access methods

(Figure 14).

Figure 12. Workflow diagram for the crawler. The input is a folder and output is an index file

Figure 13. Index JSON file containing when the video was created, the media time and the corresponding GPS coordinates and the GPS data bounds

R-tree is particularly useful for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The key idea of r-tree data structure is to group spatial objects based on their minimum bounding rectangle. The bounding rectangle allows fast checks to identify whether a spatial object is present, as a query that does not intersect with this polygon cannot intersect with the searched for objects. The data in r-trees 40

are organized as pages, each having a different number of entries. The leaf node stores the spatial object while the non-leaf nodes stores how to access the child node and the bounding box of all entries within the child node. The average time complexity for an r-tree search is

O(log n), where n is the number of minimum bounding rectangles. For crawler, the Python implementation of r-tree from the library libspatialindex (Hadjieleftheriou, 2015) is used. The bounding box for each video file, generated from the GPS coordinates, is added to the r-tree one by one. Along with the bounding box, a string containing the video file location and the

JSON file location are also added. After all nodes are inserted to the tree, the entire tree is saved to an index file to the root folder, with the extension (idx) having the same folder name as the root folder. As an example, if the root folder was ‘Joplin’, then the index file will be created in the folder ‘Joplin’ with the name ‘Joplin.idx’. This index file could be further re- used to perform bounding box intersection queries.

41

Figure 14. R-tree for spatial indexing. A, B, and C denotes nodes (bounding boxes), and d, e, f, g, and h are the leaves

Explorer

The explorer provides a platform for users to perform exploratory spatio-temporal analysis on a large set of SV data using the r-tree generated as the output from crawler (Figure 15). As the initial step of using the explorer, the index (idx) file is uploaded to the system. On uploading the index file, all the video file details along with the total video count is displayed on the left pane (Figure 15). A grid-based heatmap (Figure 16A) is used to visualize the spatial distribution of all SVs. The underlying r-tree helps facilitate fast intersection operations to generate the spatial distribution of the SVs.

42

Figure 15. User interface for Explorer

Loading all the SVs together in the explorer is not feasible and scalable for home desktops due to the extensive memory requirement in terms of RAM (random access memory). In order to tackle this problem, explorer has an option for the user to select a study area (Figure 16B). The user can draw a rectangular bounding box on the map, which will be treated as the study area (Figure 16B). On receiving the bounding box coordinates, the explorer searches the r-tree to retrieve all the bounding boxes that intersects with the study area. The JSON files from the matching bounding boxes are used to store all the GPS points, and to generate a temporary in-memory r-tree. The video file names along with the date in which the video was created (obtained from the JSON file), is used to generate video meta- data information.

43

Figure 16. A) Grid based heatmap showing spatial distribution of spatial videos. B) Study area selection. C) Spatial query with circular region as query object

Spatial queries can be performed by creating spatial objects using Google Maps. The example in Figure 16C shows a spatial query (in this case a circle) focused at E 20th Street in

Joplin, MO. When the explorer receives a query with a spatial object, it first extracts the bounding box for the spatial object. The bounding box is further used to generate an intersection query to the temporary r-tree, which holds the SV meta-data including details about the JSON and video files. This intersection test helps to reduce the number of point-in- polygon operations required to find the matching videos for a spatial query. Once the matching bounding boxes are found, the corresponding JSON files are retrieved and all the spatial coordinates are extracted. Point-in-polygon operations are conducted with all the spatial coordinates as points and the spatial object as the polygon. If at least a single point falls inside the spatial object, the corresponding video file is retrieved. Further, all the points that fall within the spatial object polygon are used to retrieve the corresponding media time, which are displayed as the time sections of the video where the spatial coordinates fall within the query polygon object (Figure 17). On selecting, the video name from the drop down, the corresponding video is displayed in the video player along with its associated GPS path

(Figure 18). A marker, which represents the current GPS location, dynamically changes its

44

position based on video time. When a particular section is picked from the selection panel, the video shifts to the corresponding time and the marker changes its position accordingly

(Figure 18).

Figure 17. Results of a query with year, videos and sections.

Figure 18. Results of a sample video section selection. The video is at the selected video section time and the red dot on the map indicates the corresponding GPS position

45

External datasets can be uploaded to explorer as a KML (Keyhole Markup Language) file. The spatial objects from the KML file are extracted by the explorer and added to Google

Maps. After the spatial objects are added to the map, each of them could be used to perform spatial queries to retrieve SVs that are in close proximity to the uploaded spatial coordinates.

Apart from performing spatial queries, the explorer also has an option to perform video frame-based search. As the video and GPS are synced together, the time corresponding to a video frame can be used to perform spatial queries. As an example, if a user wants to check whether there are any matching images in other videos, then, with the video paused at the required frame, the user can click on the video search option. The video search option retrieves the corresponding GPS location associated to the video frame, and based on the spatial coordinates and a user defined buffer (in meters), generates a new spatial query. All the videos that have coordinates that fall within buffer of the searched video will be retrieved and used for further exploration. This is particularly useful for longitudinal analysis as the same geography can be compared across all periods.

Another important feature available with explorer is the ability to download video frames as geo-tagged images. There is an option in the explorer to take snaps while viewing a video. As an example, if the user wants to download a video frame as a geo-tagged image, they can click on the take snaps option available in the selection toolbar. When the user clicks on the take snaps option, a pop up that has details about the current location (longitude and latitude) as well as fields for the frame description and frame file name are displayed

(Figure 19). The user can edit the fields and save the frame as a snapshot to a list of images.

Once all the snapshots are added (Figure 20), the user can download the data as a folder with coordinates (in a KML file) and pictures linked together (Figure 21).

46

Figure 19. A dialog box to add information to a geotagged image.

Figure 20. An image cart with geotagged images.

47

Figure 21. Geotagged images from the spatial video library opened in Google Earth as a KML file

Illustrative Examples

In order to demonstrate the utility of the spatial video library, an SV dataset containing 254 videos collected from Joplin, MO, between 2011 to 2016 was utilized. Joplin was struck by a

Tornado on 22 May 2011 in which more than 160 people died. The SVs were collected right after the tornado to gauge the damage caused by the tornado (Curtis & Fagan, 2013). Then, for each year up to 2016, the same area was re-surveyed using SV methodology. The main idea behind re-surveying the same area was to assess the rate of recovery in space and time.

For most of the SV rides two “Contour+2” cameras were mounted on a vehicle to capture the left and right side of the street. The total disk space utilized for storing the resulting set of 234 videos was approximately 800 GB. As the first step, crawler was run on the root folder containing all the 234 videos. This generated an index file (Joplin.idx) which was used as the input for the explorer. To illustrate a spatial query example, a single data point was added to the map in a location in E 17th Street Joplin (Figure 22). A spatial query using a 15-meter

48

buffer was then extended around this point to identify all SV intersecting the polygon. In order to demonstrate potential for longitudinal analysis, one of the houses that were severely damaged during the 2011 tornado was selected (Figure 23A). From the 2012 video frame

(Figure 23B); it was evident that the house has been completely renovated. The presence of vehicles at the car porch (Figure 23B) indicates that the property was occupied during the same time (2012). From the video frames for SVs in 2013 (Figure 23C), 2015 (Figure 23D), and 2016 (Figure 23E) it can be seen that the house has completely recovered from all the previous damages. For 2014, only one camera recorded useable video and it was positioned at the opposite side of the property.

Figure 22. A spatial point query with a buffer of 15 meters

49

Figure 23. Frames for a spatial point query for years A) 2011, B) 2012, C) 2013, D) 2015, E) 2016

To illustrate the usage of video search, a video frame was randomly selected from one of the 2011 videos containing the image of a heavily damaged house (Figure 24A). A 10- meter buffer was then used for the subsequent video search. The 2012 video for the same location (Figure 24B), reveals that the house has been completely removed along with the neighboring structures. No other new construction can be found for the years 2013 (Figure

24C), 2014 (Figure 24D), and 2015 (Figure 24E). However, from the 2016 video (Figure

24F) it can be gleaned that new construction has begun in the same location. This “recovery” has occurred years after the normally described period for such events, and as such, this type of tool could provide valuable data insights required for re-theorizing post disaster recovery.

50

Figure 24. Results of a frame-based query. A) The source frames are 2011, B) 2012, C) 2013, D) 2014, E) 2015, F) 2016

To demonstrate how SVL can also be used in conjunction with external data, Spatial

Video Geonarratives (SVG) (Curtis et al., 2015), which are used to contextualize spatial data are utilized. Using Wordmapper (Ajayakumar, Curtis, Smith, & Curtis, 2019), the SVGs are converted to geo-tagged sentences with a coordinate location for each sentence. Only sentences that include the word ‘damage’ are extracted from the SVGs and downloaded as a

KML file, which is then input into the SVL (Figure 25). One of the geotagged sentences

(Figure 26A), which contain details about how businesses recovered after the tornado (Figure

26B) is used to generate a spatial query, using a 10m search buffer. Five videos for the period between 2011 to 2015 intersect the buffer and were retrieved. From the 2011 video frame

(Figure 27A); it is evident that a Conoco gas station was completely destroyed by the tornado. By 2012 (Figure 27B), the gas station was back in business. The subsequent years show that the gas station continued to thrive (Figure 27C-E). The narrative from 2014 (Figure

26B) implies that the interviewee was impressed by the pace of recovery for that particular location. In this way, the SVL can be used to validate recent SVGs, or if re-analyzing older

SVGs, establish both pre and post visual assessments of what had been discussed. 51

Figure 25. Geonarrative sentences added to the spatial video library as an external dataset

Figure 26. A) A geonarrative sentence location, B) Geonarrative sentence in textual form for a location

52

Figure 27. Results from a spatial query using a geonarrative as an external dataset. The location is a Conoco gas station at Joplin, MO. The images are from A) 2011, B) 2012, C) 2013, D) 2014, E) 2015 Discussions

With rapidly improving sensor development and declining costs of storage systems and video technology, SV promises to be an efficient and cost-effective data collection methodology for obtaining, and retaining, a large volume of fine scale spatial data. In order to turn these data into insights, for input into spatial analysis and to develop new theory, efficient algorithms and software methodologies are required. SVL was developed as a software methodology for spatially indexing a large number of spatial videos as well as a tool to perform both spatial and longitudinal analysis using SVs. The crawler module in the SVL library helps to generate efficient scalable indexes, which can be reused for spatial searches. This idea of spatial indexing for SVs could be further extended to web-based large-scale storage system for the efficient retrieval of SVs. Such index-based systems are essential if SV is to be used as an important large-scale spatial data source as slow spatial queries can be a major hindrance to research. Further, such spatially indexed SV data can be used to generate a large-scale monitoring system, which could benefit health officials, disaster recovery managers, and urban planners to understand the longitudinal variations in an environment. The video frame

53

based search can be further extended to other advanced techniques such as image recognition using machine-learning approaches such as neural networks. As an example, if the longitudinal changes for a blighted property need to be assessed, then the frame-based search for the particular property can retrieve all the video sections that are spatially proximate.

Further, these frames could be analyzed using object recognition techniques to identify the searched property. The frame-based search in this case helps to reduce the number of candidate frames, which could be particularly useful to reduce the computation overhead for image analysis. The capability to add external data to the library could be useful to link SV data with more nuanced spatial data sources such as geotagged social media data. An interesting experiment would be to extract opinion about places from social media data, and to use SVL to extract the corresponding video frames. Even though the illustrative examples demonstrated in this study only shows videos that are segregated based on years, it could further be extended to support aggregation based on finer temporal granularity. Such fine scale temporal granularity-based aggregation can be used to study rapidly changing environments such as homeless or refugee camps.

54

Chapter V: The Use of Geonarratives to Add Context to Fine

Scale Geospatial Research

A version of this chapter appears in Ajayakumar et al. (2019).

There are many examples where spatial and social information combine to create a nuanced landscape of places, context and emotion. Sometimes this landscape is also dynamic, with events changing monthly, daily, or even by time of . To fully understand these processes at work, new types of spatial data need to be collected and analyzed, data that can complement more traditional sources of information, but also data that can provide context and fill the data gaps. One topical area that can be used to illustrate the importance of spatial context is the complex and dynamic landscape associated with a disaster, especially during recovery.

Recovery after a disaster arguably begins at the completion of the initial search and rescue, and then can extend through multiple years of physical and social rebuilding. Along that continuum, the human toll can be large—from the trauma of the initial experience through various psychopathological manifestation during recovery. These events are spatial, temporal and human. To fully address such dynamic landscapes, we need nuanced data that combines more traditional surveillance with ground-up insights. Indeed, the addition of context into geospatial health research is imperative if researchers are to understand the

55

process, as well as outcome (Curtis et al., 2016) addressing what Kwan describes as “the uncertain geographic context problem” (Kwan, 2012a; Kwan, 2012b). Environmentally inspired narratives, or geonarratives, provide a new means to collect spatially anchored insight (Curtis,

Curtis, Ajayakumar, Jefferis, & Mitchell, 2019; Evans & Jones, 2011; Kwan & Ding, 2008;

Mennis, Mason, & Cao, 2013). The rationale is compelling, have someone who knows about/is impacted by an event talk about what has happened/is happening. These insights can then be used to enrich other more traditional data layers. In this way, data gaps can be identified, while more traditional spatial analytical output can be contextualized. For most geonarratives, a commentary is recorded while moving through an environment. The addition of video, as is used in a spatial video geonarrative (SVG), captures even more context, in effect allowing the researcher to virtually recreate the narrative to see where comments were made and what is being described (Curtis, Blackburn, Widmer, & Morris, 2013).

There are many advantages to this form of data collection, especially for challenging environments. While the video provides a visual resource that can be used for digitizing map layers, the simultaneously collected global positioning path (GPS) path anchors each frame to an exact location. On these mapped layers, conceptually, are the comments that provide “depth” to the map. Again, each comment is matched to both location and video frame providing a rich understanding of the outcomes and processes at work. This same approach can then be applied across multiple interviews collected for the same time, or across different periods, to gain multiple perspectives of a location, or to see how that location changes. In this way, more traditional data, such as overdose locations, can be enriched with an understanding that reaches beyond a point on the map. However, what has been missing to make SVG a truly ubiquitous tool, is an easy means to combine video, narrative and coordinates in such a way that a non-

56

GIScientist can fully utilize this method. In order to address this deficiency a research support tool called Wordmapper is developed.

Mobile technologies provide exciting new avenues of spatially focused research across a variety of disciplines. These techniques provide a novel means to capture data for micro spaces of activity (Bell, Phoenix, Lovell, & Wheeler, 2015; Kwan, 2002; Miller, 2007; Phillips, Hall,

Esmen, Lynch, & Johnson, 2001; Rainham, Krewski, McDowell, Sawada, & Liekens, 2008;

Rodríguez, Brown, & Troped, 2005). In the study of human agency, these technologies can also provide a means to add “context” to more traditional geospatial analysis (De Smith, Goodchild,

& Longley, 2007). While the importance of these technologies and approaches are widely appreciated (Mooney, Corcoran, & Ciepluch, 2013), an impediment to their more common application remains the ease and utility of how to transform textual data with spatial references into mapped formats. Thus, some researchers who might utilize this approach remain unconvinced because of their lack of geospatial and programming skills. It is therefore important to provide a bridge from mobile spatial technologies and their data exported to a platform that qualitative researchers with diverse methodological backgrounds can easily use. This study shows how this can be achieved by merging commentary and geography into simple, mapped outputs. To illustrate the potential of this approach a case study of geonarratives collected after the tornado that devastated Joplin, Missouri in 2011 is utilized. The purpose of those narratives was to consider the human consequences along the devastation to recovery continuum.

Health researchers incorporating spatial data have argued that we need to consider-the- local because of its importance in designing effective intervention. Here we mean “local” as a subsection of a neighborhood, along a street, or even around a building. While new geospatial technologies can provide data layers to inform investigations at this scale, an important

57

consideration is the addition of context; not just, where a hotspot occurs but why? One approach to a more nuanced consideration of spatial data is a qualitative geographic information systems

(GIS) approach (Jung & Elwood, 2010; Matthews, Detwiler, & Burton, 2006; Pavlovskaya,

2006), which can include mixed methods (Knigge & Cope, 2006), geovisualizations (Elwood,

2006; Teixeira, 2018) and narratives (Jung & Elwood, 2010; Bell, Phoenix, Lovell, & Wheeler,

2015; Hawthorne & Kwan, 2013). One of the most exciting methods in terms of enriching context are “Geonarratives” (Kwan & Ding, 2008), which use the functionality of the GIS to analyze and interpret narrative materials, such as interviews, oral , life histories and biographies. Madden and Ross (Hawthorne & Kwan, 2013) demonstrate how geospatial technologies can be used to support narrative testimonials of individuals related to human right issues, while Kwan illustrated how emotional can be incorporated within geospatial technologies to provide a richer representation of the lived experience (Kwan, 2007). Curtis and colleagues extend this work by combining geonarratives with simultaneously collected video to provide visually and spatially supported context across a variety of different sub-neighborhood spaces (Curtis, Curtis, Porter, Jefferis, & Shook, 2016; Curtis et al., 2015). In this last example, spatial video (SV), which are video encoded with, or combined with, a coordinate stream, provide a valuable addition to the narrative as the researcher can return to the image to see what and where was being discussed. SV is in itself a novel geospatial approach that can be used to create spatial layers for different data poor environments (Mills, Curtis, Kennedy, Kennedy, &

Edwards, 2010), with examples, including disaster science (Curtis & Fagan, 2013; Curtis &

Mills, 2011; Montoya, 2003), medical and (Curtis, Blackburn, Widmer, &

Morris, 2013; Curtis, Curtis, Porter, Jefferis, & Shook, 2016), environmental and social justice

(Duval-Diop, Curtis, & Clark, 2010), (Crump et al., 2008), and crime (Curtis &

58

Mills, 2011; Doran & Lees, 2005; Pain, MacFarlane, Turner, & Gill, 2006). In these examples, the addition of a spatial video geonarrative (SVG) provides associated context through the day- to-day experiences and insights of those who occupy that same space.

While SVG offers an advantage to qualitative researchers with an interest in more explicitly spatial data, for example, why one corner of a city park is associated with violence, an obstacle has been the lack of any easy-to-use software. There are examples of customized software developed for research needs; Kwan and Ding’s (2008) 3D-VQGIS could be used for the analysis of textual data in a GIS, while Mennis, Mason & Cao (2013) created a visualization system for the exploration of narrative activity data using ArcGIS and Visual Basic (VB). For

SVG, Curtis et al. (Curtis et al., 2015) utilized a customized web program called G-Code to create geo-tagged words that could then be mapped in a GIS. However, the implementation of all these approaches has been far from ubiquitous. With GIS becoming popular for application focused research in various domains, such as public health, , archaeology, psychology, economics, and political science (Sinton & Lund, 2007), developing user-friendly tools for cross-domain research becomes more important (Aburizaiza & Ames, 2009).

Wordmapper is developed to bridge this gap between the research potential as identified by geospatial scientists, and the applied use of geonarratives by users with a limited geospatial skill set. This study opens this field to not only a broader set of topical domains, but also to the type of research needed on how variations in input, data collection, and method selection can affect geonarrative research.

59

Design and Components

To address this research gap for a ubiquitous use of SVG a conceptual frame for how narratives can be mapped is developed. This can be seen in Figure 28, which is comprised of six modules used to address the geonarrative problem. This means that transcribed text in the form of a narrative and associated GPS data needs to be combined to create a geonarrative dataset where comments and words become spatial objects. By doing this a researcher can generate queries based on textual attributes, such as keywords, and spatial queries based on the geotagged content.

Figure 28. Conceptual diagram for Wordmapper with the six different modules. To achieve this goal, a working conceptualization model based around six modules was developed. This schema provides the inspiration for Wordmapper. There now follows a more detailed conceptual explanation of these modules while also linking to the practical application as experienced when using Wordmapper.

60

Preprocessing Module

Once a narrative has been collected, it is then transcribed. In order to be able to match the text to a spatial location, the time when each comment or word occurs in the narrative must also be recorded. The preprocessing module accepts a narrative in the form of a text file and GPS data in the form of a Comma Separated Value (CSV) file (see the upcoming Figure 31A). The narrative is a text file with each sentence having the timestamp on the audio recording inserted at the beginning. The GPS data is a Comma Separated Value (CSV) file with latitude, longitude, and time in Coordinated (UTC) format. The offset time between these two data inputs is the difference between the media time of the audio recording and GPS time. Simply put, if the audio starts at 0:01 seconds, and we know the Greenwich Mean Time (GMT) of the GPS when the first word is spoken, we can sync both data streams together.

Apart from accepting these inputs, the preprocessing module should also validate the data inputs, especially out-of-sync time stamps in the transcribed narrative. A custom GPS correction software (Curtis, Bempah, Ajayakumar, Mofleh, & Odhiambo, 2019), developed by the GIS

Health & Hazards lab at Kent State University (Kent, OH, USA) to correct the positional errors in GPS. The corrected GPS is only used for further analysis

Combiner Module

The combiner module synchronizes the narratives and GPS data to create geonarratives. To do this, all narrative sentences are initially of the tuple form , which along with offset time is used to match the starting index in the GPS data (Figure 29). The starting index in the GPS is utilized to generate geonarratives sentences of the tuple form

61

sentence, location>. The mapped sentences are further utilized to generate words with spatial coordinates through a word interpolation algorithm.

Figure 29. Illustration of the GPS record to narrative mapping.

The first step in generating spatial words consists of the tokenization of the sentence.

Natural Language Toolkit (NLTK) (Loper & Bird, 2002), a Python-based text processing library, is used for a more sophisticated tokenization strategy. This is in preference to the commonly used white space based tokenization approach, which can create meaningless word tokens that may degrade the quality of geotagged words. An illustrative example is of the tokenization of a sentence, “I…am here.” A simple white space based tokenization would split the sentence into two words, ‘I…am’, ‘here.’, which are not the intended word tokens. An NLTK based tokenizer would instead have as output five words, ‘I’, ‘…’, ‘am’, ‘here’, ‘.’, from which the meaningful three words can be easily extracted. The timestamp information from the narrative record is used to assign a timestamp for each word beginning a section (the narrative that follows the time stamp in the transcription) as a tuple of the form .

62

The GPS locations for the start and end of a narrative sentence provide coordinates for the first and last word respectively. From the coordinates for the first word (X1, Y1) and the last word (X2, Y2), the angle of incidence of the path (α) can be calculated using the Point-Slope equation. By assuming a constant speed of travel and an associated known time stamp, the distance between an unknown word and a reference location, z, could be found out using average speed formula. By utilizing the distance (z), and the angle of incidence (α), and a reference word location (X1, Y1), the coordinates for the new word (X, Y) can be calculated using trigonometry

(Figure 30). Once the location information is deduced, the word can be represented as a triplet of the form . A maximum interpolation time can also be added in case there is too much “dead air” between comments. The narratives that are successfully processed and mapped are added to a narrative list for the user to conduct further exploratory analysis.

Figure 30. Interpolation algorithm for geotagged words.

Visualization Module

The visualization module allows for an interactive dynamic visualization of geonarrative. The

JavaScript-based Google Maps API is used to display the narrative path, as well as narrative sentences with coordinates as markers. Each marker also displays as a Google Maps Info

63

window with the narrative text (Figure 31B). Apart from the interactive map, Wordmapper also has a table with each row displaying the narrative text, which dynamically animates (Figure 31C) the corresponding marker, zooming into that particular location of the sentence.

To further aid the visualization of the spatial words and to help evolve the search through co-occurring words and phrases, a dynamic and interactive wordcloud (Figure 31D), which is a graphical representation of word frequency, is added. A further advantage of the wordcloud is that the searches (and visual representation) are matched with mapped output, which can help to identify where the mentions are made. In other words, this is a form of a spatialized wordcloud, which again promotes iterative searches.

Query Module

The query module supports extensive keyword-based searches (Figure 31E). Apart from supporting strict keyword matches, the query module also supports wildcard-based searches. For example, a search with the keyword ‘recover*’ can match geonarrative sentences with the word

‘recover’, ‘recovering’, ‘recovered’, and ‘recovers’. Again, as mentioned in the previous section, the wordcloud can also be used to generate keyword-based search through click interactions.

Category Module

Apart from spatio-temporal and contextual information, geonarratives are also an excellent source for extracting latent knowledge (Figure 31F). Therefore, having the ability to create category types, and then being able to assign each comment into those categories can help with both spatial and thematic investigations. For example, comments in a narrative could be assigned to different themes (health, violence, recovery etc.), or by spatial content (spatially specific, fuzzy or inspired mentioned), or even by time (far , recent past, current).

64

Output Module

Even though the preceding modules are important in creating a data set that can be investigated, manipulated and even mapped within the same software, it is also important to have the flexibility to be able to transition these data to other software. For example, for those with GIS experience, the spatial narrative data, and the geotagged words are output as two types of ESRI shape files (Figure 31G). These include a point file where every word in the narrative has a location, and secondly a coordinate for the beginning point of every transcribed section of the narrative. These GIS outputs allow the spatialized narratives to be combined or compared with other relevant datasets. For example, if the user had entered the search “injury”, then the corresponding point shapefile has a column where every comment, and every word, are identified as matching the query. These can then be used as input to a kernel density analysis to create a hotspot map of injury perception (Curtis, Curtis, Porter, Jefferis, & Shook, 2016). The corresponding SVG comment shapefiles that fall inside these hotspots can then be read for additional context about what was said regarding various injuries at the time of the disaster and in the recovery period after. The same approach works with hotspot analyses of more traditional data, such as 911 calls for service, overlaid on the same damaged and recovering areas in the after the disaster (Curtis, Curtis, Porter, Jefferis, & Shook, 2016). For researchers, not familiar with a GIS, Wordmapper provides an alternative output in the form of Google Earth’s spatial data format, Keyhole Markup Language (KML). The benefits of Google Earth are many, including ease of use, and free satellite imagery and aerial photos of most of the earth’s land surface making it a popular research tool for mapping in the health domain (Kamel Boulos et al.,

2011; Cinnamon & Schuurman, 2010). The KML output mirrors that for the GIS; a path of points with a visible word attached, and points showing the start of each narrative section. While

65

not having the power of manipulation and analysis of a GIS, the user can still see exactly where words are located, can zoom into the map for a particular location, or use the text in the side bar content field to go to a key part of the narrative, and then read the longer text stream at and around that location. This makes these data spatially accessible and interpretable for any researcher. Of note is that any search word is immediately identified as a yellow pin on both the map and in the table of contents. This means a researcher with no geospatial technology training can still gain a geographic perspective on the narrative. Wordmapper also provides output for non-spatial textual analysis for specialist qualitative analysis software, such as NVivo.

Figure 31. Wordmapper interface diagram A) Input module B) Map in visualization module C) Narrative sentence table D) Wordcloud in visualization module E) Query module F)

Category module G) Output module To illustrate the potential of collecting SVG data and then analyzing it using

Wordmapper, a small case study is presented based around the devastation of the 2011 Tornado in Joplin, Missouri. This has been chosen as events in the United States during 2017 and 2018 have left multiple communities devastated by flood, fire and hurricane, and the health impacts of those events will extend across a multiyear period in terms of direct and indirect disaster related

66

illness. The SVG provides a way to capture this complex interaction of spatial, social and temporal outcomes.

An Empirical Illustration: A Spatial Video Geonarrative in Joplin, Missouri

SVG have been collected for a variety of different environments and topics, including the health needs of a community, predicting crime, homelessness, overdoses and risks associated with infectious disease. However, the first applications of both SV and SVG were on the physical damage and subsequent health consequences of post disaster landscapes (Duval-Diop, Curtis, &

Clark, 2010; Curtis & Mills, 2012; Curtis, Duval-Diop, & Novak, 2010; Curtis, Mills, Kennedy,

Fotheringham, & McCarthy, 2007). The SVGs collected from Joplin, Missouri, after the tornado of 2011, which was the deadliest single path tornado in the United States since 1947 with over

160 deaths, is used to showcase how SVG can be used to contextualize a dynamic landscape.

The tornado hit in the early evening of Sunday 22 May leaving a 6-mile path of destruction with widths of up to 1/2 mile. Multiple land use areas were destroyed, including several residential neighborhoods. Soon after the tornado, a SV team mapped the damage (Curtis

& Fagan, 2013), and then repeated this survey at regular intervals to monitor recovery. As part of this recovery research, several SVGs were also collected in 2014 to record different perspectives of the storm, the complexity of recovery, and especially the human perspective and challenges faced. An initial mapping of these SVG used a precursor to Wordmapper called G-Code that interpolated words between the beginning points of transcribed narrative sections (Curtis et al.,

2015). While this was an important tool to begin the mapping of a SVG, the output consisted of an online text screen from which three columns of data (ID, time and word) had to be copied and pasted into Excel, and then manipulated in a GIS for mapping purposes. While some

67

collaborators could work through these steps, for many it was a problem. A further limitation was that G-Code resided on a server and connectivity problems sometimes prevented access.

Given these limitations, and with a growing desire to expand collaborations, the previously described way to standardize the mapping of a SVG was conceptualized, and Wordmapper software was developed.

Five SVG’s were collected during 2014. For each ride, two “Contour+2” cameras were mounted on a vehicle, while an audio recorder captured the commentary of a test subject as he/she navigated through what had been the tornado path. After collection, all narratives were transcribed, with each subject comment having a media time stamp in the form of [hh:mm:ss] added before the first word. The GPS path was extracted from the video (using Contour

Storyteller software) as a CSV file, and the Greenwich Mean Time (GMT) of the video media time that matched the first word of the transcription was used as the offset time for Wordmapper.

The preprocessing module in Wordmapper reads in the narrative file, csv file with location, and offset time. The combiner module combines the narrative and location data to create geonarratives, which are added to the Wordmapper geonarrative dataset queue. The dynamic wordcloud generated from the Wordmapper provides an initial comparison of the narratives. Figure 32A displays a general wordcloud for one of the narratives without any keyword search. The wordcloud suggests that words, such as “tornado”, “apartment”, and

“rebuilding” are frequently used in the narrative. A second query uses “recover*”, which matches “recover”, “recovers”, “recovering”, and “recovered”. The corresponding wordcloud

(Figure 32B) indicates that ‘rebuilding’ is a high frequency word associated with ‘recover’.

These provide an initial insight into what might be an interesting second search, such as

“rebuilding” (Figure 32C).

68

Figure 32. Wordcloud diagram for A) an entire geonarrative B) For the keyword ‘recover*’ C) and then ‘rebuilding’.

The visualization window in Figure 33 displays the path associated with the keyword ‘recover*’.

The red pin shows the location of all the narrative sentences while the yellow pin identifies those containing the searched for keyword.

Figure 33. Visualization module showing the narratives containing the searched for keyword ‘recover*’ in yellow pins. All other narrative sentences are shown by red pins.

69

Figure 34A shows one of the search results matching the word ‘recover*’ and indicates a question by the interviewer. The text box in Figure 34A reads, “S: Okay. One of the things that we’re really interested in are impressions of the pace of rebuilding and recovering. Just your overall impression of the rebuilding and recovery process is slightly more than three years on.

Would you say that it’s been a good process, a poor process, do you notice anything striking about the process.”. By examining other matching sentences, it was found that the word

‘recover’ is mainly used by the interviewer to enquire about the recovery process after the tornado to the subject. By examining the consecutive narrative sentences (Figure 34B), it was found that the responses from the subject contains valuable insights about recovery without mentioning the word “recover” (rather words, such as ‘rebuild’, were used). Text box on Figure

34B reads, “W. Excuse me, I think it’s actually been a very good process. It turns out, for example, the place where I lived during the tornado, it was completely destroyed, but it’s been subsequently rebuilt, and apparently and is much safe. So at least in that personal area it seemed very quick and good rebuilding. But generally, in the areas affected by the tornado. The community was quick to respond and rebuild as far as I can tell, even local businesses and national businesses that had stores here in Joplin were also quick to rebuild.”.

Two things can be gleaned from this. Firstly, the everyday conversation about recovery may not actually include the term “recover”, and instead the activities and outcomes of

“recovery” are described. Recovery may be an academic rather than a colloquial term. Thus, it would be useful to return to the comments proximate to the “recovery” cues to identify more appropriate search words. This may not have been so clear without the mapping approach.

Indeed, this iterative exploratory approach is one of the most compelling features of

Wordmapper.

70

Figure 34. Wordmapper visualization module showing the A) narrative sentence for the keyword ‘recover*’ and B) response by the interviewee about recovery without using the word ‘recover’. In another example (Figure 35) an interviewee mentions a convenience store that was totally demolished by the tornado. The text box on Figure 35A reads, “Here – that convenience store there was one of the places that was totally demolished, and I guess there was a recording of a cell phone call that people hiding in the – people hiding in the freezer. [unclear] got, uh, widely distributed nationally.”. The corresponding news extracted from the New York Daily

News (Sheridan, 2011) (Figure 35B) confirms the validity of the statement. This example shows how external sources can be used to triangulate the findings.

71

Figure 35. A) Narrative sentence about a convenience store that was completely destroyed in the tornado B) the corresponding tornado article from the Daily News.

It is also important to go beyond simply mapping text, and instead mine the content, and the structure of the content further. To do this, Wordmapper allows for the creation of different categories, which can then be used to create comment subsets. For example, while these categories could be thematic (recovery or for example), they could also be spatial.

Two categories, ‘Location Specific’ and ‘Fuzzy Space’ are created, which are used to classify comments that have explicit mention of a location, such as ‘this house’, ‘that church over there’, or comments that have vague place mentions, such as ‘this area’, and ‘that street’. A combined wordcloud (Figure 36) from both categories indicates the use of deictic words, such as ‘here’,

‘this’, ‘there’ and ‘that’. The spatial deictic words could further be used to aid Machine Learning based approaches that would automatically classify spatially cued sentences.

72

Figure 36. A combined wordcloud for ‘Location specific’ and ‘Fuzzy Space’.

Additional investigative power occurs when these tools are combined, for example, a search based on the keyword ‘damage’ and the category ‘Fuzzy Space’ helps to identify locations where mention about ‘damage’ was made. From the example (Figure 37), it seems that that the word ‘damage’ is more associated with an area than a particular location. A more specific location tends to have more detailed descriptions. The text box on Figure 37 reads, “Now this area I remember something because as I mentioned—we stored a number of our possessions with a friend [unclear] who lived at the very eastern end of the tornado zone. The house suffered a little damage, but not much. But because there was an awful lot of work going on—at the site of the east middle school—we very often had to detour through this road—the”.

73

Figure 37. Narrative text for matching keyword ‘damage’ and category ‘Fuzzy space’. However, keywords can also be used as a point source for analysis, and to illustrate this, three different damage related categories, including, ‘Low Damage’, ‘Medium Damage’, and

‘High Damage’ were created using Wordmapper. Only narrative sentences that contain spatial information along with content related to damage were classified. A sentence, such as “And this whole area, maybe a little wind damage, but that’s it” was classified as ‘Low Damage’ and given the value 1, while a sentence, such as “Moderate to light damage. I know that that particular church for a while was providing shelter to some people.” was classified as ‘Medium Damage’ and given a value of 5, and sentence, such as “Here ? that convenience store there was one of the places that was totally demolished, and I guess there was a recording of a cell phone call that people hiding in the ? people hiding in the freezer. got, uh, widely distributed nationally.” was classified as ‘High Damage’ and was given a value of 10. Kernel Density Estimation, a

74

technique to calculate the density of point and line features, was used to create a density surface with the classified damage values. The map (Figure 38) shows a high concentration of damage mentions near the intersection between South Rangeline Road and E20th Street and Rex Avenue.

From newspaper articles, it could be identified that Walmart, Home Depot, and Academy Sports and Outdoors, which were in the intersection near to South Rangeline Rd and E20th Street, were completely destroyed in the tornado (Rogers, 2011). The second main hotspot in Rex Avenue is near the Plaza apartment, which was completely destroyed in the tornado (Kennedy, 2011).

Figure 38. Heat map for ‘damage’ category containing spatial references.

While this type of categorical heat map may not replace actual damage estimates (in this example), but it does provide the basis for comparative mapping to see how this approach could be used in areas where no or little data exists. For example, the above heat map could be repeated

75

for emotional content, or mentions of recovery related illnesses or other health conditions. In both examples’ maps made from more traditional sources are likely to be incomplete.

Discussions

Any spatial data, whether physical (locations of damage), or health (suicide locations) only tell a partial story unless contextualized. One way to capture these insights is through an SVG. Now, using Wordmapper, we also have the means to fully leverage these data. More specifically,

Wordmapper allows the transcribed narrative to be combined with coordinates in an easy to use interface that supports both textual and spatial investigations. These queries are not linear in the sense of a more typical qualitative GIS approach, but rather iterative, supported by images and text. The addition of categories that can include both content themes and spatial structure, along with supporting visuals, such as maps and wordclouds, encourage the user to evolve their search in a nonlinear way. Wordmapper has been designed for use by non-spatial science, nor computer science savvy users. In this way a local mayor’s office, or a non-profit, or a public health official could benefit from SVG use. This has the additional benefit of also making these groups more willing collaborators as the benefits are tangible.

Wordmapper is designed to be standalone software, which both maximizes the security of working with potentially sensitive data (Johari & Sharma, 2012; Josselson, 1996), while limiting the reliance on access to an external server, which also may be prone to network outages, and server downtimes. These are important considerations for collaborators working in the field, especially in locations where internet connectivity has questionable security, for example, international work in data challenged locations. Even though the software can provide almost all functionalities, including search, download, and qualitative coding without an internet

76

connection, the map display based on Google Maps API requires an internet connection. In future revisions of the software there is a plan to include data driven document (D3) maps with minimal mapping features as an alternative to Google Maps during internet connection disruptions. An important security concern regarding spatial confidentiality would be the usage of Google Maps API for narrative location display. As textual data is not transferred across to

Google servers, the confidentiality issue is minimal, even though in future, the plan is to incorporate more privacy preserving mapping approaches, such as offline mapping (loading map tiles) for future revisions of the software. Indeed, while the illustration used here involved a post- disaster landscape, the topical application of SVG is broad. Current SVG collaborations involve multiple overseas environments associated with infectious disease, or multiple cities in the

United States addressing problems of homelessness or child injury, or even more rural communities fighting opioid addiction. The point is that SVG is a ubiquitous method that now has an equally accessible means to fully leverage these data.

Developing Wordmapper, however, is only the opening to many other strands of associated research on the technique itself. It has previously been suggested by Curtis and colleagues that comments are either spatially precise (for example “people were injured in this building here”), spatially fuzzy (“all around this area the sirens couldn’t be heard”) and spatially inspired (“I’ve heard a lot of people complaining about dust”) (Curtis, Curtis, Ajayakumar, Jefferis, & Mitchell,

2019). It would be interesting to see the proportion of comments falling into these categories, and then how each was cued in the narrative. Could, for example, certain key words be used to automatically extract out only those spatially precise comments, such as ‘deictic’ words, which could provide contextual information about a person, place, and time (Fillmore, 1966). It would also be interesting to see how close each word describing a specific place is to that place, which

77

would then have knock-on implications with different methods of analysis. Questions could also be investigated with regards to how such findings vary based on location, cohort or the framing influence of the interviewer. While there is a rich body of research into interview techniques, the

SVG is a sufficiently different method to warrant further investigation, and Wordmapper can help with that analysis.

With advances in the area of Natural Language Processing (NLP), researchers are using approaches like sentiment analysis to identify and extract emotional context from sources, such as microblogs (Pang, Lee, & others, 2008) web-forums, and narratives (Mihalcea & Liu, 2006).

With the help of open source tools, such as Natural Language Toolkit for Python (NLTK), which already supports basic level sentiment analysis, textual data in the form of narratives could be used for the systematical empirical investigation of emotions (Kleres, 2011). It could be argued that SVGs offer a far richer data set than these other forms of text, and as they also contain the location, there is the possibility of conducting a place-based sentiment analysis.

Spatial video geonarratives are an exciting new approach for data collection. They can capture institutional knowledge, record a variety of different perspectives for the same space, be used for historical and contemporary investigations, reveal where overdoses are likely to happen, where gang violence may erupt, or provide context to stop an epidemic. This approach is potentially transformative, but two areas currently limiting a more widespread use are, the need for unwieldy data manipulations in a variety of software packages, and a reliance on internet connectivity. Both these deficiencies limit the uptake of the method in non-spatial disciplines, and with partners working in challenging environments. This study has addressed these needs in the form of Wordmapper, which allows researchers to effectively process and analyze data being generated in various formats, while also having access to a variety of tools suitable for different

78

spatial skill levels. In so doing, this study has opened the way for SVGs to become a more widely used tool to add context into spatial-social research.

79

Chapter VI. GeoCluster: A Software for Extracting Spatial Patterns

From Spatial Video Geonarratives

Previous chapters have described the utility of using the spatial video geonarrative (SVG) to supplement or even replace more traditional spatial data. While these data provide contextual

(qualitative) insight into the processes at work at sub neighborhood scales, there is also a need to use more traditional quantitative analysis of these data. Most studies using the SV or SVG approach have relied on exploratory visual analytic techniques such as Kernel Density Estimate

(KDE) (Curtis, Curtis, Porter, Jefferis, & Shook, 2016), followed by contour mapping (Curtis et al., 2015) to explore the underlying patterns of either digitized objects, or key words extracted from the narratives. For SVG, Wordmapper is used to extract numerator sub groups (keywords) while all words are used as the denominator distribution. In this way, thinking of words as spatial data points, more statistically rigorous spatial analysis techniques can be used to identify, extract, and assess patterns within the SVG.

There are many techniques available to identify and determine the statistical significance of spatial clusters with point data sources (Kulldorff & Nagarwalla, 1995; Ester, Kriegel, Sander,

Xu, & others, 1996). While a technique such as SaTScan (Kulldorff, Rand, Gherman, Williams,

& DeFrancesco, 1998) uses a likelihood ratio test to identify clusters, other techniques such as

Geographical Analysis Machine (GAM) (Openshaw, Charlton, Wymer, & Craft, 1987), and spatial filter (Rushton & Lolonis, 1996) uses case-control rates to identify clusters and Monte

Carlo simulation to assess statistical significance. If the set of all words in a narrative can be

80

considered as controls, and the set of a particular word under study can be considered as cases, then techniques such a spatial filter along with a Monte Carlo simulation can be used to identify significant clusters of the word or themes under consideration. However, the process of neighbor lookup and rate calculation along with Monte Carlo simulations can be computationally expensive when a fine scale grid is used, and the total numbers of control points are higher, both of which are typical for SVG research.

This chapter introduces a new extension to the Wordmapper software, GeoCluster, that utilizes spatial filtering to extract spatial clusters of words as both rates, and then through Monte

Carlo simulation, as statistically significant clusters. The following sections will provide an overview of the spatial filter, the workflow, technical details and software implementation, and a set of illustrative examples.

Spatial Filter

The spatial filter extension for Wordmapper, utilizes a variant of the spatial filtering technique developed by Rushton and Lolonis (1996) to analyze significance of disease rates across space.

Previous studies related to understanding spatial characteristics of diseases in relation to contaminant exposures have shown that the results of spatial analysis can vary based on the level of geographic scale used for analysis. In Geography, the issue of deviation in results based on the scale of analysis is generally termed as the Modifiable Areal Unit Problem (MAUP)

(Fotheringham & Wong, 1991). For example, the results of any rate-based analysis conducted by area aggregation, such as by census tracts, block groups or zip code, will vary according to both the scale and configuration. Aggregation bias is another potential source of error common in rate-based studies. The assigning and aggregation of point data to geographical organization

81

units based on strategies such as a point-in-polygon operation (commonly available in a GIS) can cause problems such as boundary issues (artificial separation between two aggregations, or even how the is handled if it falls on the boundary). To counter such aggregation issues, including imposing artificial boundaries, spatial filtering is used. This technique divided the study area into a uniform grid and disease rates are calculated for each grid based on the number of disease cases that fall within a predefined filter radius. The overlapping nature of the filters means that boundary effects are lessened. The rates calculated at each grid point are then typically interpolated and contoured to create “hotspot” maps. Even though spatial filtering technique had been mainly used in epidemiological studies, the concept is transferable to other spatial data types. In this study, we develop a variant of the spatial filtering technique to identify spatial word clusters, or theme hotspots.

To identify spatial patterns in keyword extractions, GeoCluster has been developed.

GeoCluster generates a uniformly spaced grid in meters, and a grid extent (bottom left, and top right coordinates), which are supplied as user input parameters (Figure 39A). To calculate the rate for each grid point, the spatial word dataset is divided into cases and controls (Curtis, Curtis,

Ajayakumar, Jefferis, & Mitchell, 2018). As an example, to identify spatial patterns of ‘heroin’ use, the cases will be the set containing the word ‘heroin’ and the set containing all the words will be used as controls. Both cases and controls are added to GeoCluster as separate shapefiles

(Figure 39B). GeoCluster calculates the word rate across all the grid points using a user-defined bandwidth (r). With the grid point as center, a circle of radius r is drawn. Typically, this radius is greater than half the distance between grid nodes (and more typically between 2 to 3 types the distance between nodes) so that overlapping rates are calculated. Words that fall within the circle are assigned to the grid point. The total number of cases and controls that fall within the circle

82

(Figure 40) are assigned as the numerator and denominator respectively, and the rates are calculated by dividing the numerator by the denominator. The bandwidth or filter size plays a major role in smoothing of rates (Rushton & Lolonis, 1996). Large bandwidths overly smooth rates making it difficult to identify patterns that can match up with the underlying built environment, while a small bandwidth creates artificial hotspots due to the instability in rates created by the problem of small numbers (Initiative, 2004). Even though there are techniques developed to identify the best bandwidths (Tiwari & Rushton, 2005), the choice of filter size is usually presented as multiple sizes so as not to focus on one potentially spurious bandwidth

(Tiwari & Rushton, 2005). Though spatial filtering technique helps to minimize the effect of

MAUP, the rates generated at each grid point can still occur by chance alone. Therefore, in order to identify the likelihood of such events occurring, the statistical significance of the rates for each grid point is calculated using a Monte Carlo simulation approach.

Figure 39. A) Uniform grid used for spatial filter analysis B) Case (red) and Control (yellow) data along with the uniform grid

83

Figure 40. Geometrical representation of the spatial filter (radius of circle centered at each grid point), with cases (red points), and controls (yellow points)

Monte Carlo simulation based Statistical significance test

The Monte Carlo procedure for statistical significance testing was developed by Hope (1968), and is widely used in research domains such as epidemiology (Besag & Diggle, 1977) and crime analysis (Ratcliffe, 2004). For a null hypothesis t, the procedure involves ranking the value of an observed test statistic, t1, among a set of n-1 values generated by sampling from the null distribution of t. When the statistic is a single real number, then the rank of observed value t1 among the complete set of n values of test statistic (t2, t3….tn), can be used to determine the exact significance level (Equation 1). The underlying assumption for the Monte Carlo procedure is that, under the null hypothesis each of the n possible ranking of t1 is equally likely (Besag &

Diggle, 1977). Previous studies (Openshaw, Charlton, Wymer, & Craft, 1987) have shown that

84

the value of simulated samples (n) could be very small (typically n=100), as it is not necessary to obtain precise estimate of the null distribution function for the test statistic. The significance level (α) is calculated based on the number of simulations and a lower significance level can be obtained by increasing the number of simulations (n). The advantage of the Monte Carlo procedure is that it can be used with any test statistic and any null hypothesis without constraints.

푛−푟푎푛푘+1 푝 = (1) 푛+1

The Monte Carlo procedure for statistical significance is implemented in GeoCluster using a random-relabeling technique (Figure 41). At first, the rates are calculated for each grid using the cases and controls. These first iteration values of rates are stored as the observed rate for each grid point. After the observed values are calculated, for each simulation, a new set of cases are generated by randomly relabeling the controls as cases. As an example, if the numbers of controls are ncontrol and the numbers of cases are ncases, for every simulation, from the set of ncontrol points, ncases number of points are randomly selected and relabeled as cases (Figure 41).

After relabeling, the rates are recalculated and stored to the respective grid points. After all simulations are completed, each of the grid points will have a list of N+1 test statistic values if the number of simulations is N. The first value of the list, which is the observed value, is compared to the rest of N values to assign the rank for the observed value. The significance level for the grid point is calculated by dividing the rank of the observed value to the total number of test statistic values (N+1). As an example, for a grid point if the observed value rank is 6, and the total number of simulation is 99, then the level of statistical significance for the grid point will be

.06, which can be rejected at 95% (α=.05) confidence interval. The final output generated from

GeoCluster contains all the grid points along with the observed rates and the statistical significance level value, which can be used to generate maps showing statistically significant rate

85

variations over space. Two major computational challenges for the software implementation of

GeoCluster are the increase in computational cost when running a larger number of Monte Carlo simulations and the neighbor lookup for grid points to compute rates.

Figure 41. Random relabeling technique. For every simulation, controls (yellow points) are randomly re-labeled as cases.

Improving Spatial Filter technique with Parallel Computation

In order to improve computing efficiency, parallel computing techniques can be utilized because each grid point’s rate calculation is independent of all other calculations. Two of the most commonly used parallel programming paradigms are shared memory parallelism and distributed parallelism. For shared memory parallelism, a common memory resource is allocated to multiple processors, while for distributed parallelism, each processor has its own memory and data is passed to different processors using message passing. Shared memory parallelism is typically easier to implement as the overhead of passing to different processors is minimized.

All modern desktops have multiple cores with each core having multiple processors, which could be used for parallel computation. As desktops usually tend to have a single memory source in the form of random access memory (RAM), the shared memory parallel programming model is ideal

86

for desktops. Parallelism can be further classified into task-based parallelism and data parallelism. For task-based parallelism, a master-slave approach is followed with the master processor distributing tasks to all other processors. For data parallelism, the entire dataset is partitioned and portions of data are assigned to each of the processors. This process of assigning different data partitions to different processors is called data domain decomposition (Wang &

Armstrong, 2003). The strategy for decomposition is crucial and is dependent on multiple factors such as the number of processors, the dependency among data, and the nature of the algorithm

(Wang & Armstrong, 2003). With spatial data, the issue of data dependency is compounded due to spatial autocorrelation (Tobler’s first law (Miller, 2004)). Some of the decomposition strategies used to logically divide spatial data among multiple processors includes row, column, tree, and box-based decomposition. There is no best strategy for domain decomposition and the efficiency varies with respect to the algorithm and the spatial distribution of data. The spatial filter variant developed here is mainly intended to be used in a desktop environment with multiple cores and hence uses shared memory programming model. Box-based domain decomposition strategy is used to divide the spatial grids for processing across multiple processors.

After GeoCluster generates the spatial grid using user specified extent and separating distance between grid points, a spatial decomposition module divides the entire grid into rectangular chunks (Figure 42). Currently a pre-defined number of chunks are generated (128,

256, and 512), though this could be later modified based on a user defined option. The chunk of spatial grid points is assigned to processors in a sequential fashion. Suppose there are eight processors and nine chunks, the first eight chunks will be assigned equally to the eight processors and the ninth chunk will be assigned to the first processor. After domain decomposition, a point-

87

in-polygon operation is used to assign the cases and controls to each rectangular grid (Figure

43). While performing the point-in-polygon operation, the extent of each rectangular spatial grid is extended by a distance equal to the chosen bandwidth so that grid points near to the boundaries will have the required data within the spatial neighborhood (Figure 43). Each processor runs the rate calculation across all the rectangular grid sets assigned to it and generates a two-dimensional matrix of rates (Figure 44). After each simulation, all the processors wait for the central processor to update a common data structure, which holds the information for the new set of simulated cases. With each simulation the two dimensional matrix that holds the rate for each rectangular grid will have an increase in depth by one. As an example, if a rectangular grid has points arranged as five rows and four columns, then the initial two-dimensional matrix that holds the observed rate value will be a 5x4 matrix. After a single simulation, the two dimensional rate matrix will be transformed into a three dimensional matrix (5x4x1) (stacking a two dimensional array on top of another two dimensional array) matrix with the observed rate in the base layer of the matrix and the simulated rate matrix in the top layer. After all the simulations are completed, each processor will have sets of rate matrices with depth N+1 (N = Number of simulations)

(Figure 44). To calculate the rank of the observed rate, each element in the base layer of the rate matrix is compared to the corresponding element in all other layers. The layer comparison operation will generate a two dimensional matrix with each element having the rank of the corresponding observed rate (Figure 44). After the ranks are calculated, each element in the rank matrix is divided by N+1 to calculate the significance level for each grid point. On completion of all simulations, the central processor collates results from all the processors to generate the complete grid with each node having two different attributes representing the observed rate and the significance level. The shapefile generated as output from GeoCluster can be imported into a

88

GIS such as ArcGIS or QGIS for further visualization of the nodes by rates or statistical significance (for example α=.01 or α=.05).

Figure 42. Spatial Domain Decomposition. Rectangular division of grid points are assigned to different processors.

Figure 43. Point in polygon operation for assigning cases and controls to grids. The grid boundaries are extended to accommodate the boundary calculations.

89

Figure 44. p-value calculations. The grids from all the processors are combined together to generate a rank grid which is further used to generate p-values. ‘obs’ represents observed value while ‘sim’ represents simulation

Neighbor Lookups with K-D Tree

With a large number of control points, the number of neighbor lookups can increase dramatically. Suppose the total number of grid points is N and the total number of control points are M, then for each grid point we have to perform M distance calculations to identify the number of controls that fall within the bandwidth from a grid point. The computational complexity for such an algorithm will be O(MN), and when M>>N (when number of controls is much higher than number of grid points), the computational complexity changes to quadratic

(O(M2)) which tends to create performance bottlenecks with large datasets. In order to tackle this challenge in GeoCluster, a k-d tree is utilized to store the control points.

90

K-D tree (Ooi, 1987) is a data structure developed to organize a set of points in k dimension space. Apart from the constraints applied to split k dimensional space, a k-d tree is just a binary search tree with every leaf node representing a k-dimensional point. Every non-leaf node generates as hyperplane that divides space into half-spaces. The points that fall to the left of the hyperplane are added to a left sub-tree while points on the right of the hyperplane are added to the right subtree (Figure 45). The direction of the hyperplane is perpendicular to a chosen dimension’s axis. As an example, if x axis is taken as the dimension axis then a hyperplane will vertically split the x axis and all the points with smaller x values will move to the left of the hyperplane and all points with larger x values will move to the right. The space complexity for the k-d tree algorithm is O(n) (where n is the total number of points), and the time complexity for building a k-d tree is O(kn log n), where k is the number of dimensions. The real advantage of using a k-d tree is the ability to perform efficient neighborhood lookups. The average time complexity for a neighbor search in a k-d tree is O(log n) and the worst case complexity is O(n) when compared to the traditional distance-based algorithm with quadratic time complexity O(n2).

For GeoCluster, the control point data are stored in a k-d tree. Then for each grid point a nearest neighbor query is executed with filter bandwidth as the radius and the grid coordinates as the query parameter. If the total number of control points in the dataset is M and the total number of grid points is N, then the k-d tree can guarantee that, on an average, the nearest neighbor search will have a time complexity of O(N log M). Even if M>>N, the k-d tree nearest neighbor search would perform better than the traditional distance-based algorithm.

91

Figure 45. K-D tree for space partitioning.

Workflow and Technical Implementation

GoeCluster is developed as an easy-to-use software with a simple interface to perform spatial filtering operations on point data. The interface consists of text boxes for user inputs. At first, the user enters the numerators (cases) and denominators (controls) as shapefiles (Figure 46). Even though the point data could be in geographic or projected coordinates, internally they are transformed into Web Mercator projection (EPSG 3857) for distance calculations. GeoCluster accepts grid distance and filter radius (bandwidth) in meters (Figure 46). If the user wants to run the experiments for a set of bandwidths, they can enter different bandwidths separated by commas. The spatial extent for the filter is accepted as bottom left and top right coordinates, which along with grid distance is used to generate the spatial filter grid. The simulation count is accepted as a number and the output folder and file names are accepted as user input parameters

92

(Figure 46). After all the parameters are added, GeoCluster generates rectangular chunks of grid point data by the domain decomposition process. The grid partitions are distributed across multiple processors and a random relabeling algorithm is used to randomly assign controls as cases for every simulation. The rate and significance level matrices that are generated as a part of the simulation are finally combined to generate a single grid with word rate and significance level as an attribute for every grid point. The output grid data is generated as a shapefile having a

Web Mercator projection.

Figure 46. Complete workflow for the spatial filter

GeoCluster is written in Python, and Tkinter is used to develop the user interface. GDAL

(Geospatial Data Abstraction Library), is used to read/write and transform shapefiles. Numpy

(Walt, Colbert, & Varoquaux, 2011), which is a package for scientific computing in Python, is used for performing the matrix mathematical operations. The storage of rates as three- dimensional matrix helps to perform group operations such as division by scalar, and rank

93

calculation efficiently. Libraries such as Numpy and Scipy are optimized to perform vectorized operations in the matrices. For enabling parallel processing, multiprocessing module in Python is utilized. The multiprocessing module has a set of application programming interface (API) methods, which can be used to control the number of processors participating in a parallel programming task. For fast neighbor lookups, a k-d tree implementation in Python from Scipy

(Bressert, 2012) package is used.

Empirical Illustration

SVG have been collected for a variety of different topics, including the health needs of a community (Curtis et al., 2015), predicting crime (Curtis, Curtis, Porter, Jefferis, & Shook,

2016), homelessness, overdoses (Curtis, Felix, Mitchell, Ajayakumar, & Kerndt, 2018) and risks associated with infectious diseases (Krystosik et al., 2017). SVGs collected as part of an investigation into the spatial patterns of homelessness in Los Angelis’s Skid Row between 2014 and 2016 (Curtis, Felix, Mitchell, Ajayakumar, & Kerndt, 2018) is used for this study. Seventeen

SVG rides were collected using the “Contour+2” camera which has an in-built GPS receiver.

The narratives were recorded and later transcribed manually to generate sentences with timestamp. After the narratives were transcribed, Wordmapper (Ajayakumar, Curtis, Smith, &

Curtis, 2019), a software used to extract contextual information from a SVG, was used to combine GPS data from the camera and textual data from the narrative to create word maps. For this study, the search option in Wordmapper was used to extract narrative sentences containing the words, ‘Heroin’ and ‘Coke’. The download option in Wordmapper was used to generate three different shapefiles; all ‘Heroin’, and ‘Coke’ locations, and all words. The ‘Coke’ and ‘Heroin’ word dataset contained 122 and 78 word instances respectively, while the denominator dataset of all words contained 55,535 word instances.

94

The shapefile containing the word instances of ‘Heroin’, and ‘Coke’ was used as the numerator input to GeoCluster, while the shapefile containing all the words was used as the denominator input. The spatial extent of the study area was selected as (-118.255, 34.0347) for the bottom left and (-118.23, 34.051) for the top right, which covers the entire “classic” Skid

Row area. A distance of 10 meters was selected as the inter-grid distance. The bandwidth for the spatial filter was 50 meter meaning there would be considerable overlap of rate calculated for areas. The simulation count was set to 99, so that the level of significance (α) will be .01 for the

99% confidence interval and .05 for the 95% confidence interval. A spatial filter run was made with both the ‘Heroin’ and ‘Coke’ word sets, the same grid being used for both sets of calculations. The output shapefile contained grid points with a word rate and significance level as attributes. After being opened in ArcGIS, grid points with a p-value greater than .05 were queried from the dataset. Using word rate as the value, three different proportionality symbols were generated for these .05 locations for the ‘Heroin’, ‘Coke’, and a combination of the two words.

95

Figure 47. Spatial grid with statistically significant (α=.05) rates for the word ‘Heroin’

From the ‘Heroin’ map (Figure 47), a cluster of high and statistically significant rates can be seen near the outskirts of Skid Row (closer to the Little Tokyo area). There are also some grid points with higher word rates near to Crocker Street. The map for ‘Coke’ (Figure 48) shows a major cluster near Palmetto Street. A second major cluster can be found near the intersection between South Alameda Street and Produce Street. A small cluster also occurs near to Glady’s

Street and Ceres Avenue. The combination map (Figure 49) shows a clear picture of spatial variation on the usage of the words ‘Coke’ and ‘Heroin’. From the combination map (Figure 49), it appears as though the word ‘Coke’ is mainly used in areas near to the south of Skid Row, while the word ‘Heroin’ tends to be used near to the bordering areas of Skid Row and Little Tokyo.

96

There are also areas where the grids tend to have overlapping sections of both rates in the same space.

Figure 48. Spatial grid with statistically significant (α=.05) rates for the word ‘Coke’

97

Figure 49. Spatial grid with statistically significant (α=.05) rates for the word ‘Coke’ and ‘Heroin’

Typically, these analyses would lead to a reinvestigation of the SVG for these hotspot locations to identify whether mentions are spatially specific (to that location) and then what is the context around these places.

Discussions

While the use of SVG can add context to more typical spatial data, and while it has been championed as a true “mixed method” approach, previous spatial output has been largely descriptive and visual. In this chapter, we have evolved a classic spatial analysis technique, the spatial filter, to also provide statistical rigor to these outputs. GeoCluster, especially when used in combination with Wordmapper software can be used to identify clusters of spatial context –

98

here captured as verbal descriptions, initially as recorded commentaries, then transcribed, and finally queried for key terms or themes. Even though GeoCluster has been developed to handle spatial data in a SVG, any form of point data source can be imported for the analysis. Indeed, more traditional data layers could be used as denominator input, while key words entered as the numerator. The parallelization capabilities added to GeoCluster are valuable to tackle computational challenges when large fine scale grids are used, especially with increasing availability of multiple processing cores in typical desktop computers. The illustrative example used in this chapter demonstrates a limited application of a SVG. Further analysis could change the filter size, while other drug terms, or even co-occurring words (easily identified in

Wordmapper) could be used for thematic rather than keyword analysis. The narrative could also be re-read for only those spatially specific comments so that the analysis is restricted to described places (Curtis, Curtis, Ajayakumar, Jefferis, & Mitchell, 2018). Output maps can then be combined with traditional datasets such as 911 call records to both enhance traditional analysis, and also identify where spatial differences exist.

This is a new approach and further investigation is required on a number of topics. For example, the problem of “small numbers” can occur when rates are calculated for a grid point that has only few control points surrounding it. As an example, if the number of control points around a grid point is one and the number of case points is one, then the rate for the grid point will be calculated as 100%. Such results are unreliable due to the data paucity. This is typically dealt with in spatial filtering by having a minimum denominator. One solution could be the use of adaptive filtering to incorporate at least a minimum threshold number of control points, or even discarding grid points that do not have this minimum threshold number of control points.

But what is that right threshold for an SVG?

99

Chapter VII. Addressing the Data Guardian and Geospatial

Scientist Collaborator Dilemma: How to share health records for

spatial analysis while maintaining patient confidentiality

This chapter addresses the confidentiality issue related to spatial point data generated from

Spatial Video Geonarratives (SVG). Even though the software is developed for protecting confidential spatial point data extracted from SVG, it can generally be used with any source of point data. This work looks in to implementing a spatial privacy preserving technique, particularly for preserving confidentiality in health datasets.

A spatial appreciation continues to grow within the health sector, ranging from the addition of geographic components to needs assessments (e.g., Health Impact Assessments,

Community Health Improvement Plans, etc.), to the potential of neighbourhood focused guidance to clinicians as found with precision medicine (Bush, Crawford, Briggs, Freedman, &

Sloan, 2018). However, for many institutions, especially smaller units (health authorities or hospitals), in-house geospatial expertise remains limited. Therefore, while a county health department may see the benefit of creating fine scale maps of opioid overdoses (Stopka,

Donahue, Hutcheson, & Green, 2017), or a children’s hospital might wish to understand its neighbourhood child injury risk pattern (Rothman, Buliung, Macarthur, to, & Howard, 2014) by

100

overlaying hotspots onto built environment surveys, these tasks often remain unachievable due to a lack of geospatial skill. Even though basic “map making” has become ubiquitous, using either a Geographical Information System (GIS) or Google Earth, the training needed to conduct sophisticated spatial analysis (Chrisman, 1991) is less commonplace.

Adding both excitement and frustration to this situation is the increasing availability of fine scale spatial data including electronic medical record (Richardson, Kwan, Alter, &

McKendry, 2015), and police reports (Piza & Gilchrist, 2018), to name but two. The utility of these fine scale health data are broad (Harrington, McLafferty, & Elliott, 2016), including disease mapping and analysis, health risk surveillance (Boulos, 2004) outbreak response

(Cromley & McLafferty, 2011), healthcare delivery studies (Croner, Sperling, & Broome, 1996), and identifying sub-neighborhood level health patterns (Curtis, Mills, Agustin, & Cockburn,

2011), and clinical support. However, while Google Maps or Google Earth, or even more user friendly GIS has opened the potential for the ubiquitous visualization and analysis of health data, a counter narrative raises a justifiable concern about spatial privacy and confidentiality

(Andrienko & Andrienko, 2012; Boulos, Curtis, & AbdelMalik, 2009; Curtis, Mills, & Leitner,

2006).

Arguably, the confidentiality conversation can be thought of in two ways; “in-house” map making revealing locations that can be reengineered to an unacceptably precise level, and the ability to share data “out-of-house” to allow for expert analysis that exceeds the capabilities of the institution. These two problems are linked, because violations of the former through reengineering could occur either from the institution, or by the out-of-house collaborator. Both are unacceptable, so any data sharing requires either Institutional Review Board (IRB) oversight or a means to share data that is spatially safe.

101

To illustrate risk of re-engineering, Curtis et al. (2006) were able to identify mortality locations in the real world from published maps with only limited geographic features and boundaries through digitally scanning, geo-referencing, and digitizing before uploading the resulting coordinates into a GPS unit. Concurrently, Brownstein et al. (Brownstein, Cassa,

Kohane, & Mandl, 2005) used reverse geocoding and geo-referencing techniques to identify patient locations from a prototypical map of randomly selected patients. They were able to successfully identify 26%, 51.6%, 70.7%, and 93% of addresses within one, five, ten and twenty buildings. Further, they extended the results to create an unsupervised learning algorithm

(Brownstein, Cassa, Kohane, & Mandl, 2006) that could automatically classify patient location with an accuracy of 79%, revealing the vulnerability of point maps. To show how such vulnerability extends across academic publications, Kounadi and Leitner (2014) showed that for an eight-year period, more than 68,000 home addresses were disclosed from a set of forty-one articles. Worryingly, their study revealed that location privacy and disclosure risk were still an ongoing concern in academically published maps.

Attempts to reduce the likelihood of reengineering fall into three main categories, including anonymity, policy, and obfuscation based privacy-preserving methodologies (Ardagna,

Cremonini, Vimercati, & Samarati, 2008). Anonymity is mainly concerned with the disassociation of information about an individual, including the location of the individual

(Duckham, Kulik, & Kulik, 2006). One of the commonly used metrics for anonymity is k- anonymity, which is defined as the imprecision in location information required for making an individual indistinguishable from k other individuals (Gruteser & Grunwald, 2003). Privacy policies define restrictions for the release of individual location data to third parties (Golden,

Downs, & Davis-Packard, 2005). For example, the Health Insurance Portability and

102

Accountability Act (HIPPA) requires health data visualized by zip code must have a denominator population of at least 20,000. The most commonly used technique for spatial privacy protection, and most relevant to the aims of this paper, is obfuscation. Obfuscation can be considered as a combination of statistical and epidemiological techniques to mask location information in a way that can still enable meaningful analysis (Armstrong, Rushton, &

Zimmerman, 1999; Duckham, Kulik, & Kulik, 2006; Zandbergen, 2014). The two main goals of spatial data obfuscation are to achieve a balance between personal location information protection, and to extract maximum information from fine scale spatial data (Duckham, Kulik, &

Kulik, 2006). Unfortunately, these two goals are inversely related, i.e. the finer the spatial location involved (often preferred for intervention-style analysis), the greater the risk of reengineering data back to an actual person. Many obfuscation methods such as geomasks

(Allshouse et al., 2010; Armstrong, Rushton, & Zimmerman, 1999; Duckham, Kulik, & Kulik,

2006; Hampton et al., 2010; Seidl, Paulus, Jankowski, & Regenfelder, 2015; Wieland, Cassa,

Mandl, & Berger, 2008; Zimmerman & Pavlik, 2008), grid masks (Curtis, Mills, Agustin, &

Cockburn, 2011), and software agents (Kamel Boulos, Cai, Padget, & Rushton, 2006) have been suggested to achieve a balance between confidentiality and data utility. Armstrong et.al.

(Armstrong, Rushton, & Zimmerman, 1999) provided a comprehensive summary of various geomasking techniques, which could be generally categorized into affine, aggregation, and random perturbation masking. The affine geomasking approach includes translation, rotation, or a combination of both known as isomasks. Further, a stochastic component could be added to the affine mask by introducing a randomization to the translation, or rotation, or both. For random perturbation, each spatial point is displaced randomly in terms of distance and direction from the original location. For areal aggregations, individual point data are assigned to administrative

103

boundaries such as zip codes, census tracts, and counties. While areal aggregation masks are best in preserving spatial confidentiality, the data loss due to aggregations are the greatest, which also increases the likelihood of the ecological fallacy. A random perturbation mask does not preserve spatial structure and subsequently applied techniques such as point pattern analysis can give erroneous results when applied to the masked data. The affine mask protects spatial structure and can be useful for spatial statistical analysis, though the caveat is that if multiple locations are revealed accidentally, the real location of all other points can be reengineered instantly as the underlying spatial structure of the points are always preserved. Leitner and Curtis (2006) developed the “flipping methodology” which inverts original locations about a horizontal and vertical axis of the map while Curtis et al. (2011) developed a grid-based approach utilizing a combination of randomization and Monte-Carlo simulation to assign masked point location.

Finally, donut masking (Hampton et al., 2010) extends random perturbation masks by ensuring a user-defined minimum level of geo-privacy. The randomly perturbed points are ensured to be outside of buffer distance from the original location.

While many of these approaches have merit, there remains a disconnect between the academic exercise and real-world utility. Simply put, spatial data sharing, the creation of “safe” maps, and the preservation of confidentiality remains a confusing and often unobtainable task for many health organizations. The number of software solutions remains relatively limited given the scale of the need. For example, Zhang et.al. (2017) developed a location swapping method for geomasking, which was implemented as a custom toolset in ArcGIS. Hampton et.al. (2010), created an open source donut method for geomasking, which was implemented using MATLAB.

Chen et.al. (2017) developed GeoMasker tool, which is a Python-based tool to mask the residential locations of patients or cases in a GIS. While these authors have advanced the

104

practicality of maintaining spatial confidentiality within a health data set, there remains a lack of a ubiquitous approach that can be used to link health organizations with virtually no spatial skill sets with collaborating geospatial researchers. With this goal, we conceptualized and then built

Privy, a utility that can be immediately applied by health organizations based on the principles of isomasks (Armstrong, Rushton, & Zimmerman, 1999). Geocoded health data, such as the addresses of cancer patients, are masked in such a way that the recipient researcher has no information about the original coordinate locations. Yet the spatial configurations of the coordinates are maintained, which is vital for point-based hotspot analyses and even regression approaches (using the attribute columns of the health record as dependent variables). After the spatial science collaborator has performed the analysis, the resulting output can be shared back with the health organization, re-transformed using a unique set of codes stored from the initial transformation, allowing for the overlay mapping of results onto the “real” geography. The data providers and the researchers can then discuss the results simultaneously, both viewing the same map output, though with a different geographic underlay.

In the next section, we will detail the conceptual underpinning and algorithmic development for the Privy approach to the obfuscation and re-transformation of point and raster data. This will include a comparison of analytical mapped outputs resulting from the original and transformed data.

Design and Algorithms

Point data transformation and re-transformation

The Privy approach, which belongs to the family of isomasks (Armstrong, Rushton, &

Zimmerman, 1999), involves a random spatial translation and rotation of an original spatial point

105

dataset. The spatial translation involves a distance offset generated from a random number, which is later reused to re-transform the obfuscated data back to the original locations. More specifically, the transformation of the point data involves two steps, a random spatial translation and a 180-degree rotation. For the translation step, an offset is defined to ensure that the newly transformed points exceed a minimum distance from the original points. This procedure is closely related to donut masking (Hampton et al., 2010), where an inner radius is defined to prevent the transformed points from being accidentally to close to the original points. Suppose the offset intervals are {X1, X2} for x coordinates and {Y1, Y2} for y coordinates, then new coordinates for a location (x, y) are displaced at least by (X1-x) along the ordinate and (Y1-y) along the abscissa. The distance for translation from the original location (x, y) is made random by generating a displacement value obtained by multiplying the offset intervals (X2-X1) and (Y2-

Y1), with a random number (r) (Equation (2)). As translation maintains the original pattern of the spatial data, the obfuscated points could be potentially vulnerable to identification. In order to tackle this challenge, we perform a 0 to180-degree anti-clockwise random rotation on the translated coordinates. For this study, a 180-degree rotation is performed by a matrix multiplication of translated coordinates with a rotation matrix (Figure 50) which maintains the structural equivalence between the real and transformed coordinates, which is essential when re- transforming surfaces generated from the obfuscated spatial data. The random number generated for the translation phase is saved to a local database as a pair, with the key being a user provided parameter and the random number being the matched corresponding value. Along with the random number, the geographical extents for the transformed points are also saved into the database for a raster re-transformation procedure.

푋 = 푥 + (푟 ∗ (푋2 − 푋1)) + 푋1 (2)

106

푌 = 푦 + (푟 ∗ (푌2 − 푌1)) + 푌1

Figure 50. Obfuscation by point data translation and rotation. An offset generated from a

random number is used for translation and the rotation is performed using rotation matrix.

The re-transformation procedure utilizes the random number saved to the local database.

First, 180-degree anti-clockwise re-rotation occurs which essentially brings the transformed coordinates into the same orientation as that of the real data. Then the user-supplied key is utilized to retrieve the random number used for the translation, resulting in all coordinates being re-transformed to the original location (Figure 51) (Equation (3)).

푥 = 푋 − (푟 ∗ (푋2 − 푋1)) − 푋1 (3)

푦 = 푌 − (푟 ∗ (푌2 − 푌1)) − 푌1

Figure 51. Re-transformation of obfuscated point data through rotation and re-translation. The

point data are rotated in space using the rotation matrix and re-translation is performed using the offset generated from the random number.

107

Raster re-transformation

While the successful transformation and re-transformation of a point (patient address) data set is a useful academic exercise, the reality behind wanting to perform such a procedure is that outgoing point data will be analysed by a third party, with (probably) a continuous surface output, most likely a raster image being returned. For re-transformation of the raster generated from the obfuscated points, the bottom right coordinate of the raster is again rotated 180-degree anti-clockwise. This rotated coordinate is the unadjusted top left coordinate for the re-

’’’ ’’’ transformed raster (X left, Y top). A 180-degree matrix rotation is then performed to accommodate the data changes due to the orientation of the raster. The re-translation procedure

’’’ ’’’ (Equation (3)) is applied to the unadjusted top left coordinate (X left, Y top) of the re-transformed raster using the random number used in the obfuscation (again retrieved from the local database).

Even though the raster has been transformed into the original space, an alignment issue due to rotation of points needs to be addressed (Figure 52). The spatial extent of the obfuscated points

’ ’ ’ ’ (X left, Y bottom, X right, Y top) retrieved from the local database and the spatial extent of the raster

’’ ’’ ’’ ’’ created from the obfuscated points (X left, Y bottom, X right, Y top) can be used to calculate the adjusted top left coordinate for the re-transformed raster (Xleft, Ytop). At first, the difference in spatial extent for the top, bottom, left, and right of the obfuscated point data to the corresponding raster generated from the obfuscated data are calculated (xldiff, xrdiff, ytdiff, ybdiff) (Equation (4)).

′ ′′ 푥푙푑푖푓푓 = 푋푙푒푓푡 − 푋푙푒푓푡

′ ′′ 푥푟푑푖푓푓 = 푋푟푖푔ℎ푡 − 푋푟푖푔ℎ푡 (4)

′ ′′ 푦푡푑푖푓푓 = 푌푡표푝 − 푌푡표푝

′ ′′ 푦푏푑푖푓푓 = 푌푏표푡푡표푚 − 푌푏표푡푡표푚

108

The differences for the top and bottom as well as the left and right are added to calculate the adjusted values (xadj, yadj) (Equation (5)).

푥푎푑푗 = 푥푙푑푖푓푓 + 푥푟푑푖푓푓 (5)

푦푎푑푗 = 푦푡푑푖푓푓 + 푦푏푑푖푓푓

Based on Equation (6), the final adjusted top left coordinate for the re-transformed raster (Xleft,

Ytop) can be calculated.

′′′ 푋푙푒푓푡 = 푋푙푒푓푡 − 푥푎푑푗 (6)

′′′ 푌푡표푝 + 푦푎푑푗, 푦푎푑푗 < 0 푌푡표푝 = { ′′′ 푌푡표푝 − 푦푎푑푗, 푦푎푑푗 ≥ 0

Figure 52. Re-transformation of raster through rotation and re-translation.

Workflow and Technical Implementation

Unlike with other academic approaches to obfuscate data, Privy was conceptualized while simultaneously being developed as a ubiquitous tool. This is important to emphasize, that driving factor behind developing Privy was that it could immediately serve as a health organization / spatial science collaborating framework. To achieve this goal, a simple user- friendly interface was developed using Html5, and JavaScript (Figure 53), while Google Maps

109

API, which is a JavaScript based map framework from Google, was used to visualize the obfuscated data. All the algorithms for obfuscation and re-transformation were written in Python, using the Numpy numerical library for complex operations such as matrix rotation. SQLite3 was used for saving parameters such as the random values and the extent of the transformed coordinates. PyQT, which is a Python framework with an in-built browser that could support both web components and Python based core components was used to connect with a web- interface.

Figure 53. User interface for Privy

As a first step in the coordinate transformation, confidential point data, such as patient addresses are uploaded as an ESRI shapefile. These data are then transformed as previously described using Privy, with the new data also being output as a shapefile. The transformation key is stored for use on the eventual re-transformation, and the health organization waits for its collaborator to perform an analysis and return the output. A recreation of the obfuscation procedure occurs with the returned analytical output and both parties can then interpret the findings on the same output map, though overlaid on a different geography (Figure 54).

110

Figure 54. Privy workflow for data obfuscation and re-transformation

Experiments

In order to show the utility and effectiveness of Privy as a methodological approach that could act as a conduit between health data guardians and collaborating researchers, a series of experiments were conducted, the first on the 1878 yellow fever epidemic of New Orleans,

Louisiana (Curtis, 2008; Curtis, Mills, & Blackburn, 2007). This dataset, using mortality locations recorded in the Official Report of the Deaths from Yellow Fever as Reported by the

New Orleans Board of Health (1879), illustrates a more typical health application as the , date of death and nativity are linked to a residential address. Indeed, it has previously been suggested that these data provide an excellent test set for confidentiality work as they are at address level, are “real” epidemic data, but there is no consequence in terms of being able to be reengineered (Curtis, 2008; Curtis, Mills, & Blackburn, 2007). The case locations were obfuscated using Privy, and then re-transformed back into the original space for comparison. In

111

order to test the correctness of the re-transformation procedure, a custom Python script calculated the point-by-point distance comparison between the original and re-transformed dataset.

Spatial Point Pattern Analysis

A set of spatial point pattern analysis techniques were performed on both the original and obfuscated dataset. As a part of understanding the sensitivity of obfuscation to global and local spatial point pattern analysis, four different techniques were utilized. Average Nearest Neighbor, a common clustering technique for point data (Taylor, 1977) were run on both the real and obfuscated yellow fever datasets. Euclidean distance was used as the distance relationship between the point data. To investigate clustering structure in the yellow fever epidemic at a range of distances, Ripley’s-K function (Ripley, 1976) was applied.

A second dataset, snow depth data for 10-Jan-2019 for 115 stations across the state of

Colorado, were acquired from the United States Department of Agriculture, was utilized to assess spatial autocorrelation in snow depth using Moran’s I (Moran, 1950). Then, Local

Moran’s I (Anselin, 1995) was used to identify statistical significance of local clusters with the snow depth value being the z-field and inverse distance weighting being used as the spatial relationship parameter.

It should be noted that in all these examples, the purpose was not to reveal insightful patterns in these data but to evaluate the success and spatial precision of the resulting output between both real and obfuscated data.

Geostatistical and Interpolation Analysis

Further tests included geostatistical analysis to generate optimal surfaces from sample data and can be used in prediction and exploratory analysis, and the more common task of generating a

112

surface through interpolation. Again using the snow depth data in these tests, the raster generated from the obfuscated data is re-transformed back and compared to the same analysis without transformation. Comparison occurs by spatially matching the spatial coordinates of the raster extent, the cell size, and total rows and columns. A raster calculator (Tomlin, 1994) was used to calculate differences between the corresponding cells in the real and re-transformed raster, with the mean of differences being used to quantify the success of the overall transformation.

For the purposes of comparison four commonly utilized techniques in the spatial investigation of health data were employed; Kernel density estimate (KDE), Inverse distance weighted interpolation (IDW), Kriging and Trend analysis. KDE (Silverman, 2018) is often used to calculate the density of health events (Yang, Goerge, & Mullner, 2006), though snow depth data is used here. IDW (Myers, 1994) is a commonly used interpolation method that uses a spatial average of sample data points to estimate cell values in a raster of continuous values (here inputs of snow depth and a bandwidth of 10 meters). Kriging (Stein, 2012) generates an estimated surface from a set of points with z-values (snow depth). Finally, in order to understand coarse-scale variations in snow depth data, and to assess its sensitivity to obfuscation, trend analysis was conducted. Trend analysis (Watson, 1969) utilizes a global polynomial interpolation technique that fits a smooth surface to the input sample points.

Results

The three point-data maps shows the real location of yellow fever deaths (Figure 55A), the

obfuscated locations (Figure 55B), and the re-transformed locations (Figure 55C) respectively.

By visual examination alone, we could see that the re-transformed data locations and the real data locations are similar. The unique ids for each coordinate were used to facilitate a one on one

113

comparison with the real and re-transformed data. The output of the point-to-point distance calculation for each pair of coordinates was zero, which indicates an exact re-transformation of

the obfuscated spatial dataset.

Figure 55. A) Unmasked yellow fever death data B) obfuscated data C) re-transformed data

The results from the global spatial analysis techniques show close similarity for real and obfuscated data. The average nearest neighbor analysis for the yellow fever data reveals clustering (nearest neighbor ratio = 0.621659) with statistical significance (p-value = .00). The results from the nearest neighbor analysis on the obfuscated data mirror the results from the analysis on real data (Table 2).

Table 2. Average nearest neighbor results for yellow fever unmasked and obfuscated data. OMD represents observed mean distance, EMD represents expected mean distance and NNR represent nearest neighbor ratio

OMD EMD NNR z-score p-value Real 19.24 30.96 0.62 -16.23 0.00

Obfuscated 19.24 30.96 0.62 -16.23 0.00

The Ripley’s-K results for the yellow fever death data (Table 3), display a high level of clustering for small distance bands and a subsequent reduction in clustering at higher distance.

The difference value for observed (L(d) transform) and expected (distance of band itself) values,

114

Diff, increases up to band four (188.6 m), and further decreases till band ten (472.15 m). A comparison of transformed values and differences for masked and unmasked data reveals exact matches for all distance bands.

Table 3. Ripley’s-K function results for unmasked and obfuscated data. L(d) represents transform value and Diff represents the difference between expected and observed value. The subscripts R and O represents real and obfuscated results.

Distance L(d) R L(d)O DiffR DiffO 47.21 73.24 73.24 26.02 26.02 94.43 124.45 124.45 30.02 30.02 141.64 176.63 176.63 34.99 34.99 188.86 227.46 227.46 38.61 38.60 236.07 273.51 273.51 37.44 37.44 283.29 318.81 318.81 35.52 35.52 330.50 359.59 359.59 29.08 29.08 377.72 398.78 398.78 21.06 21.06 424.93 437.85 437.85 12.91 12.91 472.15 473.88 473.88 1.73 1.73

For the Moran’s I analysis, the input data was switched to snow depth, and the result for the non-masked data (Table 4) shows no signs of spatial autocorrelation (I = 0.074776) at a global level, and the results are not statistically significant (p-value = .08) at the 95% confidence interval. For the obfuscated data (Table 4), all the values are exact even up to decimal precision when compared to the real data, indicating the spatial structure preserving nature of the data transformation.

Table 4. Global Moran’s I results for snow depth unmasked and obfuscated data. I represents Moran’s I and EI represents expected value of I

I EI Variance z-score p-value Real 0.0747 -.008 0.002 1.742 0.08 Obfuscated 0.0747 -.008 0.002 1.742 0.08

115

The results from local Moran’s I analysis (Figure 56A) shows high-high clusters of snow depth in the Northern part of Colorado. For the obfuscated data, the map (Figure 56A) shows an inverted pattern of the real clusters (Figure 56A). The re-transformed cluster map (Figure 56A) shows the exact same pattern as the map for real data Figure 56B). The local I, z, and p-value for the twelve significant clusters (Table 5) shows the exact same value for real and obfuscated data.

Table 5. Local Moran’s I results for unmasked and obfuscated data. The subscripts R and O represents real and obfuscated results.

indR indO z-scoreR z-scoreO p-valueR p-valueO 0.00019 0.00019 2.36 2.36 0.017 0.017 0.00016 0.00016 2.80 2.80 0.005 0.005 0.00033 0.00033 3.09 3.09 0.001 0.001 0.00042 0.00042 3.81 3.81 0.000 0.000 0.00015 0.00015 2.06 2.06 0.038 0.038 0.00002 0.00002 3.26 3.26 0.001 0.001 0.00028 0.00028 3.29 3.29 0.000 0.000 0.00029 0.00029 2.84 2.84 0.004 0.004 0.00094 0.00094 8.08 8.08 0 0 -0.0002 -0.0002 -2.96 -2.96 0.003 0.003 0.00055 0.00055 2.08 2.08 0.037 0.037 0.00038 0.00038 4.74 4.74 2.06 2.067

Figure 56. Local Moran’s I clusters for A) unmasked B) obfuscated C) re-transformed snow

depth data

The KDE results for the unmasked (Figure 57A) data raster shows patterns of higher snow depth concentrations on the North and South-West . The obfuscated data raster (Figure 57B)

116

shows an inverted pattern but with similar values in the transformed space. The re-transformed data raster (Figure 57C) reveals the same trends as in the real data raster (Figure 57A). The mean of raster differences (Table 6) between the real data raster (Figure 57B) and the re-transformed raster (Figure 57C) reveals an average difference of 1.8 units.

Figure 57. KDE results for snow depth data A) raster from real data B) raster from obfuscated

data C) re-transformed raster

The IDW results for the unmasked data (Figure 58A) display a pattern of high snow depth in the

North, North-West and South-West regions. For the obfuscated data, the IDW results (Figure

58B) indicate an exact inverted pattern of the unmasked data (Figure 58C), with high snow depth in the South and East regions. The mean of raster differences (Table 6) between the unmasked

(Figure 58A) and the re-transformed raster (Figure 58C) is zero.

Figure 58. IDW results for snow depth data: A) raster from real data B) raster from

obfuscated data C) re-transformed raster

117

The predictive surface of the snow depth data generated from the Kriging analysis shows a fuzzy boundary between the North-East and South-West regions (Figure 59A). Similar to other geostatistical analysis results, the predictive surface generated from the obfuscated data (Figure

59B) is an exact inversion of the real data raster (Figure 59A), and the re-transformed data raster

(Figure 59C) maintains the exact pattern as the original unmasked data raster (Figure 59A). The mean difference between the real (Figure 59A) and re-transformed raster (Figure 59C) is almost zero (.04) (Table 6).

Figure 59. Kriging results for snow depth data; A) raster from real data B) raster from obfuscated data C) re-transformed raster

Finally, the trend analysis reveals a clear boundary between the North-East and South-West regions in terms of snow accumulation (Figure 60A). The rasters for obfuscated (Figure 60B) and re-transformed (Figure 60C) data reveal inverted and exact patterns with respect to the unmasked data raster (Figure 60A). The mean differences between the raster cells for the unmasked (Figure 60A) and re-transformed data raster (Figure 60C) is zero (Table 6). For all the experiments, raster properties such as cell size, number of rows and columns, and raster extents were compared between the unmasked data raster and the re-transformed raster and were verified to be equal.

118

Figure 60. Trend analysis results for snow depth data; A) raster from real data B) raster from

obfuscated data C) re-transformed raster

The mean difference for unmasked (Figure 57A) and re-transformed (Figure 57C) KDE rasters is comparatively high (Table 6), which warrants further investigation. This could be because image rotation tends to create differences at the borders, which is reduced by approximation techniques such as spline interpolation. Raster re-transformation also provides opportunities for researchers to apply advanced geostatistical techniques such as probability surface generation

(kriging), and trend analysis on obfuscated data.

Table 6. Mean of difference between the raster generated from unmasked data and then the re- transformed raster

Method Mean Cell Difference IDW 0.00 KDE 1.82 Kriging .04 Trend 0.00

Discussions

One of the biggest challenges facing health focused geospatial scientists is how to effectively work as part of collaborative teams. The challenge is that as intervention strategies, and theoretical advances such as the incorporation of context, require fine scale data, these also cause

119

the most consternation from a health guardians’ perspective. Simply put, address level data can reveal causative processes occurring at the sub neighbourhood scale, and these can be teased out by different forms of geospatial analysis. While making data available at coarser aggregations such as census tracts or zip codes might satisfy the creation of health atlases or public presentations, real advance requires fine scale data.

At the same time, health organizations are becoming more spatially literate, often wanting map-based insights into their data. Currently the skill sets required to effectively use different types of GIS techniques are often lacking, meaning that just basic mapping, or worse, incorrectly run and interpreted analysis result. The obvious answer is a means to obfuscate data in such a way that collaborative teams can work together, in near real time, without running the risk of violating patient confidentiality.

While there have been many eloquent approaches to solve this problem, these have largely remained in the realm of academia. If a hospital wants to share data with a collaborator, there is no widely known solution to this problem, especially one that can be applied with a limited geospatial skillset. In this study this problem is addressed in a three pronged approach; design a method that was simple to understand, that was powerful in both protecting confidentiality and allowing for a variety of different analytical approaches, and that could be applied now, in any health organization. We have achieved this with Privy. Our results show that the obfuscation technique applied to point level data preserves spatial structure, which in turn provides the exact same results for masked and real data. This is one of the overarching goals of geomasking (Duckham, Kulik, & Kulik, 2006). Our comparative analysis also shows that this approach also preserves statistical results (Table 5) as well as spatial patterns (Figure 56). Future

120

comparative analyses should incorporate other techniques important to health research, such as

SaTScan, though we have no reason to believe these results will be any different.

The ability to re-transform surfaces generated using obfuscated data to its original location adds further potential to this approach. This is important both in terms of being able to share output and have a simultaneous interpretation between both parties, and even being able to share finely aggregated original surfaces without concern. Even though KDE continuous surfaces are less prone to confidentiality issues, bullseye effects in remote areas still run an unacceptable risk of reengineering (Boulos, Curtis, & AbdelMalik, 2009). The obfuscation of the raster surface as displayed here provides a solution to this vulnerability of isolation. The comparison of unmasked and re-transformed rasters (Table 6) show that apart from maintaining the orientation and other structural properties, the software is able to preserve the data quality even after obfuscation and re-transformation.

While this approach is available now, some limitations need be addressed. Firstly, the current approach requires address level data to be geocoded, and output as a shapefile. While this might be a limitation for some organizations, some electronic medical record systems now offer geocoding as output, and the basic use of a GIS’s functionality is becoming more commonplace.

Even so, for full ubiquitous use, for example with smaller health organizations, a pre-module that provides geocoding services and shapefile creation would be a useful evolution.

Secondly, the only data that can be shared has to come from the health organization (or a similar unit). Publicly available data layers like boundaries, street files, or census data cannot be shared as these increases the risk of reengineering. While this may limit the use of some techniques, such as regression, more and more socioeconomic, behavioral and even environmental data are being collected by health organizations. These could provide a set of

121

independent variables. Indeed, the comparison of real and obfuscated data based on spatial modelling techniques such as ordinary least squared regression (OLS) and geographically weighted regression (GWR) is an area that requires further exploration. One spill over benefit with the availability of tools like Privy is to spur a greater demand for the recording of more data inhouse, while making temporal changes (both biological and address related) as input for spatio- temporal analysis.

Finally, the main vulnerability of the Privy approach is if a bad actor has information about one patient that could then be used to reengineer the rest of the system. While this will always be possible, it is unlikely given that the required data would have to have the exact input of the data being transformed. It is not enough to know a birth weight, or a BMI, or a blood level count, as these are likely to be replicated across the data set, and for many of these vary with medical visit. Therefore, the bad actor would have to have access to the electronic medical record file of one person, and then be able to place that within the transformed and rotated data.

This is even more unlikely if the geospatial team does not know which city the original data come from. The standalone nature of the software and the local database, add a further layer of security as the key used for masking and re-transformation are only available with the health organization.

In summary, as custodians of medical and health data records often have minimal GIS expertise, it is essential to develop simple yet efficient software methodologies to help them preserve spatial confidential data and at the same time enable collaborative research with GIS expert.

122

Chapter VIII. Conclusion and Future work

The main objective of this dissertation was to help formalize, or standardize, the use of SV and

SVG as research tools, especially in terms of data collection for fine scale, dynamic, challenging environments, and in the broader advancement of “Context” in the geographical sciences. As a relatively new method for spatial data collection, SV-based data collection requires the development of advanced software methodologies, ranging from data collection, storage, and data processing to the generation of contextually enriched outputs. This dissertation has addressed these challenges related to SV data collection methodology by developing two new software solutions, “Camera Player”, and “Spatial Video Player” (SVP) (Chapter III). Camera

Player, by using an array of GPS parsing methodologies, enables GPS extraction, which is an important and mandatory step in spatial data generation from SV. Previously, such spatial data extraction was completely dependent on custom camera software which were unreliable due to market dependency (if camera manufacturers go out of business, so do the software), and always had complex data format dependencies (from NMEA tags to KML files). The Camera Player provides a uniform framework for integrating SVs from a variety of camera sources by abstracting out the complexity of GPS extraction for the user. Apart from GPS extraction, time synchronization is also handled by Camera Player, which provides synchronous space-video visualization. The synchronous space-video visualization is further used for mapping and generating new layers of contextual information. With “transferability” becoming a key issue for large volume SV data, which SVP addresses by utilizing cloud storage and retrieval facilities,

123

helping to reduce the data foot print by up to 99% (for a 4GB SV file, a 140KB JSON file is generated). A single JSON file contains the required spatial data, video offsets and cloud storage details to support space-video visualizations. Apart from improving ‘transferability’ between collaborators, the ability to join multiple videos together from cameras with different viewing angles, provides the SVP user with the ability to get a holistic view of the videoed environment, which can help with better understanding the topic, the environment, and subsequent mapping.

The ability to generate a completely new GPS path for an SV by providing an interface for GPS creation through node addition and synchronization with the video is another key highlight of

SVP. When there is considerable GPS error or even a complete loss of GPS signal, SVP can be used to re-create the SV for further analysis. When large sets of SV data are collected, either for a single project or in multiple initiatives that have overlapping geographies or themes, a tool was needed to be able to sift through and spatially or thematically look for patterns. The “Spatial

Video Library” (SVL) (Chapter IV) utilizes advanced spatial indexing based on r-tree data structure to support efficient spatio-temporal video retrieval as well as to perform complex spatial queries.

With these novel software solutions, the environmentally-cued and spatially embedded textual data generated from SVG provides ample opportunities for both content and spatial analysis. Three different software were developed as a part of this dissertation with a particular focus on exploratory content analysis, spatial analysis, and geo-spatial privacy related to SVG.

“Wordmapper” (Chapter V) software utilizes time synchronization to combine textual narrative data and spatial GPS data to generate SVGs. Further the software supports textual queries in the form of a word filter, exploratory map-based and wordcloud visualizations, and content-based narrative classification by category creation. The contextual data extracted from SVG in the form

124

of spatially embedded words and sentences, as well as categorically classified narrative sentences can be further downloaded as ESRI shapefiles and KML files which improves higher level spatial analysis, or ubiquitous use among different researchers, professionals or other interested groups. The “GeoCluster” software (Chapter VI) adds a new spatial analytic approach to SVG analysis by helping to extract significant patterns of spatial words and sentences. The software achieves this through using a variation of a spatial filtering technique backed by Monte

Carlo simulations for testing the statistical significance of the identified cluster. Along with developing exploratory and spatial analysis techniques, spatial privacy preserving methods are vital, especially regarding the type of data that could be shared supporting SVG research. “Privy”

(Chapter VII), by utilizing advanced geo-masking and re-translation techniques help to mask as well as re-translate spatial point data. An extensive set of spatial point pattern experiments and raster surface based analysis (Chapter VII) show that “Privy” can mask and then re-translate spatial data without losing much needed spatial precision, which is vital for high-end spatial analytical collaborations.

All software developed as a part of this dissertation has a direct impact on context-based fine scale spatial data research. The “Camera Player” and “Spatial Video Player” is being extensively used by researchers at Kent State, across the United States and even on international projects. For example, Curtis et al. (2019), in their work on assessing the space-time variation in enteric disease risk for informal settlements in Port au Prince, Haiti utilized SVP for mapping, synchronized video exploration, and GPS error correction. In another project mapping health risks in one of the biggest informal settlements in Mathare, Kenya, Curtis et al. (2019) utilized

SVP extensively to correct GPS errors, which were initially a major impedance to the research.

SVP has also been used in multiple community gatherings as a way to show visualize data that

125

has been collected. This has extended to public servants also using SVP as a communication tool. It is expected that the utility of SVP will expand even further as other researchers, and community projects become more aware of the method. SVL also offers exciting future utility for researchers and other collaborators. While the sample case studies described here (Chapter

IV) only showed temporal aggregation based on years, this could be easily changed to incorporate longitudinal analysis at far finer temporal scales. For example, an interesting extension would be to analyze the street level changes of homeless camps across the course of a . As with most of the software developed here, there is a synergistic advance at work, as these large datasets analyzed through SVL can be used in combination with SVG to capture both pattern and process.

The Wordmapper software has proven to be one of the most popular extensions of SV related research. Wordmapper has helped to more fully leverage from the power of SVG. Simply put, the combination of SVG and Wordmapper has not only excited researchers in multiple disciplines, especially those using more qualitative approaches, but it has also advanced an appreciation for mapping of any kind. An invited presentation in 2017 opened the use of SVG

(Curtis, 2017) as a tool to study genocide, and from this meeting a dialogue on the appropriateness of spatial digital humanities was begun. NIH have also requested on multiple occasions for the SVG Wordmapper combination to be presented as a workshop to new or specialized scholars (Curtis, 2018). From an international perspective, the initial version of

Wordmapper software was used to explore SVG collected from Cali, Colombia to understand spatial patterns of mosquito borne diseases such as dengue and Zika at the sub-neighborhood scale (Krystosik et al., 2017). Further, Wordmapper has been used to compare and contrast the contextual information gained from sketch maps and SVG (Curtis, Curtis, Ajayakumar, Jefferis,

126

& Mitchell, 2019). The same study also utilized an initial version of GeoCluster software to identify significant clusters of spatial words generated from SVG.

While not being directly connected to SVG, the collaborations that have resulted have led to tangential discussions about other spatial data challenges. From these discussions Privy was developed. This software supports geo-masking and retranslation of spatial point data from any source. The standalone nature of the software is particularly suitable for health centers who would not want to have any external access to their data through web services or any other internet-based services. While this work is in its infancy, there have already been future commitments to collaborate, for example with doctors in the Los Angeles County Health

Department, because of the impact this could have on achieving the goal of incorporating more spatial approaches.

By developing an ensemble of software, this dissertation has laid the building blocks for a broad future research agenda. Currently, the Camera Player is developed to incorporate only

SVs obtained from digital cameras. In future versions of Camera Player, we plan to incorporate video from other sources such as mobile phones along with spatial locations also extracted from the phone. The SVP is currently dependent on YouTube for data storage and retrieval. With further web-based storage capabilities, we plan to seamlessly transition the software to accept SV from any web-based cloud storage source. Such a platform is currently being developed by the collaborative efforts of Dr. Ye Zhao and his team at Kent State computer science department and

Dr. Andrew Curtis and his team at GIS Health & Hazards Lab, Kent State (Curtis, 2017). The

SVL can further be enhanced by bringing in Convolutional Neural Networks (CNN) for (frame- based searching) image analysis, and introducing novel data sources such geo-spatial tweets for spatio-temporal querying. The initial spatial filter query would help to reduce the number of

127

candidate frames required for image recognition algorithm, which can further be an input to

CNN. The SVL library can also be extended to incorporate web-based storage, which would be particularly useful for urban planners, disaster managers, and health experts allowing them to conduct large-scale cross-sectional and temporal analysis using SV data. With Natural Language

Processing (NLP) techniques becoming accessible through easy-to-use packages such as Natural

Language Toolkit (NLTK), textual narratives from SVG can be further explored to identify latent patterns. As a next step, we are currently developing a new module for Wordmapper, which utilizes NLP techniques such as Part-of-Speech (POS) tagging and Wordnet-based semantic analysis to extract spatial objects from SVGs automatically. Further, extracting emotive content from SVG using the state-of-art techniques such as sentiment analysis is also a future direction planned for Wordmapper. Challenges regarding GeoCluster will also be further investigated, such as “small numbers” effects and variable minimum denominator thresholds dilemma conducting spatial filtering. We believe, exploratory visual techniques could be a possible solution for these statistical issues. Privy could be further enhanced to incorporate user controlled data aggregation. While geo-masking helps to preserve spatial privacy, the dataset that is masked cannot be combined with any other spatial data source, which is a limitation. If Privy can be enhanced to include user-controlled access to aggregated spatial point data, then it can substantially improve collaborative works and at the same time continue preserve spatial privacy.

In summary, the work presented in this dissertation has already made an impact in terms of research, of informing intervention, and in promoting the utility of mapping, spatial approaches in general and the need for context. But these are only the first steps, and there still is a lot more work needed as we draw on the expertise from geospatial science, programming, various qualitative methods, topical research experts, professionals and community members.

128

References

Aburizaiza, A. O., & Ames, D. P. (2009). GIS-Enabled Desktop Software Development

Pardigms. 2009 International Conference on Advanced Geographic Information Systems Web

Services, 75–79. https://doi.org/10.1109/GEOWS.2009.28

Ajayakumar, J., Curtis, A., Smith, S., & Curtis, J. (2019). The Use of Geonarratives to Add

Context to Fine Scale Geospatial Research. International Journal of Environmental Research and Public Health, 16(3), 515. https://doi.org/10.3390/ijerph16030515

Allshouse, W. B., Fitch, M. K., Hampton, K. H., Gesink, D. C., Doherty, I. A., Leone, P. A., …

Miller, W. C. (2010). Geomasking sensitive health data and privacy protection: an evaluation using an E911 database. Geocarto International, 25(6), 443–452. https://doi.org/10.1080/10106049.2010.496496

Andrienko, G., & Andrienko, N. (2012). Privacy Issues in Geospatial . In G.

Gartner & F. Ortag (Eds.), Advances in Location-Based Services: 8th International Symposium on Location-Based Services, Vienna 2011 (pp. 239–246). https://doi.org/10.1007/978-3-642-

24198-7_16

Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2),

93–115.

129

Ardagna, C. A., Cremonini, M., Vimercati, S. D. C. di, & Samarati, P. (2008). Privacy-enhanced

Location-based Access Control. In M. Gertz & S. Jajodia (Eds.), Handbook of Database

Security: Applications and Trends (pp. 531–552). https://doi.org/10.1007/978-0-387-48533-1_22

Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497–525. https://doi.org/10.1002/(SICI)1097-0258(19990315)18:5<497::AID-SIM45>3.0.CO;2-#

Beckett, K., & Herbert, S. (2009). Banished: The new social control in urban America. Oxford

University Press.

Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The R*-tree: an efficient and robust access method for points and rectangles. Acm Sigmod Record, 19, 322–331. Acm.

Bell, S. L., Phoenix, C., Lovell, R., & Wheeler, B. W. (2015a). Seeking everyday wellbeing: The coast as a therapeutic landscape. Social Science & Medicine, 142, 56–67. https://doi.org/10.1016/j.socscimed.2015.08.011

Bell, S. L., Phoenix, C., Lovell, R., & Wheeler, B. W. (2015b). Using GPS and geo-narratives: a methodological approach for understanding and situating everyday green space encounters. Area,

47(1), 88–96. https://doi.org/10.1111/area.12152

Besag, J., & Diggle, P. J. (1977). Simple Monte Carlo tests for spatial pattern. Journal of the

Royal Statistical Society: Series C (Applied Statistics), 26(3), 327–333.

Bissell, D. (2009). Visualising everyday geographies: practices of vision through travel-time.

Transactions of the Institute of British Geographers, 34(1), 42–60.

130

Boulos, M. N. K. (2004). Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom. International

Journal of Health Geographics, 3, 1. https://doi.org/10.1186/1476-072X-3-1

Boulos, M. N. K., Curtis, A. J., & AbdelMalik, P. (2009). Musings on privacy issues in health research involving disaggregate geographic data about individuals. International Journal of

Health Geographics, 8(1), 46. https://doi.org/10.1186/1476-072X-8-46

Bressert, E. (2012). SciPy and NumPy: an overview for developers. O’Reilly Media, Inc.

Brownstein, J. S., Cassa, C. A., Kohane, I. S., & Mandl, K. D. (2006). An unsupervised classification method for inferring original case locations from low-resolution disease maps.

International Journal of Health Geographics, 5, 56. https://doi.org/10.1186/1476-072X-5-56

Brownstein, J. S., Cassa, C., Kohane, I. S., & Mandl, K. D. (2005). Reverse Geocoding:

Concerns about Patient Confidentiality in the Display of Geospatial Health Data. AMIA Annual

Symposium Proceedings, 2005, 905.

Burkett, B., & Curtis, A. (2011). Classifying Wildfire Risk at the Building Scale in the Wildland-

Urban Interface: Applying Spatial Video Approaches to Los Angeles County. Risk, Hazards &

Crisis in Public Policy, 2(4), 1–20. https://doi.org/10.2202/1944-4079.1093

Bush, W. S., Crawford, D. C., Briggs, F., Freedman, D., & Sloan, C. (2018). Integrating community-level data resources for precision medicine research. Pac. Symp. Biocomput, 23,

618–622. World Scientific.

131

Carpiano, R. M. (2009). Come take a walk with me: The “Go-Along” interview as a novel method for studying the implications of place for health and well-being. Health & Place, 15(1),

263–272.

Chen, C.-C., Chuang, J.-H., Wang, D.-W., Wang, C.-M., Lin, B.-C., & Chan, T.-C. (2017).

Balancing geo-privacy and spatial patterns in epidemiological studies. Geospatial Health. https://doi.org/10.4081/gh.2017.573

Chrisman, N. R. (1991). The error component in spatial data. Geographical Information Systems,

1(12), 165–174.

Cinnamon, J., & Schuurman, N. (2010). Injury surveillance in low-resource settings using

Geospatial and Social Web technologies. International Journal of Health Geographics, 9, 25. https://doi.org/10.1186/1476-072X-9-25

Cromley, E. K., & McLafferty, S. L. (2011). GIS and Public Health, Second Edition. Guilford

Press.

Croner, C. M., Sperling, J., & Broome, F. R. (1996). Geographic Information Systems (gis):

New Perspectives in Understanding Human Health and Environmental Relationships. Statistics in Medicine, 15(18), 1961–1977. https://doi.org/10.1002/(SICI)1097-

0258(19960930)15:18<1961::AID-SIM408>3.0.CO;2-L

Crump, J., Newman, K., Belsky, E. S., Ashton, P., Kaplan, D. H., Hammel, D. J., & Wyly, E.

(2008). Cities Destroyed (Again) For Cash: Forum on the U.S. Foreclosure Crisis. Urban

Geography, 29(8), 745–784. https://doi.org/10.2747/0272-3638.29.8.745

132

Curtis, A. (2017a). GeoVisuals Software: Capturing, Managing, and Utilizing GeoSpatial

Multimedia Data for Collaborative Field Research (National Science Foundation Grant No.

1739491 S12-SSE).

Curtis, A. (2017b, October). Genocide Spatial Video Geo-narratives: Mapping the Personal

Experiences of Victims of the Khmer Rouge. Presented at the Digital Approaches to Genocide,

University of Southern California Shoah Foundation, Los Angeles. Retrieved from https://sfi.usc.edu/cagr/conferences/2017_international

Curtis, A. (2018). Workshop, Big Data and Systems Science & Innovative Methods. Presented at the NIH Health Disparities Research Institute, Washington D.C. NIH Health Disparities

Research Institute, Washington D.C.

Curtis, A., Bempah, S., Ajayakumar, J., Mofleh, D., & Odhiambo, L. (2019). Spatial Video

Health Risk Mapping in Informal Settlements: Correcting GPS Error. International Journal of

Environmental Research and Public Health, 16(1), 33. https://doi.org/10.3390/ijerph16010033

Curtis, A., Blackburn, J. K., Widmer, J. M., & Morris, J. G. (2013). A ubiquitous method for street scale spatial data collection and analysis in challenging urban environments: mapping health risks using spatial video in Haiti. International Journal of Health Geographics, 12, 21. https://doi.org/10.1186/1476-072X-12-21

Curtis, A., Blackburn, J., Smiley, S., Yen, M., Camilli, A., Alam, M., … Morris, J. (2016).

Mapping to support fine scale epidemiological cholera investigations: A case study of spatial video in Haiti. International Journal of Environmental Research and Public Health, 13(2), 187.

133

Curtis, A., Curtis, J. W., Ajayakumar, J., Jefferis, E., & Mitchell, S. (2018). Same space– different perspectives: comparative analysis of geographic context through sketch maps and spatial video geonarratives. International Journal of Geographical Information Science, 1–27.

Curtis, A., Curtis, J. W., Porter, L. C., Jefferis, E., & Shook, E. (2016). Context and Spatial

Nuance Inside a Neighborhood’s Drug Hotspot: Implications for the Crime–Health Nexus.

Annals of the American Association of Geographers, 106(4), 819–836. https://doi.org/10.1080/24694452.2016.1164582

Curtis, A., Curtis, J. W., Shook, E., Smith, S., Jefferis, E., Porter, L., … Kerndt, P. R. (2015).

Spatial video geonarratives and health: case studies in post-disaster recovery, crime, mosquito control and tuberculosis in the homeless. International Journal of Health Geographics, 14(1),

22. https://doi.org/10.1186/s12942-015-0014-8

Curtis, A., Duval-Diop, D., & Novak, J. (2010). Identifying Spatial Patterns of Recovery and

Abandonment in the Post-Katrina Holy Cross Neighborhood of New Orleans. Cartography and

Geographic Information Science, 37(1), 45–56. https://doi.org/10.1559/152304010790588043

Curtis, A., & Fagan, W. F. (2013). Capturing Damage Assessment with a Spatial Video: An

Example of a Building and Street-Scale Analysis of Tornado-Related Mortality in Joplin,

Missouri, 2011. Annals of the Association of American Geographers, 103(6), 1522–1538. https://doi.org/10.1080/00045608.2013.784098

Curtis, A., Felix, C., Mitchell, S., Ajayakumar, J., & Kerndt, P. R. (2018). Contextualizing

Overdoses in Los Angeles’s Skid Row between 2014 and 2016 by Leveraging the Spatial

Knowledge of the Marginalized as a Resource. Annals of the American Association of

Geographers, 108(6), 1521–1536. https://doi.org/10.1080/24694452.2018.1471386

134

Curtis, A. J. (2008). Three-dimensional visualization of cultural clusters in the 1878 yellow fever epidemic of New Orleans. International Journal of Health Geographics, 7(1), 47. https://doi.org/10.1186/1476-072X-7-47

Curtis, A. J., Mills, J. W., & Leitner, M. (2006). Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina. International Journal of

Health Geographics, 5(1), 44. https://doi.org/10.1186/1476-072X-5-44

Curtis, A. J., Mills, J. W., McCarthy, T., Fotheringham, A. S., & Fagan, W. F. (2010). Space and

Time Changes in Neighborhood Recovery After a Disaster Using a Spatial Video Acquisition

System. In P. S. Showalter & Y. Lu (Eds.), Geospatial Techniques in Urban Hazard and

Disaster Analysis (pp. 373–392). https://doi.org/10.1007/978-90-481-2238-7_18

Curtis, A., & Mills, J. W. (2011). Crime in Urban Post-Disaster Environments: A

Methodological Framework from New Orleans. Urban Geography, 32(4), 488–510. https://doi.org/10.2747/0272-3638.32.4.488

Curtis, A., & Mills, J. W. (2012). Spatial video data collection in a post-disaster landscape: The

Tuscaloosa Tornado of April 27th 2011. Applied Geography, 32(2), 393–400. https://doi.org/10.1016/j.apgeog.2011.06.002

Curtis, A., Mills, J. W., Agustin, L., & Cockburn, M. (2011). Confidentiality risks in fine scale aggregations of health data. Computers, Environment and Urban Systems, 35(1), 57–64. https://doi.org/10.1016/j.compenvurbsys.2010.08.002

Curtis, A., Mills, J. W., & Blackburn, J. K. (2007). A Spatial Variant of the Basic Reproduction

Number for the New Orleans Yellow Fever Epidemic of 1878. The Professional Geographer,

59(4), 492–502. https://doi.org/10.1111/j.1467-9272.2007.00637.x

135

Curtis, A., Mills, J. W., Kennedy, B., Fotheringham, S., & McCarthy, T. (2007). Understanding the Geography of Post-Traumatic Stress: An Academic Justification for Using a Spatial Video

Acquisition System in the Response to Hurricane Katrina. Journal of Contingencies and Crisis

Management, 15(4), 208–219. https://doi.org/10.1111/j.1468-5973.2007.00522.x

Curtis, A., Mills, J. W., & Leitner, M. (2006). Keeping an eye on privacy issues with geospatial data. Nature, 441, 150. https://doi.org/10.1038/441150d

Curtis, A., Quinn, M., Obenauer, J., & Renk, B. M. (2017). Supporting local health decision making with spatial video: Dengue, Chikungunya and Zika risks in a data poor, informal community in Nicaragua. Applied Geography, 87, 197–206.

Curtis, A., Squires, R., Rouzier, V., Pape, J. W., Ajayakumar, J., Bempah, S., … Morris, J. G.

(2019). Micro-Space Complexity and Context in the Space-Time Variation in Enteric Disease

Risk for Three Informal Settlements of Port au Prince, Haiti. International Journal of

Environmental Research and Public Health, 16(5). https://doi.org/10.3390/ijerph16050807

De Smith, M. J., Goodchild, M. F., & Longley, P. (2007). Geospatial analysis: a comprehensive guide to principles, techniques and software tools. Troubador Publishing Ltd.

Doran, B. J., & Lees, B. G. (2005). Investigating the Spatiotemporal Links Between Disorder,

Crime, and the Fear of Crime. The Professional Geographer, 57(1), 1–12. https://doi.org/10.1111/j.0033-0124.2005.00454.x

Duckham, M., Kulik, L., & Kulik, L. (2006, November 10). Location Privacy and Location-

Aware Computing. https://doi.org/10.1201/9781420008609-11

136

Dunkel, A. (2015). Visualizing the perceived environment using crowdsourced photo geodata.

Landscape and Urban Planning, 142, 173–186.

Duval-Diop, D., Curtis, A., & Clark, A. (2010). Enhancing equity with public participatory GIS in hurricane rebuilding: faith based organizations, community mapping, and policy advocacy.

Community Development, 41(1), 32–49. https://doi.org/10.1080/15575330903288854

Elwood, S. (2006). Critical Issues in Participatory GIS: , Reconstructions, and

New Research Directions. Transactions in GIS, 10(5), 693–708. https://doi.org/10.1111/j.1467-

9671.2006.01023.x

Ester, M., Kriegel, H.-P., Sander, J., Xu, X., & others. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd, 96, 226–231.

Evans, J., & Jones, P. (2011). The walking interview: Methodology, mobility and place. Applied

Geography, 31(2), 849–858. https://doi.org/10.1016/j.apgeog.2010.09.005

Fillmore, C. J. (1966). Deictic Categories in the Semantics of “Come.” Foundations of

Language, 2(3), 219–227. Retrieved from JSTOR.

Fotheringham, A. S., & Wong, D. W. (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23(7), 1025–1044.

Gattrell, A., & Loytonen, M. (1998). GIS and Health. Houghton Mifflin Harcourt.

Golden, M. L., Downs, R. R., & Davis-Packard, K. (2005). Confidentiality issues and policies related to the utilization and dissemination of geospatial data for public health applications. New

York, NY, The Socioeconomic Data and Applications Center (SEDAC) and Center for

International Information Network (CIESIN): Columbia University.

137

Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal,

69(4), 211–221.

Gruteser, M., & Grunwald, D. (2003). Anonymous Usage of Location-Based Services Through

Spatial and Temporal Cloaking. Proceedings of the 1st International Conference on Mobile

Systems, Applications and Services, 31–42. https://doi.org/10.1145/1066116.1189037

Hadjieleftheriou, M. (2015). LibSpatialIndex.

Hägerstrand, T. (1975). Space, time and human conditions.

Haines, A., Haines, A., & Donald, A. (2008). Getting research findings into practice. John Wiley

& Sons.

Haklay, M., Singleton, A., & Parker, C. (2008). 2.0: The of the

GeoWeb. Geography Compass, 2(6), 2011–2039.

Hampton, K. H., Fitch, M. K., Allshouse, W. B., Doherty, I. A., Gesink, D. C., Leone, P. A., …

Miller, W. C. (2010). Mapping health data: improved privacy protection with donut method geomasking. American Journal of Epidemiology, 172(9), 1062–1069. https://doi.org/10.1093/aje/kwq248

Harrington, D. W., McLafferty, S., & Elliott, S. J. (2016). Population health intervention research: geographical perspectives. Routledge.

Harvey, P. (2013). ExifTool: Read, write and edit meta information. Software Package Available at Http://Www. Sno. Phy. Queensu. ca/ Phil/Exiftool.

138

Hawthorne, T. L., & Kwan, M.-P. (2013). Exploring the unequal landscapes of healthcare accessibility in lower-income urban neighborhoods through qualitative inquiry. Geoforum, 50,

97–106. https://doi.org/10.1016/j.geoforum.2013.08.002

Hawthorne, T. L., Solís, P., Terry, B., Price, M., & Atchison, C. L. (2015). Critical reflection mapping as a hybrid methodology for examining sociospatial perceptions of new research sites.

Annals of the Association of American Geographers, 105(1), 22–47.

Hide, C., Moore, T., & Smith, M. (2003). Adaptive Kalman filtering for low-cost INS/GPS. The

Journal of Navigation, 56(1), 143–152.

Hope, A. C. (1968). A simplified Monte Carlo significance test procedure. Journal of the Royal

Statistical Society: Series B (Methodological), 30(3), 582–598.

Initiative, C. A. (2004). Statistical Approaches for Small Numbers: Addressing Reliability and

Disclosure Risk.

Johari, R., & Sharma, P. (2012). A survey on web application vulnerabilities (SQLIA, XSS) exploitation and security engine for SQL injection. 2012 International Conference on

Communication Systems and Network Technologies, 453–458. IEEE.

Jones, P., & Evans, J. (2012). The spatial transcript: analysing mobilities through qualitative

GIS. Area, 44(1), 92–99.

Josselson, R. (1996). Ethics and process in the narrative study of lives (Vol. 4). Sage.

Jung, J.-K. (2009). Computer-aided qualitative GIS: A software-level integration of qualitative research and GIS. Qualitative GIS: A Mixed Methods Approach, 115–136.

139

Jung, J.-K., & Elwood, S. (2010). Extending the Qualitative Capabilities of GIS: Computer-

Aided Qualitative GIS. Transactions in GIS, 14(1), 63–87. https://doi.org/10.1111/j.1467-

9671.2009.01182.x

Kamel Boulos, M. N., Cai, Q., Padget, J. A., & Rushton, G. (2006). Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. Journal of

Biomedical Informatics, 39(2), 160–170. https://doi.org/10.1016/j.jbi.2005.06.003

Kamel Boulos, M. N., Resch, B., Crowley, D. N., Breslin, J. G., Sohn, G., Burtner, R., …

Chuang, K.-Y. S. (2011). Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples. International Journal of Health Geographics, 10(1), 67. https://doi.org/10.1186/1476-072X-10-67

Kar, B., Sieber, R., Haklay, M., & Ghose, R. (2016). Public Participation GIS and Participatory

GIS in the of GeoWeb. The Cartographic Journal, 53(4), 296–299. https://doi.org/10.1080/00087041.2016.1256963

Kennedy, W. (2011, August 6). Tornado poses questions about putting shelters in apartment buildings. Retrieved April 7, 2019, from Joplin Globe website: https://www.joplinglobe.com/news/local_news/tornado-poses-questions-about-putting-shelters- in-apartment-buildings/article_5f747b90-a442-5ddc-bb80-7de008bc7760.html

Kitchin, R., Lauriault, T., & Wilson, M. (2016). Understanding Spatial Media (SSRN Scholarly

Paper No. ID 2799586). Retrieved from Social Science Research Network website: https://papers.ssrn.com/abstract=2799586

140

Kleres, J. (2011). Emotions and Narrative Analysis: A Methodological Approach. Journal for the

Theory of Social Behaviour, 41(2), 182–202. https://doi.org/10.1111/j.1468-5914.2010.00451.x

Knigge, L., & Cope, M. (2006). Grounded Visualization: Integrating the Analysis of Qualitative and Quantitative Data through Grounded Theory and Visualization. Environment and Planning

A: Economy and Space, 38(11), 2021–2037. https://doi.org/10.1068/a37327

Kounadi, O., & Leitner, M. (2014). Why Does Geoprivacy Matter? The Scientific Publication of

Confidential Data Presented on Maps. Journal of Empirical Research on Human Research

Ethics, 9(4), 34–45. https://doi.org/10.1177/1556264614544103

Kraak, M.-J., & He, N. (2009). Organizing the neo-geography collections with annotated space- time paths. The 24th International Cartographic Conference, Chile.

Krystosik, A. R., Curtis, A., Buritica, P., Ajayakumar, J., Squires, R., Dávalos, D., … James, M.

A. (2017). Community context and sub-neighborhood scale detail to explain dengue, chikungunya and Zika patterns in Cali, Colombia. PLOS ONE, 12(8), e0181208. https://doi.org/10.1371/journal.pone.0181208

Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: detection and inference.

Statistics in Medicine, 14(8), 799–810.

Kulldorff, M., Rand, K., Gherman, G., Williams, G., & DeFrancesco, D. (1998). SaTScan v 2.1:

Software for the spatial and space-time scan statistics. Bethesda, MD: National Cancer Institute.

Kusenbach, M. (2003). Street phenomenology: The go-along as ethnographic research tool.

Ethnography, 4(3), 455–485.

141

Kwan, M.-P. (2002a). Feminist Visualization: Re-Envisioning GIS as a Method in Feminist

Geographic Research. Annals of the Association of American Geographers, 92(4), 645–661.

Retrieved from JSTOR.

Kwan, M.-P. (2002b). Introduction: feminist geography and GIS.

Kwan, M.-P. (2002c). Is GIS for women? Reflections on the critical discourse in the 1990s.

Gender, Place and Culture: A Journal of Feminist Geography, 9(3), 271–279.

Kwan, M.-P. (2007a). Affecting geospatial technologies: Toward a feminist politics of emotion.

The Professional Geographer, 59(1), 22–34.

Kwan, M.-P. (2007b). Affecting Geospatial Technologies: Toward a Feminist Politics of

Emotion. The Professional Geographer, 59(1), 22–34. https://doi.org/10.1111/j.1467-

9272.2007.00588.x

Kwan, M.-P. (2012a). How GIS can help address the uncertain geographic context problem in social science research. Annals of GIS, 18(4), 245–255. https://doi.org/10.1080/19475683.2012.727867

Kwan, M.-P. (2012b). The Uncertain Geographic Context Problem. Annals of the Association of

American Geographers, 102(5), 958–968. https://doi.org/10.1080/00045608.2012.687349

Kwan, M.-P., & Ding, G. (2008). Geo-Narrative: Extending Geographic Information Systems for

Narrative Analysis in Qualitative and Mixed-Method Research. The Professional Geographer,

60(4), 443–465. https://doi.org/10.1080/00330120802211752

Laurier, E., Lorimer, H., Brown, B., Jones, O., Juhlin, O., Noble, A., … others. (2008). Driving and ‘passengering’: Notes on the ordinary organization of car travel. Mobilities, 3(1), 1–23.

142

Lee, E. C., Asher, J. M., Goldlust, S., Kraemer, J. D., Lawson, A. B., & Bansal, S. (2016). Mind the Scales: Harnessing Spatial Big Data for Infectious Disease Surveillance and Inference. The

Journal of Infectious Diseases, 214(suppl_4), S409–S413. https://doi.org/10.1093/infdis/jiw344

Leitner, M., & Curtis, A. (2006). A first step towards a framework for presenting the location of confidential point data on maps—results of an empirical perceptual study. International Journal of Geographical Information Science, 20(7), 813–822. https://doi.org/10.1080/13658810600711261

Leitner, Michael. (2013). Crime modeling and mapping using geospatial technologies (Vol. 8).

Springer Science & Business Media.

Lewis, P., Fotheringham, S., & Winstanley, A. (2011). Spatial video and GIS. International

Journal of Geographical Information Science, 25(5), 697–716. https://doi.org/10.1080/13658816.2010.505196

Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. ArXiv:Cs/0205028.

Retrieved from http://arxiv.org/abs/cs/0205028

Lue, E., Wilson, J. P., & Curtis, A. (2014). Conducting disaster damage assessments with Spatial

Video, experts, and citizens. Applied Geography, 52, 46–54. https://doi.org/10.1016/j.apgeog.2014.04.014

Madden, M., & Ross, A. (2009). Genocide and GIScience: Integrating Personal Narratives and

Geographic Information Science to Study Human Rights. The Professional Geographer, 61(4),

508–526. https://doi.org/10.1080/00330120903163480

143

Matthews, S. A., Detwiler, J. E., & Burton, L. M. (2006). Geo-ethnography: Coupling

Geographic Information Analysis Techniques with Ethnographic Methods in Urban Research.

Cartographica: The International Journal for Geographic Information and Geovisualization. https://doi.org/10.3138/2288-1450-W061-R664

McCall, M. K. (2003). Seeking good governance in participatory-GIS: a review of processes and governance dimensions in applying GIS to participatory spatial planning. Habitat International,

27(4), 549–573. https://doi.org/10.1016/S0197-3975(03)00005-5

McIntosh, J., De Lozier, G., Cantrell, J., & Yuan, M. (2011). Towards a narrative GIS. Digital

Humanities.

McLafferty, S., & Grady, S. (2004). Prenatal Care Need and Access: A GIS Analysis. Journal of

Medical Systems, 28(3), 321–333. https://doi.org/10.1023/B:JOMS.0000032848.76032.28

Mennis, J., Mason, M. J., & Cao, Y. (2013). Qualitative GIS and the Visualization of Narrative

Activity Space Data. International Journal of Geographical Information Science : IJGIS, 27(2),

267–291. https://doi.org/10.1080/13658816.2012.678362

Mihalcea, R., & Liu, H. (2006). A Corpus-based Approach to Finding Happiness. Retrieved

April 7, 2019, from undefined website: /paper/A-Corpus-based-Approach-to-Finding-Happiness-

Mihalcea-Liu/836f5bc6b13404790d01c337e71511c1eed96b53

Miller, C. C. (2006). A beast in the field: The Google Maps mashup as GIS/2. Cartographica:

The International Journal for Geographic Information and Geovisualization, 41(3), 187–199.

Miller, H. (2007). Place-Based versus People-Based Geographic Information Science.

Geography Compass, 1(3), 503–535. https://doi.org/10.1111/j.1749-8198.2007.00025.x

144

Miller, H. J. (2004). Tobler’s first law and spatial analysis. Annals of the Association of

American Geographers, 94(2), 284–289.

Mills, J. W., Curtis, A., Kennedy, B., Kennedy, S. W., & Edwards, J. D. (2010). Geospatial video for field data collection. Applied Geography, 30(4), 533–547. https://doi.org/10.1016/j.apgeog.2010.03.008

Montoya, L. (2003). Geo-data acquisition through mobile GIS and digital video: an urban disaster management perspective. Environmental Modelling & Software, 18(10), 869–876. https://doi.org/10.1016/S1364-8152(03)00105-1

Mooney, P., Corcoran, P., & Ciepluch, B. (2013). The potential for using volunteered geographic information in pervasive health computing applications. Journal of Ambient Intelligence and

Humanized Computing, 4(6), 731–745. https://doi.org/10.1007/s12652-012-0149-4

Moran, P. A. P. (1950). Notes on Continuous Stochastic Phenomena. Biometrika, 37(1/2), 17–

23. https://doi.org/10.2307/2332142

Myers, D. E. (1994). Spatial interpolation: an overview. Geoderma, 62(1–3), 17–28.

Ooi, B. C. (1987). Spatial kd-tree: A data structure for geographic database. Datenbanksysteme in Büro, Technik Und Wissenschaft, 247–258. Springer.

Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical

Information System, 1(4), 335–358.

145

Pain, R., MacFarlane, R., Turner, K., & Gill, S. (2006). When, where, if and but; qualifying GIS and the effect of streetlighting on crime and fear. Environment and Planning A., 38, 2055–2074. http://dx.doi.org/10.1068/a38391

Pang, B., Lee, L., & others. (2008). Opinion mining and sentiment analysis. Foundations and

Trends® in Information Retrieval, 2(1–2), 1–135.

Paulikas, M. J., Curtis, A., & Veldman, T. (2014). Spatial video street-scale damage assessment of the Washington, Illinois Tornado of 2013. ISCRAM.

Pavlovskaya, M. (2006). Theorizing with GIS: A Tool for Critical Geographies? Environment and Planning A: Economy and Space, 38(11), 2003–2020. https://doi.org/10.1068/a37326

Perchoux, C., Chaix, B., Cummins, S., & Kestens, Y. (2013). Conceptualization and measurement of environmental exposure in epidemiology: accounting for activity space related to daily mobility. Health & Place, 21, 86–93.

Phillips, M. L., Hall, T. A., Esmen, N. A., Lynch, R., & Johnson, D. L. (2001). Use of global positioning system technology to track subject’s location during environmental exposure sampling. Journal of Exposure Science and Environmental Epidemiology, 11(3), 207.

Pickles, J. (2008). Representations in an electronic age: Geography, GIS, and democracy. Praxis

(e) Press.

Piza, E. L., & Gilchrist, A. M. (2018). Measuring the effect heterogeneity of police enforcement actions across spatial contexts. Journal of Criminal Justice, 54, 76–87.

146

Porter, L. C., De Biasi, A., Mitchell, S., Curtis, A., & Jefferis, E. (2018). Understanding the

Criminogenic Properties of Vacant Housing: A Mixed Methods Approach. Journal of Research in Crime and Delinquency, 0022427818807965. https://doi.org/10.1177/0022427818807965

Rahman, M. K., Schmidlin, T. W., Munro-Stasiuk, M. J., & Curtis, A. (2017). Geospatial

Analysis of Land Loss, Land Cover Change, and Landuse Patterns of Kutubdia Island,

Bangladesh. International Journal of Applied Geospatial Research (IJAGR), 8(2), 45–60. https://doi.org/10.4018/IJAGR.2017040104

Rainham, D., Krewski, D., McDowell, I., Sawada, M., & Liekens, B. (2008). Development of a wearable global positioning system for place and health research. International Journal of

Health Geographics, 7(1), 59. https://doi.org/10.1186/1476-072X-7-59

Ratcliffe, J. H. (2004). Geocoding crime and a first estimate of a minimum acceptable hit rate.

International Journal of Geographical Information Science, 18(1), 61–72.

Richardson, D. B., Kwan, M.-P., Alter, G., & McKendry, J. E. (2015). Replication of scientific research: addressing geoprivacy, confidentiality, and data sharing challenges in geospatial research. Annals of GIS, 21(2), 101–110. https://doi.org/10.1080/19475683.2015.1027792

Richardson, D. B., Volkow, N. D., Kwan, M.-P., Kaplan, R. M., Goodchild, M. F., & Croyle, R.

T. (2013). Spatial Turn in Health Research. Science, 339(6126), 1390–1392. https://doi.org/10.1126/science.1232257

Ripley, B. D. (1976). The Second-Order Analysis of Stationary Point Processes. Journal of

Applied Probability, 13(2), 255–266. https://doi.org/10.2307/3212829

147

Rodríguez, D. A., Brown, A. L., & Troped, P. J. (2005). Portable global positioning units to complement accelerometry-based physical activity monitors. Medicine and Science in Sports and

Exercise, 37(11 Suppl), S572-581.

Rogers, R. (2011, May 23). Eyewitness in tornado video: “I didn’t think I was going to die this young.” Retrieved April 7, 2019, from Cecil Daily website: https://www.cecildaily.com/news/national_news/eyewitness-in-tornado-video-i-didn-t-think-i- was/article_458dd14c-85a2-11e0-bec2-001cc4c002e0.html

Rothman, L., Buliung, R., Macarthur, C., To, T., & Howard, A. (2014). Walking and child pedestrian injury: a systematic review of built environment correlates of safe walking. Injury

Prevention, 20(1), 41–49.

Rushton, G., & Lolonis, P. (1996). Exploratory spatial analysis of birth defect rates in an urban population. Statistics in Medicine, 15(7–9), 717–726.

Seidl, D. E., Paulus, G., Jankowski, P., & Regenfelder, M. (2015). Spatial obfuscation methods for privacy protection of household-level data. Applied Geography, 63, 253–263. https://doi.org/10.1016/j.apgeog.2015.07.001

Sheridan, M. (2011, May 23). Video captures terror of Joplin, Missouri tornado; recorded inside

Fastrip store. Retrieved April 7, 2019, from nydailynews.com website: https://www.nydailynews.com/news/national/video-captures-terror-joplin-missouri-tornado- recorded-fastrip-store-article-1.146265

Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge.

148

Sinton, D. S., & Lund, J. J. (2007). Understanding place: GIS and mapping across the curriculum. ESRI, Inc.

Stefanidis, A., Crooks, A., & Radzikowski, J. (2013). Harvesting ambient geospatial information from social media feeds. GeoJournal, 78(2), 319–338.

Stein, M. L. (2012). Interpolation of spatial data: some theory for kriging. Springer Science &

Business Media.

Stopka, T. J., Donahue, A., Hutcheson, M., & Green, T. C. (2017). Nonprescription naloxone and syringe sales in the midst of opioid overdose and hepatitis C virus epidemics: Massachusetts,

2015. Journal of the American Pharmacists Association, 57(2), S34–S44.

Taylor, P. J. (1977). Quantitative methods in geography: an introduction to spatial analysis.

Houghton Mifflin Boston.

Teixeira, S. (2018). Qualitative Geographic Information Systems (GIS): An untapped research approach for social work. Qualitative Social Work, 17(1), 9–23.

Tiwari, C., & Rushton, G. (2005). Using spatially adaptive filters to map late stage colorectal cancer incidence in Iowa. In Developments in spatial data handling (pp. 665–676). Springer.

Tomlin, C. D. (1994). Map algebra: one perspective. Landscape and Urban Planning, 30(1–2),

3–12.

Vannini, P. (2011). Constellations of ferry (im) mobility: islandness as the performance and politics of insulation and isolation. Cultural Geographies, 18(2), 249–271.

149

VanWey, L. K., Rindfuss, R. R., Gutmann, M. P., Entwisle, B., & Balk, D. L. (2005).

Confidentiality and spatially explicit data: Concerns and challenges. Proceedings of the National

Academy of Sciences, 102(43), 15337–15342. https://doi.org/10.1073/pnas.0507804102

Walt, S. van der, Colbert, S. C., & Varoquaux, G. (2011). The NumPy Array: A Structure for

Efficient Numerical Computation. Computing in Science & Engineering, 13(2), 22–30. https://doi.org/10.1109/MCSE.2011.37

Wang, C., Ren, K., Lou, W., & Li, J. (2010). Toward publicly auditable secure cloud data storage services. IEEE Network, 24(4), 19–24.

Wang, S., & Armstrong, M. P. (2003). A approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing, 29(10), 1481–1504.

Warmerdam, F. (2008). The Geospatial Data Abstraction Library. In G. B. Hall & M. G. Leahy

(Eds.), Open Source Approaches in Spatial Data Handling (pp. 87–104). https://doi.org/10.1007/978-3-540-74831-1_5

Watson, G. S. (1969). TREND SURFACE ANALYSIS AND SPATIAL CORRELATION. Retrieved from JOHNS HOPKINS UNIV BALTIMORE MD DEPT OF STATISTICS website: https://apps.dtic.mil/docs/citations/AD0699163

Watts, P. R. (2010). Mapping narratives: the 1992 Los Angeles riots as a case study for narrative-based geovisualization. Journal of , 27(2), 203–227.

Wieland, S. C., Cassa, C. A., Mandl, K. D., & Berger, B. (2008). Revealing the spatial distribution of a disease while preserving privacy. Proceedings of the National Academy of

150

Sciences of the United States of America, 105(46), 17608–17613. https://doi.org/10.1073/pnas.0801021105

Yang, D.-H., Goerge, R., & Mullner, R. (2006). Comparing GIS-based methods of measuring spatial accessibility to health services. Journal of Medical Systems, 30(1), 23–32.

Zandbergen, P. A. (2014). Ensuring Confidentiality of Geocoded Health Data: Assessing

Geographic Masking Strategies for Individual-Level Data [Research article]. https://doi.org/10.1155/2014/567049

Zhang, S., Freundschuh, S. M., Lenzer, K., & Zandbergen, P. A. (2017). The location swapping method for geomasking. Cartography and Geographic Information Science, 44(1), 22–34. https://doi.org/10.1080/15230406.2015.1095655

Zimmerman, D. L., & Pavlik, C. (2008). Quantifying the Effects of Mask Metadata Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data.

Geographical Analysis, 40(1), 52–76. https://doi.org/10.1111/j.0016-7363.2007.00713.x

Zook, M., Graham, M., Shelton, T., & Gorman, S. (2010). Volunteered geographic information and crowdsourcing disaster relief: a case study of the Haitian earthquake. World Medical &

Health Policy, 2(2), 7–33.

151