GeoSense An open publishing platform for visualization, social sharing, and analysis of geospatial data. ARCHNES Anthony DeVincenzi TT I T B.F.A. Visual Communication, Seattle Art Institute 2007

Submitted to the Program in Media Arts and Sciences, Shlf A- hi dlI c, oo~ o rcecur an- annng11, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the Massachusetts Institute of Technology June 2012 @ 2012 Massachusetts Institute of Technology. All rights reserved

Aut or Anthony DeVincenzi Program in Media Arts and Sciences May 11, 2012

Certified by Dr. Hiroshi Ishii Jerome B. Wiesner Professor of Media Arts and Sciences Associate Director, MIT Media Lab Program in Media Arts and Sciences

Accepted by Dr. Mitchel Resnick Chairperson, Departmental Committee on Graduate Students Program in Media Arts and Sciences

GeoSense An open publishing platform for visualization, social sharing, and analysis of geospatial data.

Anthony DeVincenzi

;~

Thesis Supervisor Dr. Hiroshi Ishii Jerome B. Wiesner Professor of Media Arts and Sciences

Associate Director, MIT Media Lab Program in Media and Sciences

Thesis Reader Cesar A. Hidalgo Assistant Professor, MIT Media Lab {' 34> Thesis Reader Joi Ito Director, MIT Media Lab

Acknowledgments

THANK YOU,

Hiroshi, my advisor, for allowing me to diverge greatly from our group's pri- mary area of research to investigate an area I believe to be strikingly mean- ingful; for no holds barred in critique, and providing endless insight.

The Tangible Media Group, my second family, who adopted me as a designer and allowed me to play pretend engineer.

Samuel Luescher, for co-authoring GeoSense alongside me.

My thesis readers Joichi Ito and Cesar Hidalgo for providing feedback, inspi- ration, and guidance over the course of this work.

The people of Safecast, who support an idea larger than what any one man could accomplish. You are truly inspiring.

Divid Lakatos, and Matthew Blackshaw, for our many adventurous projects to date, and for those to come in the near future.

Mom and Dad, for allowing me to explore my passions despite how inappli- cable they may have seemed at times.

My family, and Jessica for loving me. I learn from your patience.

My friends in Seattle, and around the world.

TABLE OF CONTENTS

Introduction 13

Related Work 18 Contemporaries 20

Safecast 23 A call for help 23

Keeping quarters 24

Application Design 27 Balancing simplicity and complexity 27

Data mobility 28

Summary of system 28

Second order observation 30

Data features 30

Development timeline 31

Design Theory 33 Geovisualization 33

Aesthetics 36

Spatial-temporal narratives 39

Process 42 Concept 43

Safecast worldmap (V1) 45

Generalizing the platform (V2) 48

User interfacefor data management 49

GeoSense (V3) 50

9 Spatial comments and chat 52

Continued:Beyond the screen 53

Technical Design 55 Server structure 56

Amazon EC2 56

Ubuntu 56

Satellite & satellite API 56

Architecture 56

GeoSense Database 57

Data import 58

Aggregation and reduction through MapReduce 58

Spatial indexing and grid queries 59

TeamdataDatabase 61

Application Structure 61

Views 61

Models 62

Collections 62

ExternalLibraries 62

Challenges 65 Data purity 65

Performance 66

Scale 67

Custom instances 67

Use Cases 69 Safecast 69

Sourcemap 71

The Lace Race 71

10 Results 72

Future Work 74 Tile servers 74

Expanded visualization types 75

Models & mechanistic explanations 75

Boolean conditions and spatially bound alerts 75

Conclusion 78

References 81

Appendix 87 Tablet AR installation 87

11 12 Introduction

Throughout this document we refer to two projects: GeoSense, a visualization platform, and Safecast [1], a sensing and data collection organization. Their differences will be described at length as well as their commonality and shared resources.

ONWARD -

Geovisualization is a common form of information visualization, or scientific data visualization that when combined with visual pattern recognition allows for increased human understanding in effort to enhance the decision making process around a given view of data. [2]

Geospatial data has become abundant, and so have the many sensors that we use to collect it. With over 1.2 billion web and GPS enabled devices in our pockets [3], the amount of geotagged meta data ranging from tweets to photos has skyrocketed to enormous proportions. As more data becomes coupled with geospatial coordinates the intrinsic relationship between the meaning of the data and the place-in-space from where it came can be visual- ized, observed, and analyzed to inform decision making processes. However, this poses a problem as growing amounts of data can become more and more difficult to parse and understand.

13 Today, the tools available for geospatial mapping remain highly spe- cialized with significant technical overhead often outweighing the capabili- ties of the user. We use maps to codify the physical existence of immaterial

media and without accessible tools for visualization, the meaning of data is lost in the columns and rows of spreadsheets. Further, the inability to quickly and simply create and share geovisualizations in a lightweight manner has slowed the evolution of sharing and collaboration in GIS [4].

How could a community, a university, or an entire industry benefit from having the complexity of geospatial data visualization reduced to that of email, or a single tweet? To be more specific, what if we could seamlessly share and engage with social features such as comments and live interaction around geospatial data? We believe that empowering users with the tools necessary to construct visual and social narratives around contextual data will enhance their collective ability to respond to current events while simultane- ously planning for the future.

To achieve this we must first build a platform that can interpret the many disparate forms of data and enable them to co-exist in a single unified visualization. Without this tool, our data and voices are left in singular silos - never able to engage and interact with the voices of many. The visualization may take a number of forms, two or three-dimensional, varied in aesthetics per the author's discretion yet constrained within a sandbox as to guide the user - in short, not too much control, but not too little. A simple interface for sharing and socializing the new geovisualization invites multiuser collabora- tion, where each user may contribute and discuss the current datasets; sup- porting the claim that the shared knowledge of many users is often more valuable than that of one [5]. Finally, data pertaining to user interaction in- volving comments, tweets, and physical location may be aggregated to create a second order dataset which in turn may be incorporated into the visualiza- tion for communal behavior analysis.

GeoSense aims to provide such a tool, where the user can perform tasks of both the visual artist and data analyst all while contributing to the shared cognition and collective intelligence of a broader community. Geo-

14 Sense is an easy-to-use web based platform for the organization and upload of multiple datasets, a framework platform for 2D and 3D visualization, as well as a suite of social and analysis tools. GeoSense explores generating visual correlation models based on data layering and the aggregate of community analysis in lieu of unified theories, or known mechanistic explanations. After the 311 disasters in Japan involving the Thhoku earthquake, tsunami, and Fukushima Daiichi reactor meltdown, the community was left with little information around the outcome of the crisis. The public struggled to obtain answers to even the most basic questions: "Is it safe for me to stay in my home?" and "Is my food safe to eat?" Thousands rose to aid, and amongst the responders was Safecast, an independently organized crowd-sourced mapping network. Despite the great amount of information and data that was collected, there was no clear path towards displaying, juxtaposing, and dis- cussing the multivariate sources of critical information. GeoSense was founded to support the efforts of Safecast and the many communities of Ja- pan.

15 16 17 Related Work

We have, for hundreds of years, refined our use of visual language in the art of data visualization. As early as the 18th century men such as Joseph Priestley, an English theologist and academic had begun exploring the graphical repre- sentation of statistical methods through what is believed to be one of the ear- liest implementations of a timeline; designed to illustrate the contemporane- ity of ancient philosophers and statesman [44]. During a similar time William Playfair, a contemporary of Priestly, debuted what are believed to be the first known instances of bar and pie charts in his two books The Commercial and Political Atlas, and Statistical Breviary respectively. These early exploration laid a foundation upon which nearly three hundred years of related work has been conducted. In more contemporary times, an enumerable amount of work has been done in the field of data visualization, much of which stems from the foundational work of Edward Tufte and his many visual definitions described in "Visual Explanations" [6]. Tufte's seminal work in visual explanation and analysis has provided the foundation for an even wider field of informational graphic design: a notable trend covering a massive spectrum of content rang- ing from visualizations for geospatial data [7] to social and emotional observa- tions through data analysis [8].

18 Exports and Imports to and from DENMARK Se NORWAY from r/oo TO178Q

The .Bottom ise is dsqd nt 1arrs the Ryht- hand her bzto L.QOOO eark

One of the first time series graphs: William Playfair'strade-balance time-series chart,pub- lished in PoliticalAtlas, 1786

In the area of visualization for geospatial applications much work has been done by the GIS community to provide tools which allow for the exploration and visualization of location based datasets. Of these, Earth [9] and NASA World Wind [10] have been widely adopted as platforms for plotting sets of data ranging from tracking glacier footprints [11] to the

displacement and distribution of refugees located in remote areas of the world [12]. This wide application space is evidence towards the versatility of utilizing a three dimensional globe to display both context and meaning of data.

Deeper into tools built specifically for spatial data analysis, both Arc GIS [13] and ESRI MapIt [14] provide tools which claim to provide "easy on- line discovery, access, visualization, and dissemination of geospatial informa-

19 tion." Both services offer an extensive suite of data visualization and analysis tools, though neither provide a suitable framework for control over dataset comparison beyond basic layering and are both constrained to two dimen- sional view-ports. Similarly, community crisis mapping tools such as Pachube

[15] and [16] allow users to take much of the foundational mapping work done by the aforementioned sources, and add specific additions related to disaster relief.

In the space of tools and research for geospatial data comparison, analysis and theoretical model generation, significant work has been done by Floraine Grabler et al, with Automatic Generation of Tourist Maps, where the salience of map elements are determined by using bottom-up vision-based image analysis and top-down web-based information extraction methods [17]. The technique of selective visualization with respect to geography and loca- tional data is an important accomplishment towards identifying how to pre- sent visual data based on the user specified variables of interest within the data. Further, work by Jeffrey Heer and Michael Bostock of Stanford Univer- sity has explored how to leverage crowd sourcing to generate a visual analysis of raw data in " Graphical Perception: Using Mechanical Turk to Assess Visualization Design" [18].

Contemporaries

Web-based authoring tools for generating geovisualizations have become more prominent in recent years, offering an assortment of services towards helping online visitors create custom visualizations. Of them, the following are most related to GeoSense:

GEOCOMMONS

Most notably is GeoCommons, a public community of GeoIQ users who are building an open repository of data and maps [19]. GeoCommons has a num- ber of similarities to GeoSense, namely in that users are given an interface to

20 assist in the upload and treatment of geospatial data, as well as a shared data repository amongst users. While many of GeoCommons' features are thor- oughly implemented, including an impressive level of control over data layer- ing through boolean operations, there remains little to no social infrastruc- ture beyond the ability to share on Facebook or .

HARVARD UNIVERSITY'S WORLDMAP

Similar to GeoCommons, yet slightly smaller in scale, is Harvard University's WorldMap project [20], which invites its users to "build [their] own mapping portal and publish it to the world or to just a few collaborators." WorldMap offers a complex and configurable user experience, offering users the ability handle multiple sets of layered data atop an assortment of base map tiles. As with GeoCommons, The Harvard Worldmap has no true infrastructure for communal dialog and analysis.

MAPBOX

MapBox [21] is a simplified toolkit for publishing static geovisualizations. Their clean aesthetic and well designed native authoring platform named Ti- leMill [22] stands out as best in class regarding user interface and experience. The MapBox tools are less suited for community map building and are more fitted towards creating attractive visualizations as an embed or stand alone site.

MANY EYES

Finally, the democratization and socialization of data visualization has been explored by Fernanda Viegas et al., in "Your Place or Mine? Visualization as a Community Component" [23] where a number of studies were conducted in order to "enable the use of visualization technology by lay users and to facili- tate communication around the visualizations via tools for annotation, shar- ing and discussion." Many eyes does not focus on geovisualization and instead explores community dialog around common data graphs such as bar and pie charts.

21 22 Safecast

GeoSense serves as the visualization enginefor Safecast.org: a non profit collective of hackers and humanitarianswho are actively crowd sourcingradiation mapping from the 3-11-11 Daiichi reactor meltdown in the Fukushimapre- fecture, Japan.

A call for help

On Friday, the 11th of March 2011, Japan suffered a national catastrophe known now as the 311 Earthquake. At a staggering magnitude of 9.0 (Mw) [24] the off-coast earthquake was the most powerful to ever affect Japan and amongst the most powerful ever recorded [25]. As a result of the undersea epicenter, a series of tsunamis were triggered generating waves which were seen to reach as high as 130 feet. Amongst the tragic and catastrophic loss of life (-15,000), injury (-26,000), and property destruction (-129,000 buildings) [26], the damage caused by the tsunamis put into motion a chain of events which would lead to the eventual equipment failures, nuclear meltdown, and following radioactive material leakage from the Fukushima I Nuclear Power Plant (referred to as Daiichi). Rated as a level 7 catastrophe on the Interna-

23 tional Nuclear Event Scale (INES) [27], the Fukushima I meltdown was the

largest nuclear incident since the 1986 Chernobyl disaster. [28] Estimated economic losses skyrocketed into the tens of billions [29].

While no factor could outweigh the tragic loss of life, a full recovery and en- sured healthy future for the country and its inhabitants quickly became Ja- pan's main focus. It was during this time, seemingly moments after the be- ginning of this tragedy, that Safecast was formed.

Safecast is a global organization working to empower people with data, primarily through building sensor networks that enable both contribu- tion and free use of the data collected. After the 311 earthquake and resulting nuclear situation at Fukushima Daiichi it became clear that people needed more data than what was available. Since the post 311 formation of Safecast, the team has grown to a dedicated core team and over 150 supporting volun- teers. It has recently received grants from the John S. and James L. Knight Foundation and has, to date, deployed over 150 handmade radiation sensors

with a measurement aggregation of over 2 million individual readings [30].

Safecast is almost certainly the single largest source of radiation data in Japan, if not the world; all if which is open and available under CCO dedi- cation [31]. GeoSense, as a project and platform, was born out of the necessity

for Safecast to make visible its growing collection of data, and quickly evolved into a larger study which aims to redefine the relationship between commu-

nity driven datasets and the democratization of geovisualization and analysis.

Keeping quarters

On March 22nd, 2012, we held a meeting at the Tokyo Hacker Space to dis- cuss the current state and future needs of GeoSense as it pertained to Safe- cast. The following day, a demonstration of the data and its visualization was

given at the Roppongi Hills art night, part of the Mori Art Museum, in Rop- pongi, Japan. During this event, numerous members of the audience shouted

out, uncharacteristically for Japanese culture, and declared their need for un- fettered access to this critical data.

24 "They tell lies" one woman exclaimed from the audience, "they don't want us to know what's really happeningand you're the only ones who know the truth!" We can only assume "they" refers to the local government or TEPCO, the power company responsible for the Fukushima reactors [32]. Regardless of political or conspiracy beliefs, one year past the 311 incident the cry for help was clear as ever. During the event we presented a recap of the previous 12 months, announced that at least 2 million data points had been collected, demonstrated the GeoSense visualization platform, and presented a musical synthesizer which generated interpretive music related to the ambient radia- tion around it. The following day a press conference was held at The Fab Cafe in Shibuya, Tokyo. Members of the press were invited to attend and learn about the achievements of Safecast to date. We again announced the 2 million data points collected, the GeoSense platform, as well as an exciting new Safecast Geiger Counter which was built entirely by Safecast team members. The press, many of whom represented major Japanese outlets like NHK and TBS, had inquiries around the mapping platform: Questions such as "What do the colors mean? Is red dangerous?Is green safe? How can I tell who collected the data?What about data that is incorrector ma- licious?"were most common amongst the bunch. The answer, of course, was that much like our data our visualization engine would be as agnostic as pos- sible - meaning that all variables from data type to data display would be fully customizable.

Our answer in short -

"We are not presenting conclusions, only an observational platform from which you may draw your own."

25 26 Application Design

Balancing simplicity and complexity

The most fundamental design principle behind GeoSense is to procure sim- plicity and legibility where complexity and confusion exists. In order to pro- duce a usable platform with the greatest amount of user coverage and rich feature depth, it has been carefully designed to promote ease-of-use from the API to the UI. However, this does not discredit the need for a tool which pro- vides even the most seasoned data analysts with new and actionable insights. To address this, GeoSense scales gracefully dependent upon its user's specific needs; a simple geovisualization can quickly grow into a deeply insightful tool for analysis through a series complex, spatial-temporal queries across an infi- nite number of data sets. We believe that there exists value in large data analysis in place of known data models as was philosophically described by Nobel prize winner Philip Anderson in "More and Different" [33], and further explored by the entirety of the contemporary big data movement. Rather than incorporate complex computationally expensive algorithms to understand, interpolate, or predict model behaviors GeoSense instead invites the community as a source of analysis utilizing human intuition and natural pattern recognition to detect

27 occurring phenomena. This is not to say there isn't inherent value in known

models, it is however a different approach which lends itself to a level of ac- cessibility and friendliness which may in turn better serve a large community. Finally, GeoSense takes use of multiple open technologies, all of

which contribute greatly to the usability of the platform. Only 5 years ago the requirements to offer a service at this scale would have come with astronomi-

cal cost, requiring dedicated physical hardware servers, a team of engineers, and client side computing power that just did not exist. Open source software

efforts and blossoming communities cannot be thanked enough.

Data mobility

All data brought into or authored within GeoSense is stored, managed, and

appropriated by the GeoSense Satellite RESTful API. The GeoSense applica- tion invites users to explore different dimensions and parameters of their da- tasets, both spatial and temporal, providing a suite of tools which acquire their parameters via the API. In fact, any map or source of data may be used outside the GeoSense application . For example, should a user wish to develop their own front end application or integrate dataset(s) into another service, the satellite API provides sufficient scaffolding and endpoints to do so.

Summary of system

GeoSense is an open platform for the comparative and cooperative visualiza- tion of geo spatial data. It is fundamentally different from similar platforms that aim to provide complex mapping GIS tools and as a result are often weighed down by a cumbersome feature set.

GeoSense aims at providing the highest level of simplicity through carefully considering the average ability and limited prior knowledge of users, in regards to GIS systems. In order to build such a system, special considera- tions have been made in developing the UX. Given that a vast majority of first

28 world internet users are equipped with geospatial aware devices and plat- forms such as Google Maps and Bing, which has bolstered awareness of car- tographic interaction, GeoSense comes at a time when the user has already acquired familiarity with mapping concepts and is in prime condition to be- gin authoring. The system manifests as a web application available publicly at http://geo.media.mit.edu where any user can, within seconds, acquire a boi- lerplate visualization template to which they can upload or link geospatial datasets.

We believe that geospatial data is best understood collaboratively as was explored by Viegas et al in 2007 with Many Eyes [5]. To promote social behavior a single user's map is incredibly easy to share, as it belongs to a unique URL address. Maps can be shared through integrated social outlets such as Twitter, Facebook, or more traditionally through email or text link. To promote multi-user collaboration, all maps are generated with a public and private short URL (public view and administer respectively) which can be used to access the visualization platform. A map accessed through a specific URL allows for user annotation and commenting, both on specific data points and general location coordinates.

Users are also made aware of other current collaborators and their general whereabouts in the context of the map. To elaborate, the entire map is a chat room and message board to which invited users may co-author and analyze data. These features are explained in greater detail throughout this document.

GeoSense provides an insanely simple platform for visualizing mul- tiple disparate sources of geospatial data. In parallel, it also provides a suite of tools for collaboration and data insight which have, to date, not existed in well executed form. GeoSense is built specifically to serve users whose main skills are not computer science or design, but who have curiosity around geospatial analysis and appreciate beautiful presentation design.

29 Second order observation

By exposing user behavior in context of the geography from where they

originated along side areas of interaction, a second order observation can be described. Specifically, for geospatial data and geovisualization the place in space where the viewer or author exists may have special relevance to the data they are investigating - both at the individual and community level. To

explore this concept, each instance of GeoSense keeps track of where its users originate from, where (and if) they leave geospatial comments, as well as how they interact within the integrated chat room.

Data features

Data representation is highly variable within GeoSense. It is left up to the map's author to select the visual style, though GeoSense maintains pre- defined data point aggregates for large or extremely dense datasets. Data may be explored interactively by clicking on either a cluster of aggregated data or an individual datum. Meta information associated with the specific data is then revealed in geospatial context, assisting the user in better understanding the information with which they are interacting. We discuss in great depth the visual and computational considerations of visualizing data features in the Design Theory chapter.

30 Development timeline

GeoSense was developed over a sixth month period, all of which was spent in close collaboration with Safecast. To serve both the active Safecast commu- nity and prepare GeoSense for growth into additional communities, mile- stones vary from summit meetings in Japan to periods of presentation at the MIT Media Lab. This timeline is reflective of Geo's development, as well as its future plans and iterations:

Oct2011 Dec Jan Feb Mar May

Conception

V.1 Safecast Worldmap Research & Meet- ings V2 Development

I______Tokyo visit

V.3 GeoSense

Deployment

31 32 Design Theory

"The world is complex, dynamic, multidimen- sional; the paper is static,flat. How are we to represent the rich visual world of experience and measurement on mere flatland?" Edward Tufte [34]

Geovisualization

Producing effective visual representation of multi-layered information atop a map or any cartographic medium poses a torrent of potential complications.

For every condition that produces a desirable result one hundred new com- plications may reveal themselves generating information-less patterns as a byproduct of their presence. As explained first by Josef Albers and under- scored later by Tufte, the conundrum is that 1 + 1 may often equal 3 [35], where the byproduct of the initial variables produce an additional, distinct condition - adding to the visual complexity.

33 As described by Albers, the combination of one or more shapes may produce a thirdshape (shown in red) as their byproduct

To address this, we employ a number of techniques, both aesthetic and com- putational, that address the needs of user generated geovisualization. The key features we consider are:

1. Mindful representation of multivariate information layers drawn across both two and three dimensional planes.

2. Dynamic data densities where the application state (or UI) informs the visual output.

GeoSense is faced with a number of challenges when representing geographic data within the user interface. Aside from standard complexities that arise from visualizing large data, such as information density, other conditions must be considered when we investigate the user's interaction with the data. It is blunt and inefficient to show all data, as visual comprehension begins to suffer as the amount, or more specifically the density, of visualized data in- creases. Overabundant or incomprehensible arrangements stem from failures in design rather than from the information itself- regardless of magnitude. To address this complication, we employ a well known tactic of fit- ting a grid of boundary boxes against the map, to which data is aggregated in relation to the user's visible viewport. The grid is dynamically generated and sized. Many geospatial visualizations have addressed this, either for visual or computational simplicity, by averaging number of occurrences into a known cultural boundary. For instance, population density is often visualized as a choropleth map [figure below] where a polygon shape defines the state boundaries and all data within the given bounds is displayed as a single hue

34 Left: A computationallygenerated interpolationof radiationlevels. Right: A choropleth map showing population density by prefecture near Tokyo, Japan.Neither image producedfrom GeoSense

across the entire shape. This technique often misleads the viewer, as the data within the bounded areas is not nearly as uniform as the visualization sug- gests. A similarly misleading tactic is to attempt averaging information over a given space. Computational interpolations [figure above], while often making the visualization seems denser and perhaps more visually compelling, do lit- tle more than generate an unqualified visual representation and, in the case of Safecast's radiation dataset, produce extremely misleading conceptions regarding the data's meaning. Interpolations are effective when attempting to predict or model the behavior or future state of a dataset, especially in the case of trajectory over time and space.

35 Aesthetics

Shape, color, and size of visualized objects is carefully considered, as the shape of an object is optically tethered to the geography from where it rests. For example, a single data point may represent one particular point in space but to show it as a single pixel on a map is sometimes misleading. Instead, by showing the data point as a 10x10 pixel box it suggests that the data point corresponds with an area of space on the map rather than a single point on the map. Likewise, the visual change of data must coincide with adjustments in the map zoom level; If a datum does not change its size parameter as zoom is adjusted, the user will perceive the shape size to have no geographic bind- ing in relation to the geo coordinates of the map. This is perfectly illustrated in modern mapping tools such as Google Maps or Open Street Maps where the map tiles change resolution in respect to the user's perceived distance from earth (zoom). Additionally, the color of data information plays a critical role in both the visual legibility of each point of data as well as the intent expressed by the visualization. For example, the question continually arises whether or not certain types of data, radiation in our case, should be colored or have a fixed color scale. The most common example is a linear hue shift from green to red. In western culture green is universally accepted as safe, versus red, which is understood as being dangerous. Ironically, in Japan the color red represents heroism, love, and is a a positive visual indicator for the Tokyo stock exchange. Further, how does one normalize scale to color where the range value is either user generated or chosen arbitrarily? Non linear value distributions cause additional complexities to representing data using a hue shift and often need to be represented in logarithmic scale. It was decided early on that the potential harm in suggestive color- ing, especially within critical datasets like radiation, outweighs any aesthetic benefit. To address these concerns GeoSense gives the user complete control

36 A view of bold, brightly colored shapes atop a dark tonal map. Blue dots represent earthquakes sized by magnitude.Red dots represent nuclear reactorssized by power.

over data representation; the choice of whether data is represented as a single pixel, relatively sized circle, or bounding box, as well as single or hue-shifted color is completely customizable. By default, the application promotes bold color and is set against dark, tonal map tiles which best suites the type of data uploaded. To do this, we borrow a page from Swiss cartographer Eduard Imhof's first rule of color composition:

Pure, bright or very strong colors have loud, un- bearableeffects when they stand unrelieved over large areasadjacent to each other, but extraordi- nary effects can be achieved when they are used sparingly on or between dull background tones. "Noise is not music ... only on a quiet back- ground can a colorful theme be constructed." [36]

37 GeoSense addresses the multivariate nature of geospatial visualization by combining the proper amount of end user control with system constraints; in turn addressing the technical, artistic, and culture complications that arise. The figure below describes the three primary methods of data representation and their literal to representational qualities:

Pixel Box Circle LITERAL REPRESENTATIVE U 0100

Different renderingtechniques used by GeoSense. From left (most literal) to right (most repre- sentative) and their correspondingvisualizations below

38 Spatial-temporal narratives

In addition to the two and three dimensional canvases that GeoSense displays information, a fourth dimension for time has been implemented through a time series graphing system. Data sources containing temporal attributes may be explored alongside their geophysical attributes in shared context. In order to expose the value of a dataset's time quality, each datum is sequenced in successive time spaced by uniform time intervals. Coupling the spatial dimensions of the map viewport with the tem- poral sequence of the series graph deepens the an onlooker's understanding of a the dataset depth. By reducing the complexity of of the data into two un- derstandable, and intrinsically related parameters - time and space - an equally interactive and elegant view in all four dimensions is made tangible.

Top: Earthquakesshown with both geospatial and temporal analysis.Bottom: Arrangements of se- ries graph display types - bar chart, scatterplot,area, sparkline

39 Because data properties such as color and shape are selected at the data man- agement level, parameters are synchronized across all visualization mediums: a series of red dots for earthquakes on the maps will display as a red time se- ries line on the graph. Additionally, temporal data may be explored through a number of time based graph techniques that currently include scatterplot, line, area fill, and bar chart.

<0o0 SPACE

tTIME

2012

T he space (xy) plane represents the user's current viewport. It is defined by a constraininglatitude/ longitude and zoom level. T he time (z) plane displays selections ofa data set based on occurrences within a given time constraint.In this case, we show a selection between t1 and t2.

Users may also find interest in further exploring subsets of data through the time graph and can easily do so by interacting with a number of UI features allowing for time-range adjustment, and on-graph annotation. The above fig- ure describes the spatial-temporal relationship between the user's view of the time series graph and the visible geospatial viewport. As a user interacts with the time constraint controls, in this case ti and t2, the amount of data shown both on the map and graph are concatenated against the new parameters.

40 41 Process

GeoSense began with a simple concept: making the most simplistic, friction- free experience for mapping geospatial data with special attention towards social collaboration and data analysis. Moreover, this tool should allow for the effortless creation of data and model mashups that expose insight into the meaning of the data. The initial goal was largely unconstrained in its defini- tion and by design was allowed to grow and evolve as certain points of devel- opment were reached. At the time of conceiving the idea, a number of related projects had been recently completed by members of the core team. For example, at least three large scale geospatial projects had taken form, all of which we were re-

quired to build custom geospatial visualization. These projects, Peddl [37], Place Pulse [38], and Sourcemap [39] provide deep insights into the complexi- ties of design and implementation for custom made geospatial visualization where the datasets where both community driven and dynamically updating.

42 YOAhem 3 ffk" **wfd

I Want This I I i

~4

U + -

ice mapShare thi

Left: A view from the Peddl marketplace.Middle: A view from a Place Pulse visualization.Right: A view from Sourcemap.com

To begin, GeoSense was prototyped as a wireframe concept to assist with identifying the UI/UX foundation from which to build the service. These

early prototypes explored different arrangements of user interfaces that, if implemented, would serve as the app's foundation. Early wireframes bor- rowed a common design pattern found in applications such as Google Maps where the left most column of the screen, delegated for content related to the right column, taking up nearly two-thirds of the total real estate with a geovisualization.

Concept

The wireframe prototypes proposed three key features for the GeoSense plat- form: 1) a GUI with the map as the locus of interaction 2) a simple manage-

ment interface for adding and subtracting data and 3) layers of interactivity

atop the map object that expose features to the users. Some of these features were defined as the ability to comment on geospatial coordinates as well as building'if-this-then-that' [40] style queries around the active datasets.

43 An early illustrationdemonstrating the split, two column real estate, the ability to add data as well as a three-dimensionalglobal view.

44 Demonstration ofan early "if this, then that"geo-bound condition. This feature was later removed for the release version of GeoSense and further discussed in the Featured Work section

As is common in prototype design, a significant amount of time was spent on iterating the UI and UX in the form of a visual storyboard where any amount of development or system engineering would be postponed until the first "functional prototype".

Safecast worldmap (Vi)

Upon completion of the GeoSense wireframe prototype, a production version implementing the Safecast dataset underwent development. Understanding

that the application was going to be deployed periodically to a large user base, the development of experimental features was put into a sandbox, forked

from the original repository, so that two instances of GeoSense could simulta-

neously exist: one for public viewing at http://blog.safecast.org/worldmap and another which would eventually become GeoSense.

With the Safecast worldmap, referred to as version one, only the

most fundamental features were developed while a small amount of visual

45 design and aesthetic polished was applied over the entire application. Initial features included the ability to show or hide the Safecast mobile dataset as well as a choropleth map of Japanese population averages per prefecture. Core features such as geospatial search, basic map controls, multiple map themes, and social sharing were also implemented. During this stage, the data being shown was populated from ten different

A view of the Safecast Worldmap showing data aggregationacross the island of Japan.

Google Fusion Tables, each of which held a aggregated granularity of data dependent on zoom level. The tables were mapped to the user's zoom level within the application such that as the user clicked "zoom in" or "zoom out"

the tables would be queried to render the respective height (loom, 1000m, 1000m, and so on). Each table contained specific KML data which defined a 4

point geographic box. The benefit of rendering tiles from a dedicated map

server became immediately obvious, as the amount of client-side computa- tion involved in displaying 10,000+ data points in a single view outmatched

the capabilities of a Javascript based approach. Further explanation and justi-

46 fication for and against the use of tile servers is explored further in the Tech- nical Design section.

IsV/ 0.229

aeag 76.481

A closer view of the Fukushima area, showing the 20km evacuation radius and a finer data resolution

Zooming in on the Fukushima prefecture revealed the 20km evacuation zone,

as well as a higher granularity of data points. Clicking on any individual data point, or cluster, would reveal information such as CPM (counts per minute) and uSv/h (micro-Sievert per hour) per hour pertaining to that specific set of data. Version one of the Safecast Worldmap was live between February

15th, 2012, and May 11th, 2012, when it received more than 10,000 unique visitors. Significant feedback was received both from the Safecast and public

community. The general sentiment was that the Worldmap was impressively simple, easy to comprehend, and a step in the right direction.

47 Generalizing the platform (V2)

The second iteration of the platform, referred to as version two, began with a complete rewrite of the application structure as will be outlined in the Tech- nical Design chapter. Version one had been built as a standalone application, more akin to an advanced prototype functional enough to garner interest and insight, but without the fundamental framework required for additional fea- tures. With a number of new members joining the development, version two quickly took on a much more structured framework with specific focus to- wards speed to development. V2 takes a step back from Safecast, and a step towards generality. Rather than build features specifically pertaining to the radiation dataset, ef- forts were spent building a platform that would expand the simplistic power of the Safecast Worldmap to any and all users who had their own types of geospatial data.

ICofirm and Add

The current user interfacefor reviewing recently added data. Users are given the ability to select which columns represent the necessary attributesof location and intensity

48 USER INTERFACE FOR DATA MANAGEMENT

Many of the design and engineering cycles during version two were put into the process of a seamless experience for users to add their own data. In order to do so, a workflow had to be developed which would assist users in prepar- ing their data such that it could be understood and interpreted by our system. To do so, an "add data" wizard was developed, where users were in- structed to attach a datafile either through uploading from their file system or by URL link. Once the data had been received by GeoSense, it was parsed and display back to the user as a table of columns and rows. To identify the data headers, the user is instructed to drag and drop labels onto the columns. Properly imported datasets are represented in the system on the left column

Comments 4 Close ULbrary Comments

Nuclear Reactors (29) Add New Data Drag and drop rghr onto your map Earthouakes M8771 U Browse Data Library

Safecast Visible Hidden

Display Rat map 3D Globe SingleColor ColorScale Earthquakes

Theme Dark Ught Standard Nucear Reactors

Nuclear Accidents p xels circles

Remove Save and Update Data

Add New Data

Browse Data Library

The data managementpanel. Showingfrom left to right - initial view, data library browser,and ex- panded controlsfor added data sources

where they are shown inline with additional data sources. This visual group, or "accordion" component, allows users to easily manage, edit, or remove the current data sources.

49 An important advancement during V2 was the introduction of data models that existed separately from the visual representation. All data added into the system is pushed into a remote database where it stored and made accessible through a public API. While this ultimately means that all data in GeoSense could be re-visualized elsewhere by a third party, it also means that the appli- cation can easily iterate through different types or methods of visualizations. V2 began exploring this by introducing a modal switch which toggles between "Flat Map" and "3D Globe" respectively Clicking the toggle changes the dis- play type and automatically rebinds the active data models as appropriate for each visualization.

GeoSense (V.3)

The third version of the product brings the first actual instance of a "Geo- Sense" in its entirety. Wrapped with an additional layer of instruction and messaging, GeoSense becomes less an experimental application and more a widely available consumer service website.

SCREAITAMAP LOAD YOUR QATA SHARE AND DISCUSS

AddYour Awesom D.

The landingpage for geo.media.mit.edu, inviting users to create a map by enteringa name and clicking the prominent create button 50 Left: CoastalJapan showing earthquakes,nuclear reactors,and coastalflooding.Right: A view of asia showing earthquakesalongside a time series graph.Bottom: A view of the webGL 3D globe

V3 features a homepage instructing users how to create their own mapping

sandbox. The homepage also features a number of community insight tools

such as "recently created maps" and "recently added datasets".Currently, all data stored on GeoSense is made publicly available. As well, maps created on Geo- Sense are publicly viewable though only users with a special admin URL are able to manipulate or add associated data.

51 SPATIAL COMMENTS AND CHAT

The release version of GeoSense also incorporates a number of critical fea- tures which add to the value of community input and collaborate around spe- cific maps. A simple UI feature for leaving geotagged comments or comment- ing directly on a data point is provided. This familiar interface, akin to leaving a comment on Youtube or Facebook, invites users to leave annotations in di- rect spatial context. Similarly, a set of "on/off' toggles allow users to see the physical location of users currently viewing their map as well as the geo loca- tions of where all past contributors and editors have been. During this phase, GeoSense underwent a complete API overhaul from basic restructuring of naming conventions to complete refactoring of routes. The API was generalized and cleaned to improve workflow for the de- velopment team as well as to prepare for community wide usage. Methods for data uploading, parsing, and aggregation were greatly enhanced during ver-

Two users converse about the safety levels of the Safecast radiationdataset in reference to their resi- dences. Comment bubbles on the map create links to andfrom the chat window which user's may use to specify a specific geo coordinate

52 sion three, which will be more fully detailed in the Technical Design chapter. GeoSense version three, undergoing active development at the time of writing this, will serve as the platform from which the project will continue to evolve and also mirrors the state of recent releases, posted at GitHub (http://github.com/tonydevincenzi/geo).

CONTINUED: BEYOND THE SCREEN

An obvious benefit of developing for- web accessibility is the vast number of devices that can access the full range of the application. To test extensibility, we developed an iPad application that, with a simple wrapper around webkit, allows for full functionality on an iPad tablet device. To compliment the form factor and push the boundaries on how to present the project in situ at the MIT Media Lab, the GeoSense team developed a suite of technologies to transform the entire platform into an experimental augmented reality instal- lation. Featuring a full sized physical globe, users are given tablet devices as instruments to explore data on and around the tangible earth. Moving the globe rotates the data accordingly, as does moving the tablet device around the space. This exciting exploration creates questions about how to best rep- resent virtual geospatial data tethered to a physical object, and what other user interaction scenarios may emerge in the future.

53 54 Technical Design

The GeoSense technical implementation is best described by outlining the underlying frameworks for the server, app, and web service respectively. A large number of framework services have been employed, iterated on, re- moved, and revised during the development of GeoSense. The current tech- nology stack is by no means the most practical or scalable implementation,

55 but is perhaps most fitting as it is built entirely atop open source platforms whose ethos align with the goal and aim of GeoSense and Safecast.

Server structure

AMAZON EC2

The GeoSense web service, code named "Satellite", is hosted on an Amazon EC2 instance server. Amazon EC2 was chosen for its ability to scale to meet increased load demands as the service grows in size. It is also heavily adopted and well documented by the contemporary web development community.

UBUNTU

The server runs an instance of Ubuntu Linux, a Unix based operating system, that has a thriving community of developers who have documented the many ways of "rolling a server" to your own specifications, much like Amazon EC2.

Satellite can run on any unix based operating system and is completely man- aged and deployed through terminal configuration.

Satellite & satellite API

ARCHITECTURE

NODE

The satellite web server is a node.js based application. Node.js is a javascript framework for writing scalable internet applications, most commonly for web servers [41]. Node uses an event driven Asynchronous I/O for improved scal- ability and reduced infrastructure overhead. Unlike the majority of Javascript based programs, it is executed 'server side', the benefit of which is a close coupling of language and method between server-side and client-side render- ing. In the case of GeoSense, this was a obvious benefit as a number of the

56 applications features mix-server and client-side rendering techniques. Node comes coupled with Node Package Modules, which is a stand alone manager for installing a community curated collection of "modules" that extend the basic functionality of Node. GeoSense uses the following ma- jor NPM packages:

EXPRESS / CONNECT

A fast, and small server-side JavaScript web development framework with features including routing, session support, cookie handling, and logging.

MONGOOSE

An object modeling tool designed to work in an asynchronous environment, making integration with MongoDB extremely pleasant and straight forward.

NOWJS

An implementation of web sockets (via socket.io) and node-proxy libraries for real-time communication for live updates between users.

GEOSENSE DATABASE

MONGODB, GIS

For data storage and management, MongoDB [42] (from humongous) is used as the central data repository. Mongo is a NoSQL database, meaning that it stores structure as a JSON-like document with dynamic schemas. Table-free database architectures are known to be more efficient in terms of speed and efficiency for certain types of applications. Mongo includes a number of crucial libraries referred to as "Mon- goGIS" that are optimized for geospatial data operations. These operations are central to data storage and retrieval within GeoSense. For example, Mongo makes easy the ability to index and quickly return search results for complex queries such as "average the 500,000 points closest to my location where value is never higher than 5".

57 Data import

Importing data is handled by the server once the client has specified and up- loaded a suitable datatype. GeoSense currently supports XML, JSON, and CSV datatypes. Once a file has been posted to the server, it is put through a process which cleans and standardizes the import. Each line of the data source is read in linear order, where each column or property is then transformed into a field within our associative MongoDB collection. Original conversations of the uploaded data are kept as a collection prefaced with o_ in the active data- base. As the document is being parsed, transformed fields are asynchronously dumped into a master collection that houses all uploaded points within Geo- Sense. Their unique _id is retained and used to associate the individual field with its parent collection. Attributes unique to the dataset, such as title, de- fault color, created by, and modified date are stored in an associative collec- tion where the _id attribute is used as a linkage identifier. Data import and parsing happens asynchronously once the user uploads their first dataset. The time remaining is indicated to the user in the GUI by showing the estimated time remaining on the data conversation. Once the data is properly converted and stored, it is drawn into the user's current viewport.

Aggregation and reduction through MapReduce

For datasets exceeding a certain number of fields (arbitrarily ~1,000) an ag- gregation process is executed to greatly increase the performance of the data for both the client and server. To accomplish this, we create sub collections of the dataset, each containing a reduced aggregate as a function of zoom level. We currently support reductions for 15 discrete zoom levels as well as tempo- ral reductions that host only the time series for each dataset reduced into days, weeks, months, and years in accordance with the zoom level aggregate.

58 To accomplish this, we employ a technique referred to as "MapRe- duce". Traditionally, MapReduce is a framework for distributing the process- ing of huge datasets across a large number of nodes. In the case of GeoSense and the GIS libraries for MongoDB, it is a tool for batch processing data and aggregation operations.

Spatial indexing and grid queries

As described in the previous Design Theory chapter, all data stored and dis- played within GeoSense is subject to a mesh grid. This grid, mesh, or lattice, serves the dual functions of one, reducing the amount of visual complexity for the user and two, standardizing and reducing the amount of computational processing for the client and server. For example, at a global zoom level show- ing all 180,000 earthquakes over a magnitude of 4.4 since 1973 would be both visually and computationally inefficient. Instead, occurrences are organized into micro clusters, fitted to the known geospatial grid, and displayed dy- namically in regards to zoom level and the bounding extremities of the user's viewport. This approach produces an optimized number of queries against a geospatial index. To create and manage these queries, the GeoSense applica- tion constructs the viewport grid in accordance with the aggregate collections generated explained in the previous section Aggregation and Reduction through MapReduce.

The following structuring logic was developed with and paraphrased from Walter Mendez (MIT EE/CS 2015) who contributed to the GeoSense project during the summer of Spring of 2012:

On constructingthe mesh grid -

The grid is managed by a set of ordered pairs, which are not created at ran- dom. They follow a geometric pattern that is based entirely on the physical

59 dimensions of the zoom level and the parameters of the viewport grid being generated.

The origin of of this coordinate system, or (xO 9yO) is placed at the lower left hand corner of the bounding area and as a result, a change in the horizontal direction and the vertical direction, x and y respectively, can be defined as the following:

- lengthZ0,, A = widthzoom lengthgrid widthgid

It hence follows that, given the zoom level's bounding corners, the

lower left being (xO,yO) and the upper right being (xf ,y1) any point in the grid could be reached by the following general formula:

r + lengthZ00M,y 0 +n widthzoom lengthgrid widthgrd where m is in the range of {O,...,lengthgrd}I and n is in the range of

{o,...,Widthgrd } .The geometric constraint when it comes to the bounds of the grid is then defined. When m and n are equal to their respective maxima:

length,_ widtho ++d xo+lengthgrid length ,yo + wdhgrid wi. thgrid) = f+length,,myo+widthzom=(x,,y, length,,rd width,,

Given MongoDB's geospatial indexing specifications, the database indexes the data using spatial coordinates (longitude, latitude). To create the boundaries of a grid, we specify a box by passing in a lower left hand corner and an upper right hand corner. Thus, for any given m and n in our grid, a bounded box would have as lower left and upper right corner respectively:

length width length width 1 x0 + m " ,yO + n zoom , x0 + (m + YO +(n+1) Z lengthgri widthgrid )I lengthgri widthgrid

60 This makes geometric sense. In order to get to the upper right hand corner of a box given the lower left hand corner, we need only add AX and A , as well as a single box side length and width, in each direction. Finally, each cell within the grid contains an array storing all the data points retrieved from the server, the number of points in said array, the minimum, the maximum, the average, and the center point of the respective container.

TEAMDATA DATABASE

POSTGIS

Data specific to Safecast is stored in a separate database, which operates out- side the server bounds of GeoSense. Safecast's dataset, which is referred to as teamdata, is stored within a PostGIS (Postre GIS) database and is subject to a different upload and management process than data added directly through GeoSense. Though the Safecast dataset is community driven, it's handled and monitored by a number of Safecast volunteers due to the critical nature of the data.

APPLICATION STRUCTURE

The map platform, which is the publicly visible portion of GeoSense, is a built fully in HTML5 and Javascript. The application is organized in a MVC (Model, View, Controller) framework using Backbone.js [43] that provides logical structuring of the application into a manageable development flow. The application is organized into the following structure:

VIEWS

The visual build is constructed through a simple templating engine that serves views based on the application state. These views vary from '2D map view' to '3D map view' and 'About GeoSense' view Each view is an individual module that contains a linked HTML and CSS file for format and styling.

61 MODELS

Models are used to define the parameters around how individual pieces of data are handled within the GeoSense application. For example, the most common model is 'point', which refers to a singular point of data containing a latitude and longitude coordinate. Each point may differ from the last, both in lat/lon and in additional values (intensity, date added, etc).

COLLECTIONS

Collections are bundles of models that exist together under the umbrella of parent properties. For example, a million points (taken from the point model) may make up the collection 'air pollution' that then has its own properties independent from the individual models themselves. Collections, as contain- ers of models, are bound to views within the application.

EXTERNAL LIBRARIES

A number of widely adopted external libraries are used as part of the Geo- Sense application. Listed below are their titles and basic operation:

TWITTER BOOTSTRAP

Twitter's bootstrap framework is used underneath the application to provide easy access to commonly used design patterns such as headers, footers, but- ton types, forms, modal windows, and more. Bootstrap is a welcome addi- tional to the technology stack as it reduces the vast amount of time- consuming work by replicating expected behaviors of a web app. It is, in gen- eral, a fantastic boiler plate for starting a new application. However, precau- tions have to be taken to ensure that the ubiquitous "look and feel" of Boot- strap does not overtake the application. To do so, nearly all the default styles provided are restyled or adjusted.

62 JQUERY/J QUERY UI

Jquery, a javascript framework library for accessing and manipulating the DOM (Document Object Model) of the application is fundamental to any Javascript based application. Jquery UI is a simple extension of Jquery that appropriates certain features such as "drag and drop", which may be only necessary in certain applications.

THREE.JS

Three.js is a javascript library that wraps a basic render model around the OpenGL based WebGL. Three.js simplifies access to WebGL and is instru- mental in Geo's ability to display data in the third dimension.

OPENLAYERS

OpenLayers is an open source library for displaying and manipulating map data. It is built entirely in Javascript, and provides an API for constructing interactive map applications. GeoSense uses OpenLayers as the rendering engine for two-dimensional maps and has heavily extended the canvas ren- dering class to support features unique to GeoSense.

This list covers the most fundamental libraries but is not exhaustive. For more information regarding the current state of the GeoSense library ar- rangement visit the project on Github (http://www.github.com/tonydevincenzi/geo)

63 64 Challenges

Data purity

Because GeoSense does not offer itself as a source of data but rather a source for data observation, there are certain precautions towards allowing the community to generate and share data sources. For example, erroneous data may be inserted into the system by any user and then replicated by future us- ers. Rather than try and detect bad data, or even offer tools to report such in- cidents, GeoSense takes the position that it offers nothing but the platform and that all data within the platform is community generated. In the case of Safecast, the data is stored in the teamdata database, which is part of the Safecast repository. GeoSense has integrated bespoke hooks for the teamdata dataset, but only in a manner that is available at safecast.org. Therefore, for all intents and purposes, the data available at http://geo.media.mit.edu is community generated and not explicitly endorsed by the platform. This is made clear in the GeoSense terms and conditions, which are available online. Data comes in many shapes and sizes. An ongoing challenge is con- tinuing the development of upload compatibility from within the add data wizard. To date, GeoSense requires that the user specify at least three crucial

65 columns for every uploaded dataset: latitude, longitude, and intensity. Ideally, a lightweight algorithm could handle the majority of the guesswork involved in specifying these columns as the names held within header rows of geospa- tial data are often similar (i.e., lat or latitude). Finally, certain considerations are taken when choosing how to han- dle a maximum file size for user upload. For instance, it is computationally expensive to upload and parse through a file the size of the Safecast dataset, which at time of writing is over 3 Million entry points housed in a 50mb CSV file. GeoSense currently limits the file size upload to 20mb, which can still easily cover more than one to two million entries in a well managed docu- ment. Increasing this capacity would require significant server enhancements and storage capacity, coming at significant cost.

Performance

When attempting to process and visualize large amounts of data, perform- ance issues are one of the first hurdles to overcome. Rendering millions of live data points requires a dynamic relationship between the rendering en- gine (front end) and data server (back end). In its current build, Satellite, the GeoSense web service, aggregates and returns data from the back end based on the specifications requested by the front end. Because the data within the GeoSense application is handled separately from the visualizations, it is easy to adjust the requests based on the currently application state. This is most evident in the scenario of rendering to the flat map, where we begin to expe- rience extreme performance loss when more than -20,000 individual objects are being rendered. Conversely, it is much easier to render large amounts of data through the webGL pipe, which is utilized by the 3D globe display type. Be- cause webGL has access to the video card's GPU, the majority of display logic can be pushed off the CPU, which is the general bottleneck for JavaScript- heavy applications. Future versions of GeoSense may implement a custom tile server,

66 similar to how Google Fusion Maps are rendered, which in turn would allevi- ate the constraints of rendering data points into the map tiles. Tile servers are, at this time, complex and expensive to manage. New services such as MapBox have begun to innovate with products like TileMill, though the in- fancy of the software comes with too many limitations for it to be used by GeoSense.

Scale

As GeoSense begins to grow in users and scope, scale becomes a prevalent issue. In its current state, scale is handled by basic load balancing and an elas- tic instance through Amazon EC2 [44]. GeoSense has been carefully designed to handle a magnitude of scale, though the costs of operation would scale in parallel. Future funding will be required to keep the service running if ex- treme growth is experienced.

Custom instances

As GeoSense continues to grow, the community may want to create their own instance of the platform on a different server. Because it is open source, the entirety of the project can be downloaded and installed via the public GitHub repository. This creates complexity when trying to develop GeoSense for both Safecast as well as community usage. Because of this, there may be ongoing branches of the GeoSense project that are specific to a certain instance of the project, Safecast in this example, and would differ in certain features from the instance hosted at http://geo.media.mit.edu. This fragmentation can cause complications when developing new futures, as it requires that all custom or branched features are forward compatible with changes to the master reposi- tory To avoid further complication, GeoSense will only "officially support" development of the master repository and specific derivatives that are gener- ated by the core team.

67 68 Use Cases

GeoSense has been evaluated against a number of different usage scenarios whose interests and datasets differ greatly. In order to prove the versatility of the system, it was crucial to select example maps and users whose feedback would differ based on their individual needs. Our tool's true power is demon- strated through how we observe the community using it to tell stories; the narratives developed within GeoSense exceeded our original intent and ex- pectations. The following case studies were conducted with the GeoSense platform:

SAFECAST

The first and most obvious usage scenario is Safecast, whose dataset was the spark behind the development of GeoSense. With over 10,000 active viewers through the development of GeoSense V3, Safecast has been the primary driver behind feature-set development. For the first time, the Safecast dataset was fully visible as a perfect mirror of its current state in the teamdata data- base: there were no intermediary hand-built aggregates or reductions as was previously the case.

69 SAFECAST H

An image of GeoSense for Safecast showing a coastal area ofJapan featuring: Radiation levels (green to pink), coastalflood zones (red coast), nuclear reactors (red dot), and earthquakes(blue)

For our usage scenario, the Safecast dataset was combined with historical

earthquake data, nuclear power reactors, and nuclear power plants with re- ported INES (International Nuclear Events Scale) incidents as well as model-

generated coastal flooding models from the 3/11 earthquake and ensuing tsu- nami. The selective choice of data layering was done to not only tell an im-

portant story, but open the stage for discussion: common questions such as "where should I consider building a house?", "Is my child's school playground safe from radiation?", and "What areas are at high risk for similar catastro- phe?" have been asked and addressed. By allowing the community to discuss data placed in context, the back-and-forth of email news groups and repeti- tive question & answer has been reduced. Much like the ancient stone mark-

ers found in coastal Japan warning the inhabitants of tsunamis, GeoSense of- fers not only a view into the past but a glimpse into the future where indi- viduals and communities alike can make concise, informed decisions.

70 SOURCEMAP

Sourcemap.com is the open directory of supply chains and environmental footprints. Consumers use the site to learn about where products come from, what they're made of, and how they impact people and the environment. Companies use Sourcemap to communicate transparently with consumers and to tell the story of how products are made. [39] The GeoSense team is working closely with CEO Leonardo Bonanni of Sourcemap on finding ways to explore the causal relationships between , cultural, and ecological data in conjunction with product supply chains. We have begun by exploring the relationship between North Ameri- can farm location, food distribution patterns, global warming, and population density. When properly visualized, new insights related to operational risk factors and supply chain optimization have arisen.

THE LACE RACE

The Lace Race is an ongoing global game developed by a team of artists and researchers from the MIT ACT, Media Lab, CSAIL and Department of Archi- tecture. It debuted at the Reykjavik Arts Festival in Reykjavik, Iceland. The

Lace Race game is simple: participants are given a single shoe lace with a unique identifier number. Each participant is then encouraged to continually trade his or her shoelace(s) with strangers or other participants. Per each en- counter, the exchanging user is encouraged to tweet in the following format "#LaceRace 123 location" where "#LaceRace" refers to the game's hash tag,

"123" the unique identifier, and "location" to the physical location of the ex- change. GeoSense was then used to watch the Twitter hashtag #LaceRace and produce a realtime map of all ongoing Lace Race activity.

Users are also encouraged to use the geo-tagged comment system to leave annotation on their exchange, where they saw specific laces or even to hunt down specific numbers as a source of information exchange.

71 Results

As of writing, GeoSense has encountered more than 10,000 users through

Safecast alone. It was demonstrated to over 400 visitors and broadcast to thousands during the 2012 spring MIT Media Lab Member's week. Many par- ties were interested in using GeoSense as a new way to decode their own,

cryptic data. Specific interest was shown by members of the National Wildlife Federation in regards to better understanding the social, economic, and envi- ronmental impact of seasonal fires; we anticipate many future partnerships. Thanks to Safecast, a constant stream of users encounters GeoSense

for mission-critical usage regarding the radiation dataset. Results so far are positive, and optimistic, but we realize only the surface has been scratched and will continue to feverishly develop GeoSense until it reaches its full po- tential.

72 73 Future Work

GeoSense is an ongoing ever evolving project. Because it is open source and serves as the visualization platform for Safecast's future work, it will always be defined not only by the experimental directions we hope to take but also by features that best suit the needs of the active user base. Hundreds of po- tential directions have been discussed, of them these are some of the most pressing:

Tile servers

As previously described in Technical Design and Challenges, technical limita- tions are quickly met when attempting to handle and visualize large and dense sets of data. The most efficient methods remains to be one of the old- est, to render all of the data as part of the map tile on the server itself. Geo- Sense currently renders visual information into the canvas layer client- side and displays it as an overlay atop a pre-generated map tile. To date, we have reached an efficiency that challenges the performance of even a dedicated tile server, however older machines and mobile users may find the experience slower and in some cases, completely broken.

74 Expanded visualization types

With a robust method for handling large data sets and a community of active users, GeoSense is in a prime position to iterate and experiment with new types of visualizations. We imagine there to be a well of opportunity in ex- ploring information visualization beyond geovisualization. We hope to work towards finding new and expressive visual explanations of a dataset's poten- tial meaning.

Models & mechanistic explanations

As is started to be explored by the introduction of time series graphs and pre- generated model overlays, the idea of allowing for user-specified models to cast against their dataset is compelling. We imagine that once a set of data is represented in GeoSense, a number of conditions can be applied against it. These conditions are infinite but we are currently exploring falloff decay, pa- rameters for attraction and deflection, as well as movement and inertia. Ulti- mately, a suite of tools could be developed to allow users, or communities of users, to develop models towards understanding the meaning or future im- pact of their geovisualization.

Boolean conditions and spatially bound alerts

Part in parcel of the original GeoSense proposal was to invite individual users to create geo-fenced conditional alerts atop their geovisualization. This inter- face will allow users to specify an "if-this-then-that" problem statement where if certain criteria is met, a series of specified outcomes will execute. A situa- tional example of this would be:

"If radiationsover 500CPM is reported within 5KM of my home then email me a notice".

75 This feature was deprecated in the current build of GeoSense as, during de- velopment, it was found to be less crucial than a stable infrastructure of geo- spatial commenting and live chat amongst current users. We are looking to reevaluate the importance of boolean conditions and spatial alerts in the coming months.

76 77 Conclusion

GeoSense liberates the author, viewer, and data. It proposes that design may be used as a lens to enhance human understanding and promote imagination - that provocative discoveries can be uncovered through intent and serendip- ity alike. We have demonstrated how, through the juxtaposition of visual lan- guage and observational analysis, insightful narratives can be discovered; leading a community of individuals to generate hypotheses around the cau- sality of data and worldly events. With geovisualization comes many complexities. Daunting they may be, their very presence also provides inherent value; to be massively complex is both boon and bane. To explore, to probe at, and to liberate lifeless tabu- lated data into instructive, insightful, and human readable information is a prelude to an even larger effort. We have explored the visual marriage of time and space, where both parameters are tuned and tweaked to provide the viewer with insights that were once locked away within spreadsheets. We have also begun expanding the known vocabulary of geovisualization for the digital age, where each pixel can have tremendous meaning and consequence; devising a representational taxonomy that serves both form and function. Finally, we have seen the need for, and positive response to, com-

78 munity tools for building dialog and sharing intelligence. GeoSense has opened the doors for both thought and voice, where the user plays the role of designer, scientist, analyst, and philosopher. Our accomplishment is an impor- tant first step, but is it only that - the first step. To answer the harder ques- tions, to gaze into the future, we must first have a tool to see into the past and into the now; with GeoSense we may begin this process with massive data as our vessel, assembled by and for a community of open minds and thinkers.

79 80 References

[1] Safecast, "Safecast," blog.safecast.org. [Online]. Available: http://blog.safecast.org/. [Accessed: 27-Apr.-2012].

[2] J. Mackinlay, S. K. Card, and B. Shneidermann, Reading in Informa-

tion Visualization: Using Vision to Think. Morgan Kaumann Publish-

ers, 1999.

[3] MobiThinking, "Global mobile statistics 2012," mobithinkingcom.

[Online]. Available:

http://mobithinking.com/mobile-marketing-tools/latest-mobile-stats. [Accessed: 27-Apr.-2012].

[4] Geospatial Today. [Online]. Available: http://geospatialtoday.com. [Ac- cessed: 27-Apr.-2012].

[5] E B. Viegas, M. Wattenberg, E van Ham, J. Kriss, and M. McKeon,

"Many Eyes: A Site for Visualization at Internet Scale," pp. 1-8, Aug. 2007.

[6] E. R. Tufte, Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press, 1997, p. 156. [7] stamencom. [Online]. Available: http://stamen.com. [Accessed: 27-Apr.-2012].

[8] "We feel fine and searching the emotional web," presented at the Pro- ceedings of the fourth ACM international conference on Web search and data mining, New York, NY, USA, 2011, pp. 117-126.

[9] Google, "," google.com. [Online]. Available:

http://www.google.com/earth/index.html. [Accessed: 27-Apr.-2012].

[10] NASA, "World Wind JAVA SDK," worldwind.arc.nasa.gov, 18-Jul.-2011. [Online]. Available: http://worldwind.arc.nasa.gov/java/. [Accessed: 27-Apr.-2012].

[11] NSIDC, "View NSIDC Data on Virtual Globes: Google Earth," nsidc.org. [Online]. Available: http://nsidc.org/data/virtual-globes/. [Accessed: 27-Apr.-2012].

[12] unhcr.org. [Online]. Available: http://www.unhcr.org. [Accessed: 27-Apr.-2012].

[13] ArcGIS, "ArcGIS Online," arcgis.com. [Online]. Available: http://www.arcgis.com/home/. [Accessed: 27-Apr.-2012].

[14] ESRI, "Maplt - Create Interactive Business Maps | Map SQL Server & Excel Data," esri.com. [Online]. Available:

http://www.esri.com/software/mapit/index.html. [Accessed: 27-Apr.-2012].

[15] Pachube, "The Internet of Things Real-Time Web Service and Appli- cations - Pachube," pachube.com. [Online]. Available: https://pachube.com/. [Accessed: 27-Apr.-2012].

[16] Ushahidi, "Ushahidi :: Home," ushahidi.com. [Online]. Available: http://www.ushahidi.com/. [Accessed: 27-Apr.-2012].

[17] "Automatic generation of tourist maps," ACM Trans. Graph., vol. 27, no. 3, pp. 100:1-100:11, 2008. [18] "Crowdsourcing graphical perception: using mechanical turk to assess visualization design," presented at the Proceedings of the 28th inter- national conference on Human factors in computing systems, New York, NY, USA, 2010, pp. 203-212.

[19] geocommons.com. [Online]. Available: http://geocommons.com. [Ac- cessed: 28-Apr.-2012].

[20] worldmap.harvard.edu. [Online]. Available: http://worldmap.harvard.edu. [Accessed: 28-Apr.-2012].

[21] MapBox, "MapBox I MapBox," mapbox.com. [Online]. Available:

http://mapbox.com/. [Accessed: 27-Apr.-2012].

[22] TileMill, "TileMill | MapBox," mapbox.com. [Online]. Available:

http://mapbox.com/tilemill/. [Accessed: 27-Apr.-2012].

[23] "Your place or mine?: visualization as a community component," pre-

sented at the Proceedings of the twenty-sixth annual SIGCHI confer- ence on Human factors in computing systems, New York, NY, USA, 2008, pp. 275-284.

[24] CAIS, "thquake Prediction, Japan," cais.gsi.go.jp. [Online]. Available:

http://cais.gsi.go.jp/YOCHIREN/activity/191/191.e.html. [Accessed: 27-Apr.-2012].

[25] CBS, "New USGS number puts Japan quake at 4th largest - CBS News," cbsnews.com, 14-Mar.-2011. [Online]. Available:

http://www.cbsnews.com/stories/2011/03/14/501364/main20043126.s html. [Accessed: 27-Apr.-2012].

[26] N. G. JP, "Damage Situation and Police Countermeasures associated with 2011Tohoku district - off the Pacific Ocean Earthquake,"

npa.go.jp, 25-Apr.-2012. [Online]. Available: http://www.npa.go.jp/archive/keibi/biki/higaijokyo-e.pdf. [Accessed: 27-Apr.-2012]. [27] NISA, "INES (the International Nuclear and Radiological Event

Scale) Rating on the Events in Fukushima Dai-ichi Nuclear Power Station by the Tohoku District," nisa.meti.go.jp. [Online]. Available:

http://www.nisa.meti.go.jp/english/files/en20110412-4.pdf. [Accessed: 27-Apr.-2012].

[28] I. B. Times, "Analysis: A month on, Japan nuclear crisis still scarring - International Business Times," ibtimes.co.in, 09-Apr.-2011. [Online]. Available:

http://www.ibtimes.co.in/articles/132391/20110409/japan-nuclear-cris is-radiation.htm. [Accessed: 27-Apr.-2012].

[29] L. Times, "Japan earthquake: Insurance cost for quake alone pegged at $35 billion, AIR says - Los Angeles Times," articles.latimes.com, 13-Mar.-2011. [Online]. Available: http://articles.latimes.com/2011/mar/13/world/la-fgw-japan-quake-ins urance-20110314. [Accessed: 27-Apr.-2012].

[30] Safecast, "Safecast Data Downloads," maps.safecast.org. [Online]. Available: http://maps.safecast.org/downloads/. [Accessed: 27-Apr.-2012].

[31] CC, "Creative Commons - CCO 1.0 Universal," creativecommons.org. [Online]. Available:

http://creativecommons.org/publicdomain/zero/1.o/. [Accessed: 27-Apr.-2012].

[32] TEPCO, "TEPCO: Status of Fukushima Daiichi and Fukushima Daini Nuclear Power Stations after great east japan earthquake," tepco.co.jp. [Online]. Available:

http://www.tepco.co.jp/en/nu/fukushima-np/index-e.html. [Accessed: 27-Apr.-2012].

[33] P. W Anderson, More and Different: Notes from a Thoughtful Cur- mudgeon, 1st ed. World Scientific Publishing Company, 2011, p. 424. [34] E. R. Tufte, Envisioning Information. Graphics Pr, 1990, p. 126.

[35] J. Albers, Search versus re-search. Trinity College Press, 1969, p. 85.

[36] E. Imhof, Cartographic Relief Presentation. ESRI Press, 2007, p. 388.

[37] peddl.com. [Online]. Available: https://peddl.com. [Accessed: 27-Apr.-2012].

[38] P. Pulse, "Place Pulse I The Collaborative Image of the City," pulse.media.mit.edu. [Online]. Available: http://pulse.media.mit.edu/. [Accessed: 27-Apr.-2012].

[39] sourcemap.com. [Online]. Available: http://sourcemap.com. [Accessed: 27-Apr.-2012].

[40] ifttt.com. [Online]. Available: http://ifttt.com. [Accessed: 27-Apr.-2012].

[41] ReadWriteHack, "Wait, What's Node.js Good for Again?," readwriteweb.com. [Online]. Available:

http://www.readwriteweb.com/hack/2011/O1/wait-whats-nodejs-good- for-aga.php. [Accessed: 27-Apr.-2012].

[42] mongodb.org. [Online]. Available: http://www.mongodb.org. [Accessed: 27-Apr.-2012].

[43] backbonejs.org. [Online]. Available: http://backbonejs.org. [Accessed: 27-Apr.-2012].

[44] C. Hidalgo, "Graphical Statistical Methods for the Representation of the Human Development Index and its Components."

Appendix

Tablet AR installation

GSPEAK BRIDGE

In order to translate coordinate position of both the iPad and physical globe, a translation bridge was developed and deployed as part of the GeoSense appli- cation. This bridge, written in Ruby acts as an interpreter between Oblong's Gspeak system and the GeoSense platform.

THE INTENT OF AN AUGMENTED REALITY APPLICATION

Paraphrasedfrom Samuel Luescher's 2012 projectproposal -

"As a tangible interface to this data, we propose a physical globe whose posi- tion and orientation in space the application is monitoring. When holding up a tablet to the globe, digital layers are superimposed on the camera image of the globe that is displayed on the tablet screen. By coupling the physical af- fordances of the object with an AR application for tablet computers, we ex- pect to tackle a number of usability problems that commonly occur with mapping applications. We explore possible interaction techniques when cou- pling tablets with the globe and using them for individual navigation around the geospatial data, subsequent decoupling of specific map views from the globe and the tablet, as well as using the globe as a master control for larger views."

Left: Samuel Luescher (front) and Anthony DeVincenzi (back) created a new map with GeoSense. Right:A view of the tablet AR installation