<<

Senseable City Lab :.:: Massachusetts Institute of Technology This paper might be a pre-copy-editing or a post-print author-produced .pdf of an article accepted for publication. For the definitive publisher-authenticated version, please refer directly to publishing house’s archive system

SENSEABLE CITY LAB BIG AND THE CITY From Origins to Destinations: The Past, Present and Future of Visualizing Flow

MATTHEW CLAUDEL, TILL NAGEL and CARLO RATTI

Flow maps are an established cartographic method to depict movements over time and space. In recent years, the exponential increase of geospatial – what we call urban ‘’ – has introduced new uses and highlighted the need to expand . In this paper, we defi ne existing strategies and tools, and examine their characteristics. From this, we identify challenges and opportunities for data-driven fl ow maps and suggest future developments. Specifi cally, we apply a new taxonomy to compare several geospatial data visualizations from the MIT Senseable City Lab and extract principles that can defi ne the capabilities of a new interactive fl ow mapping tool. We have begun to work on such a tool – called the Datacollider – that is public, powerful, intuitive, and scalable. In the lat er portion of this paper, we describe the Datacollider, detail its limitations, and outline directions for future development. We conclude by extrapolating broader trends for the fi eld of geospatial . We articulate a shift from visualization as a set of graphic tools for representing found insights, to visualization as a way of engaging with data and deriving knowledge.

Flow maps emerged during the early 1800s ’s number of troops, distance marched, as a new form of cartographic representation temperature, latitude and longitude, direction (Robinson, 1967). These introduced techniques of travel, and location with respect to time). for documenting geospatial data with time- Minard’s body of work also varying properties – the movements and includes documents showing the geographic transfers of such things as people, objects, extents of cattle raised to feed in 1858, goods and disease – as information overlaid worldwide immigration patterns, and global on a standard bird’s-eye view projection. wine exports from (Robinson, 1967). Pioneering graphic techniques at the time An even earlier visualization of traffic flows undertook legibility while negotiating many in the Pale of Dublin in 1837 produced by simultaneous data types. These documents Henry Harness for the British Army prior to represented a new tool, a geospatial infographic. the decision to build a railway in Ireland and Maps augmented with information layers later work on migration in the UK by Ernst introduced a rich spectrum of possibilities Georg Ravenstein in the late 1880s reveal beyond traditional cartography. that flow maps have always been popular was an early pioneer ways of picturing and communicating such of data-centric mapping, best known for his movements, despite the difficulties of doing iconic documentation of Napoleon Bona- so (Batty, 2015). parte’s march to Moscow (a that plots Minard produced documents – a novel

338 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS approach to information-augmented carto- niques has been developed to contend with graphy, certainly, but with the structure of a datasets, analytical tools and representation traditional map. In anticipation of the Paris platforms (Andrienko and Andrienko, 2012). World Exhibition of 1900, Elisée Reclus, a geo- Given the concentration of data associated grapher, and Patrick Geddes, an urban - with urban agglomerations (owing to popula- ner and sociologist, proposed an entirely new tion density and a higher rate of technology visualization device for geospatial data. Their adoption), as well as greater opportunities for giant Paris Globe project was intended as an application of results (geospatial educational tool capable of displaying global can readily inform policy at the metropolitan information (Meller, 1990). Decades later, during scale), many of these techniques focus on the the 1960s, Buckminster Fuller proposed a urban condition. It is in this domain that a similar geospatial infographic device, called line of our research at the MIT Senseable Geoscope. His globe, 200 ft (61 m) in diameter, City Lab has focused over the past 10 years. would be studded with coloured lights, Simply as a reference, we evaluate a sequence dynamically showing information in real of visualization projects from this time frame: time. If realized, Geoscope would have been able to accept a variety of datasets with geo- 1. Real Time Rome was among the fi rst spatial dimensions (such as transportation, explorations of aggregated data from cell economic activity or demographic shifts) and phones (through Telecom Italia’s LocHNESs display their planetary flows. ‘With the Geo- platform), buses, and taxis in Rome (Calabrese scope humanity would be able to recognize et al., 2011). Researchers sought a bet er under- formerly invisible patterns and thereby to standing of urban dynamics, specifi cally sur- forecast and plan in vastly greater magnitude rounding major events. Their analysis revealed than heretofore’ (Fuller, 1981). Both concepts the pulse of the city, demonstrating that proved too ambitious to be realized during collective emotion, action and interaction are their time. writ en in geospatial data. These conceptual projects anticipated the Internet, and the unprecedented amount of 2. Live Singapore brought together diverse data that is now generated at every moment. datasets, including weather, energy use, social The miniaturization of computing and the media, public and private transit and tele- ubiquity of telecommunications networks , to enrich a portrait of the city have added another layer to physical space, (Kloeckl et al., 2012). The resulting visualiza- enriching the physical world with a wealth tions have reached a broad audience not only of virtual bits. More and more material through scientifi c papers but also exhibitions movements – whether people, vehicles or in the city-state. goods – can now be tracked and tagged in real time: their aggregate sum is an un- 3. A Tale of Many Cities investigated one precedented ‘big data’. What Charles Joseph category of data – telecommunications – with Minard spent years painstakingly recording an in-depth analysis and comparison of urban with analogue tools can now be generated on areas around the globe, fi nding important a massive scale to describe planetary flows. In pat erns and characteristics through juxta- this article, we encounter and explore map- position (Grauwin et al., 2014). Crucially, it is ping in the hybrid digital-physical condition. visualized as a public web app for browsing The contemporary data deluge – in addi- data. tion to new tools for analysis and representa- tion – demands a reconsideration of practices These projects represent a development and applications for geospatial data visualiza- from the geospatial infographic to interactive tion. To date, a wide variety of new tech- geovisualizations. This is a move from

BUILT ENVIRONMENT VOL 42 NO 3 339 BIG DATA AND THE CITY animated maps to dynamic representations (MacEachren, 1994). Importantly, this will not of time-based data, in which a user can inter- sacrifice the extreme conditions, or ‘corners’ act by filtering content according to place, of the cube (specialists who require advanced time, dimension, characteristic, and more. The tools and laypersons who require simple, examples progress from narrative representa- intuitive functionality). We articulate our first tion of relationships or changes, to enabling attempt at such a tool, called the Datacollider. non-linear user exploration and engagement We detail the Datacollider’s structure and (a detailed review follows in ‘State of the Art’ function, its current limitations and our plans below). for its future development. In conclusion, we While interactive geovisualizations are a discuss the impact and subsequent research powerful tool, they are nonetheless tram- directions for the broader field of data-driven melled by significant constraints. State-of- geospatial flow mapping. We also include the-art remains limited in an Appendix which summarizes the key its technical capacity and in its accessibility. terminology related to visualization in this More specifically, visualizations are made domain. by highly trained experts, using prepared datasets, over a long period of time. Par- Context and Users ticularly in the urban context, a broad range of stakeholders stand to gain from geo- By defi nition, big data – in its raw form spatial infographics, not only technically – cannot be understood without analysis, capable geospatial scientists. Commuters, synthesis and context (Snijders et al., 2012). entrepreneurs, academics, government officials The term ‘big data’ was fi rst used in a 1997 and business leaders create data every day, paper (Cox and Ellsworth, 1997) to indicate and could potentially benefit from using the limit at which NASA scientists faced geospatial data tools for decision-making and computational challenges in the process of . visualization. The authors assert that ‘in the In this paper, we begin with an overview area of scientifi c visualization, input datasets of the state-of-the-art in the field of flow are often very large’, and propose a seg- mapping, including data structure, , mented memory system whose ‘techniques capacities and limitations, to establish clearly eff ectively reduce the total memory required the gaps in contemporary approaches. Next by visualization at run-time’. In the years we present several examples of geospatial since this initial characterization, ‘big data’ data visualization from the Senseable City has continued to be vexed by a number of Lab – focusing on the development of carto- limitations in the process of visualization. As graphic practice over time; we summarize datasets have become larger, computational the main contributions of selected projects, demands of processing for visualization have and discuss these in the context of the afore- only become greater. mentioned usage scenarios. We synthesize Urban agglomerations generate a tremend- our experience with the limitations we have ous amount of data and, furthermore, a identified in contemporary tools, in order to diverse set of stakeholders in the metro- inform a path forward. politan context demand synthesis and analy- Together, these factors suggest a clear sis to communicate insights. Particularly in vector: to put the tools of and the realm of so-called ‘smart cities’, analysis visualization into public hands, with a system and visualization are well understood as a that is powerful, intuitive, and scalable, at necessary corollary to sensors, urban opera- little to no cost – spanning the edge of Mac- tions and urban big data itself (Townsend, Eachren’s conceptual of the ‘carto- 2014). At stake is the challenge of turning raw graphy cube’ which we show in figure 1 data into information. A cadre of specialized

340 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS geographers, geographic information scientists, in the past decade the meanings of ‘tasks’, and data visualization experts has emerged ‘interactions’ and ‘users’ have changed with to produce complex geospatial data visuali- respect to a new and expanded digitally- zations, yet the demand for simple and direct literate audience. With the broad adoption ways of creating such documents outstrips of location-aware mobile devices, and the the capacity of the specialist coterie. everyday use of personal navigation applica- With the establishment of digital carto- tions, a large segment of the population is graphy and interactive geovisualization, a geospatial data-savvy, and more accustomed number of taxonometric schemes (e.g. Cramp- to interactive maps. As a result, there is a need ton, 2002; Roth, 2013) have been proposed to for new, simplified data management and systematically classify tools and visualization visualization tools (Andrienko et al., 2010). techniques. The cube schema by MacEachren While existing tools typically fit in one of the (MacEachren, 1994) describes a number of cube’s corners, opposed across the diagonal interrelated spectra: tasks (presenting knowns; axis, we extend the concept of cartography. revealing unknowns), interaction levels (pas- Rather than creating targeted tools that are sive; active), and audiences (public/laypeople; specialized for extremes of the cube’s axes, we specialist/experts) (see figure 1). An addi- propose an alternative goal: to deliver soft- tional axis through the middle of the cube ware that spans the entire edge, from lay- spans from Analysis to Synthesis to Presenta- person to specialist, in a way that is highly tion. MacEachran and Kraak (1997) further interactive and reveals unknowns. Existing elaborate the scope of Presentation, specify- solutions are based either on advanced soft- ing that the viewing audience may gain new ware development skills (for example, pro- insights that had only been ‘largely known gramming custom software), or on phrasing to the information designer’. Furthermore, complex queries in a coding language. Fur- presentation may be interactive while remain- thermore, as detailed below, a vast amount of ing ‘mostly’ on the Presenting Knowns urban data has inherent temporal properties extreme of the Task axis. that are difficult or impossible to visualize This is a useful classification tool, but with current tools.

Figure 1. Cartography cube (based on MacEachren 1994).

BUILT ENVIRONMENT VOL 42 NO 3 341 BIG DATA AND THE CITY

The State-of-the-Art provide special support for large or time- stamped datasets. Software such as Tableau, Qlikview or Omniscope have made a step Existing Tools forward in usability by introducing a graphical Geographic Information Systems (GIS) soft- interface to manage and process big data- ware is the industry standard for the majority bases, but they still rely on traditional visuali- of professional geospatial data processing. zation types. Of existing tools, the two most powerful are the proprietary ArcGIS software package A Taxonomy Matrix by ESRI (Environmental Systems Research Institute), and the open source software QGIS We propose a basic taxonomy of functionality (Quantum GIS). These allow users to import, for geospatial data visualization tools, based analyze, process and display geospatial data. on the MacEachren’s cube diagram. This The software accepts input data (both tabular provides a rubric for classifying existing tools and cartographic) and can commit a variety and identifying the advantages and draw- of computational processes on that data, both backs of diff erent approaches. We fi rst describe raster-based (map algebra) and vector-based the input (data), the development (processing (geoprocessing). This geospatial computation procedure), and the output (visualization) of is defi ned by calculations and user input each. Subsequent discussion is grounded in within the software, driven predominantly this taxonomic model. by internal processing and to a lesser extent by connection to existing programming Input (data): Data can come from many languages (e.g. Python for ArcGIS). diff erent sources. Some online tools allow Independent of GIS software, there are a users to edit or copy+paste data into a simple variety of programming languages (including text fi eld. Essentially all tools allow fi les to R, Python and Matlab) and libraries that allow be uploaded, with some providing input a wide range of operations, from special- from cloud services such as Google Drive ized geospatial analysis to data processing. or Dropbox. Each of these tools supports Using these tools, entirely new visualizations many and varied fi le formats, ranging from can be created, yielding results more varied structured text fi les such as CSV, XML, and than GIS software – but requiring significant JSON to standardized formats for geospatial technical expertise. To address this, some data such as KML, GeoJSON and Shapefi les. scripting environments – for example, D3 and Furthermore, tools can be connected to Processing – simplify analysis and visuali- external sources and accept data directly. zation, but both require coding skills and do Using a as an external source not offer data management functionalities. typically requires standard spatial queries, There exist software tools for geo- while using APIs (such as Twit er) involves visualization that are capable of importing more complex programming mechanisms. and displaying data in a variety of ways. This third group differs from GIS applications Development: Visualization tools can be in that they are generally designed to be classifi ed according to the way a user brings intuitive for a non-specialized audience, and raw data to visualization: textual program- from programming languages in that they are ming, visual programming, or a tailored user user-facing software. These range from gen- interface. This strongly defi nes the audience eral data tools (Tableau or Excel), to geo- that can interact with a given tool, according visualization applications (CartoDB or Map- to technical expertise. Tools amenable to non- box). CartoDB simplifies map-making through experts often guide users in choosing the an intuitive web application, but does not appropriate representation strategy.

342 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS

Output (visualization): While datasets can be employed to generate such visualizations are depicted in a variety of ways, some require either too complex for a broad user base other specifi c representations. A variety of visuali- than technical specialists, or lack appropriate zation types span this breadth of output. functionalities. Through this detailed exam- Users can engage in a range of interactions, ination of software and tools for the creation from observation of a single document of interactive geovisualizations, it is clear (Static), to watching a visualization change that the existing off ering only partly fulfi ls in time (Dynamic), to manipulating diff erent the variety of requirements and use cases aspects of the representation (Interactive): arising today. We will continue this with a discussion of projects by the Senseable City (a) Static graphical representation of geo- Lab as case studies, identifying their advan- spatial data: Export: (JPEG, PNG, TIFF). tages and drawbacks to arrive at a bet er understanding of how to structure a much (b) Dynamic representations take the form more comprehensive interactive geospatial of video, showing animated geospatial data. analyses and visualization tools. A subset of dynamic visualizations is custom- izable representations: these are not truly Geospatial Data Visualization at the MIT interactive but grant users a measure of Senseable City Lab control, usually over graphic characteristics: styles, labels, colours, or base-map. Dynamic, Since the Lab’s founding in 2005, its multi- non-interactive visualizations require the disciplinary group of researchers has de- user to structure the output as a sequential signed a variety of projects using geospatial narrative, to communicate with the end data. Working with vastly heterogeneous viewer. Export: (MP4, AVI, GIF). datasets, led by a broad spectrum of goals, the lab has produced many diff erent kinds (c) Interactive representations of geospatial of outputs. From this portfolio, we detail fi ve data are structured to dynamically query examples that showcase diff erent strategies the dataset based on viewer input. Common and encompass a chronological development. cartographic interaction types include pan, Drawing on our taxonomic framework, we zoom, select geographical objects, get details describe each project in terms of interaction, on demand. Some allow more advanced inter- visualization, task and audience, which actions such as automatically zooming and categorize the main features and objectives panning so that selected objects fi t in the of Senseable City Lab projects. view frame. Geospatial data commonly has temporal properties, placing importance on Geospatial Data Visualization time-based interactions: such as play, stop, zoom (in time) and identifying temporal Real Time Rome (2006) aimed to imagine and pat erns (rhythms). There are few standard- present the implications of ubiquitous con- ized platforms for presenting interactive nectivity in the urban environment. With tem- visualizations, and generally must be custom poral data from many diff erent sources, what built. Export: (web-app in a browser or a was, at the time, a new form of cartography app). synchronized to create a portrait of urban behaviour pat erns. Data were aggre- gated on the Localizing and Handling Network An Overview Event Systems (LocHNESs) platform, a system Overall, visualizations of large datasets with developed by Telecom Italia. temporal and spatial properties are useful in The project was presented at the 2006 a variety of domains, yet the tools that can be Venice Biennale of Architecture in the form

BUILT ENVIRONMENT VOL 42 NO 3 343 BIG DATA AND THE CITY of seven animated visualizations that showed visualizes mobile phone users as a 3D heat urban activity in Rome, including traffic map on top of a satellite map with the height congestion, human mobility, and the exact and density coding the amount of cellphone movements and locations of buses and taxis activity. In this way, areas with dense activity (using GPS). Analysis revealed consistent are revealed. The visualization is non- patterns of activity, as well as anomalies. interactive, and animates through one day, Mega-events such as the football World in this case, through the day of the Madonna Cup were expressed in the data, as the city’s concert in 2006 in Rome. population flooded through different districts The second animation in figure 3 combines to celebrate after the match. These maps show two datasets, showing mobile phone users and aggregated movements of mobile phone public transit vehicles. The density of mobile users (in figure 2) and public transportation phone users is visualized as a heat map over- (in figure 3). The raw data was analyzed and laid onto a satellite map, while buses are visualized by experts, resulting in carefully shown with their current position as white chosen animations to present a story and dots and trails. This visualization is non- answer specific pre-selected questions. interactive, and animates through a rep- The first animation in figure 2 shows resentative sample day. Real Time Rome sug- gatherings of people for special events. It gested a future in which the pioneering

Figure 2. Real Time Rome: Gatherings. Interaction: None (Animation). Visualization: 3D heat map showing mobile phone users.

344 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS

Figure 3. Real Time Rome: Connectivity. Interaction: None (Animation). Visualization: showing bus routes; Heat map showing mobile phone users. Task: Presenting knowns: is public transportation where the people are? Audience: Public. methodologies of urban science and the tools marry these approaches in order to create a of big data analytics allow officials to target unifi ed data platform where policy-makers the inefficiencies of urban systems and open and citizens alike can easily create interactive the way to a more sustainable future. apps visualizing diverse datasets (see ‘Syn- thesis: Powerful and Public Visualization’ below). This aims at empowering citizens and Linking and Comparing Datasets offi cials to explore and understand their own Live Singapore is a multi-year research initia- data, ultimately to gain insights into Singa- tive that aims to give diff erent stakeholders pore’s complex urban system. in Singapore real-time access to information about their city. The goals of this initiative Raining Taxis shows how weather aff ects taxi are manifold, ranging from acquiring and usage on the island. This visualization acts as analyzing data, to creating urban data visuali- an example of linking datasets in new ways zations as prototypes, to exhibiting urban in order to investigate a local phenomenon. demos to communicate to stakeholders the Users can rotate the map to orient the 2.5D value of visualization and to realize that view and move over the to set the value. In a later stage, a major objective is to time of day for the animation. The histogram

BUILT ENVIRONMENT VOL 42 NO 3 345 BIG DATA AND THE CITY

Figure 4. Live Singapore: Raining Taxis. Interaction: Rotate the map; move in time. Visualization: Histogram showing amount of rain over time; dot density showing trip starts; fl ow map showing free and busy taxi routes; grid map showing rain density. Task: Presenting knowns: More rain results in less free taxis.

shows the amount of rain over time as a bar The system provides multiple perspectives . On the visually reduced base map of of the data and consists of three interactive Singapore, white dots show the start locations visualization modes conveying chrono-spatial of taxi trips, thus giving an impression of pat erns as map, arc view, and time-series, as taxi use distribution. A fl ow map depicts the shown in fi gure 5. origin, path and destination of each taxi, with In the larger process of working towards bolder lines showing free taxis, and fi ner lines an informed discourse on smart cities, we showing on-call taxi trips. Over the map, a believe that reaching out to both experts and second layer shows rain density in light blue citizens with a tool that facilitates exploring as a gridded map of vertical bars. data, gaining knowledge of urban activity and soliciting feedback is an important next Touching Transport is a multitouch visuali- step. One way of achieving that goal is to zation of the public transit network in Singa- exhibit publicly visualizations of urban data pore (Nagel et al., 2014). It supports the explora- in a demonstration, with an aim of providing tion and understanding of complex tempo- visual and interactive access to large urban spatial data for experts and non-experts. datasets, engaging public and industry part-

346 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS

Figure 5. Touching Transport. Interaction: Zoom and pan the map; move in time; select time range (aggregate data); switch views (diff erent perspectives); select bus line + direction (fi lter data); details on demand (fi lter data). Visualization: Glyphs showing boarding and alighting pax per station; glyphs on a map showing spatio-temporal relations (cluster, areas); time-series with glyphs showing spatio-temporal relations (at a glance); arc diagram showing fl ow of pax (OD, clusters); histogram showing pax over time. Task: Presenting knowns: revealing unknowns: personally relevant insights. Audience: Public: laypeople and experts. ners, and collecting feedback and ideas from user to browse data (calls, SMS, data traffi c) users. Within Live Singapore, a variety of urban with high spatial granularity (districts, neigh- demos have been created. bourhoods) over time (days, weeks) as shown in fi gure 6. The project was conceptualized with two Evaluation and Comparison Across Cities main aims, specific to two intended user An analysis carried out during 2014 in part- groups. nership with the Ericsson Company (a multi- national digital services provider) looked 1. Providing the researcher with an easy- specifi cally at telecommunications pat erns to-use tool for quick, intuitive visualizations in cities around the world. The fi ndings were of spatio-temporal data (in this case, mobile presented as an interactive web application, phone usage data, a source that is increas- titled Many Cities, and fi rst presented at the ingly used by researchers for many purposes, New Cities Foundation Summit on 18 June from analysis of human mobility to demo- 2014. This tool maps mobile phone usage graphic and economic insights). Cities on data in cities around the world (London, New diff erent continents have characteristic and York, Hong Kong, Los Angeles), allowing the distinguishable activity pat erns due to local

BUILT ENVIRONMENT VOL 42 NO 3 347 BIG DATA AND THE CITY

Figure 6. The Many Cities web application. Interaction: Select city (fi lter data); switch views (diff erent perspectives); select city districts (fi lter and compare data); details on demand (fi lter data); select time range (aggregate data); share views and save comments (communicate). Visualization: Choropleth map; cluster map showing spatial clusters of similar activities; time-series of mobile phone activities. economic-cultural conditions (e.g. browse (not manipulate) data, such that they teenagers sending more SMS in the evening can find and explore their own insights. For than elsewhere in the world), and the fi nan- example, citizens may compare the typical cial centres of cities share a common pat ern patterns in the area they live with those of that is strikingly similar in all cities. As such, other areas in the same city or in cities around mobile phone data suggests that globalization the world. Most uniquely for a browser of (in the sense of uniformization) exists mainly this type is the capacity to export and share in the business and fi nance sector. any given session. Users can save, share, annotate and compare individual ‘explora- 2. Providing the citizen with a user-friendly tions’. A rich and expanding set of inter- tool that provides a bet er understanding of pretations reveals how users – from researchers individual and collective urban dynamics. to laypeople – make sense of complex datasets. For example, specifi c events (concerts, storms, exhibitions, holidays) can be directly identi- Synthesis: fi ed by deviations from the average mobile Powerful and Public Visualization phone usage pat ern. As a response to the barriers of technical Many Cities is a visualization tool for analyz- specialization, comprehensive utility and ing data. Yet the data has been managed or limited accessibility identifi ed in our review pre-analyzed, and represented in a custom- and taxonomy of geospatial software tools, ized web-based interface. This allows users to we propose a powerful and publicly access-

348 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS

Figure 7. The visualization pipeline. ible system. Such a software tool would In the classic visualization pipeline (Card eff ectively span the edge of the ‘cartography et al., 1999) shown in figure 7, the raw data cube’ without sacrifi cing functionality at the is parsed and transformed, mapped, and corners. displayed in some visual form. The user is able With this agenda, the Senseable City Lab to interact with the visualization by adapt- is currently developing a new tool called ing each stage of the pipeline. This, however, the Datacollider – a web application that was intended for expert analysts. As we provides a library of visualization types (and have explained above, there is an increasing the capacity for specialists to create custom demand from laypeople for urban visuali- visualization types) along with an interface zation tools of varying complexity. In figure that simplifies the process of uploading, 8, we identify interaction mechanisms when structuring and processing heterogeneous matched to the capabilities of Senseable datasets for use within those visualization City Lab projects. The Datacollider aims to frameworks (Senn et al., 2015). In this section, provide functionality for the full spectrum of we begin with a summary of key elements visualization adaptation. of the Datacollider’s front-end interface and its back-end architecture. We then articulate The Datacollider the path from the raw data to visualization as four sequential steps. This demonstrates the The user interface is designed as a seamless tool’s utility, specifically for processing time- tool for querying and visualizing geospatial and-location-dependent data, and ultimately datasets, while the backend framework is producing interactive geospatial infographics. robust enough to manage large amounts

Figure 8. Selected interaction mechanisms for the Senseable City Lab projects.

BUILT ENVIRONMENT VOL 42 NO 3 349 BIG DATA AND THE CITY of data. The goal of the Datacollider is to operators, for example: group, fi lter, join, etc. provide high-level functionality (in the form There are specifi c operators to perform trans- of processing power for large and hetero- formations, for example aggregation into geneous data with a temporal dimension) grids/polygons. Multiple operations can be without sacrifi cing intuitive workfl ow and chained together to form a workfl ow. This is ease of use – as previously stated, to span achieved through a graphical user interface the edge of the ‘cartography cube’. The Data- and dragging/dropping items. Furthermore, collider dramatically reduces time required the user is simultaneously presented with a to glean and represent insights from large preview of the data to show implications of datasets. Furthermore, a large (and growing) adding a given operator to the dataset. library of visualization types contains many diff erent representation strategies, amenable (d) Visualize: After processing the data, the to a broad variety of datasets, and seeks to user can render a visualization using any of convey information in a way that is inter- the templates provided (see map and chart active and intuitive for non-experts. For all types, below). Each template presents a list of these reasons, both the process and the of requirements that can be mapped to data product of the tool span a broad user and fi elds from a selection panel. Specifi c at ri- audience demographic. However, the Data- butes of objects are fully customizable, for collider lacks important functionalities that example, colours, sizes, heights, etc. after they will be the subject of its continued develop- have been linked to a specifi c data dimension. ment. These may also inform the broader Workfl ow integration allows the user to fi eld of geospatial data visualization. review the data and operators as necessary, and, if changes are desired, to return to the data operations window for additional or The Work Flow alternate processing. The user can change We will outline the four stage workfl ow the imagery of the maps, add additional which constitutes the foundation of the Data- GeoJSON layers, add contextual elements, or collider: animate the visualization as a narrative.

(a) Upload data: Users begin by logging The current offering of map and chart in and uploading data to their repository. types: The Datacollider accepts two data formats: GeoJSON and delimiter-separated fi les with Map – Scaled bars spatial identifi ers. Having uploaded a desired Map – Scaled circles dataset (or using a dataset previously up- Map – Heat map with bars loaded), the user then creates a new project. Map – Heat map with spheres Map – Origin destination fl ow (b) Select Time Frame and Resolution: The Chart – Scat er user must select a time window (when the Chart – visualization will begin and end) and the Chart – level of resolution (5m, 10m, 15m, 30m, 1h, 4h, 6h, 12h, 24h, 7d, 30d, 90d, 180d and 365d). Datacollider Limitations This is done on a simple timeline with extent- pointers that can be dragged to demarcate the The Datacollider aims to overcome many of desired time frame. the identifi ed limitations of contemporary GIS software, and it constitutes a fi rst step (c) Analyze Data: The user can then mani- towards an openly available, intuitive and pulate the data through a range of SQL-like powerful real-time geospatial visualization

350 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS

Figure 9. Screenshots illustrating the Datacollider work fl ow. Top left: Upload Data; top right: Select time frame and resolution; bot om left: Analyze data; bot om right: Visualize data. tool. However, the software has limitations universal bounds). that should be acknowledged, and which may inform future research: Data Hosting: When uploaded, data is hosted on servers in Singapore, which may raise Real-time Functionality: The Datacollider does political concerns for some users. not support real-time input. Data must be uploaded as a cohesive and singular dataset Visualization Export: The Datacollider currently for processing with the Datacollider’s structur- exports high-resolution (2560 × 1920px ap- ing tool. proximately 2.5MB) images in JPG and PNG format. Animated visualizations can be cap- Data Structure: Input data can be saved tured using screencast software such as Quick- natively in a variety of formats, but must be: time. However, future developments will (a) delimited fi le; (b) every record associated simplify and improve the export process. with a time stamp; and (c) every record associated with a latitude/longitude (for use Future Work with cartographic visualizations). Building on our evaluation of contemporary Data Size: Maximum size is limited (although GIS software and our experience with the it is variable, dependent upon the number Datacollider, there are several clear avenues of operations in a workfl ow – there are no for future development – both of the tool

BUILT ENVIRONMENT VOL 42 NO 3 351 BIG DATA AND THE CITY itself and to progress the general fi eld of ‘ambient mobility’ platform, linking and geospatial data visualization software. optimizing urban mobility options. In terms of technical capabilities, our prior- More broadly, we imagine that an important ity for the Datacollider is to enable real-time development will simply be the wider appli- function, such that data streams can be cation of geospatial data visualization tools. plugged into the tool, rather than uploaded This will include existing datasets that have as a discrete dataset. This will allow stake- not been visualized and entirely new data- holders to understand the real-time operation sets (created with or without the intention of of systems, rapidly gain insight and take action. visualization). We also suggest the representa- Simultaneously, we propose to continue tion and use of geospatial data visualizations expanding and improving the library of in new contexts, for example, decision-mak- visualization types. This will be significantly ing in business strategy – these will only advanced as the tool is released to the increase in the years to come. broader public, and developers are invited to create and upload custom visualization types Conclusion using open documentation of the application programming interface or API. Finally, we In this paper, we have outlined the state- intend to increase the scale and scope of the of-the-art in geospatial data visualization, tool. Larger servers, more file upload types, described a new synthetic approach to visuali- and locations outside Singapore will allow zation called the Datacollider, and suggested the Datacollider to process more data types, avenues for future development of the fi eld. more data, more quickly, and in a politically These are valuable for the improvement of agnostic way. software and for the general advancement of Beyond the Datacollider itself, we wish knowledge, but a widely accessible tool for to acknowledge policy standards as an real-time fl ow mapping also has the potential important aspect of continued development to radically transform urban operations by in the field of geospatial data visualization. engaging every segment of the population. This is particularly relevant in the urban Urban complexities – formerly the domain context, where activity happens primarily at of specialists or limited stakeholders – can the intersection of government, industry and generate concrete insights that anyone can academia. Each of these players currently works grasp intuitively. This is crucial for involving with proprietary standards and structures, communities of citizens who constitute the limiting or precluding any transfer of data, city itself, and whose behaviours defi ne its workflows or general best practice. Develop- overall function. A real-time omni-modal ing schema and protocols for real-time data transportation analytics platform, for example, will allow every potential actor to produce could benefi t individuals by elucidating the a wider variety of datasets that can be more fastest, most comfortable, or least expensive easily aggregated or streamed, manipulated, options, while simultaneously rebalancing and visualized. Examples include: compari- the overall system to reduce congestion. This son of ocean patterns between coastal cities generally represents a shift from visualization to better understand sea level rise and inform as a set of graphic tools for representing urban resilience strategies; analysis of energy previously found insights, to visualization as use patterns across many scales (individual a way of engaging with data and actively homes, cities, regions) and across private deriving insights: from post-rationalization to energy providers, to inform and manage interrogation. transition to a renewable energy grid; and Visual representations may help urban aggregating real-time transportation data inhabitants to become more aware of issues from many modes to drive a city-scale and patterns in the city around them by

352 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS providing a rich array of communication tools the contemporary urban condition generates (Vande Moere and Hill, 2012). Visualization enormous real-time datasets. Ultimately, a platforms can transfer information from the wide community of interested participants top down (government to citizens; businesses can benefit from such exploration, synthesis to subscribers) or horizontally among citizens and visualization. (peer to peer). As they become increasingly intuitive they will reach a larger number of people across a broader demographic and be more desirable for everyday use. The goal REFERENCES for geospatial data visualization tools is to Andrienko, G., Andrienko, N., Demsar, U., Dransch, add value and enrich life across a significant D., Dykes, J., Fabrikant, S.I. and Tominski, segment of the population. At this point, C. (2010) Space, time and . International Journal of Geographical Information people will be empowered to use such tools Science, 24(10), pp. 1577–1600. creatively and in a personally meaningful Andrienko, N. and Andrienko, G. (2012) Visual way. We suggest that there is an urgent need analytics of movement: an overview of methods, for a simple, intuitive and powerful geo- tools and procedures. Information Visualization, spatial and temporal data exploration and 12(1), pp. 3–24. visualization tool. Bat y, M. (2015) Data about Cities: Redefi ning Through our evaluation of the state-of-the- Big, Recasting Small. Paper prepared for the art in geospatial visualization tools, as well as Data and the City Workshop. National Univers- ity of Ireland, Maynooth. Abstract and video a review of our own work at the Senseable available at: ht p://progcity.maynoothunivers City Lab, we have identified important best ity.ie/2015/09/data-and-the-city-workshop- practices, limitations, and trends in geospatial session-4-videos/ accessed/. data visualization. Considering the contemp- Calabrese, F., Colonna, M., Lovisolo, P., Parata, D. orary influx of data, increased availability and Rat i, C. (2011) Real-time urban monitoring of datasets, and wider use of data-driven using cell phones: a case study in Rome. IEEE Transactions on Intelligent Transportation Systems, cartography in a broad range of decision- 12(1), pp. 141–151. making applications, we contend that there is Card, S.K., Mackinlay, J.D., and Shneiderman, a need for simple, unrestricted and powerful B. (1999) Readings in Information Visualization: geospatial data visualization tools that span Using Vision to Think. , CA: the edge of MacEachren’s (1994) ‘cartography Morgan Kaufmann. cube’. We demonstrate that our Datacollider is Chen, J., Guo, D. and MacEachren, A. (2005) Space- a first step in this direction. Time-At ribute Analysis and Visualization of Flexible data analysis and visualization US Company Data. , MN: IEEE Symposium on Information Visualization, Com- tools, amenable to many and varied real-time pendium Volume, pp. 23–25. data streams (which, themselves, are govern- Cox, M. and Ellsworth, D. (1997) Application- ed by overarching standards) may become an Controlled Demand Paging for Out-of-Core integral aspect of urban life. Such tools span Visualization. Report NAS-97-010, MS T27A- the edge of the conceptual ‘cartography cube’, 2. Moff et Field, CA: NASA Ames Research and foreground the process of interacting Center. with data. A visualization may be the end Crampton, J.W. (2002) Interactivity types in geo- product, but new insights will be generated graphic visualization. Cartography and Geographic , 29(2), pp. 85–98. through providing broad public access to an interrogative tool for interacting with data. Fuller, R.B. (1981) Critical Path. New York: St. Martins Press. Geospatial becomes an active Grauwin, S., Sobolevsky, S., Morit , S., Gódor, I. endeavour rather than a specialist’s post- and Rat i, C. (2014) Towards a comparative rationalization. We submit that this should science of cities: using mobile traffi c records be the goal for data-driven flow mapping, as in New York, London, and Hong Kong. Com-

BUILT ENVIRONMENT VOL 42 NO 3 353 BIG DATA AND THE CITY

putational Approaches for Urban Environments, 13, Roth, R.E. (2013) An empirically-derived tax- pp. 363–387. onomy of interaction primitives for interactive Kloeckl, K., Senn, O. and Rat i, C. (2012). Enabling cartography and geovisualization. IEEE Trans- the real-time city: LIVE Singapore! Journal of actions on Visualization and Computer , Urban Technology, 19(2), pp. 89–112. 19(12), pp. 2356–2365. MacEachren, A.M. (1994) Visualization in Modern Senn, O., Khairul, M., Maitan, M., Pribadi, R., Cartography: Set ing the Agenda. Oxford: Perga- Shah, M. and Sivaprakasam, R. (2015) Data- mon. collider, in Proceedings of SIGGRAPH Asia 2015 Visualization in High Performance Com- MacEachren, A.M., and Kraak, M. (1997) Explora- puting, Kobe, Japan. tory cartographic visualization: advancing the agenda. Computers and Geosciences, 23(4), pp. Snijders, C., Mat at, U. and Reips, U.-D. (2012) 335–343. ‘Big Data’: big gaps of knowledge in the fi eld of internet. International Journal of Internet Science, Meller, H. (1990) Patrick Geddes, Social Evolutionist 7, pp. 1–5. and City Planner. London: Routledge. Townsend, A.M. (2014) Smart Cities: Big Data, Civic Nagel, T., Maitan, M., Duval, E., Moere, A.V., Hackers, and The Quest for a New Utopia. New Klerkx, J., Kloeckl, K. and Rat i, C. (2014) York: W.W. Norton. Touching Transport – A Case Study on Visualizing Metropolitan Public Transit on Vande Moere, A. and Hill, D. (2012) Designing for Interactive Tabletops. ACM Proceedings of the situated and public visualization of urban the 2014 International Working Conference on data. Journal of Urban Technology, 19(2), pp. Advanced Visual Interfaces - AVI ‘14, pp. 281–288 25–46. Robinson, A.H. (1967) The thematic maps of van Wijk, J.J. (2005) The value of visualization. VIS Charles Joseph Minard. Imago Mundi, 21(1), 05 IEEE Visualization, pp. 79– 86 pp. 95–108.

Appendix: Terminology for Visualizations and often use a broad design style. For ex- ample: two side-by-side coca cola bottles Map: A static tool for representing collected filled up to different heights, indicating aver- information about what exists in physical age consumption in two different countries. territory. Intended to show one or several dimensions of physical space. Geospatial Infographic: A visual representation Maps depict geospatial information, and of data that incorporates a map. Intended to are a classic, centuries old technique to rep- communicate information from basic geo- resent our physical environment. For example: spatial data to a wide audience. a maritime navigational document show- Geospatial infographics are graphical rep- ing land masses, water bodies and ports. resentations of geospatial data or information. Maps typically are a part of a geospatial info- Infographic: A visual representation of data. graphic. For example: a map of United States Intended to communicate information to a voting trends, showing red and blue states defi ned audience. for a given year. Infographics (or ‘information graphics’) are graphical representations of data or informa- Geospatial Data Visualization (Geovisualization, tion, in order to communicate some facts, Flow Map): A visual representation of dynamic stories or insights visually. They are – in con- (time-based) data across physical territory. trast to data visualization – typically static or Intended to show relationships or changes at least non-interactive. While they can consist in geospatial data. of multiple visualizations or graphs side-by- Information visualization focuses on side, they tend to be visually less complex, graphical representations of data to help

354 BUILT ENVIRONMENT VOL 42 NO 3 FROM ORIGINS TO DESTINATIONS: THE PAST, PRESENT AND FUTURE OF VISUALIZING FLOW MAPS people understand and analyze data. This insights in a personally relevant way. Inter- foregrounds the question of how to use activity can further two main goals: visualization techniques to effectively and efficiently reveal the internal structure of (a) Presenting a larger amount of informa- the data (Van Wijk, 2005). While information tion without overwhelming users (particular- visualization typically focuses on abstract ly non-specialists). For example, an Inter- data (Chen et al., 2005), geovisualization, or active Geovisualization may show all trans- geographic information visualization deals portation data, including car, bus, taxi, bike, with data that has physical and spatial corre- train, metro, but only display one of those at spondence. However, it covers not only geo- a time. spatial data, but also multivariate geo- referenced data, typically including time- (b) Allowing users to fi nd personally varying properties. Geovisualization is a relevant insights. In the above example, a visual analysis of patterns, relationships, and user may only be interested in exploring the trends, and allows exploration and under- spatial relationship between bike routes and standing of spatio-temporal information. public transit. Flow maps are a subset of this category: visualizations depicting movement between For example: a browser for all transportation geographical locations on a map. They are data in a city. a combination of flow and maps, typically showing a background map over- Geospatial Data Tool: A tool for collecting, struc- laid with lines connecting the flow origins turing, manipulating, analyzing, and visualiz- with the destinations, often with the flow ing time-based geospatial data. Outputs magnitudes mapped to line thickness. For can be interactive, to be browsed by a wide example: a map of telecommunication patterns variety of users. Intended to empower any- between one city and other cities, animated one, regardless of background and expertise, to show how those connections change to work with data, fi nd, and visualize throughout the day. insights. A geospatial data tool will be applicable Interactive Geovisualization: A visual repre- for every step in the production and con- sentation of dynamic (time-based) data across sumption chain, from raw data to visualiza- physical territory, which can be fi ltered by tion. Stakeholders may apply this tool for the a user (according to place, time, dimension, entire process, or interact with it at different characteristic). Intended to show relationships stages. For example, an urban researcher or changes in geospatial data, and can be may upload, analyze and visualize data, or a browsed by a wide variety of users. graphic designer may enter at the final stage Data is pre-processed (by visualization to turn insights into a compelling dynamic and/or data science experts) to derive specific graphic. The greatest value of this tool will insights that can be communicated to users. be in finding insights and presenting them. Advanced data analysis techniques can be That is, its strength is in enabling anyone to applied to the raw data, and the interactive actively and productively explore raw data. tool can focus on communicating those For example: the Datacollider.

BUILT ENVIRONMENT VOL 42 NO 3 355