An Interactive Widget to Visualise the Allergy Symptoms of a Country

A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences

Deemah Alqahtani

2016

School of Computer Science Table of Contents

Table of Contents ...... 2

List of Figures ...... 5

List of Tables ...... 8

Glossary of Terms ...... 9

Abstract ...... 10

Declaration...... 11

Intellectual Property Statement ...... 12

Acknowledgement ...... 13

1 Introduction ...... 14

1.1 Aims and Objectives ...... 15

1.2 Report Structure ...... 16

2 Background Research...... 18

2.1 Overview ...... 18

2.2 Spatio-Temporal and Time Series Data on Maps...... 19

2.3 Visual Information Comparisons ...... 21

2.4 The Visual Information Seeking Mantra ...... 24

2.5 Summary ...... 26

3 Literature Analysis ...... 27

3.1 Overview ...... 27

3.2 Examining Multivariate Time Series Data Visualisation Techniques ...... 27

3.2.1 Discussion ...... 27

3.2.2 Comparison ...... 31

3.3 Examining Cartesian and Polar Visualisation ...... 33

3.3.1 Discussion ...... 33

3.3.2 Comparison ...... 34

2 3.4 User Interaction Techniques ...... 37

3.5 Geo-Analysis Techniques...... 37

3.6 Research Questions ...... 40

3.7 Summary ...... 41

4 Research Methods ...... 42

4.1 Overview ...... 42

4.2 Methodology ...... 42

4.2.1 Full Map ...... 44

4.2.2 Small Multiples ...... 45

4.3 Evaluation Methodology ...... 45

4.3.1 Usability Testing ...... 46

4.3.2 Tasks and Metrics ...... 46

4.3.3 Experimental Design ...... 48

4.4 Summary ...... 48

5 Project Implementation ...... 50

5.1 Overview ...... 50

5.2 Technology and Architecture ...... 50

5.3 Data ...... 52

5.4 Development Sprints ...... 54

5.4.1 Sprint One ...... 55

5.4.1.1 Planning ...... 55

5.4.1.2 Implementation ...... 55

5.4.1.3 Testing ...... 60

5.4.2 Sprint Two ...... 61

5.4.2.1 Planning ...... 61

5.4.2.2 Implementation ...... 61

3 5.4.2.3 Testing ...... 65

5.4.3 Sprint Three ...... 66

5.4.3.1 Planning ...... 66

5.4.3.2 Implementation ...... 66

5.4.3.3 Testing ...... 69

5.4.4 Sprint Four ...... 69

5.4.4.1 Planning ...... 69

5.4.4.2 Implementation ...... 69

5.4.4.3 Testing ...... 71

5.5 Summary ...... 71

6 Project Evaluation ...... 72

6.1 Overview ...... 72

6.2 Demographics ...... 72

6.3 Analysis and Results ...... 72

6.4 Summary ...... 78

7 Discussions ...... 79

7.1 Overview ...... 79

7.2 Analysis Interpretation ...... 79

7.3 Implications for Design ...... 82

7.4 Summary ...... 84

8 Conclusions and Future Work ...... 85

8.1 Conclusions ...... 85

8.2 Limitations and Further Work ...... 86

9 References ...... 87

10 Appendix 1 ...... 93

Final word count: 18287

4 List of Figures

Figure 1. Two types of health data and the result of their combination...... 19

Figure 2. The ThemeRiver: A river representing theme changes through time [10]. ... 20

Figure 3. The TimeWheel for multivariate time-oriented data visualisation [11]...... 21

Figure 4. Two juxtaposed images of the same geographical area at different times [14]...... 22

Figure 5. (Left) Superposition of two maps [15]. (Right) A map with Blending Lens [3]...... 22

Figure 6. The three categories of visual designs applied to two heat maps [16]...... 23

Figure 7. The triangle of the categories of different applications [12]...... 24

Figure 8. A map showing the overview+details interface [19]...... 25

Figure 9. Small multiples showing American adult obesity levels for three different years [21]...... 28

Figure 10. EventViewer interface [22]...... 29

Figure 11. A ring map representing time series data of health alert levels per American ZIP code [23]...... 29

Figure 12. ThemeRiver icons represented on a map [8]...... 30

Figure 13: VIS-STAMP system. The matrix view is on top left, small multiples view is on top right, the parallel coordinates view is on bottom left and SOM is on bottom right [24]...... 31

Figure 14. A Google calendar-like interface for displaying temporal data [31]...... 38

Figure 15. Medical cost of a hip replacement by state [32]...... 39

Figure 16. Historic weather-station data for the UK [33]...... 39

Figure 17. The US mortality visualisation [34]...... 40

Figure 18. A choropleth map...... 44

Figure 19. The project Architecture...... 52

Figure 20. Entity Relationship Diagram...... 53

5 Figure 21. A map polygon [37]...... 56

Figure 22: The project’s database schema ...... 57

Figure 23. JSON object example...... 59

Figure 24. The GeoJSON output...... 59

Figure 25. The JSON output...... 59

Figure 26. The first sprint result...... 60

Figure 27. Data flow in Sprint one...... 60

Figure 28: The updated GeoJSON output...... 62

Figure 29: The choropleth map and the dropdown menu to filter the symptoms...... 63

Figure 30. A postcode sector with a median well-being of 2 highlighted in grey as result of hovering over it...... 63

Figure 31. The postcode sector after clicking it...... 64

Figure 32. A pie chart is shown upon hovering over a postcode sector...... 64

Figure 33. Data flow in Sprint two...... 65

Figure 34. The time slider set to a specific time range shown below it...... 67

Figure 35. Spline chart visualising the trend in the symptoms in postcode sector DE74 2...... 67

Figure 36. Spline chart visualising the trend in ‘Nose’ and ‘Eyes’ symptoms in postcode sector CM3 6...... 68

Figure 37. Data flow in sprint three...... 68

Figure 38. Small multiples...... 70

Figure 39. An area in the map before (left) and after (right) implementing the Douglas- Peucker algorithm ...... 71

Figure 40. The mean values of completion rate in each condition...... 73

Figure 41. A histogram showing Shapiro-Wilk test result on the task time...... 74

Figure 42. A Q-Q plot showing the result of Shapiro-Wilk test on the task time...... 74

Figure 43. The mean values of completion time of tasks in each condition...... 75

6 Figure 44. A histogram showing Shapiro-Wilk test result on the satisfaction...... 76

Figure 45. A Q-Q plot showing the result of Shapiro-Wilk test on the satisfaction. .... 76

Figure 46. The mean values of the satisfaction of tasks in each condition...... 77

Figure 47: The product backlog...... 93

7 List of Tables

Table 1: Comparison of multivariate visualisation techniques (on this and top of following page) ...... 32

Table 2: Cartesian and polar coordinate systems comparison (continues on next page) ...... 35

Table 3. The usability test’s tasks...... 47

Table 4. The summary statistics of the p-value in each task between the two conditions where the asterisks indicate the level of significance in the difference...... 77

8 Glossary of Terms

Term Definition

Basemap A map displaying a collection of background details of GIS information

Choropleth Map A thematic map used to classify quantitative data

COLORBREWER An online tool that assists in choosing the right colour scheme for mapmaking tasks

Esri An international provider of Geographic Information Sys- tems

GeoJSON A format that can represent geographic and non-geo- graphic features based on the JSON format

Highcharts An open source JavaScript charting library to add inter- active charts to web applications

JSON A data interchange format that uses data objects in the form of attribute-values pairs

Leaflet An open source JavaScript mapping library

OpenStreetMap: An open source project to create editable maps of the world

Open Door Logistics: A UK-based firm that provides open source spatial solu- tions and data

Ordnance Survey A national mapping agency and a producer of different types of the UK maps’ data and layers

9 Abstract

Visualising time series data is an important field of information visualisation since it allows capturing changes that occur in data over time. It is challenging, however, to visualise these data when they are multivariate and represented in a spatial context. Many studies have attempted to suggest visualisation techniques for such data. Ap- proaches ranging from geovisual analytic tools to icon placements on maps have been claimed to be effective for identifying spatio-temporal trends and patterns in data repre- sented on maps. However, many of the methods proposed thus far have not been evaluated thoroughly, making it difficult to determine which technique is the best. This project aims to provide an in-depth analysis of the various techniques and strategies for visualising and analysing multivariate time series data on maps and to implement and evaluate suitable techniques that lead to better spatio-temporal pattern detection and better learning from data. This project adopts a hybrid approach of implementing two visualisation techniques, namely, the full map and small multiples, with a solid emphasis on interaction by applying the Visual Information Seeking Mantra and other interaction techniques. A summative usability study that combines qualitative and quantitative ap- proaches was undertaken to evaluate the two visualisation techniques to determine which better encourages knowledge extraction and spatial, temporal or spatio-temporal pattern discovery. The analysis revealed that tasks focusing on spatial comparisons gen- erally outperformed in the full map. It also found temporal comparison tasks showed better performance in small multiples. However, tasks with the spatio-temporal type of comparisons showed a relatively low significant difference between the two visualisa- tion techniques. These findings imply that, to obtain a collective advantage, expand usability and enhance interactivity, a single visualisation technique is insufficient. In- stead, composite approaches of two or more visualisation techniques might be considered.

10 Declaration

No portion of the work referred to in the dissertation has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

11 Intellectual Property Statement

i. The author of this dissertation (including any appendices and/or schedules to this dissertation) owns certain copyright or related rights in it (the “Copy- right”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes.

ii. Copies of this dissertation, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, De- signs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the Uni- versity has entered into. This page must form part of any such copies made.

iii. The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the dissertation, for example graphs and tables (“Repro- ductions”), which may be described in this dissertation, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Prop- erty and/or Reproductions.

iv. Further information on the conditions under which disclosure, publication and commercialisation of this dissertation, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/dis- play.aspx?DocID=24420), in any relevant Dissertation restriction declarations deposited in the University Library, and The University Li- brary’s regulations ( see http://www.library.manchester.ac.uk/about/regulations/_files/Library-regu- lations.pdf).

12 Acknowledgement

First and foremost, I praise the almighty for his blessings and for being with me through- out the lows and the highs of my MSc journey.

I would like to express my sincere gratitude to my supervisor Dr. Markel Vigo for his encouragement, support and guidance throughout all the stages of this dissertation.

Many thanks to my Government and University of Dammam for their support through their scholarship programme.

Many thanks to my friends, colleagues and to all those who provided me help and ad- vice.

Last but not least, special thanks to my family for their love and endless support.

13 1 Introduction

The visual analysis of time series data is one of the main tasks in many application fields, including forecasting, health and environmental change. Visualising time series data exposes the evolution of the data over time and helps one observe changes that occur in the data that further assist with future decision making [1]. One of the challenges involved in the visualisation of time series data occurs when these data are multivariate, which means that they arise from more than one dependant variable. An additional chal- lenge arises when the time series data are represented in a spatial context, such as maps, in different applications in geographic information systems (GIS). The spatial aspect of the multivariate time series data unlocks more opportunities for finding patterns. How- ever, understanding the relationship and distribution of the data through an optimum method of visualising them is still problematic [2]. Some techniques have been sug- gested for visualising multiple time-dependent data on maps. Those suggestions range from geo-visual analytic tools to 2D and 3D icon placements on maps. Despite claiming that they are effective tools for identifying spatio-temporal trends and patterns in the data represented on maps, many of these techniques have not been evaluated. The uncertainty of the validity and reliability of these methods makes it debatable whether any of them can be considered to be the best way of visualising multivariate data. Fur- thermore, these methods have not been tested in visual comparison tasks where the dataset, in any form of geographic data structures such as points, polygons or file for- mats, is exposed to other datasets with similar or different forms to determine more interesting patterns. This is important in geo-analysis, because it would help one explore data from multiple resources on one map [3].

The purpose of this project is to visualise the allergy symptoms of asthma and hay fever as reported by citizen scientists on an interactive map. This project is part of the #Brit- ainBreathing initiative to spread awareness about seasonal allergies in the United Kingdom (UK). The reports have been collected from various geographical locations, and each data point has a timestamp and the reported allergy symptoms. In light of the challenges involved in the visualisation of time series data on maps, this project will investigate and implement the most convenient techniques to visualise and analyse mul- tivariate time series data drawn from the fields of information visualisation (Infovis), geo-visualisation and human–computer interaction (HCI). The project will specifically

14 focus on the dynamic exploration of the information space by following the Visual Information-Seeking Mantra, explained in Section 2.4, which outlines the fundamental rules of interaction in information visualisation [4]. In addition, the project will examine and analyse visual composition strategies in maps to help individuals perform visual comparison tasks [3]. Information visualisation in 3D space is beyond the scope of this project.

1.1 Aims and Objectives

The aim of the project is to develop an interactive widget that not only allows users to visualise historical and recent seasonal allergy symptoms on a map, but also enables them to compare, learn and extract knowledge from the visualised data that can help them better manage their symptoms in the future.

The objectives of the project are:

1. To investigate the existing approaches to visualising time series data, with an emphasis on data with spatial dependencies.

2. To identify, apply and evaluate the appropriate visual design techniques based on the Visual Information-Seeking Mantra [4], which is based on the principles of overview, zoom and filter, and details-on-demand. This project adapts the mantra and analyses additional tasks that are described below:

a. Because it is difficult to display all the data in one picture, focusing on areas of interest must be taken into account. Since the dataset of this pro- ject involves spatial dependencies, focusing is implemented based on a geographical location by defining a level of granularity. The maps in the widget will be subdivided by postcode sectors so that the manageable areas are able to render data when they are received. Additionally, focus- ing on a specific location, such as a postcode, tends to yield more useful information. For example, users will be able to visualise allergy symp- toms in the areas where they live and compare those symptoms with neighbouring areas.

b. There are different well-known graphical methods of data representation, such as various chart types that use Cartesian or polar coordinate sys- tems. One of the project’s objectives is to determine and apply the best

15 method for representing time series data on allergy symptoms for a geo- graphical area. This is done by analysing studies in the literature that have evaluated the coordinates and presented suitable findings.

c. The selected visualisation method should conveniently fit within its spec- ified area on the map without affecting the overall look of the interface. Moreover, it should be capable of visualising different symptoms of sev- eral allergens in each geographical area.

d. The widget should allow for the visualisation of historical data of any area of interest upon user demand. The data will be represented by charts related to the severity of symptoms on an adjustable timescale.

e. The widget, by default, should have a details-on-demand facility by which the user can specify the details that are to be shown on the map.

3. To develop a web widget that will be deployed on the #BritainBreathing website.

a. The data that feed the visualisation widget are a collection of reports about allergy symptoms submitted by citizen scientists through a mobile application that was developed by the #BritainBreathing team.

b. The widget will be developed in an agile fashion so that a working soft- ware version can be released several times, with each version meeting the minimum requirements each time. This will allow more engagement from its users and rapid feedback.

1.2 Report Structure

In addition to the introduction, this report is comprised of seven chapters. Chapter Two presents a general background of the visualisation of health data and spatio-temporal data. It defines the general concepts, reviews the relevant examples and provides an overview of the well-known taxonomies of information visualisation and information comparison in maps. In addition, this chapter discusses the Visual Information-Seeking Mantra that is going to be adopted in the project. Chapter Three conducts a thorough analysis of the literature in the techniques of visualising multivariate time series data as well as using Cartesian and polar coordinate systems for the visualisation of time series data. The analysis comprises detailed comparisons and discussions. It also examines

16 current user interactions and geoanalysis techniques and sets the project’s research ques- tions. Chapter Four discusses the project’s research methods. It first describes the research methodology and the chosen visualisation techniques. Then it discusses the evaluation methodology including its objectives, evaluation methods, tasks and experi- mental design approach. Chapter Five covers the project implementation. The chapter outlines the project’s technology and architecture, gives an overview of the data used in the project, and details the development sprints where in each sprint it discusses the sprint’s objectives, implementation and outcomes. Chapter Six discusses the analysis and findings of the project evaluation. A detailed discussion in light of the analysis re- sults and the implications for design to help in future work are carried out in Chapter Seven. Finally, Chapter Eight presents the conclusions of this project. It discusses the limitations that were present in the project and remarks concerning future work.

17 2 Background Research

2.1 Overview

There is a considerable number of sources of geographic information that are available today, including global positioning system (GPS)-enabled mobile devices, health data, demographics, satellites and even text converted to coordinates. Geographic information systems (GIS) are gaining greater importance, and geographic information science (GIScience) is needed to initiate methods for analysing and exploring the geographic information more effectively [5]. Geovisualisation, which is a field that integrates car- tography, information visualization (Infovis), exploratory data analysis (EDA) and geographic information systems (GIS), offers ways to visually explore, analyse and pre- sent geographic information [6]. Geovisualisation applications provide methods to compose different map layers to determine patterns. They also facilitate correlating data from different sources to create effective maps used in emergency response and crisis management tasks. There are challenges that are associated with geovisualisation. From the HCI perspective, there is a demand to investigate strategies to design interactive maps that are legible and easy to compare [3]. Research is also required to explore the complex patterns of spatio-temporal data.

Health information is a rich source of geographic information. It produces a huge amount of data that can be analysed visually to extract information that could assist in recent and future healthcare decisions [7]. Health data could be multivariate, with tem- poral and spatial aspects. Therefore, their visualisation is an ever-growing research discipline. From the temporal perspective, time is generally an important dimension in health data. Spatially, healthcare statistics utilise georeferenced data, which are associ- ated with coordinate systems, to enable plotting and overlaying the data on maps. There is a vital need to develop state-of-the-art visualisation techniques that would facilitate advanced geospatial data exploration while maintaining a high level of interaction [8]. However, there is also a need to promote the integration of geospatial data exploration with time series data visualisation so that the potential spatio-temporal patterns that could play an essential role in healthcare analysis can be examined. Figure 1 depicts two different types of health data, namely, time series data and geographic data, as well as their corresponding ways of visualisation. When both are combined, they lead to spatio-

18 temporal data for which geovisualisation can be used. The project will suggest tech- niques to help visualise complex patterns appearing from such data on interactive maps so as to provide new insights from time series healthcare data. This chapter further ex- plains the spatio-temporal and time series data on maps and their current challenges, introduces visual information comparison on maps, and highlights the notable Visual Information Seeking Mantra that is going to be adopted in the project to promote user interaction.

Health Data

Time Series Geographic Data Data InfoVis GIS

Spatio- Temporal Data Geovisualisation

Figure 1. Two types of health data and the result of their combination.

2.2 Spatio-Temporal and Time Series Data on Maps

Spatio-temporal data have both temporal aspects, such as timestamps, and spatial aspects, such as geographical coordinates. Adding a spatial context to temporal data enables the data to be visualised on a map. Time series data are particularly significant because they allow for the capture of changes that occur over time. Different visual methods have been developed to help visualise time-oriented and time series data. Some examples include the ThemeRiver, in which an ordered group of time points are repre- sented in a river-like representation (see Figure 2), and TimeWheel, which consists of a single time-axis for the temporal dimension and multiple colour-coded data axes for the data variables (see Figure 3) [9]. However, it is challenging to develop techniques to visualise multivariate time series data with geo-spatial properties. One of the challenges is to develop a convenient graphical method for visualising information in a single

19 representation, taking into account the main visualisation tasks of presenting the over- view first and then the details. Another challenge is the need to develop effective visualisation techniques that are capable of handling complicated spatio-temporal patterns, especially if the data are multivariate, in which case representing them in reg- ular line graphs plotted on a map might not be sufficient [2]. It is important for users to evaluate the methods both in terms of their successes as well as their limitations. How- ever, no evaluations have been found for many of the current suggested methods. Therefore, it is unclear if these techniques work best for visualising multivariate data. One of the key goals of data visualisations is to enable their comparison, thereby ex- tracting knowledge from them. The main visual comparison techniques in maps will be discussed in the next section.

Figure 2. The ThemeRiver: A river representing theme changes through time [10].

20

Figure 3. The TimeWheel for multivariate time-oriented data visualisation [11].

2.3 Visual Information Comparisons

As the complexity of data has increased, a subsequent need for sophisticated tools that can analyse these data, particularly by performing comparison tasks, has developed [12]. Information comparison is a well-known method to learn from the objects by observing the differences and the interesting patterns developed in their behaviours.

In GIS, the base map only represents the key data, such as the administrative boundaries. To enrich the map, layers are often added that show various characteristics. These layers are comprised of two categories: vectors to render the features as points, such as poly- gons, and raster layers to render the landscape features, including buildings, roads and vegetation [13]. The recent applications of GIS typically utilise maps with specific pre- defined layers, such as topographic maps. Since data are organised into overlapping layers, map comparisons can be used by superimposing the different types of maps layers. However, there are also other categories of maps.

According to the taxonomy of visual designs [12], there are three core categories of visual designs: juxtaposition, superposition and explicit encoding. Juxtaposition is based on the separation of compared objects. For example, the two images shown in Figure 4 are juxtaposed to display the same geographical area at different time points with an adjustable slider to swipe between the two images for a more customised view. Superposition involves overlaying the compared objects in the same space, including place and time (see Figure 5 (left)). Some techniques that are currently used in map comparisons are extended versions of superposition. For example, Blending Lens in- volves placing a lens on top of a specific region of interest that allows the maps to be

21 overlaid only at that region while maintaining a stable context in the upper layer (see Figure 5 (right)) [3].

Figure 4. Two juxtaposed images of the same geographical area at different times [14].

Figure 5. (Left) Superposition of two maps [15]. (Right) A map with Blending Lens [3].

The third category of visual designs is explicit encoding, which identifies the differ- ences, similarities or correlations between two objects using visual encoding. Explicit encoding illustrates a comparative representation of flow data on maps [12]. In some cases, this technique outperforms both juxtaposition and superposition. For example, Figure 6 shows the three techniques applied to heat maps that were produced from the replay data of two players during a video game match [16]. Juxtaposition is applied to the upper two heat maps. Juxtaposition allows users to visually compare the images, although users’ attention becomes divided when looking at the two separate images [3].

22 The lower left image shows the superposition of both datasets, which results in the oc- clusion of useful information in the underlying layer; for example, the underlying heat map might be hidden. Explicit encoding is used in the lower right image. In this image, the values from the second heat map are subtracted from the values of the first heat map, and the difference between the two values is visually encoded on the map using a colour gradient. It is important to note that the three categories are not meant to be used sepa- rately, but can be combined to create hybrid categories for specific problems of comparative visualisations that might not be addressed using a single mechanism. Fig- ure 7 shows the categories of different applications taken from a database of various resources. Most comparative visualisation designs fall into at least one of these catego- ries [12].

Figure 6. The three categories of visual designs applied to two heat maps [16].

23

Figure 7. The triangle of the categories of different applications [12].

Currently, there is a need for more interactive composition strategies that promote dy- namic methods for visualisation and exploration. Such strategies not only focus on GIS research topics, such as map generalisation and static composition strategies, but they also emphasise the HCI perspective of how users can make the most of their interactions with maps [3]. Since the data of the project are multivariate, typical layer compositions might not be sufficient. Instead, there is a great need for a more dynamic approach that enables interactive visual analysis of the data [17]. The next section will provide an overview of one of the well-known methodologies of interactive information visualisa- tion in the field of HCI. It is known as the Visual Information Seeking Mantra.

2.4 The Visual Information Seeking Mantra

Human perceptual abilities include scanning, recalling and detecting changes. These abilities are involved in the various visual design approaches. Ever since it was proposed by Shneiderman, the Visual Information Seeking Mantra has played a central role in the applications of information visualisation and has been widely used as a classical para- digm to effectively promote information exploration through interactive visualisation [4]. The mantra consists of three core tasks: overview, zoom and filter, and details-on- demand. Overview provides the overall context in which the entire dataset can be un- derstood. This task is essential, since some patterns are simpler to recognise when the

24 big picture of the data is seen. Zoom is the interactive technique of zooming to an area of interest to locate more details, while filter removes any uninteresting objects to pro- vide a focused view. Lastly, details-on-demand provides dynamic interactivity without losing the general overview of the data by showing only the details requested by the user. Studies have shown that the mantra performs well in several applications, includ- ing maps [18]. For example, in a typical view of a map, the user first requires a general overview with the ability to zoom into a specific area, limiting the amount of infor- mation displayed. In most maps, more details can be displayed when the user demands it. In current visual displays, the tasks are often combined so that the map can be viewed more efficiently and can have a higher degree of interactivity. For example, some inter- faces apply the overview+details concept to offer a spatially separated overview and a detailed view of the information space. Figure 8 illustrates an example of this approach taken from OpenLayers mapping library. Other interfaces adopt focus+context, in which the general overview and the detailed view are shown in a single scene to facilitate vi- sion-driven comparisons. The Blending Lens shown in Figure 5 (on the right) is an example of this approach. More details about the mantra and other interactive ap- proaches are described in Section 3.4. The present project aims at using a combination of the mantra techniques, as well as other interactive techniques, to support visualisation with interactivity.

Figure 8. A map showing the overview+details interface [19].

25 2.5 Summary

This chapter presented the background research performed to understand the project’s related areas and current challenges. The general terms were defined, and the specific fields that the project will focus on were indicated. The chapter discussed spatio-tem- poral and time series data on maps, visual information comparisons on maps, and the Visual Information Seeking Mantra. In addition, some relevant examples were shown to further aid in understanding these fields as well as pointing out their present chal- lenges.

26 3 Literature Analysis

3.1 Overview

Different techniques have been proposed for visualising time series data that are of a multivariate and spatial nature. This chapter presents an in-depth analysis that was per- formed on selected multivariate time series data visualisation techniques for the purpose of identifying the most convenient ones that were adopted in this project. In addition, time series data are usually represented in one of the Cartesian or polar coordinate sys- tems. However, in order to determine which representation is better for the type of data in this project, the present researcher performed a detailed comparison of several eval- uations from the literature to assess which of the coordinate systems outperformed the others in different requested tasks. In consideration of the importance of interaction in InfoVis, the chapter discusses user interaction techniques and re-emphasises the InfoVis mantra for the purpose of using the mantra as well as other suitable interactions. The chapter then outlines some current geo-analysis technique examples. Finally, as a result of the literature analysis, the research questions that this project tries to answer are outlined.

3.2 Examining Multivariate Time Series Data Visualisation Techniques

A survey of 101 visualisation techniques of time-oriented data was performed to reveal the different visualisation methods used for time-oriented data representation. These techniques varied in applicability from specific to very general domains [20]. For the purpose of this project, the analysed techniques included spatial and multivariate nature of data, linear or cyclic time arrangement, instant or interval time primitives, static or dynamic mapping and 2D space implementation. Ultimately, the scope of this project covered five visualisation techniques.

3.2.1 Discussion

Each of the five visualisation techniques is described below. Although the techniques perform visual associations, statistical measures of the associations were not provided

27 in the original research. Thus, this criterion was not taken into account in the compari- son.

- Small Multiples: This technique involves using a collection of thumbnail im- ages, each visualising a different time point. The technique is general rather than specific and can be used with any type of graphical method. This technique is commonly used in map-based applications, as shown in Figure 9.

Figure 9. Small multiples showing American adult obesity levels for three different years [21]1.

- EventViewer: This technique was proposed as a framework to support the sim- ultaneous visualisation and exploration of temporal and spatial patterns by triggering events that are defined as change units. Each event has its own spatial, temporal and thematic category that can be assigned to three graphic compo- nents: bands, stacks and panels [22]. A snapshot of the tool is shown in Figure 10.

1 The figure is for illustrative purposes only.

28

Figure 10. EventViewer interface [22].

- Ring Maps: This circular style visualisation technique involves surrounding a map with rings of time series data. Each ring depicts an attribute of a particular area on the map. Figure 11 shows an example of this technique.

Figure 11. A ring map representing time series data of health alert levels per American ZIP code [23].

- Icons on Maps: In this approach, existing techniques of time-oriented data visu- alisations are used to represent the temporal dimension in a spatial context. To create the map, the corresponding techniques are reduced in size so that the vis- ualisations fit into a particular area on the map. If too many icons are present, icon positioning methods must be considered to avoid the occlusion of infor- mation. Figure 12 shows an example of this technique.

29

Figure 12. ThemeRiver icons represented on a map [8].

- VIS-STAMP: This visualisation system was developed to help analysts explore complex geo-visual patterns of multivariate spatio-temporal data using cluster- ing, sorting and colouring. The tasks can be performed on a self-organising map (SOM) that is displayed along with a matrix view in which columns represent time points and rows represent geographic regions; a map view, which imple- ments small multiples strategy on colour-coded maps that are distinguished by different time points, and a parallel coordinates view that focuses on dealing with the multivariate data [24]. This system is shown in Figure 13.

30

Figure 13: VIS-STAMP system. The matrix view is on top left, small multiples view is on top right, the parallel coordinates view is on bottom left and SOM is on bottom right [24].

3.2.2 Comparison

Table 1 shows a detailed comparison of the advantages and disadvantages that are asso- ciated with each of the five visualisation techniques.

31 Table 1: Comparison of multivariate visualisation techniques (on this and top of following page)

Visualisation Pros Cons Tools Technique Available / Licenses Small Multiples - Can be applied to any - Can show a large space ✔ graphical method - Can result in less visible de- - Imposes the visual compari- tails if there are many images to son and pattern trending display in thumbnail size among the data or objects - Uses the same measure and captured at different time scale across all thumbnails points - Can be challenging to compare - Overcomes the complex specific locations or complex at- overlying comparison meth- tributes ods that try to capture everything in a single display - Resolves the over-plotting issues EventViewer - Helps with the discovery of - The spatial context is absent, ✔ periodic patterns which limits the ability to com- - Allows users to make que- pare neighbouring areas ries for events stored in an events database - The user can specify a cate- gory of interest for a selected event among temporal, spatial and thematic categories - Granularities of the category of interest can be modified to reveal more patterns Ring Maps - Represents a time series and - The number of rings and enu- ✔ a variable series in a single meration units must be limited display to help with comprehension - Facilitates same area com- - The loss of the spatial infor- parisons because it uses mation of geographic units visual data categorisation - Since the spatial context is based on regions, for exam- static, web-based ring maps that ple, postcodes accept users’ interactions are needed Icons on Maps - Accepts any time - Must be integrated with tech- ✔ series visualisation techniques niques for icon positioning to - Suitable for comparison of prevent occlusion when too neighbouring areas many icons are placed on the map - The granularity of the infor- mation space might not match the granularity of the map

32 - Both the map area and the icon on that area might be resized. Thus, techniques for optimised information representation placement on an undistorted map must be applied

VIS-STAMP - Provides multiple ways of - Might not be the best option ×2 detecting complex patterns for users who require easy and across multivariate, spatio- interactive mapping facilities temporal patterns - Small objects may not be visi- ble because the tool uses small multiples

3.3 Examining Cartesian and Polar Visualisation

Cartesian and polar coordinates are commonly accepted representations of data of all types, including time series data. Cartesian coordinates, usually in the form of line plots, demonstrate the temporal connections between the plotted data through a horizontal time axis. Although time is usually aligned along a visual axis, polar coordinates can be used to represent time using spiral geometry where the width of the spiral indicates the density of the data [20]. In order to decide which coordinate system to use to visualise the project’s data, a detailed comparison of the coordinates was done using four evalu- ations taken from the literature. Only one of the evaluations was performed on time series data. However, it is believed that having this type of comparison would justify the ultimate selection of the ideal coordinate system.

3.3.1 Discussion

The coordinate systems were compared in four types of evaluation experiments. The evaluation experiments are described below. The details and conclusions of each exper- iment are provided in Table 2.

2 The support staff for VIS-STAMP was contacted to obtain permission to use the tool. However, they stated that the tool was no longer available for free use.

33 1. In this practical experiment, participants were shown screenshots while an eye- tracking system was used to monitor their eye movements. The participants then had to answer 18 questions [25].

2. Two studies were conducted using different numbers of participants. The first study was an online experiment in which participants were shown eight pairs of images and were asked to memorise the colour-coded positions/cells in the first image and to fill in the blank areas in the second image. Correctness and time were the dependant variables, while the different coordinate systems were the independent variables. The second study was an extended version of the first study that added visual context to the cells in the form of background patterns [26].

3. This experiment assessed the performance of time series visualisations during peak detection, temporal location and trend detection. Based on the small multi- ples concept, four glyph representations were used: Line, Stripe, Clock and Start glyphs [27].

4. In this randomised experiment, respondents received an email invitation once a month to visit a URL and complete a 30-min questionnaire. Participants were randomly shown one of five experimental arms in which each arm consisted of a bar or pie graph and a table. The participants were then asked three questions about the graph and table [28].

3.3.2 Comparison

To help reference the details of each evaluation in Table 2, each evaluation is given an index in the table that matches the index of its description above.

34 Table 2: Cartesian and polar coordinate systems comparison (continues on next page)

Ref- Tools Domain Population Background Cartesian Polar / Radial eren ce 1 - Timeline Trees - A se- - 35 participants were - Participants were - Outperformed in - More correct for the correlation questions (TLT) used a Carte- quence of divided into two students counting questions with - The use of thumbnails worked better, as it was sian representation transac- groups: TLT (17 par- respect to correctness of easier to distinguish locations in the radial lay- - TimeRadarTrees tions in ticipants) and TRT (18 answers and response out (TRT) used radial infor- participants) times representation mation hierarchy - Used an eye-track- ing system - The da- taset was related to soccer 2 - A sequence of - Colour- - First experiment had - First experiment First experiment: - Appropriate for focusing on a particular di- webpages that con- coded cells 674 participants participants came - 60% answered cor- mension tained visualisations to memo- - Second experiment from a mix of dif- rectly First experiment: in Cartesian or rise ferent backgrounds had 21 participants - Participants found that - 55% answered correctly radial coordinate sys- - Second experi- it was easier to memo- - Participants found that it was easier to memo- tems ment participants rise three cells rise a single cell (the centre point helped when had an academic - Outperformed in one cell was highlighted) background answer time (average - Participants found that it was harder to learn answer time was 1417 (performed better in the second four visualisa- ms) tions) Second experiment: - Average answer time was 1661 ms - Perception accu- Second experiment: racy/learning effect - Perception accuracy/learning effects increased increased at the begin- at the beginning of the experiment. ning of the experiment

35 3 - Four glyphs, Line - Financial - 24 par- - Partici- Task Meas- LIN STR CLO STA (LIN), Stripe (STR), stock data ticipants pants ure Clock (CLO) and (median were Peak Detec- Accu- 96% 60% 69% 34% Start (STA), were age 24 students tion (value racy used to display time years) 8 s 16.9 s 18.6 28.2 s comparison) series data Effi- s ciency Peak Detec- Accu- 24% 43% 77% 79% tion (time racy 27.6 s 25.5 s 15 s 17.7 s comparison) Effi- ciency Trend Accu- 63% 25% 39% 31% Detection racy 26.2 s 23.7 s 27.1 25.5 s Effi- s ciency 4 - Webpages con- - Self- - 897 par- - Mem- - 41.0% correct answers in finding the largest - Performed slightly better in finding the largest tained questionnaires assessment ticipants bers of category in the graph category in the graph (41.9% correct answers) about bar/pie chart health data (age 18 the Amer- - Performed better in finding the smallest cate- - 72.1% correct answers in finding the smallest and tables years and ican Life gory in the graph (75.8% correct answers) category in the graph older) Panel

36 3.4 User Interaction Techniques

The previous section focused on the representation of the information. Along with actual visual representation, interaction is also a core component of Infovis. Interaction is a powerful method that can be used to gain more insight into data. However, despite the persistent recognition of the importance of interaction, it does not receive enough atten- tion, especially in time series visualisation [29]. Many articles have intensively covered and conducted empirical evaluations of various data representation techniques, yet only a few papers have focused on interaction. Researchers have argued that interaction is an intangible concept because it can take place even with static images. This is because humans tend to mentally interact with images by navigating their eyes and focusing on details through passive interaction. However, there is a need for more research about interaction and how its techniques can be used as supportive methods in representations [30]. For example, in an Infovis system, interaction techniques strengthen the corre- sponding visualisation techniques. Interactions also give the user the abilities to alter the view, which could lead to further interpretations of the data [30].

The Visual Information Seeking Mantra is the foundational base of many techniques that use interactive visualisation. However, some papers have proposed techniques that are more widely used. In GIS applications, for example, highlighting and using tooltips are common interaction techniques. Highlighting manages how users view the dataset by directing their attention to a specific part(s), while tooltips interact with users by providing textual details as a result of hovering over an item in the interface. Other in- teraction techniques that have been associated with spatial contexts include select, explore, reconfigure, encode, abstract/elaborate, filter and connect [30].

This project aims at emphasising visualisation interaction by selecting relevant interac- tion techniques. The following section discusses different examples of visualisation techniques to aid geo-analysis.

3.5 Geo-Analysis Techniques

Various techniques can be used to visualise spatio-temporal data. The following are some examples of techniques visualising temporal, spatial or both data:

37 - A Google calendar-like display: This visualisation can be used to show temporal data only; the spatial element does not exist. Data for each day are displayed in pie charts in small multiples fashion. Users can navigate the date using forward and back buttons. Figure 14 shows an example of this technique.

Figure 14. A Google calendar-like interface for displaying temporal data [31].

- The medical cost of hip replacements by state: Two variables were visualised on a map of the United States (US). The map was divided by states, and the varia- bles were placed as icons on each state. The temporal element does not exist. Figure 15 shows an example of this tool.

38

Figure 15. Medical cost of a hip replacement by state [32].

- Historic weather-station data for the UK: An animated visualisation of historical climate data using small multiples of a map of the UK shown along with a cor- responding timeline (Figure 16).

Figure 16. Historic weather-station data for the UK [33].

39 - Mortality visualisation: An interactive dashboard is used to visualise the mortal- ity rate in the US. Users are able to filter the results on the map based on a variety of criteria. A snapshot of the dashboard is displayed in Figure 17.

Figure 17. The US mortality visualisation [34]3.

3.6 Research Questions

As an outcome of the literature analyses, the following four research questions were established:

1. Considering the multivariate and spatial nature of the project’s data, the compar- ative types of tasks and the ultimate goal of learning from the data, can a single visualisation technique be used while considering the requirements mentioned, the nature of the data and the relevant interaction techniques?

2. How can we apply the Visual Information Seeking Mantra considering the requirements mentioned (refers to Q. 1), the nature of the data and the relevant interaction techniques?

3. What is the best spatial granularity, such as street level, postcodes, or regions, considering the mentioned requirements (also refers to Q. 1), the nature of the data and the relevant interaction techniques?

3 The figure is for illustrative purposes only.

40 4. Which coordinate system is suitable for representing multivariate time series data?

3.7 Summary

This chapter has provided an elaborate analysis from the literature in order to eventually derive the research questions. The chapter began by examining multivariate time series data visualisation techniques. Then it presented a detailed comparison between the two well-known coordinate systems that are also used to represent time series data. The chapter then went on to discuss user interaction techniques that hold a similar level of importance as representations in InfoVis. After that, several examples of geo-analysis techniques were highlighted. Finally, the analysis resulted in initiating the research ques- tions that the project will try to answer.

41 4 Research Methods

4.1 Overview

This chapter describes the research methods followed in the project. It begins by elabo- rating the project’s methodology and detailing the choices of the visualisation techniques used. Then, the chapter explains the evaluation methodology of the project including the type of the evaluation methods used, the details of the evaluation tasks, the metrics and the strategy of the experimental design.

4.2 Methodology

A rational conclusion that can be derived from the conducted analysis is that no visual- isation technique is demonstrably better than the others. Each technique has its own advantages and disadvantages, and therefore, these techniques are usually used in com- bination. For example, many applications tend to use multiple views in which each view represents a different visualisation technique that can be altered by performing one or more interaction techniques [35]. Another common implementation is to use dashboards as interactive interfaces that provide different options and views in one display. The tool illustrated in Figure 17 is one such example of a dashboard.

This project will follow a hybrid approach to realise a collective advantage, maximise usability and interactivity, and to answer the research questions. Dashboards encom- passing maps with two visualisation techniques in different views will be developed to answer the first research question. The application of the mantra paradigm will be em- phasised to address the second research question. A spatial granularity level will be selected to answer the third research question. Both Cartesian and polar coordinate sys- tems will be used to answer the fourth research question.

On the basis of the analysis of the visualisation techniques performed in the last chapter, two methods have been chosen. These methods will be implemented in two different views. The goal of creating two different visualisation techniques is to enrich the user experience by enabling different ways of visualising the allergy symptoms data on maps. Additionally, both techniques will be analysed to determine which of them helps in pat- tern detection and provides better learning from the data. The choice of the two

42 visualisation techniques was made based on the conducted literature analysis on multi- variate time series data visualisation techniques in the last chapter. Although the analysis claimed that no technique exceeds the other techniques, but rather that each has its own strengths and limitations, the choice of the two techniques was affected by a number of factors. The first factor is the sake of simplicity which is particularly essential since the intended users of the widget are not specialists in geo-analysis. Therefore, the choices of the visualisation techniques that provide deep visual analytics including EventViewer and VIS-STAMP were eliminated. The second factor is the comprehension of the spatial information on the map. This eliminated the choice of ring maps since one of their lim- itations is the loss of associated information about the geographic units. In other words, ring maps have a limited representation of spatial topology which are the details of how the geographic features such as points and polygons are linked in the map [36]. This has narrowed down the choice to two visualisation techniques. Since the characteristics of the last two visualisation techniques suited the project’s nature, the choice of techniques was finalised. The first visualisation technique is a full map that follows the same con- cept of icons on maps but with a thematic type of mapping instead of icons4 and the second visualisation technique is small multiples.

Each visualisation technique represents a different view in the project’s tool5. The maps in both views are built based on the choropleth map concept. A choropleth map is a thematic map used to classify quantitative data on a map. Statistical measurements are used to aggregate the data and define the number of classes to be symbolised in the map in the form of area shading in order to indicate the density of the visualised attribute. Figure 18 shows an example of a choropleth map to illustrate its concept [37]. The col- our progression was chosen as a single hue progression; the density is depicted based on the colour shade in which the lightest shade represents the lowest number and the darkest shade represents the largest number. The median was selected as the statistical measurement to specify the number of classes (i.e. the number of shades). Generally, the classes depicts the severity level of the symptom by providing an exact value and a

4 The icons, in the form of pie charts, on the map did not meet the project’s requirements. More details are described later in the implementation chapter. 5 Both ‘tool’ and ‘widget’ will be used interchangeably to refer the implemented software.

43 colour range. The severity level of the symptom as submitted from the mobile applica- tion ranges from zero to three. The level of spatial granularity that should be shaded is indicated by postcode sectors. This granularity level was chosen since the postcode sec- tors are manageable and, in most cases, are visible areas on the map with sufficient space to render data when they are received. In contrast to univariate choropleth maps that depend on analysing a single variable, the project’s choropleth maps for both visualisa- tion techniques are multivariate that enable choosing one of four different attributes to be analysed. There will be a number of interaction techniques, including overview+de- tails, a zoom pane to zoom in or out, side panels to provide menus, and additional information and details-on-demand tasks, such as providing allergy symptom data in textual format upon mouseover.

The development method of the project follows an agile approach in which a working software product is released several times; each release meets the minimum require- ments. This approach was chosen to allow users to be engaged during the development process, as their acceptance was needed after each release. Furthermore, rapid feedback facilitated responses to change, thus ensuring that the application meets the users’ de- mands. The details of the two views are described next.

Figure 18. A choropleth map.

4.2.1 Full Map

The first view implements a full choropleth map of the medians of the allergy symptoms in postcode sectors with ‘General Wellness’ as the default attribute. The attribute can be changed to any of the three allergy symptoms based on the user’s selection. In addition, a line chart is provided on demand. This emphasises the temporal dimension in the spa- tial space by enabling symptoms’ comparison in any specified postcode sector over any

44 selected time range. Furthermore, to aid time series data visualisation, an interactive time slider is provided to allow the user to choose any time range or to visualise the historic time series data of the specified allergy symptom in all areas by sliding the slider and observing the change in the map. By default, the choropleth map displays the symp- toms over the entire time range. The user is able to aggregate the data from one day to any larger time range. The mantra tasks are present as well as other user interactions.

4.2.2 Small Multiples

The second view adopt the small multiples technique. A number of maps are visualised, with each map representing a timestamp. Similar to the first view, each map is a choro- pleth map with the possibility of filtering out the attribute visualised on the maps. Small multiples are synchronised in interactions to help better time and space comparisons. Owing to the limited screen space, a specific number of maps are displayed. The mantra tasks are present as well as other user interactions.

4.3 Evaluation Methodology

This section explains in detail the methodology that was applied to evaluate this project. The purpose of the evaluation was to assess users’ performance in each visualisation technique in order to determine the usability of the tool and identify whether the tech- niques enables users to compare, learn, and extract knowledge from the data shown on the map to make future decisions. In addition, an emphasis was made on interactivity, which is an essential aspect of time series data visualisation that has not yet been re- searched adequately [29]. To reach this objective, a summative usability evaluation was performed. The evaluation methods were both qualitative and quantitative. Qualita- tively, a directed usability test using the ‘Think Aloud’ method was undertaken to assess users’ performance on specific tasks by measuring the effectiveness and efficiency when using each of the two visualisation techniques. Quantitatively, the evaluation included direct enquiry using questionnaires to test participants’ satisfaction. The usability test- ing, the approach of the experimental design and the tasks that were assessed are explained next.

45 4.3.1 Usability Testing

In order to enrich their quality by discovering the strengths and flaws that users find in them, the interfaces are usability tested. Generally, usability testing encompasses sym- bolic tasks performed by representative users for testing prototypes, enhanced systems or working versions of software prior to their release [38]. A usability study was con- ducted for this project after finishing its tool implementation for the purpose of determining which of the two visualisation techniques the tool presents outperforms in visualising time series data and helps in better detecting spatio-temporal patterns. Prior to starting the study, the specific Computer Science research ethics to start the evaluation were met. The study involved recruiting 20 participants and it consisted of a directed evaluation test with an estimated maximum time of 45 minutes. The evaluation included a post-task questionnaire to measure participants’ satisfaction. The participants were given a 10-minute training session before starting the test in which the project’s objec- tives were described and a tool walk-through was conducted to explain its main functionalities and features. The directed evaluation adopted the ‘Think Aloud’ method- ology in which the participants speak aloud how they feel about the tasks while performing them. In ‘Think Aloud’ method, participants are encouraged to verbalise their comments and thoughts at which the evaluator will write them down and continue observing their performance [39]. The sessions were screen and audio recorded.

4.3.2 Tasks and Metrics

The participants were asked to complete nine tasks. Each group of three tasks belonged to a specific category. Each category represented a learning outcome. The outcome of the first category was to extract knowledge from the visualisation technique by testing whether the participants are able to find specific information. The outcome of the second category was to test to what extent the visualisation technique enabled performing spa- tial comparisons. The third category focused on the temporal dimension of the visualisation technique by testing whether it helped in conducting comparative tasks on time series data. These outcomes were examined through the three tasks in each cate- gory. In addition, the tasks examined the implications of the InfoVis mantra tasks on them such that each task corresponded to a task in the mantra. Table 3 list the tasks, their categories and the corresponding mantra task for each. The tasks were then assessed against three usability metrics. The metrics that were collected are the efficiency to

46 measure completion time of each task, the effectiveness to measure the accuracy and completeness with which users achieved the tasks’ goals and satisfaction which was collected through a five-point Single Ease Question (SEQ) that followed the Likert scale at the end of each task. The visualisation techniques formed the evaluation conditions. The first visualisation technique, namely the full map view, will be referred to as Con- dition ‘A’ while the second visualisation technique, namely the small multiples view, will be referred to as Condition ‘B’. The three collected metrics represent the dependant variables or the effects whereas the conditions represent the cause or the independent variable in the evaluation.

Table 3. The usability test’s tasks.

Task Task Num- Mantra Category Description Category ber Name a region/postcode that has severe 1 Overview Nose incidents just by looking at the map What is the density range of Eyes inci- 2 Zoom and Filter dents in postcode sector LD3 8 (Brecon, Extract Wales) knowledge Which allergy symptom has the highest Details-on-De- rate of incidents out of the four symptoms 3 mand in postcode sector NE66 2 (Northumber- land, England) What is the difference in density of Breathing between postcode sectors 4 Overview KA26 0 and KA26 9 (South Ayrshire, Compare Scotland) just by looking at the map (Space) What is the difference in the general wellness factor between postcode sectors 5 Zoom and Filter PE10 0 (Bourne, England) and LN4 3 ( Lincoln, England) in June6

6 This is a spatio-temporal comparison task.

47 What is the median Eyes value of post- Details-on-De- 6 code sector OX15 5 (Banbury, mand Oxfordshire, England) in April What time period witnessed low inci- 7 Overview dents of breathing In England just by looking at the map What are your observations regarding the Compare general wellness factor from March to 8 Zoom and Filter (Time) May in postcode sector KA26 0 ( South Ayrshire, Scotland )7 In May, what are your observations re- Details-on-De- 9 garding the evolution of Nose symptom mand compared to other months

4.3.3 Experimental Design

Having two conditions and a population of participants that was adequate for the nature of the research study implied the application of the ‘Between-group’ experimental de- sign. In this design approach, only one of the experiment conditions is revealed to each participant [38]. This was implemented by dividing the test into two versions. Half of the participants performed the test in one version and the other half performed it in the other version. Although the participants all experienced both conditions, the three cate- gories were swapped in each version so that the category conducted in Condition ‘A’ in the first version was conducted in Condition ‘B’ in the second version. This helped when comparing the participants’ performance for each condition.

4.4 Summary

This chapter has discussed the research methods that were applied in order to answer the research questions. It described the project’s methodology, which is divided into two parts. Each part represents a visualisation technique that was compared to each other

7 This is a spatio-temporal comparison task.

48 and compared against the nature of the project’s data and its objectives. Then, the chap- ter detailed the methodology that was used in the evaluation of the project’s tool. The chapter described the type of the evaluation method used, the number and nature of the tasks and metrics and the approach of the experimental design. The next chapter moves on to describe the project’s implementation.

49 5 Project Implementation

5.1 Overview

This chapter discusses the implementation phase of the project. It first describes the technology used and justifies its choice. It also explains the project’s architecture. Then, the chapter describes the characteristics of the project’s data. After that the chapter elab- orates on the development process.

5.2 Technology and Architecture

A web widget was developed for the #BritainBreathing project. Since the widget would be used by normal non-specialist users who probably seek interactivity and simplicity rather than advanced visual analytics, the technology choice was to go with open source JavaScript libraries that provide different facilities to build interactive maps instead of platforms such as ArcGIS or QGIS that provide wider options for in- tensive geo-analysis. PostgreSQL with the PostGIS spatial database extender was chosen as this project’s relational database. PostGIS enables storing and operating geo- spatial data. It performs PostGIS spatial operations internally in the database, and it can be used in GIS web applications through various libraries that support geospatial oper- ations such as GEOS, PROJ.4 and GDAL. The web framework that was used in the project is Django and its geographic module GeoDjango. Django is a Python web frame- work used for developing web applications while supporting rapid development and promoting software engineering principles such as abstraction and loose coupling. Django follows the Model-View-Template (MVT) architectural pattern. An MVT pat- tern divides the application into three layers. ‘Model’ refers to the data access layer, where the data are described in some defined Django or GeoDjango models after being retrieved from the database using the suitable database engine. ‘View’ represents the business logic layer that links the model(s) to the templates(s). ‘Template’ is the presen- tation layer of the application [40]. GeoDjango is an application in the Django package whose purpose is to enable the creation of GIS web applications.

In order to use GeoDjango, two components should exist. The first component is the project spatial database engine, which was PostGIS. The second component involves the geospatial libraries GDAL and GEOS. GDAL, which stands for Geospatial Data

50 Abstraction Library, helped accessing different types of vector file formats to import them to the project’s GeoDjango models. Geometry Engine Open Source library (GEOS) facilitated creating and managing geometric fields in the project’s GeoDjango models. The main Model layer of the project was a number of GeoDjango models. These models were created using Object-Relational Mapper (ORM), where the data stored in the project’s relational database was transferred into querysets. Each queryset represents a collection of objects that are stored in a model. The data in the models can then be migrated to the database, during any phase in the tool’s development, to update the database records. The View layer involved a set of Python functions; each specified a view, for which it requested information from the models. The information could then be processed before it was passed to the templates. The font-end of the ap- plication in the Template layer was a number of URLs and a set of templates. The URLs were the addresses of the project’s webpages. Using Django’s URL configuration, a number of patterns were created to help match the requested URL to the right view. The templates are HTML documents that were primarily developed using . Leaflet is an open source lightweight JavaScript mapping library that comes with a wide range of third-party plugins to extend its functionality. In combination with Leaflet, Highcharts open source JavaScript charting library was used to add interactive charts to the web application. As its basemap, the web application utilised an open source tiled web map provided by ArcGIS tiled web maps that is compatible with Leaflet tile layers. They replace the traditional (WMS) method in which a map is displayed as a single large image. Instead, a tiled web map is a map that follows the (TMS) protocol in which the map is displayed as a collection of tiles rendered from different resources. This ensures faster response time and better scalability. One exam- ple depicting these advantages is map panning, where the user in a tiled web map is able to pan the map without losing any of its parts, since the existing tile remains visible while the new tile is rendered.

The full architecture of the project is shown in the diagram in Figure 19. The architecture illustrates the three layers of the MVT pattern. The next section of this paper will focus on the nature and the sources of the project’s data.

51

Figure 19. The project Architecture.

5.3 Data

The project has two different sources of data. First, the data to be represented in the widget is a collection of historical and real-time allergy symptoms reports submitted by citizen scientists who participated in the #BritainBreathing initiative. Through a mobile application that was previously developed by the #BritainBreathing team and launched last March, citizen scientists were asked to access the app on a regular basis to provide their feelings as coarse inputs with three options: great, so-so, and bad. This reflected their ‘General Wellness’, which was one of the four analysed variables. Conditionally, they were required to provide a further description of the symptoms, namely, ‘Eyes’, ‘Nose’ and ‘Breathing’ as the other three variables. The reports are stored in a remote database and maintained by The University of Manchester IT department. Each submit- ted report consists of feeling, medication, and GPS coordinates; it also includes a number indicating the severity level of the symptoms. All the fields obtained from the mobile application and stored in the remote database, specifically in a table called re- motedata, can be seen in the Entity-Relationship diagram in Figure 20.

52

Figure 20. Entity Relationship Diagram.

The other source of the data used in this project is the postcode sectors’ data. The data should be acquired from an open source provider in a shapefile format. A shapefile is a well-known vector file format used to store the location, shape such as polygons, and other attributes of geographic features [37]. After conducting some research on the avail- able providers of open source map data for the UK, there were two options to choose form. The first option was to obtain the data from Ordnance Survey (OS), which is a well-known national mapping agency and a producer of different types of UK map data and layers. The required data is provided in a package named Code-Point with Polygons. The package offers the ability to access different types of data related to the postcodes, such as postcode units and the geographic boundaries of the postcodes. However, the dataset is not publicly available and needs to be purchased. To obtain the data, an access was granted to EDINA, which is a provider that supplies different online services for free access to benefit research and higher education in the UK. Particularly, Digimap, which provides maps and geospatial data for UK academia, including OS data and maps, was used to download the required dataset. The second option was to obtain the data

53 from Open Door Logistics. Open Door Logistics is a UK-based firm that provides open source spatial solutions for building systems with geographical features. It also produces different UK maps data in a variety of formats acceptable in GIS applications and allows free access to them.

In order to decide between both sources, a comparison examining the properties of the data in each source was made. The first source is supported by OS, which updates its datasets regularly, making it more accurate than the second source. However, the size of the data in the first source was quite large, since there are many details provided that might not be necessary within the project’s scope. For instance, the dataset provides the postcode units of every postcode area in the UK with each in an individual Shapefile. These Shapefiles would have needed to be first combined into one Shapefile to be loaded as a vector layer on the map or to be added as a table in the database. However, besides the large size of the Shapefile, the map could have been cluttered with a lot of data that might have affected the overall performance of the application.

The other source provides a comprehensive Shapefile that contains all the UK postcode sectors with no further details about individual postcode units. This suits the level of detail that could be adapted in this project. In addition, the reachability of the second source, since it is open source, makes it more convenient for future maintenance. Know- ing this, the second source was selected as the source of postcode data to feed the project’s map. The next section will detail software development which was imple- mented in an agile fashion.

5.4 Development Sprints

As mentioned earlier, the project has adopted agile software development that followed a time-boxed approach to build the software in an incremental manner, instead of the traditional sequential approach of building the software all at once. This choice was made since the dynamic nature of agile development suited the project, as the project’s requirements were not static, and rapid feedback was needed throughout the develop- ment. Prior to the development, the project’s requirements were broken down into user stories, which are short sentences, each of which describes a desired feature along with its justification. The stories were recorded in the product backlog, which is an on-going document that was frequently updated. A screenshot of the last version of the product

54 backlog is shown in Appendix 1. This section is divided into four subsections; each comprises a development sprint. In agile development, a sprint is a basic unit of devel- opment that is also called an iteration. The sprint life cycle starts with sprint planning to discuss and agree on a set of user stories to be implemented in the sprint and to set a high level goal. This is followed by a fixed time-boxed development effort to implement these user stories. The duration of each sprint was decided to be two weeks. Finally, a sprint review is conducted for acceptance testing [41].

5.4.1 Sprint One

5.4.1.1 Planning The goal of this sprint is to prepare and manage the data by implementing the appropri- ate data management processes. In addition, in order to show the technology’s suitability and to show sufficient understanding of the requirements, a minimum viable product with initial features should be created. The minimum viable product is a pilot experiment with the purpose of developing a small version of the product to test the validity of what are considered to be the product’s requirements, saving time and effort by avoiding the implementation of undesired functions [42].

5.4.1.2 Implementation As mentioned in section 5.3, the project has two main data sources. The first source is the allergy symptoms reports submitted by the citizen scientists. These data reside in a remote database controlled by the University of Manchester IT department. The second source of the data consists of the polygons of the UK postcode sectors. In a map, a polygon represents a closed spatial geometric shape that consists of a set of X, Y coor- dinate pairs connected sequentially to form an area, as illustrated in Figure 21 [37]. In order to be prepared for development, the data passed through several stages. The first stage included importing the data from the two main resources. The remote data in the first source was imported as a comma separated values (CSV) file. The initial collected dataset consisted of two weeks’ data since the launch of the mobile application. How- ever, the size of the data was convenient for development-related tasks. The second source of data was acquired by downloading the polygons of the UK postcode sectors in Shapefile format from Open Door Logistics. After obtaining the required data, Lay- erMapping, which is a class provided by the GDAL geospatial library, was used to map

55 the Shapefile of the postcode sectors data into the project’s GeoDjango models. Using the Python CSV module, the CSV file of the allergy symptoms data was also imported into the project’s GeoDjango models. The geometric fields were created using the geo- spatial library GEOS. However, the data needed to be processed before being stored in the PostgreSQL PostGIS database so that eventually each record of the allergy symp- toms data would be connected to the right postcode sector. To reach this goal, the latitude and longitude of each record were converted into a geometric point in which a point is a single X, Y coordinate. It symbolises a geographic feature that cannot be represented as a line or area due its small size [37]. Some of the points existed outside UK bounda- ries. This is because the citizen scientists committed to submitting their allergy symptoms even when they were outside the UK. Since the project is focused on the evolution of the allergy symptoms and what can be learned from them in the UK, the entries from outside its borders were ignored. The other points were then linked to their corresponding postcode sectors by checking whether the geometric point exists in that sector’s polygon. Finally, the data were migrated from the GeoDjango models to the PostgreSQL PostGIS database, which contains two tables for storing the allergy symp- toms and the postcode sectors. The database schema structure is shown in Figure 22 (next page).

Figure 21. A map polygon [37].

56

Figure 22: The project’s database schema

57 Upon preparation of the backend database and after installing the required JavaScript plugins for the frontend, the second part of the sprint was to implement a minimum viable product to show the basic functionality of the complete system. Initially, a web tile map from OpenStreetMap was used. OpenStreetMap is a large open source collab- orative project for providing free and editable maps of the world. The map, in the form of a web Tile Layer, was used as the UK’s basemap and was loaded in a Leaflet map container. Leaflet accepts the JavaScript Object Notation (JSON) data format to feed its maps. JSON is a well-accepted data interchange format that uses data objects in the form of attribute-value pairs. Owing to its lightweight nature, readability and fast computer parsing, it is commonly used for asynchronous communications between the client and the server [43]. Figure 23 describes the JSON object syntax with a simple example. Using Python, the objects of the postcode sectors polygons were queried. Then, using Django’s serialisation framework, the objects were serialised and stored in a GeoJSON file that was added onto the Leaflet map. GeoJSON is a format that can represent geo- graphic and non-geographic features based on the JSON format.

Figure 24 shows a part of an object from the output file that represents a collection of GeoJSON features, where each feature is a multi-polygon containing the postcode sector’s name and coordinates. Multi-polygons were used to address each postcode sector that further includes postcodes as one feature in the map. However, for the sake of simplicity, the postcode sectors will be referred to as polygons only. Initially, the allergy symptoms data were aggregated based on the average of each of them in each postcode sector. This was then serialised into a JSON file along with the centroid point of each polygon, as can be seen in the snippet in Figure 25. Then, using the Leaflet Markercluster plugin, which facilitates drawing icons on maps, the JSON data were plotted in pie charts that were then placed on top of the already added postcode sectors by positioning the ceratoid points on them. The final result of the sprint, in Figure 26, shows the UK basemap, the postcode sector polygons bordered in blue, and the symp- toms’ pie charts, in which blue represents ‘Eyes’, green represents ‘Nose’ and red represents ‘Breathing’. The flow of the data throughout the sprint is depicted in the chart in Figure 27.

58

Figure 23. JSON object example.

Figure 24. The GeoJSON output.

Figure 25. The JSON output.

59

Figure 26. The first sprint result.

Figure 27. Data flow in Sprint one.

5.4.1.3 Testing Following implementation, a sprint review meeting was held in which a demo of what was accomplished was conducted for user acceptance testing. The review resulted in feedback on the implemented stories and in new approaches towards the next

60 sprint’s stories. The discussion points that were adapted in the second sprint are sum- marised below:

- The pie charts shouldn’t be displayed all at once, as this resulted in occluding icons on the map. Instead, it should be left to the user to reveal the charts by introducing interactive controls.

- The chart should have explanatory legends.

- Instead of outlining all the postcode sectors, only the ones that have data should be outlined.

- The ‘General Wellness’ factor should be visualised along with ‘Eyes’, ‘Nose’ and ‘Breathing’ symptoms.

- The density of the allergy symptoms and how they vary across the different regions could be displayed in the form of a choropleth map.

- The average of the symptoms per postcode should be substituted with the median. This is because the values are not distributed symmetrically but instead are based on users’ inputs, which vary, depending on how they feel. Thus, it is more sensible to use the median to better detect any general tendency in the data.

5.4.2 Sprint Two

5.4.2.1 Planning The goal of this sprint is to reflect on the last sprint’s feedback and to further improve the interface by adding interactive features. This includes enabling the users to filter out the allergy symptoms by choosing which one to explore solely on the choropleth map. Also, interactions should exist on clicking and/or hovering over the postcode sectors. Moreover, the pie charts should be displayed on user demand. The statistical measure- ment of the displayed allergy symptoms data should be the median of each symptom in each postcode sector.

5.4.2.2 Implementation In order to implement the choropleth map, the GeoJSON backend had to be changed so that it included the values of the median of each of the three symptoms as well as the general wellbeing for each postcode along with its polygon value. Besides the allergy data, the new GeoJSON only included the postcode sectors where allergy data were

61 submitted. In other words, the outputs of the two files implemented in the last sprint were joined into one file, discarding the postcode sectors that did not have allergy data. A sample object from the updated GeoJSON can be seen in Figure 28. After updating the backend file, the frontend also had to be updated. Firstly, the colour scheme was selected with the help of COLORBREWER 2.0, which is an online tool that assists in choosing the right colour scheme in mapmaking tasks by providing several helpful features such as selecting the desired number of classes, the nature of the data, and the map’s context [44]. After the colour scheme was specified, a function was created to return the colours based on the severity of the symptoms so that the lighter colours indicate zero to light symptoms, while the darker colours meant that the symptoms were severe in the postcode sector of interest and require more attention. After that, the symp- toms needed to be filtered out to enable the user to control what attribute(s) would be displayed by default in the map’s overview. This was implemented by providing a dropdown menu to enable selecting among the different symptoms and updating the map accordingly. Figure 29 shows the choropleth map and the filtering dropdown menu along with a legend panel to view the colour scheme.

Figure 28: The updated GeoJSON output.

62 Figure 29: The choropleth map and the dropdown menu to filter the symptoms.

In order to improve the user experience, interaction events were introduced. When a mouse hovers over a postcode sector, it is visually highlighted and its exact median value is shown in a side panel. When the mouse is moved out of the postcode sector, it returns to its previous un-highlighted state. When the postcode sector is clicked, the map zooms in to that postcode. Figure 30 illustrates a postcode sector being hovered with its median value shown in the side panel in the upper right side of the window while Figure 31 displays the postcode sector in a greater zoom level as a result of clicking it.

Figure 30. A postcode sector with a median well-being of 2 highlighted in grey as result of hover- ing over it.

63

Figure 31. The postcode sector after clicking it.

In order to prevent the icon occlusions that result from presenting all the postcode sec- tors’ corresponding pie charts as had occurred in sprint one, the pie charts were hidden and only displayed upon the user’s demand. To implement this feature, when the mouse hovered over the postcode sector, along with highlighting the sector, the pie chart of the median values of the ‘General Wellness’, ‘Eyes’, ‘Nose’ and ‘Breathing’ symptoms in that postcode sector were displayed on top of it. This can be seen in Figure 32. They faded away once the mouse was no longer hovering there. A descriptive legend for the pie charts was also added. The flow of the data throughout the second sprint is illustrated in the chart in Figure 33.

Figure 32. A pie chart is shown upon hovering over a postcode sector.

64

Figure 33. Data flow in Sprint two.

5.4.2.3 Testing The features of the second sprint were demonstrated in the sprint’s review for user acceptance testing. The review resulted in several changes in the project’s methodology. The changes derived from a discussion of whether the polar coordinates in the form of pie charts helped in extracting knowledge from the multivariate allergy data, and whether this representation helped in presenting their time series nature, as posed in the fourth research question. The current pie charts lead to an apparent comparison of the proportions of the symptoms, letting the user determine which symptom is the most severe because it has the greatest median value. Nevertheless, identifying the evolution of these symptoms over time is problematic. A time slider can be used to enable users to visualise the pie charts at different time periods. However, this could be visualised via the choropleth map by observing the changes in the density colours. Also, visually comparing the pie charts for different time periods is not possible, as it would depend on recalling the previous values to compare with the new values. Although the fourth experiment in the literature analysis performed in section 3.3 indicated that pie charts performed better in finding the largest category, the lack of experiments comparing their performance in time series data compared with Cartesian coordinates in the form of line charts could lead to speculative assumptions. However, based on the user experience during the second sprint, the pie charts were not sufficient to extract the expected knowledge. Therefore, a decision was made to change allergy data representation, which

65 led to further changes in the project’s methodology. The post-sprint review points are listed below:

- Instead of pie charts, the allergy symptoms data should be represented in a line chart to emphasise the time dimension while also comparing the differences in the density of the symptoms.

- In order to filter the data based on the time they were recorded, a time slider should be introduced.

5.4.3 Sprint Three

5.4.3.1 Planning The goal of the third sprint is to visualise the allergy symptoms in a line chart to enable their comparison across time. In addition, a time slider to aggregate the data based on any time range should be added. Interactions should be adjusted based on the new features.

5.4.3.2 Implementation As a foundational step to introduce the time slider, the backend of the tool has been updated from the static files to an API call via a server. Technically, once the slider is set to a specific time range, the allergy data should be statically aggregated to that time range. Consequently, fetching the data has been switched from using a static GeoJSON file to a more dynamic approach in which the data are provided through a URL, and the server updates that URL, hence updating the choropleth map, each time the data is aggregated. The database has been updated to include a bigger dataset retrieved from the remote database. Following the server-side preparation, the time slider was added using the Leaflet time slider plugin LeafletSlider, which enables attaching a slider that can actively update the data on the map based on the time range selected. Descriptive labels showing the date and the exact time when the first report was submitted, and the date and exact time when the last report was submitted, were added to both ends of the slider. Additionally, a label displaying the currently selected time range was provided below the slider, as presented in Figure 34.

66

Figure 34. The time slider set to a specific time range shown below it.

Using the JavaScript charting library Highcharts, a spline chart was created. A spline chart is a type of line chart that renders a curved line between the points in the data series. It was used to smooth out the data series to better distinguish peak occurrences. Contrary to pie charts, whose circular layout would spatially fit into any map region, the rectangular shape of line charts can easily hide the region underneath it. Also, displaying line charts with more than one data series on top of each affected region will result in a vast amount of data with fewer insights to build correlations from. In addition, the in- creased amount of information displayed at once does not necessarily imply better graphical perception [45]. Knowing this, the spline chart was placed in a side panel at the right side of the map. Once a postcode sector was clicked on, the spline chart should be unveiled to visualise the trend of the four attributes over the selected time period. Figure 35 shows the evolution of the symptoms for postcode sector DE74 2 from the beginning of May to the middle of July in the spline chart. Interactions were initiated such that a tooltip displaying the date and the value was shown when a data point was hovered over in any of the four data series. Furthermore, to aid with focused compari- sons, any data series can be hidden and shown again by clicking its legend, as demonstrated in Figure 36.

Figure 35. Spline chart visualising the trend in the symptoms in postcode sector DE74 2.

67

Figure 36. Spline chart visualising the trend in ‘Nose’ and ‘Eyes’ symptoms in postcode sector CM3 6.

For the purpose of clarity, the map layer has been changed to a Leaflet extension of the ArcGIS tiled web map Esri.OceanBasemap. This layer provides a lighter background; hence a clearer choropleth map as can be seen in Figure 35 and in Figure 36. The new data flow that was followed in sprint 3 is displayed in Figure 37.

Figure 37. Data flow in sprint three.

68 5.4.3.3 Testing The working software was demonstrated in the third sprint review. Consequently, the final result was approved in the acceptance test. The collected feedback to be adapted in the final sprint were as follows:

- As a result of increasing the dataset, the number of polygons on the map in- creased. Thus, the response time also increased. This was clearly seen in the time it takes to upload the allergy data in the map, zoom in and out, and drag the map. In addition, sliding the time slider takes time to progress. This is a result of hav- ing to aggregate the data and call the API to populate a new GeoJSON each time a new time range is selected. This performance issue should be fixed in the final sprint.

- The spline chart should have a fixed time granularity on its X-axis. This should later on be matched with the time granularity set for the other visualisation tech- nique that will be created in the final sprint, namely, small multiples.

5.4.4 Sprint Four

5.4.4.1 Planning The goal of the fourth sprint is to implement small multiples as another view in the tool. Additionally, the map polygons in both views should be optimised for the purpose of speeding up the map’s performance. The interactions should be further customised as required. Finally, the look and feel of the tool should be enhanced.

5.4.4.2 Implementation Small multiples were principally created using the Leaflet Plugin Leaflet.Sync. This plugin enables creating a synchronised view of two or more maps. The maps were divided in a time granularity of months resulting in five small maps from March to July. The five choropleth maps were synchronised in different actions to help perform com- parative tasks more accurately. For instance, once a map is zoomed in, zoomed out or dragged, the same action will be performed in all other maps. This helps users not lose track of the area they are interested in. It also minimises the divided attention effect. In addition, to facilitate deeper same-area comparisons, clicking a postcode sector in any

69 map results in zooming to that postcode zone in all maps. A side panel was provided with a dropdown menu to filter out the attributes and the colour scheme legend infor- mation. The small multiples interface is shown in Figure 38. In order to expedite the performance of the map, the polygon geometric shapes were simplified using the Doug- las-Peucker algorithm. This algorithm replaces the original curves in the geometry with simplified ones. Using PostGIS, the algorithm was implemented by straightening out the wiggles in the original curves that varied from a straight line by less than a distance threshold of 0.10 meters. Figure 39 (left) shows an area in the map before implementing the Douglas-Peucker algorithm, while Figure 39 (right) shows the same area after ap- plying the algorithm. The time it takes for the map to upload data has been reduced from 15.29 seconds to 15.01 seconds. However, the map’s response to the mouse actions did not recognisably improve.

The time granularity for the X-axis in the spline chart was set to months. Nevertheless, due to the reduced amount of data from some postcode sectors, these data could not be plotted in the chart, as the chart has fixed durations. Displaying such data was only possible by not fixing the granularity. The data flow in this sprint was not changed.

Figure 38. Small multiples.

70

Figure 39. An area in the map before (left) and after (right) implementing the Douglas-Peucker algorithm

5.4.4.3 Testing The features that were produced in the final sprint were demonstrated in the sprint re- view, resulting in a final approval for both views provided by the tool. Despite the limitations that will be discussed later in section 8.2, the tool was ready to be evaluated8.

5.5 Summary

This chapter has described the project’s implementation. The chapter began with a de- scription of the technology chosen to implement the project’s tool. It also stated the type of the project architecture and explained its structure. Then, the types of the project’s data, its sources, and data acquisition processes were explained. The chapter then fo- cused on software development by dividing its fourth section into four further subsections. Each section represented an agile sprint, which is a development iteration that produces working software at its end. Each sprint consisted of three phases. The first phase was the planning phase, in which the requirements in terms of user stories were updated, and the sprint goal was set. The second phase was the implementation phase, where the software was developed. The last phase was the testing phase that pre- sented the sprint for review. In each review, the finished product was demonstrated to collect feedback and reflect on it. In the next chapter, the tool will be empirically eval- uated via usability studies.

8 The full source code of the project is provided in GitLab which is the web-based git repository manage- ment service of Computer Science school [53]

71 6 Project Evaluation

6.1 Overview

The purpose of this chapter is to discuss the analysis and results of the evaluation meth- ods. As previously mentioned, the evaluation methods are both qualitatively and quantitatively acquired via a directed usability test and direct enquiry using question- naires. The chapter is divided into two sections. The first section describes the demographics of the participants while the second section moves on to discuss the eval- uation analysis and results.

6.2 Demographics

The usability study was conducted on 20 participants (8 male, 12 female) from the Uni- versity of Manchester student population with a median age of 26. The participants had different academic backgrounds including Neuroscience, Computer Science, Physics, Business and Social Sciences. Five of the participants were Ph. D students and fifteen were Master’s students. None of the participants reported colour vision deficiencies.

6.3 Analysis and Results

This section turns to the analysis performed on the collected data from the evaluation methods. The statistical software used to conduct the analysis was SPSS which is a well- known software package for statistical analysis. To measure the effectiveness, task com- pleteness was assessed. This was a Boolean value of one if the participant was able to successfully answer the task and zero if the answer was incorrect or the participant quit- ted the task. The average completion rate of each task per condition was calculated. The column chart in Figure 40 illustrates the result. The X-axis represents the tasks while the Y-axis shows the mean values. Each task has two columns for the two conditions. To measure the independence of the completion rate variable by testing if its mean val- ues have a significant difference in Condition ‘A’ and ‘B’, Fisher's exact test was used. The test was selected as the sample size is small and each cell has a value of either one or zero. When the mean values of the tasks in each condition were compared, the test showed no significant differences. The test results are reported in Table 4.

72

Mean Values of Compleation Rate 1.2

1

0.8

0.6 Mean 0.4

0.2

0 A B A B A B A B A B A B A B A B A B 1 2 3 4 5 6 7 8 9

Figure 40. The mean values of completion rate in each condition.

To measure the efficiency, the completion time of each task per condition was analysed. To achieve this, first, the task time in seconds per every participant was logged. Since the data was collected from two independent samples using ‘Between-group’ design, before selecting the right statistical test, the assumption that the data is normally distrib- uted should be examined. This was achieved by performing Shapiro-Wilk statistical normality test on the data. The probability to reject the assumption was set to the alpha- level (α) of 0.05, so that the difference would be significant if the significance level (p- value) is less than 0.05. The results of the test showed that the time data are statistically different from the normal distribution with a value of significance as p = .000. Therefore, the assumption was not met. This result can be demonstrated in the histogram in Figure 41 and the Q-Q plot of time in Figure 42.

73

Figure 41. A histogram showing Shapiro-Wilk test result on the task time.

Figure 42. A Q-Q plot showing the result of Shapiro-Wilk test on the task time.

After presuming that the data are not modelled in a normal distribution, Mann-Whitney's U test was conducted on the data. Mann-Whitney's U test was used to compare the means of the task completion time in each condition and to determine whether the dif- ference in the means is statistically significant. Table 4 presents the summary statistics

74 for the test. The results indicated that five out of the nine tasks are significantly different; namely, tasks 2, 3, 4, 7 and 9. Task 1 and task 8 are marginally significant (p < 0.1). These results provide important insights into the fact that the performance of the tasks under the two conditions differs. Figure 43 shows the mean values of all the nine tasks in each condition illustrated in column charts with error bars representing a Confidence Interval of %95.

Figure 43. The mean values of completion time of tasks in each condition.

As mentioned earlier, the third metric’s data were collected though a subjective Single Ease Questionnaire (SEQ) for satisfaction. Since the SEQ uses the Likert scale for each question, the two adjacent data points might be different in distance [38]. Therefore, the assumption of the normal distribution of the data might not be met. To test the assump- tion, Shapiro-Wilk test was executed on the SEQ data. The test resulted in rejecting the assumption as it revealed that the data were not distributed normally with a significance value of p = .000. This can be visually seen in the histogram in Figure 44 and the Q-Q plot in Figure 45.

75

Figure 44. A histogram showing Shapiro-Wilk test result on the satisfaction.

Figure 45. A Q-Q plot showing the result of Shapiro-Wilk test on the satisfaction.

Mann-Whitney's U test was run on the data to compare the means of the level of satis- faction of each task in each condition and assess the statistical significance of their difference. The results obtained from the test analysis are reported in Table 4. It is ap- parent from this table that tasks 4, 7 and 9 are significantly different. Tasks 5 and 8 reported marginal significance. These results are supported by the graph in Figure 46,

76 where the mean values of all the nine tasks in each condition are displayed in column charts with error bars representing a Confidence Interval of %95. The next chapter will thoroughly discuss these results.

Figure 46. The mean values of the satisfaction of tasks in each condition.

Table 4. The summary statistics of the p-value in each task between the two conditions where the asterisks indicate the level of significance in the difference.

Task Effectiveness (Com- Efficiency (Comple- Satisfaction Number pletion Rate) tion Time)

1 1 0.082 0.664

2 NA9 0.001** 0.521

3 0.211 0.016* 0.488

4 1 0.021* 0.005**

5 1 0.597 0.074

9 The completion rate is constant.

77 6 1 0.850 0.608

7 0.11 0.000** 0.002**

8 1 0.059 0.089

9 0.582 0.014* 0.007**

6.4 Summary

This chapter has explained the analysis of the evaluation phase of this project. In the first part, the chapter described participants’ demographic information. Then, in the sec- ond part of the chapter, the analysis conducted on the data gathered from the evaluation methods and the findings drawn from these analyses were highlighted. The results from this chapter indicated that there was a difference between both visualisation techniques in terms of the easiness of extracting knowledge and spatial and temporal comparisons. The next chapter, therefore, moves on to discuss these findings and states how this could affect the design of time series visualisation techniques.

78 7 Discussions

7.1 Overview

This part of the dissertation discusses the findings that emerged from the statistical anal- ysis. It begins by interpreting the analysis conducted in the last chapter and explaining the results. It then reflects on these findings with respect to the research questions. Lastly, the chapter outlines the implications for design and discusses what can be learned based on these findings.

7.2 Analysis Interpretation

The implemented analysis, generally, showed an explicit difference between the condi- tions set to represent each visualisation technique. Some tasks were performed faster in one visualisation technique and some were performed satisfactorily better. However, a few tasks did not show a significant difference in their performance on either technique. The tasks’ completion rates, as depicted in Figure 40, show that most of the tasks were successfully completed. Notable exceptions with large mean differences are seen in tasks 3, 7 and 9. The results of the tasks per category are discussed below. - The tasks with the goal of extracting knowledge with no specific emphasis on space or time took longer to be performed in small multiples visualisation tech- nique. Surprisingly, the level of satisfaction did not show a significant difference between the two visualisation techniques. o The first task showed a marginally significant difference in its comple- tion time. The users took more time performing the task in small multiples than the full map. As mentioned in the literature analysis when small multiples visualisation technique was examined, it was hypothe- sised that the visibility of details could be difficult due to having a number of thumbnails. This is confirmed by the long task time and by the fact that a number of participants needed to zoom in order to find the required information, although the task was also an infoVis mantra over- view task. However, the task level of satisfaction showed no difference between the two visualisation techniques.

79 o The second task was significantly different in the time it took to be com- pleted with a mean difference of 186.4 seconds. It was faster to accomplish in the full map. The task was examining the possibility of finding the density range of ‘Eyes’ symptom in a specific postcode sector where the participants were expected to navigate the map to locate the postcode and directly detect the density range by observing its colour and corresponding class. Similar to the first task, one possibility of this out- come is that small multiples could lead to less visibility of details, and hence, a longer task time. Another noticeable reason that was observed from the experiences of participants who accomplished the task in small multiples was the divided attention effect especially that the task does not require focusing on a particular time zone. Surprisingly, task satis- faction was not significantly different. o Similar to the previous two tasks, the third task was faster to perform in the full map. The aim of the task was to compare the multivariate allergy symptoms in a postcode sector. In the full map, the participants used the line chart which allowed them to rapidly spot the trending symptom. However, those who implemented the task in small multiples had to filter out the symptom using the dropdown menu and observe the change in colour and value of the area of interest. Knowing the performance issue discussed previously in the implementation chapter, this has played a role in increasing the waiting time as the maps needed to re-load each time the symptoms were altered. Another surprising finding was that there was no significant difference in terms of task satisfaction. - The tasks with the spatial comparison goal tended to be performed better in the full map view. Interestingly, in two of the tasks performed in the full map, the satisfaction level was either low or indifferent. The fifth task has a spatio-tem- poral nature of comparison. However, its results show only a marginally significant difference between both conditions. o The fourth task reported a significant difference in both its completion time and satisfaction levels. More time was spent with lower satisfaction levels in small multiples. The goal of the task was to compare between two postcode sectors in terms of the severity of ‘Breathing’ symptom.

80 The drawback found in the literature analysis that comparing specific lo- cations can be challenging in small multiples was evidently seen in this task. The divided attention effect is also applicable in this task as the comparison was not temporal-based. o While there was no significant difference in the completion time of the fifth task, it had a marginally significant difference in participants’ satis- faction, with less satisfaction when the full map was used. The task compared the ‘General Wellness’ between two postcode sectors in June. A possible explanation for this might be that since the time granularity in small multiples was based on months, the participants did not find it difficult to locate the month of interest and perform the comparison. However, using the full map, the task was not straightforward, as the time slider had to be first set to June. Also, having a time slider with no explicit month labels made it trickier to manipulate. o The reports of the sixth task indicated no statistically significant differ- ences in both metrics. - The last three tasks, which concentrated on temporal comparisons, suggested that comparing time series data was more efficient and competent when per- formed in small multiples. It is worth to note that the eighth task has a sptio- temporal aspect of comparison but the results also showed a relatively low sig- nificant difference between the two visualisation techniques.

o Interestingly, the reports of the seventh task showed a significant differ- ence in terms of efficiency as well as satisfaction. When performed in the full map, the task took considerably longer time than small multiples. Additionally, the task satisfaction was lower in the full map (with a mean difference of 2.1) than in small multiples. The purpose of the task was to perform a temporal comparison to discover a time period that had a few ‘Breathing’ incidents in England. The task was broad in terms of its scope, with no precise answer assumption. This was found to be chal- lenging for many participants compared to the tasks that specified a time range or a postcode sector. However, performing the task was notably slower and more difficult in the full map. This might be related to the nature of the task, which was depending heavily on utilising the time

81 slider. Another anticipated reason is that while the view is synced in small multiples, using the full map, the participants had to rely on their mem- ories when they were performing the task. In other words, when the time slider moved to a new time range, the participants had to recall the map’s pervious scene in order to compare it with the current scene. The average successful completion rate of the task in the full map was 0.40 compared to an average of 1 in small multiples.

o The eighth task which asked users to perform a time series comparison in a particular postcode sector, showed a marginally significant differ- ence in the satisfaction, showing lower satisfaction when performed in the full map. Nevertheless, the task did not show any significant differ- ence in its completion time. It may be that these participants benefitted from the learning effect resulted from experiencing the previous task, as this task also included interactions with the time slider. However, it is somewhat unexpected that a number of participants did not reveal the line chart, which could have led to more insightful answers. This might be claimed to be a design issue. The implications for design will be de- tailed in the next section.

o The ninth task had a significant difference in its completion time and levels of satisfaction, with better results towards small multiples. The task was based on temporal comparisons but it did not specify an area. These results are likely related to the rationale behind the results of the former two tasks.

7.3 Implications for Design

The findings in the analysis have a number of practical implications for design. The design guidelines emerged from what has been learned from the visualisation of multi- variate spatio-temporal data are described below.

- The visualisation technique should not put a burden on users’ memories. In order to perform spatial and/or temporal comparisons, the comparison aspects should all exist in the same scene. This was raised from the difficulty faced by

82 users to perform temporal comparisons on the full map due to the need to re- member the map’s scenes while sliding the slider over the different time ranges. Strategies to tackle this issue might include capturing the currently viewed map in a Blending Lens similar approach, in which the current map will be kept in the Blending Lens while the user updates the slider to a new time range. This method would aid vision-driven comparisons and users would not be forced to rely on memory.

- The visualisation technique should have a limited number, preferably less than five, of small multiples. As noted from the literature analysis and validated in the tasks’ statistical analysis, small multiples might result in less apparent fea- tures. Although the number of maps displayed in small multiples view was five. In some tasks, the participants found it difficult to spot the required information from the overview level and alternatively zoomed in to reveal more details. A suggestion to adopt an approach similar to the one displayed in Figure 16 might be considered. In this approach, a timeline is provided so that once the visuali- sation is set to the animation mode, the time periods will be updated based on the timeline, which will update the data in the three small multiples of the map accordingly.

- The visualisation technique should not cause divided attention problems. The analysis showed that the participants encountered divided attention when asked to perform tasks with no specific time period in small multiples. This led to the assumption that small multiples might not be the best visualisation tech- nique for spatial comparisons.

- The visualisation technique might encourage more comparisons on the Car- tesian coordinate system. The use of the line chart to compare the symptoms’ evolution in a specific postcode sector over time produced an effective outcome in terms of effectiveness, efficiency and satisfaction. Therefore, comparing the symptoms of two regions over time using dynamic multi-line chart(s) might be considered.

83 7.4 Summary

This chapter has discussed the results of the statistical analysis and its implications. In the first part, the chapter has conducted a detailed analysis interpretation in which the significant findings were highlighted and some corresponding explanations and possi- bilities of having such results were drawn. Then, the second part of the chapter described a number of implications for design to help in future practice.

84 8 Conclusions and Future Work

8.1 Conclusions

This project set out to investigate the current visualisation techniques of multivariate time series data that are represented in a spatial context. This has helped to develop an interactive widget to visualise allergy symptoms in the UK and detect spatio-temporal trends. The lack of sufficient evaluation for many of the proposed techniques resulted in the decision to acquire a hybrid approach in which two visualisation techniques were implemented, namely the full map and small multiples. The two techniques were then evaluated for the purpose of determining which one better encourages knowledge ex- traction and spatial, temporal or spatio-temporal patterns discovery. While tasks focusing on spatial comparisons generally outperformed in the full map, temporal com- parison tasks showed better performance in small multiples. Tasks with the spatio- temporal type of comparisons showed a relatively low significant difference between the two visualisation techniques.

The project has thrown up four questions as a result of the literature analysis for further investigation throughout the development and evaluation. With respect to the first re- search question, it is now possible to state that in order to obtain a collective advantage, expand usability and enhance interactivity a single visualisation technique is insuffi- cient. Instead, composite approaches of two or more visualisation techniques might be considered. The second research question was addressed by introducing the InfoVis mantra tasks and other interactive events, such as clicking and hovering in the imple- mented tool. After evaluating the tool, it was found that having interactions has helped better tasks’ performances overall. The third research question remains unanswered at present. Although, the postcode sectors were manageable and visible areas on the map, it is still unclear if this spatial granularity is the best for the project’s nature and type of data. Further investigation on the granularities of a wider space such as postcode areas should be undertaken. Regarding, the fourth research question, it can be concluded that after implementing and testing both polar coordinate system, represented in the pie chart, and Cartesian coordinate systems, represented the in line chart, polar coordinate system did not meet the comparison-based nature of the project. Also, if the time dimen- sion is not introduced in polar coordinate system, which is not the case in the pie chart,

85 it will not be sufficient in presenting the multivariate time series data. Therefore, due to its time visual axis and the promising outcomes from the evaluation analysis, Cartesian coordinate system is suitable to represent multivariate time series data.

8.2 Limitations and Further Work

The main weakness of this project was the tool’s performance due to the heavy map polygons’ data, which has led to an overall longer response time. In order to expedite the performance, elevating the spatial granularity from postcode sectors to postcode ar- eas is recommended. This will lead to fewer polygons on the map; hence, it is believed that it will result in faster data loading and map interactions. In respect to the analysis of the multivariate time series data on maps, it would be interesting to assess the use of 3D information visualisation, such as 3D icons for mapping temporal dependencies, in a spatial context. Finally, the allergy dataset could be compared with other publicly available datasets, such as pollen count and pollution so that further patterns and trends can be found.

86 9 References

[1] M. Weber, M. Alexa and W. Müller, “Visualizing Time-Series on Spirals,” in Infovis, San Diego, 2001.

[2] S. Thakur and A. J. Hanson, “A 3D Visualization of Multiple Time Series on Maps,” in 14th International Conference Information Visualisation, 2010.

[3] M.-J. Lobo, E. Pietriga and C. Appert, “An Evaluation of Interactive Map Comparison Techniques,” in 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul, 2015.

[4] B. Shneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations,” in The 1996 IEEE Symposium on Visual Languages, Boulder, 1996.

[5] A. M. MacEachren and S. Pezanowski, “Geovisualization: Leveraging the Opportunities of Geographic Information,” 15 August 2010. [Online]. Available: http://www.adobe.com/devnet/edu/articles/macEachren_pezanowski.html. [Accessed 27 4 2016].

[6] A. M. MacEachren and M.-J. Kraak, Eds., Exploring Geovisualization, Elsevier, 2005.

[7] S. Cha, A. Abusharekh and S. S. Abidi, “Towards a ‘Big’ Health Data Analytics Platform,” in IEEE First International Conference on Big Data Computing Service and Applications, Jeju, 2015.

[8] G. Fuchs and H. Schumann, “Visualizing Abstract Data on Maps,” in 8th International Conference on Information Visualisation, London, 2004.

[9] W. Aigner, S. Miksch, W. Muller, H. Schumann and C. Tominski, “Visual Methods for Analyzing Time-Oriented Data,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 1, pp. 47-60, 2008.

87 [10] S. Havre, E. Hetzler, P. Whitney and L. Nowell, “ThemeRiver: Visualizing Thematic Changes in Large Document Collections,” IEEE Transactions on Visualization and Computer Graphics, vol. 8, no. 1, pp. 9 - 20, 2002.

[11] C. Tominsk, J. Abello and H. Schumann, “Axes-Based Visualizations with Radial Layouts,” in the 2004 ACM symposium on Applied computing, 2004.

[12] M. Gleicher, D. Albers, R. Walker, I. Jusufi, C. D. Hansen and J. C. Roberts, “Visual comparison for information visualization,” Information Visualization, vol. 10, no. 4, pp. 289-309, 2011.

[13] C. Dempsey, “GIS Data Explored – Vector and Raster Data,” 1 May 2012. [Online]. Available: https://www.gislounge.com/geodatabases-explored-vector- and-raster-data/.

[14] “JuxtaposeJS,” The Northwestern University Knight Lab, [Online]. Available: https://juxtapose.knightlab.com/#create-new. [Accessed 1 May 2016].

[15] “Adding a Custom Overlay,” APIs, [Online]. Available: https://developers.google.com/maps/documentation/javascript/examples/overlay- simple. [Accessed 24 April 2016].

[16] C. Sebastian, Y. Sheng and D. Ifenthaler, Eds., Serious games analytics : methodologies for performance measurement, assessment, and improvement, Springer, 2015.

[17] D. Guo, “Flow Mapping and Multivariate Visualization of Large Spatial Interaction Data,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1041-1048, 2009.

[18] N. Ernst, “When the visual information-seeking mantra fails,” 24 October 2006. [Online]. Available: https://neilernst.net/2006/10/24/when-the-visual- information-seeking-mantra-fails/.

[19] “Creating an Overview Map,” Boundless and Prodevelop, [Online]. Available: http://girona-openlayers- workshop.readthedocs.io/en/latest/controls/overview.html. [Accessed 3 May 2016].

88 [20] W. Aigner, S. Miksch, H. Schumann and C. Tominski, Visualization of Time- Oriented Data, Springer Science & Business Media, 2011.

[21] A. Pandre, “Charts and their Dimensionality,” WordPress, [Online]. Available: https://apandre.wordpress.com/dataviews/dimensionality/. [Accessed 17 April 2016].

[22] K. Beard, H. Deese and N. R. Pettigrew, “A framework for visualization and exploration of events,” Information Visualization, vol. 7, no. 2, pp. 133-151, 2008 .

[23] G. Huang, S. Govoni, J. Choi, D. M. Hartley and J. M. Wilson, “Geovisualizing data with ring maps,” ArcUser, vol. 11, no. 1, pp. 54-55, 2008.

[24] D. Guo, J. Chen, A. M. MacEachren and K. Liao, “A Visualization System for Space-Time and Multivariate Patterns (VIS-STAMP),” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 6, pp. 1461-1474, 2006.

[25] M. Burch, F. Bott, F. Beck and S. Diehl, “Cartesian vs. Radial – A Comparative Evaluation of Two Visualization Tools,” in Advances in Visual Computing, Berlin, 2008.

[26] S. Diehl, F. Beck and M. Burch, “Uncovering Strengths and Weaknesses of Radial Visualizations—an Empirical Approach,” IEEE Transactions, vol. 16, no. 6, pp. 935-942, 2010.

[27] J. Fuchs, F. Fischer, F. Mansmann, E. Bertini and P. Isenberg, “Evaluation of Alternative Glyph Designs for Time Series Data in a Small Multiple Setting,” in the SIGCHI Conference on Human Factors in Computing Systems, 2013.

[28] M. Schonlau and E. Peters, “Graph Comprehension: An experiment in displaying data as bar charts, pie charts and tables with and without the gratuitous 3rd dimension,” Social Science Research Network Working Paper Series, 2008.

[29] M. Adnan, M. Just and L. Baillie, “Investigating time series visualisations to improve the user experience,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, 2016.

89 [30] J. S. Yi, Y. a. Kang, J. T. Stasko and J. A. Jacko, “Toward a Deeper Understanding of the Role of Interaction in Information Visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1224-1231, 2007.

[31] C. Gurrapu, “A Google calendar like display for temporal data using D3.Js,” Blocks, [Online]. Available: http://bl.ocks.org/chaitanyagurrapu/6007521. [Accessed 17 April 2016].

[32] P. Do, “Medical Cost of Hip Replacement by State,” VIDA, [Online]. Available: https://vida.io/documents/s5qo5Gwrct5HNxAD2. [Accessed 17 April 2016].

[33] K. Dale, “Historic Weather-station data for the UK (1880-2013),” kyrandale, [Online]. Available: http://kyrandale.com/viz/uk-weather-stations.html. [Accessed 17 April 2016].

[34] “Visualization of USA Deaths by State,” infinome, [Online]. Available: https://www.infino.me/mortality/usmap. [Accessed 18 April 2016].

[35] A. Buja, J. A. McDonald, J. Michalak and W. Stuetzle, “Interactive data visualization using focusing and linking,” in the 2nd conference on Visualization '91, 1991.

[36] J. E Stewart, S. E Battersby, A. Lopez-De Fede, K. C Remington, J. W Hardin and K. Mayfield-Smith, “Diabetes and the socioeconomic and built environment: geovisualization of disease prevalence and potential contextual associations using ring maps,” International journal of health geographics, vol. 10, no. 1, p. 1, 2011.

[37] H. Kennedy, The ESRI Press Dictionary of GIS Terminology, redlands, California: Environmental Systems Research Institute, 2001.

[38] J. Heidi, J. Heidi Feng and J. Lazar, Research Methods in Human-Computer Interaction, Chichester, West Sussex: John Wiley & Sons, 2010.

[39] E. Charters, “The Use of Think-aloud Methods in Qualitative Research An Introduction to Think-aloud Methods,” Brock Education, vol. 12, no. 2, pp. 68-82, 2003.

[40] A. Holovaty and J. Kaplan-Moss, The Django Book, New York: Apress, 2009.

90 [41] K. Schwaber and J. Sutherland, “The Scrum Guide,” Scrum Alliance, July 2016. [Online]. Available: https://www.scrumalliance.org/why-scrum/scrum-guide. [Accessed 9 August 2016].

[42] E. Ries, The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses, New York: Crown Publishing Group, 2011, p. 336.

[43] N. Nurseitov, M. Paulson, R. Reynolds and C. Izurieta, Comparison of JSON and XML Data Interchange Formats: A Case Study, San Francisco, San Francisco: Caine, 2009, pp. 157-162.

[44] C. A. Brewer, “A Transition in Improving Maps: The ColorBrewer Example,” Cartography and Geographic Information Science, vol. 30, no. 2, pp. 159-162, 2003.

[45] J. Heer, N. Kong and M. Agrawala, “Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations,” in SIGCHI Conference on Human Factors in Computing Systems, Boston, 2009.

[46] B. Eberhardinger, . H. Seebach, A. Knapp and W. Reif, “Towards Testing Self- organizing, Adaptive Systems,” in ICTSS Conference, Madrid, 2014.

[47] M. Minami, A. Hatakeyama, A. Mitchell, B. Booth, B. Payne, C. Eicher, E. Blades, I. Sims, J. Bailey, P. Brennan, S. Stephens and S. Woo, “Symbolizing your data,” in Using ArcMap, 2000, pp. 133-166.

[48] I. Sommerville, Software Engineering, 8th ed., Harlow: Addison-Wesley, 2007.

[49] J. Andersson, R. de Lemos, S. Malek and D. Weyns, “Modeling Dimensions of Self-Adaptive Software Systems,” in Software Engineering for Self-Adaptive Systems Seminar, Dagstuhl, 2008.

[50] V. Grassi, R. Mirandola and E. Randazzo, “Model-Driven Assessment of QoS- Aware Self-Adaptation,” in Software Engineering for Self-Adaptive Systems Seminar, Dagstuhl, 2008.

[51] Y. Brun, G. D. M. Serugendo, C. Gacek, H. Giese, H. Kienle, M. Litoiu, H. Müller, M. Pezzẻ and M. Shaw, “Engineering Self-Adaptive Systems through Feedback

91 Loops,” in Software Engineering for Self-Adaptive Systems Seminar, Dagstuhl, 2008.

[52] C. Tominski, P. Schulze-Wollgast and H. Schumann, “3D information visualization for time dependent data on maps,” in 9h International Conference on Information Visualisation, 2005.

[53] D. Alqahtani, “bbproject,” GitLab, [Online]. Available: https://gitlab.cs.man.ac.uk/deemah-alqahtani/bbproject. [Accessed 7 September 2016].

92 10 Appendix 1

A screenshot of the current version of the product backlog is shown below.

Figure 47: The product backlog.

93