Communication of Statistical Uncertainty to Non-expert Audiences

A thesis submitted for the degree of

Master of Philosophy (Mathematics)

by

Jessie Roberts

B.Sc. Biotechnology Innovation, Queensland University of Technology

School of Mathematical Sciences

Science and Engineering Faculty

Queensland University of Technology

Australia

2019 Contents

Acknowledgementsv

Declaration vii

Abstract ix

1 Introduction1

1.1 Aims and objectives ...... 6

1.2 The Australian Cancer Atlas ...... 7

1.3 Research contributions ...... 8

1.4 Thesis structure ...... 8

2 Literature Review 11

2.1 Uncertainty ...... 12

2.2 Why is Uncertainty Information important to decision-makers? ...... 21

2.3 Communicating Statistical Uncertainty ...... 24

2.4 Uncertainty communication ...... 32

2.5 Uncertainty Communication design ...... 34

2.6 Spatial epidemiology and disease mapping ...... 37

3 Research Activity 1.A: Grey literature review of internet published cancer maps. 43

3.1 Introduction ...... 43

3.2 Aim and Research Question ...... 44

3.3 Methods ...... 45

3.4 Summary findings ...... 46

3.5 Implications for the Australian Cancer Atlas ...... 53

3.6 Conclusion ...... 55

ii CONTENTS

4 Research Activity 1.B: User centred uncertainty communication design (Australian

Cancer Atlas as a case study) 57

4.1 Introduction ...... 57

4.2 Methods ...... 58

4.3 Results ...... 65

4.4 Discussion & insights for the Australian Cancer Atlas...... 76

4.5 Conclusion ...... 80

5 Research Activity 2: User study - Uncertainty representation in an online game. 85

5.1 Introduction ...... 85

5.2 Aim ...... 87 5.3 Online Simulation - Impact of uncertainty communication methods on decision

making ...... 87

5.4 Methods ...... 88

5.5 Results ...... 102

5.6 Discussion ...... 114

6 Discussion 119

6.1 Uncertainty communication design ...... 120

6.2 Testing uncertainty representation methods ...... 125

6.3 Critique & limitations ...... 127

6.4 Future Work ...... 129

6.5 Conclusion ...... 130

A Appendix: Literature Review 131

A.1 Uncertainty Representation in Mapping & GISciences ...... 131

B Appendix: Research Acitivity 1. A - Grey Literature Review 133

B.1 Search Protocol ...... 133

Pre-Scoping ...... 133

Search Details ...... 134

B.2 Database of identified cancer atlases ...... 139

iii CONTENTS

C Research Activity 1.B - User-centred design for uncertainty communication 163

C.1 Project Partners Workshop ...... 163

C.2 NEUVis Audience Profiles ...... 179

D Appendix: Online Game 191

D.1 Untransformed Performance Data ...... 191

D.2 Logit(PR) by game mode: LME output and diagnostic plots ...... 191

D.3 logit(PR) by risk profile - LME model output and diagnostic plots ...... 194

E Ethics Approvals 199

E.1 Focus Groups Recruitment Flyer and Consent Form ...... 199

E.2 Online Game ...... 204

Bibliography 207

iv Acknowledgements

I would like to acknowledge the many people that have contributed to the following thesis and more importantly the development of my academic research skills and competencies.

I express my appreciation and thanks to my supervisors Distinguished Professor Kerrie

Mengersen & Dr Kate Helmstedt for their guidance and support. Further to this I would like to specifically acknowledge the contribution and support of the following people:

1. Phil Gough - with whom I collaborated closely with to design, develop and imple-

ment the online game (Chapter 4), as well as the design, data collection and data

analysis of the Cancer Atlas focus groups (Chapter 3).

2. The Cancer Council QLD and specifically Peter Baade and Susanna Cramb - I am very

grateful for their guidance and feedback along the journey and their collaboration

for the work detailed in Chapter 3, including financial support to conduct the cancer

mapping focus groups.

3. Nicholas Dendle - who through his VRES summer project of 2016/17 translated a

paper version of the pirate game to a functioning online game (Chapter 4).

4. Matt Sutton - who implemented the optimisation solution needed for part of the

analysis of the online game data (Chapter 4)

5. The QUT ACEMS and BRAG community for their many coffees, lunches, board

games and generous feedback.

6. Shannon Ryan - for editing and proof-reading services.

7. Lawrence Jones for his support and encouragement.

v

Declaration

I hereby declare that this thesis contains no material which has been accepted for the award of any other degree or diploma in any university or equivalent institution, and that, to the best of my knowledge and belief, this thesis contains no material previously published or written by another person, except where due reference is made in the text of the thesis.

QUT Verified Signature

Jessie Roberts

June 2019

vii

Abstract

Introduction

Statistical uncertainty is present in modelled and data derived information and plays an important role in how outputs of scientific and quantitative analyses are interpreted and used by decision makers, many of which are non-experts. Despite the importance of uncertainty information, there still remains impediments to successfully communicating this information to the non-expert audience. Standardisation of uncertainty representation methods as well as methods and guidelines for communication uncertainty in a way that is accessible to non-expert audiences are two recognised impediments to this challenge. I contribute to both of these in this thesis. A significant motivation of this thesis is to inform the design of an Australian national cancer atlas, which is used as a case study within this research.

Aim and Objectives

This research aims to contribute to a growing body of literature on uncertainty com- munication. It achieves this aim through four distinct objectives: 1 - provide a greater understanding of the current methods used to visualise uncertainty in disease mapping;

2 - investigate a tool for systematically identifying the sources of uncertainty within a multidisciplinary research project; 3 - explore a framework for including uncertainty within the design of scientific communication material; and 4 - and investigate the impact of three different uncertainty visualisation methods on decision making.

Methods

The first, second and third objectives are connected to the Australian Cancer Atlas, which is used as a case study throughout this thesis. Objective 1 is achieved through a grey literature review of currently available cancer maps. Objective 2 is achieved through

ix CONTENTS exploring the use of an uncertainty taxonomy (developed by Morgan and Henrion (1990)), as a tool to diagnose uncertainty sources within the atlas. And objective 3 explores the NEUVis design framework (Gough, Wall, and Bednarz, 2014) for its effectiveness in uncertainty communication design, including the use of focus groups to understand how people interpret uncertainty within cancer mapping. The final and 4th objective is achieved through a quantitative user study which uses an online game to evaluate if players behaviour differs depending on if uncertainty is represented as: the upper and lower bounds of an interval, the semantic bounds of an interval, a point estimate ± error, or a point estimate (no uncertainty). Representation methods are evaluated in terms of the players ability to maximise their potential reward (performance), as well as how players distribute available resources (risk-averse behaviour).

Findings/Results

1. The grey literature review identified 33 publicly available cancer maps of which

13 contained uncertainty information. Many different approaches were used to

represent uncertainty, there were no common approaches. Uncertainty was more

common in maps that contained interactivity.

2. Morgan and Henrion (1990)’s taxonomy of uncertainty, in its current form, did not

prove to be a useful tool for diagnosing sources of uncertainty within the Australian

Cancer Atlas. The technical detail of the taxonomy and lack of connection to the

project or the target audiences hindered its applicability.

3. The extension of the NEUVis design framework to include uncertainty information

worked well to navigate the uncertainty communication design process and sup-

ported a cross disciplinary team to identify audiences and their needs. The design

tool provided a platform for mapping audience needs to uncertainty information

within the Australian Cancer Atlas.

4. The user study evaluating the effect of different uncertainty representation methods

on user behaviour showed that, in terms of maximising potential reward (perfor-

mance), players performed better in the online game when a point estimate or a

point estimate ± error was used, compared to when an interval was used. The

x CONTENTS

inclusion of the ± error did not have a statistically significant effect on performance.

The point estimate with error and the point estimate without error performed sim-

ilarly. In terms of resource spreading, or risk-averse behaviour, when uncertainty

was represented as a semantic interval there was greater diversity in how players

distributed their available resources, they were more likely to spread their resources

across the available options but overall their behaviour was less consistent.

Conclusion

Uncertainty communication is a difficult challenge, it is not technically a statistical problem, but is an important problem for statistics. Addressing this challenge requires: quantitative user studies to evaluate the effectiveness of uncertainty representation methods; qualita- tive investigations into how users interpret uncertainty, navigate communication material, as well as what their needs are in accessing this information; and finally, a framework that supports cross-disciplinary teams to bring the quantitative and qualitative methods together and embed uncertainty into the communication design process. Through a grey literature review, extending the NEUVis communication design framework to include un- certainty (including focus groups), testing an uncertainty diagnostic tool and conducting user testing, this thesis has contributed to all three of these challenges. Further work is required to develop a tool that supports the identification and prioritisation of uncertainty sources within a project.

xi Chapter 1

Introduction

Uncertainty is pervasive in every choice we make and knowledge of its presence and nature can clarify the applicability of information to real world decisions. Despite its pervasiveness and value to decision making, uncertainty is under-represented in the com- munication of scientific information to the non-expert audience who lack prior knowledge of statistics and/or data science (Fischhoff (2012); Frewer and Salter (2002); Schneider and

Moss (1999); Manski (2014)).

The term uncertainty can have different definitions across domains. The different def- initions or meanings of uncertainty are: linguistic imprecision, which arises when one term can be interpreted in several ways, or its meaning is ambiguous; human decision or behavioural uncertainty, which is present in the unknowns about the worldviews, objec- tives and future actions of stakeholders; and statistical or scientific uncertainty, which is uncertainty about facts, for example the uncertainty around tomorrow’s weather forecast

(Kujala, Burgman, and Moilanen, 2013). Each of these types of uncertainty are defined further in the Literature Review (Chapter 2).

This thesis focuses specifically on the communication of statistical uncertainty. Which is the uncertainty related to empirical values, which are quantities that can be measured to some level of accuracy and precision (e.g. birth weight, height, rainfall, cancer incidence, etc

(Begg and Welsh (2014), Kujala, Burgman, and Moilanen (2013)). Statistical uncertainty is represented through a range of methods, including: uncertainty intervals, points estimates

1 CHAPTER 1. INTRODUCTION

± error, semantic versions of uncertainty, standard deviation, standard error, error bars and probabilities. The clear communication of scientific uncertainty is a critical and urgent challenge for statistics, but also for science more broadly (Palmer and Hardaker, 2011).

There is an increasing number of non-expert decision makers demanding data driven insights from the recently popularised big data and machine learning ‘revolutions’ (Olsen,

Martuzzi, and Elliott (1996); Harbage and Dean (1999)). However, without the skills to interrogate the uncertainty of these data derived and modelled insights, the non-expert audience is in danger of misinterpreting them.

The challenge of communicating statistical uncertainty is relevant across all domains of science (Palmer and Hardaker, 2011), particularly if research outputs, and data derived or modelled estimates are to be adopted in real world decisions and planning for indus- try, business and government. This type of data derived information is already being used within decision making across domains, for example farmers collecting data from equipment and environmental sensors which guide their farm management, pest control and forecasting decisions (Bobkoff (2015); Wolfert et al. (2017)), or brand managers using social media analytics to plan marketing campaigns (Schrage, 2016). If this information is perceived as more accurate and reliable than the data or methods warrant, decision- makers may be misguided in their planning. The future decisions for important sectors of our society such as healthcare, education, infrastructure planning and environmental regulations utilise data and modelled insights in their decision making and it is important that the uncertainties within this information are understood and considered.

Acknowledging uncertainty is not only important to decision makers, but is also impor- tant to the public’s perception of science (Frewer and Salter (2002); Gavankar, Anderson, and Keller (2015)). The researcher’s role is to push the boundaries of current knowledge and this means exploring unknowns. The inevitable of the exploration of these unknowns are discrepancies or disagreements between independent studies, an essential step in the development of knowledge and the scientific process. Neglecting to inform the non-expert of the uncertainty in study outputs can lead to a mistrust in science. As independent studies can produce different results, if perceived as contradictory, these results erode the public’s confidence in the ‘scientist’, even though disagreement between

2 CHAPTER 1. INTRODUCTION scientific peers is a natural part of emerging knowledge. A prime example of this is the climate change debate, which is in part fuelled by a lack of acknowledgement and handling of uncertainty (Giles, 2002). Communicating uncertainty to the non-expert enables different research outputs to be seen as developments of knowledge, rather than contradictory findings which invalidate each other and create mistrust and fuel scepticism

(Giles, 2002).

Uncertainty information is one of the key pieces of information used by the scientific com- munity to evaluate the rigour of study outputs. Without an understanding of uncertainty, or at least a respect for its presence, the public audience does not have the tools to make sense of differing research outputs.

Furthermore, science is facing a greater presence of uncertainty both in the questions it targets and the tools it uses for investigation. This is not unexpected as science contin- ues to explore increasingly complex systems and relationships, and as developments in computational capacity have enabled an increased uptake of simulation and stochastic methods, which explicitly use uncertainty to explain the scientific output. Uncertainty is an inevitable piece of metadata and is inseparable from study outputs. Even if it is not in a form that the decision maker can access, uncertainty is an important part of research outputs, and not communicating uncertainty to the end-user is neglecting to report the full story. It is the responsibility of the entire science community to ensure scientific outputs and statistical information are communicated effectively, in order to build a greater appreciation of the uncertainty present in any and all frontier/leading research and to ensure leaders and decision makers can use information appropriately to make decisions for our future.

Uncertainty communication is a difficult challenge to address. Buttenfield (1993) proposes there are three impediments to this challenge: 1. standardised terminology, 2. methods for uncertainty representation and visualisation, and 3. methods for communicating these representations in a meaningful way for the audience (i.e. uncertainty communication).

Impediment 1 is important to address, however is not the focus of this thesis. This thesis aims to contribute to addressing impediments 2 and 3.

3 CHAPTER 1. INTRODUCTION

Much work has been done to address impediment 2. However, while there is a growing body of literature and user studies on uncertainty representation and visualisation, there remains a lack of agreement and many unknowns about how the non-expert understands these uncertainty representation methods (Kinkeldey, MacEachren, and Schiewe, 2014), and which are most effective. This thesis contributes to this growing body of research through a user study which investigates how decision making is influenced by different uncertainty representation methods (Chapter 5: Research Activity 2).

Much less work has been done to address impediment 3. Uncertainty communication.

The literature lacks examples or frameworks, case studies or guidelines for integrating uncertainty representation methods to ensure uncertainty communication. An acknowl- edgement of the importance of the users’ needs in creating effective uncertainty commu- nications is emerging across a range of domains (Davis and Keller (1997); Carroll et al.

(2014); Sanyal et al. (2010)). This thesis considers how the users’ needs can be used in uncertainty communication design. Through the use of the Australian Cancer Atlas as a case study, this thesis aims to contribute to addressing this final impediment and does so through three activities: an evaluation of current practices of uncertainty communi- cation in cancer mapping (Chapter3: Research Activity 1.A); testing a potential tool for diagnosing uncertainty sources (Chapter 4: Research Activity 1.B); extending the NEUVis science communication design framework to include uncertainty, place the user at the centre of the design process, and understand the users’ needs in the context of the ACA and its related uncertainty information (including conducting focus groups to validate user needs) (Chapter 4: Research Activity 1.B).

This thesis uses both quantitative and qualitative methods to contribute to impediments 2 and 3 outlined by Buttenfield (1993) (i.e., 2. methods for uncertainty representation and visualisation, and 3. methods for communicating these representations in a meaningful way for the audience (i.e. uncertainty communication). Firstly, Research Activities 1.A and 1.B, I use qualitative methods to explore the third impediment (uncertainty commu- nication). This first research activity is motivated by the Australian Cancer Atlas (ACA), which I use as a case study. Within this research activity I: conducted a grey literature review of current cancer maps available on the internet between January 2010 to June 2016;

4 CHAPTER 1. INTRODUCTION conducted a project participant workshop to identify target audiences and diagnose the uncertainty sources within the ACA; apply a user centred design framework (NEUVis) to build audience profiles that explicitly consider the impact of uncertainty information, and mapped the users’ needs and current behaviour; and conducted focus groups with target audiences to validate the user profiles including their needs and understanding of uncertainty in cancer mapping. Secondly, within Research Activity 2 (uncertainty representation - impediment 1), I conduct a quantitative user study to evaluate the effect of common uncertainty representation methods on the decision-making behaviour of the non-expert audience. This component utilises an online game that I co-developed. In the game players are presented with a risk of failure and a potential reward if successful.

Players must allocate limited resources to reduce the risk of failure and maximise their potential reward. I use the game to compare the players behaviour across four different uncertainty representation methods. In addition, I also explore the influence of risk level and uncertainty level on players’ behaviour.

This research explores how to communicate statistical uncertainty to non-expert audiences.

I contribute to the last two items on Buttenfield (1993)‘s list of impediments. The successful communication of statistical uncertainty to the non-expert audiences is an important chal- lenge for the statistical sciences as well as science more broadly, and is essential to ensuring that scientific outputs are used appropriately in decision making in the ‘real world’. In this research I contribute to a growing body of research on uncertainty representation by investigating how uncertainty representation methods influence audience behaviour

(Research Activity 2). In addition, I present a case study for uncertainty communication design which identifies uncertainty sources and places the user’s needs at the centre of the communication design process (Research Activity 1.A and 1.B).

5 CHAPTER 1. INTRODUCTION

1.1 Aims and objectives

1.1.1 Aim:

The overall aim of this research is to explore how to communicate statistical uncertainty to non-expert audiences.

1.1.2 Objectives:

I have focused on the following objectives within the field of uncertainty communication:

1. Apply an uncertainty diagnostic framework, user centred design approaches, and

focus groups to inform uncertainty communication design.

2. Investigate the relationship between commonly used uncertainty representation

methods and audience behaviour.

1.1.3 Research Plan:

To achieve my research objectives, I use both quantitative and qualitative methods. Quali- tative workshops, design tools and focus groups place the user at the centre of the design process, and explore an uncertainty diagnostic tool. These are applied through the design of a national cancer atlas as a case study for user centred design for uncertainty commu- nication. Quantitative methods are deployed through a user study, which explores the non-experts’ decision making when presented with different uncertainty representation methods. The user-study contributes to a growing body of knowledge exploring how the common uncertainty representation methods influence behaviour.

These objectives will be achieved through the following research plan.

Objective 1: Use the Australian Cancer Atlas as a case study in the application of the following research activities:

• Investigate and summarise the current practices for generating and visualising

cancer maps.

6 CHAPTER 1. INTRODUCTION

• Explore the use of Morgan & Henrion’s (1990) taxonomy of uncertainty as a tool for

uncertainty diagnosis.

• Explore the application of user centred design (NEUVis design framework) for

informing uncertainty communication design.

• Conduct a range of focus groups to understand the target audiences of a national

cancer atlas, including identifying audience needs, current behaviours and current

understanding of existing uncertainty communication methods in cancer maps.

Objective 2:

• Design and build an online game to quantitatively investigate the relationship

between behaviour and different uncertainty representation methods, including:

uncertainty intervals, point estimates ± error and semantic versions of uncertainty.

1.2 The Australian Cancer Atlas

The Australian Cancer Atlas (ACA) is an Australian-first research study aimed at under- standing national patterns in cancer incidence, survival and screening practices based on where people live. The development of the Atlas will allow health agencies, policy makers and the community to understand the location and resource requirements for the most common cancers in Australia. The outcomes of this thesis aim to inform the design of communication material that clearly and efficiently communicates the insights from the

Australian Cancer Atlas project to a wide range of audiences.

The Australian Cancer Atlas brings together data from Australia’s state cancer registries and utilises cutting edge spatial statistical methodologies to investigate spatial variation in key cancer indicators. The project is a partnership between Cancer Council Queensland,

The Queensland University of Technology, the Australian Institute of Health and Welfare and the Australia and New Zealand Cooperative Research Centre for Spatial Information

(CRCSI).

The outputs from Australian Cancer Atlas project will be communicated to a wide range of audiences including, policy makers and health managers, cancer patients, general public

7 CHAPTER 1. INTRODUCTION and media. Inclusion of statistical uncertainty within these cancer maps is a key challenge within the development of the published atlas.

1.3 Research contributions

This research provides a unique case study demonstrating the use of the NEUVis design framework for uncertainty communication design, as well as the use of a taxonomy of uncertainty sources for diagnosing and mapping uncertainty sources with project stake- holders. This research provides a case study for the use of design thinking tools and focus groups in uncertainty communication design. In addition, this research contributes to a growing body of user studies which investigate uncertainty representation methods, and makes a unique contribution by exploring how commonly used methods for representing measures of uncertainty influence behaviour.

1.4 Thesis structure

This thesis has three main sections; a Literature Review, Research Activity 1 - user centred design of uncertainty communication in cancer mapping, and Research Activity 2 - User study: Uncertainty representation in an online game.

The literature review summarises the current peer reviewed literature around uncertainty classification, uncertainty sources, uncertainty representation and communication meth- ods, user centred design for science communication to the non-expert audience and spatial epidemiology.

The first research activity of this thesis is further broken into two sections and explores uncertainty communication in the context of the Australian Cancer Atlas. Firstly, Research

Activity 1.A investigates current practices in cancer map visualisation and cancer mapping through a grey literature review. Secondly, Research Activity 1.B, uses qualitative methods including an uncertainty diagnosis tool, design thinking tools and audience focus groups, to explore the users’ needs and current information seeking behaviour. Research Activity

1 concludes with a summary of design insights developed through this process, which will inform the design of the Australian Cancer Atlas.

8 CHAPTER 1. INTRODUCTION

The second part of this thesis, Research Activity 2, uses an online game to investigate the impact of uncertainty representation methods on players’ behaviour. The online game places the player as the quartermaster on a pirate ship preparing to attack three on-coming ships and steal their treasure. Each on-coming ship has: an estimate risk of defeat, a measure of the uncertainty around the estimate (high, medium or low), and a potential reward. The player has limited resources and they must allocate those to reduce their risk of defeat, thus making trade-offs and decisions between risk, uncertainty and rewatd within the options available. The collected interactions from the game were analysed to investigate changes in allocation behaviour and performance across game mode, risk level and uncertainty level.

9

Chapter 2

Literature Review

Uncertainty is present in all aspects of life, and influences decisions in government, industry and the personal lives of individuals. In the quantitative sciences, uncertainty is a valuable piece of metadata that can guide the decision maker about how the available information should be used to inform decisions (Davis and Keller, 1997). As more decision makers without training in quantitative methods are using data driven insights, it is important to develop methods that enable these new audiences to access the uncertainty information associated with these insights. Developing effective methods for uncertainty communication is a critical challenge that is important to both statistics and science more broadly. This review begins with an outline of the different types of uncertainty and focuses this thesis on statistical uncertainty (Section 2.1). Section 2.2, then discusses why uncertainty communication is important to the decision maker. Section 2.3 presents the current literature on uncertainty communication and highlights the challenges or gaps that still exist in this area. Finally, Section 2.4, gives an overview of spatial epidemiology and disease mapping, including a summary of the literature on uncertainty communication specific to this domain. This final section is included in order to support the use of the

Australian Cancer Atlas as a case study in this thesis.

The non-expert audience

This research project is focused on the non-expert audience, which we define as the user that does not have training in statistical methods or quantitative science (Gough, Wall,

11 CHAPTER 2. LITERATURE REVIEW and Bednarz, 2014). Despite their lack of training in quantitative methods, the non-expert audience still consumes scientific and modelled insights on a daily basis to make decisions, allocate resources and plan for the future. These decision makers are throughout our communities. Government and policy-makers use modelled information and data to inform policy, for example, the management of irrigation allocations in the Murray darling

Basin Plan1, setting water quality guidelines for the Great Barrier Reef (Authority, 2010), or allocating resources of remote cancer patients to travel to the city to receive treatment.

Beyond government, industry also demands access to insights from the ‘big data revolution’ to guide decision making such as: target marketing campaigns, evaluate technology safety and develop economic forecasts (Yin and Kaynak, 2015). And finally, the general public also use quantitative information, such as: planning their daily commute based on weather forecasts, lifestyle choices based on the latest health research or investment decisions guided by economic or financia data. While these audiences are not trained in quantitative methods, statistical and modelled insights and information can still help them make more informed decisions.

I make a special note that, uncertainty communication is also important to the expert- audience. There remains much work to be done in order to standardise methods for including uncertainty in science communication and visualisation within the research community, particularly when communicating across domains, as terms to describe uncertainty can be used differently (Cummings, Fidler, and Vaux, 2007). However, when the intended audience understands the modelling process, the communication challenge is different. Within this thesis we focus on the non-expert audience.

2.1 Uncertainty

The term uncertainty is used to refer to a range of concepts. These include: different views, imprecision, error, subjectivity, non-specificity, a lack of knowledge, or a state of being (Aerts, Clarke, and Keuper (2003); Pang, Wittenbrink, and Lodha (1997); Deitrick and Edsall (2008); Thomson et al. (2005); Han et al. (2011)). While there is no single agreed classification for the different types of uncertainty, Kujala, Burgman, and Moilanen

1https://www.mdba.gov.au/basin-plan/developing-basin-plan/science-behind-basin-plan

12 CHAPTER 2. LITERATURE REVIEW

(2013) define three broad categories: linguistic imprecision, human decision/behavioural uncertainty, and scientific uncertainty. Han et al. (2011), adds more details to the last category of scientific uncertainty, by defining two subcategories of aleatoric uncertainty, and epistemic uncertainty. Each of these is defined below.

Linguistic imprecision refers to occasions where one term can be interpreted in several ways, or its meaning is ambiguous (Kujala, Burgman, and Moilanen, 2013). For example, the meaning of the world ‘love’ in the sentence “I love my partner” compared to “I love chocolate” is similar but undeniably different. “Love” in these two contexts is not the same. Human decision or behavioural uncertainty is defined as the uncertainty about the worldviews, objectives and future behaviours of a stakeholder (Kujala, Burgman, and

Moilanen, 2013). For example, the future policies of the US President are unknown. Finally, scientific (aka statistical) uncertainty relates to empirical quantities that can be measured to some level of accuracy and precision (e.g. birth weight, height, rainfall, cancer incidence, etc (Begg and Welsh (2014), Kujala, Burgman, and Moilanen (2013)).

Scientific/statistical uncertainty includes two subcategories: aleatoric and epistemic uncer- tainty (Han et al., 2011). Aleatoric uncertainties are a result of the underlying randomness within the model, or the processes that are being modelled. They cannot be reduced by gathering more information or additional data, and are a result of the fundamental irreducible randomness natural events (Hora, 1996). Epistemic uncertainties on the other hand, are presumed to be due to a lack of knowledge, and can be reduced by gathering more data or refining the model (Han et al., 2011).

In this thesis, statistical and scientific uncertainty are considered interchangeable, and I use statistical uncertainty from here onwards in order to be consistent. Future reference to uncertainty refers to statistical uncertainty unless otherwise specifically stated.

2.1.1 Focus of this Study: Statistical Uncertainty within a modelling context

This thesis focuses on statistical uncertainty, and does not consider behaviour and linguis- tic uncertainties any further. Specifically, in this thesis, I focus on the communication of statistical uncertainty within a modelling context.

13 CHAPTER 2. LITERATURE REVIEW

Modelling is a simplified mathematically-formalised approach to approximating reality.

Models are used to estimate/predict a future event (e.g. tomorrows weather). Statistical uncertainty arises when some relevant information about the phenomena of interest is not definite, known or reliable (Thunnissen, 2003). Every model includes uncertainty as it is not feasible or realistic to accurately identify and measure all the system processes or all model parameters or have 100% accurate data.

Before progressing to the next section, it is important to clarify a few terms that can be confused within the uncertainty literature. These are: risk versus uncertainty and variability (a.k.a. natural variation) versus uncertainty. I also note that in this research

I do not consider uncertainty in the context of quantum mechanics, as defined by the

Heisenberg’s uncertainty principle (Hughes, 1989), but focus on uncertainty as related to empirical quantities, where empirical quantities are properties of the real-word that can, in principle be measured to some level of accuracy (e.g. birth weight, height, rainfall, cancer incidence, etc) (Begg and Welsh, 2014).

Risk vs Uncertainty

In general terms, uncertainty refers to a future event or available information that is unsure.

Uncertainty does not infer any positive or negative outcome, simply that the outcome is not known or not knownable. Alternatively, risk is the possible negative outcome of an uncertain situation or event (Bedford and Cooke, n.d.). It is important to note that these terms are related but different.

Variability vs Uncertainty

Variability, also known as natural variation, is one of numerous sources of uncertainty.

It can be used as a measure of uncertainty, but it is not the only source or measure of uncertainty. Variability arises when multiple measurements of an event or phenomenon are observed, and is a natural feature of a dataset. Variability attempts to describe one characteristic of a population Take for example, the birth weight of babies born in Australia in 2015. The weight of each baby (observation) is different, and the range from the smallest to the largest baby describes the normal weight range for a new born. Investing in better scales, or collecting more observations may improve the data, and create a more accurate

14 CHAPTER 2. LITERATURE REVIEW mean, but there will always be a range, because new born babies are not all the same. This range is due to natural variation (Morgan and Henrion, 1990).

In contrast, if I wanted to predict the birthweight of your next child, I could use the

2015 data to make a good guess, but I could only give a prediction range, and could not guess with 100% accuracy a specific weight. The birthweight of your next child contains uncertainty as the event has not occurred yet. I don’t know where, within the prediction range, your child will fall. Natural variation/variability is a source of uncertainty, and is sometimes used as a measure of uncertainty, but the terms cannot be used interchangeably.

Much of the confusion between uncertainty and variability arises due to the fact that distributions are used to describe both. Variability is quantified using a frequency dis- tribution derived from measurements or observations (i.e., data) and is a feature of an observed population. Uncertainties are often quantified using probability distributions or probabilities and are used to describe an unobserved or unobservable population. The confusion is often compounded by the fact that frequency distributions, derived from an observed population, are often used to inform probability distributions.

2.1.2 Sources of statistical uncertainty

Several authors have attempted to categorise the different sources of statistical/scientific uncertainty. However, there is no agreed taxonomy. Some taxonomies are generalised

(Kahneman and Tversky (1982); Morgan and Henrion (1990); Walker et al. (2003); Funtow- icz and Ravetz (1990)), and try to provide a typology that can be applied to any domain.

Table 2.1 outlines the key categories and references for a majority of these taxonomies.

Alternatively, others have developed domain specific taxonomies, for example: Regan,

Colyvan, and Burgman (2002) outlined uncertainty sources in ecology & conservation biology; Skinner et al. (2013) and Burgman (2005) map uncertainty sources specific to environmental risk assessment and management; Beven et al., (2015) in hydrology; Han et al. (2011) in health; Thomson et al. (2005) in intelligence analysis & geospatial data;

Ristovski et al. (2014) in medical imaging; Ramirez, Jensen, and Cheng (2012) in software engineering; and Landesberger, Bremm, and Wunderlich (2017) in geolocated graphs.

15 CHAPTER 2. LITERATURE REVIEW

Of these generalised taxonomies, I consider two that could be used as tools to map out the uncertainty sources within a specific project that was defined by Morgan and Henrion

(1990) and another defined by Funtowicz and Ravetz (1990). The taxonomy defined by

Morgan and Henrion (1990) is the most comprehensive of the uncertainty taxonomies listed in Table 2.1 and defines the following sources of uncertainty: measurement error, systematic error, variability (natural variation), approximation, disagreements, inherent randomness, or linguistic uncertainty 2 (each of these are defined in section 2.1.3). Alterna- tively, Funtowicz and Ravetz (1990) classify uncertainty sources in terms of their location rather than the type of knowledge that is lacking. Within the Funtowicz and Ravetz (1990) framework, uncertainty arises from eith the data or the model. Uncertainty that arises from the data refers to the quality or appropriateness of the data used as inputs, while uncertainty that arises from the model can be due to numerical approximations used in the mathematical representation of a processes or relationship, or the assumptions and errors in the model structure. It is easy to see how each of Morgan and Henrion (1990)’s sources of uncertainty can also be classified by their location. This is demonstrated in

Table 2.2.

In this thesis we consider the taxonomy outlined by Morgan and Henrion (1990) to be more comprehensive, and provide valuable detail that is not apparent using the location classification provided by Funtowicz and Ravetz (1990). For example, in Table 2.2 both measurement error and systematic error are classified as arising from the data; however, they are very different sources of uncertainty, which require different actions if they are to be reduced, and may have different impacts on the interpretation of the modelled outputs.

Measurement error can be addressed through the improvement of equipment or data collection methods, while systematic error creates bias and is very difficult to address.

The extra depth and detail provided by Morgan and Henrion (1990) may be important in communication and decision making.

2Linguistic uncertainty is not a source of scientific uncertainty but is included by Morgan and Henrion (1990) due to the potential impact it can have on communicating scientific information.

16 CHAPTER 2. LITERATURE REVIEW

Table 2.1: Taxonomies of sources of scientific uncertainty identified in the academic literature.

Typology categories Reference

1. The data, the model, completeness Funtowicz and Ravetz (1990)

2. Internal vs external Kahneman and Tversky (1982), Han et al.

(2011)

3. Ignorance Lipshitz and Strauss (1997)

4. The scientific pipeline

(data acquisition > transformation > Pang, Wittenbrink, and Lodha (1997),

modeling > data visualisation) Johnson and Sanderson (2003), Brodlie,

Osorio, and Lopes (2012)

5. Location, level and nature Walker et al. (2003)

6. Context, inputs (data) and model Refsgaard et al., (2007)

7. Data uncertainties and modelling Funtowicz and Ravetz (1990)

uncertainties

Table 2.2: Morgan and Henrion (1990)‘s uncertainty sources classified in terms of Funtowicz and Ravetz (1990)’s location categories of the data or the model. ’Other’ refers to uncertianty sources that do not map to either the data or the model.

The data The model Other

* Inherent randomness

Measurement error * Model Structure * Natural variability

Systematic error * Disagreement

* Model uncertainty

2.1.3 Uncertainty Sources from Morgan and Henrion (1990)

The following section provides a summary of each of the uncertainty sources described by

Morgan and Henrion (1990).

1. Uncertainty induced by measurement variation is the most studied and best under- stood source of uncertainty and arises from random error in direct measurements of a

17 CHAPTER 2. LITERATURE REVIEW quantity. Imperfections in the measuring instruments and observational techniques in- evitably give rise to variations from one observation to the next. The resulting uncertainty depends on the magnitude of the variation between observations and the number of observations taken. Measurement error can be estimated by statistical methods, and there are a variety of well-known techniques for quantifying this uncertainty, such as standard deviation and confidence intervals.

Example: Imagine you were measuring the weights of 100 marathon athletes.

There could be variation in the weights of each athlete at different times and

variation between the scales used to measure them.

2. Uncertainty induced by systematic error (subjective judgment or bias) occurs when measurements are biased. Bias is defined as the difference between the true value of a parameter and the measured or estimated value of that parameter. In a sampling setup, this estimated value may be the value to which the mean of the measurements converge as the sample size increases. Measurements that have systematic error do not vary randomly around a true value. Systematic errors can also result from a scientist’s decision to exclude

(or include) data that should be included (or excluded). This can be viewed as subjective judgment bias. The only way to deal with such an error is to recognize a bias in the experimental procedure and remove it. Systematic error can also result from consistent unintentional errors in calibrating equipment or recording measurements.

Example 1: In 1997 the EPA implemented regulations that limited permissible

levels of pollutants. This was in part based on the result of a statistical analysis

that demonstrated that when daily levels of soot rise, slightly more people

die from heart and lunch disease. In 2002 the US EPA discovered that studies

linking deaths to very fine particulate matter (such as diesel exhaust) were

biased due to a default setting in a computer program. The correction resulted

in a revision of the estimate from a 0.41% rise in mortality per 10µg/m3 of

fine particles to 0.27% increase (Kaiser, 2002; Figure 2.2). As a result, the EPA

allowable emissions standards changed (Burgman, 2005)

18 CHAPTER 2. LITERATURE REVIEW

3. Variability (natural variation) occurs naturally in a measured quantity over time and space and is a feature of the natural world. Variability is a descriptive feature of an observed population. Variability arises when multiple instances of an event or phenomenon are observed, and is a natural feature of the population being studied.

Equipment accuracy and measuring methodologies can influence the variability of ob- served data, but variability cannot be removed by improving methodologies and measure- ment accuracy or collecting more observations. Variability is considered predominantly due to irreducible natural variation and in itself not uncertainty. However, is a source of uncertainty when we use observed information from a population to predict or estimate an unobserved event or measured quantity. The natural variability of Queensland birth weights in 2015 gives rise to uncertainty when this information is used to prediction of the birth weight of the next baby born in Queensland.

Example: The birth weight of babies born in Australia, in 2015. The weight

of each baby is different, and the range from the smallest to the largest birth

weight is due to natural variation. This variation gives rise to a natural and

expected birth weight range for newborns from a specific population.

4. Model uncertainty arises because models are abstractions of reality. Some less in-

fluential variables and interactions are left out, and the shapes of the functions used to describe the system processes are always abstractions of the real processes. The current understanding of the system and its processes may be incomplete, and the shapes of the corresponding functions and their parameter values are only best estimates. Models may be used to further understand the structure of the system of interest, to predict future events or to answer questions about a system (Burgman, 2005).

Model uncertainty arises in two main ways. Firstly, there is uncertainty of the model parameters. This can be accounted for in probabilistic models, with careful consideration of the range of possible values and their probabilities. Secondly, there is uncertainty about the model structure, i.e. uncertainty about cause-and-effect relationships. This can be very difficult to quantify (Regan, Colyvan, and Burgman, 2002).

19 CHAPTER 2. LITERATURE REVIEW

Example: Growth models that predict the abundance of populations do not

include parameters for describing weather events, such as rainfall, because

modellers believe that weather is not sufficiently important to understanding

population dynamics to be explicitly incorporated in the model (Eberhardt,

1987).

5. Approximation gives rise to uncertainty by introducing simplified abstractions of the real-world system into the model. Spatial and temporal resolutions of a model are approximations, and so is the resolution in terms of time intervals and grid size.

Approximations are used due to a lack of knowledge about a specific feature of one aspect of the system being modelled or due to computational limitations.

6. Disagreements give rise to uncertainty when researchers must select between statistical, computational and mathematical methods and techniques that may not be agreed upon within the scientific community. Disagreement may also be a source of uncertainty due to differing interpretations of scientific information.

7. Inherent randomness can be thought of as being innate. This is linked to 1 above.

Within modelling setups that “predict forward” to learn about a system and its outcomes, no matter how well we know the process and the initial (starting) conditions, we cannot be certain of what the outcome will be. This kind of randomness can often be quantified very well, and is easy to deal with in probabilistic models. There is still much debate in the mathematical and statistical literature about whether inherent randomness is in principle reducible or not (epistemic vs aleatoric uncertainty), a phenomenon of the natural world or a by-product of our lack of knowledge about a system or relationship, and its starting conditions.

8. Linguistic uncertainty (linguistic imprecision) arises because both natural and scien- tific language can be interpreted in several ways, or an event is ill-defined. Linguistic uncertainty can be classified into five distinct types: context dependence, ambiguity, in- of theoretical terms, and under-specificity. All of these uncertainties arise in natural and scientific language, and can impact the interpretation and application of scientific insights to real world decision making.

20 CHAPTER 2. LITERATURE REVIEW

As stated above, linguistic uncertainty is not a source of scientific uncertainty, but it is included here as it is important for science communication.

2.2 Why is Uncertainty Information important to decision-makers?

Ignoring uncertainty can lead to misinterpretation of statistical information and consid- ering its presence, possible sources, magnitude and potential impact can lead to more informed decision making. Uncertainty communication to the non-expert audience is being recognised as a critical challenge across a wide range of domains. This increase in recognition is driven by a range of factors including: the increasingly sophisticated systems and relationships that scientists are researching, an increase in simulation based tools, a movement towards stochastic rather than deterministics apporoaches and an in- crease in the number of non-expert decision makers demanding insights from the ‘big data’ and ‘machine learning’ revolutions. These non-expert decision makers are in government, business and industry and their decisions impact us all in terms of policy, health care, in- frastructure, environmental regulations etc. Ensuring that these audiences understand the uncertainty of data driven and/or statistically derived information is a critical challenge for all of science.

The very nature of using quantitative methods to derive information about a future or un-measurable event, means that complete knowledge is not available. As highlighted by

George E. P. Box in his famous quote “All models are wrong, but some models are useful” (Box,

1979), incomplete knowledge can be valuable, even though it is uncertain. In practice, decision makers are very aware of the un-avoidability of uncertainty, and potentially view uncertainty as integral to the definition of a problem (Brugnach et al., 2008). Rather than seeing uncertainty as a problem that must be dealth with, perhaps it is more useful to reframed uncertainty informaton as a tool that allows the available information to be used efficiently, when what is available is limited or incomplete.

The importance of uncertainty to decision makers is recognised across domains

The idea that uncertainty should be communicated to non-expert audience is not new

21 CHAPTER 2. LITERATURE REVIEW

(Manski, 2015), Morgenstern argued forcefully for error estimates to be published along- side official economic statistics in the 1960s (Morgenstern et al., 1963). However, recogni- tion of the importance of this challenge has increased in popularity in recent years. This is happening across a wide range of domains, in fields such as spatial analyses (Hunter and

Goodchild, 1996), weather forecasting (Joslyn and Savelli, 2010), healthcare (Sanyal et al.

(2010)), decision science (Grubler, Ermoliev, and Kryazhimskiy, 2015), epidemiology (Car- roll et al., 2014), disaster management (Eiser et al., 2012) and policy development (Morgan and Henrion, 1990) as well as many others. For some disciplines such as the geographic information science and forecasting, understanding, quantifying and communicating uncertainty is even considered the dominant challenge in the discipline (Harrower and

Street (2003); Grubler, Ermoliev, and Kryazhimskiy (2015)). Further to this, the ubiquitous nature of this challenge is also beginning to bring together a multidisciplinary cohort to discuss and compare methods, such as in the 2010 Discussion Meeting at the Royal Society of London (Palmer and Hardaker (2011)).

The acknowledgement of the importance of uncertainty is not only present in the research domains. The past decade has seen a growing awareness of uncertainty in policy-making, and a recognition that policies that ignore uncertainty in the long-run often lead to unsatisfactory technical, social, and political outcomes (Morgan and Henrion, 1990). The conservation and environmental sciences that inform environmental management argue for explicit assessments of uncertainty in environmental data and models as a necessary, although not sufficient, condition for balancing uncertain scientific arguments against uncertain social, ethical, moral and legal arguments, and in managing environmental systems (Brown (2004); Uusitalo et al. (2015)). Additionally, health care workers are grappling with the role of scientific uncertainty within informed decision making between clinicians and patients. Understanding the certainty of evidence-based medicine becomes critical when people are making decisions about their future health outcomes and risks

(Han et al., 2011). However, this isn’t an easy challenge and many clinicians struggle with methods for how to communicate statistical uncertainty and how this information should impact the informed consent process (Politi, Han, and Col, 2007). These recognitions signal a paradigm shift, moving from deterministic modelling and its quest for “optimality” to

22 CHAPTER 2. LITERATURE REVIEW the concept of “robust” decision making that takes uncertainties explicitly into account

(Grubler, Ermoliev, and Kryazhimskiy, 2015).

Ignoring the presence of uncertainty can lead to misinterpretation of information

One of the dangers of not communicating uncertainty is that estimates can appear more reliable than is warranted by the data or the model from which they are derived from may warrant. For example, user studies in the GISciences have demonstrated that audiences interpreting data generated graphics, where the data contains some level of missingness, tend to misinterpret results but do so with the same level of confidence as when the data are complete (Eaton, Plaisant, and Drizd, 2005). Similarly, issues of misinterpretation have also been highlighted in the interpretation of official economic statistics (Manski, 2014), which can have broad reaching implications. For example, a central bank monitoring statistics on

GDP growth, inflation and employment may mis-evaluate the state of the economy and consequently set inappropriate monetary policies (Manski, 2014). Several authors have demonstrated that only understanding the average, or point estimate, of a model output can lead to less robust decision making than understanding the underlying processes

(O’Hagan (2012); Uusitalo et al. (2015)). An additional example is within modelling potential future climate change scenarios, where neglecting to include uncertainty can suggest to policy makers that all scenarios are equally likely, which is often not the case

(Schneider and Moss, 1999; Giles, 2002).

The dangers of ignoring uncertainty are not restricted to estimates being misinterpreted.

Not handling uncertainty communication explicitly can have a detrimental impact on the public’s perception of science. This is clearly also seen in the climate change debate, where the neglect of clear communication of uncertainty is in part responsible for the backlash by climate sceptics (Schneider and Moss, 1999). The absence of adequate uncertainty in reported estimates can be detrimental to the credibility of the scientific outputs because it increases public distrust in the motives of researchers, risk regulators, and scientific advisors (Frewer and Salter (2002); Miles and Frewer (2003); Scheufele and Lewenstein

(2005)).

23 CHAPTER 2. LITERATURE REVIEW

2.3 Communicating Statistical Uncertainty

Buttenfield (Buttenfield, 1993) proposes three challenges to effective uncertainty commu- nication. They are:

1. standardisation of terminology,

2. methods for measuring and representing uncertainty, and

3. methods for depicting uncertainty simultaneously alongside the estimates/data it

relates to in an understandable, useful, usable and meaningful way.

I have used these three impediments to organise the literature detailed in the following section.

2.3.1 Impediment 1: Standardisation of terminology

Uncertainty terms can be interpreted differently by different people in different circum- stances (Morgan and Henrion, 1990). A call for a standardisation of uncertainty termi- nology has been made by Buttenfield (1993) and Schneider and Moss (1999); however, suggestions in the literature for addressing this challenge either generally or within specific domains are limited.

Schneider and Moss (1999) have been very vocal in the climate science community for standardisation of quantitative and qualitative uncertainty terminology. They have been calling for a common language when describing results. For example, Schneider and Moss

(1999) prescribe that when referring to probabilities within the climate change literature, semantic description of the uncertainty should be mapped explicitly to the numerical probabilities. In total they recommend five categories that map to numeric probabilities, from “very low confidence” (less than 5%) to “low confidence” (5% to 33%), “very high confidence” (95-100% confidence), etc., and advocate these to be used in all climate science communications. See Figure 2.2 for terminology suggested for quantitative estimates.

(Source: Figure 3 from Schneider and Moss (1999).

Since standardised terminology is not a focus of this research I do not explore this beyond acknowledging its importance in the wider uncertainty communication challenge.

24 CHAPTER 2. LITERATURE REVIEW

Figure 2.1: Scale for assessing state of knowledge, source: Schneider and Moss (1999), Figure 3.

2.3.2 Impediment 2: Uncertainty representation

As outlined by Buttenfield and Beard (1991), standardised and effective methods for encoding/representing uncertainty is a current impediment to communicating to the non-expert audience. While much work has been done in this space, there still remains a lack of agreement about the effectiveness of these representation methods, or which methods should be used with the non-expert audience. Different researchers have taken different approaches to investigating uncertainty representation. Some have focused on evaluating the effectiveness of methods commonly used to represent uncertainty such as intervals, probabilities and frequencies. The geospatial domain, which has a long history in innovation in data visualisation, has explored the effectiveness of new visual variables for encoding uncertainty such as fuzziness, transparency and others. Other researchers have explored other aspects of the display rather than the symbols within the display, such as interactivity, representing scenarios and display comparisons. Many of these have been made possible due to technological advances. Finally, worth noting is the work in the climate change literature which has suggested a new visualisation for making disagreement between experts more transparent, which is a type of uncertainty not usually measured or communicated. Each of these is discussed in further detail below.

25 CHAPTER 2. LITERATURE REVIEW

High jacking the human visual communication channel Most uncertainty representa- tion research uses visualisation to address this challenge. Visualisation is a proven channel for communicating data driven insights to non-expert audiences, and can represent com- plex and dense information in a single view (Tufte (1983); Cleveland and McGill (1984)).

The human visual system is a very high-bandwidth channel to the brain, with a signifi- cant amount of processing occurring in parallel and at the preconscious level (Munzner,

2009). Data visualisation enables an audience and a communicator to take advantage of the highly evolved and sophisticated analysis capabilities of the human visual system

(Munzner, 2014). Visualisation is not the only method for representing uncertainty to a non-expert audience. For example, the semantic definitions of uncertainty above (Figure

2.1) are non-visual approaches to representing a measure of uncertainty in a form that a non-expert can interpret. These semantic versions of representing uncertainty are far less researched than the visual representations.

Intervals

Intervals are commonly used to represent uncertainty including; confidence intervals, credible intervals and predictive intervals. These have been more commonly researched than other existing representation methods. They are widely used in domains such as

finance (e.g., Du et al. (2011)) and weather forecasts, which are commonly utilised by the general public. User studies investigating intervals have shown that even when audiences are given a deterministic forecast (a point estimate rather than a range/interval), they expect information derived from data or future predictions to contain uncertainty

(Joslyn and Savelli (2010); Lazo, Morss, and Demuth (2009); Morss, Demuth, and Lazo

(2008)), and that audiences prefer ranges over deterministic information (Du et al., 2011).

Audiences find the informativeness of the interval more useful than the precision of a deterministic representation (point-estimate) (Yaniv and Foster, 1995). An interval quantifies the uncertainty the user knows is present, and therefore may narrow the range of expected values by giving the user guidance on where the bounds of that uncertainty are, rather than leaving them to guess.

Further to this, studies have shown that non-expert users are capable of effectively utilising predictive intervals without extensive training (Savelli and Joslyn, 2013), and studies have

26 CHAPTER 2. LITERATURE REVIEW shown that non-expert users provided with predictive intervals in weather forecasts are more able to identify uncertain forecasts, and are more decisive, compared to users provided with deterministic forecasts (Joslyn and LeClerc, 2011; Savelli and Joslyn, 2013).

Some research on intervals, however, is contradictory; for example, intervals have been shown to commonly lead to overconfidence (Alpert and Raiffa (1982); Soll and Klayman

(2004)). Furthermore, at least one study has shown that even researchers (who are quite familiar with uncertainty and uncertainty visualisation) frequently misunderstand visual representations of confidence intervals and errors bars (Belia et al., 2005). Finally, a comparison of text and visual intervals has shown that text versions are less open to misinterpretation (Savelli and Joslyn, 2013).

Frequencies and probabilities

Frequencies and probabilities are also common forms of encoding uncertainty. Some research suggests that frequencies (e.g. 1 in 10) rather than probabilities (e.g. 10% chance) improve performance for some problems (Gigerenzer and Hoffrage (1995); Hertwig and

Gigerenzer (1999)). Specifically, probabilities have been shown to be better understood compared to frequencies in weather forecasts (Joslyn and Nichols, 2009). David Spiegelhal- ter has done significant work in summarising the current methods for the visualisation of probabilities, relative frequencies and other graphs for the non-expert audience (Spiegel- halter, Pearson, and Short (2011)).

Exploring new visual variables for encoding uncertainty

The geospatial sciences have a long history in innovation in visualisation. An example is the work by Bertin (1981), who developed a catalogue of visual variables that could be used in mapping to encode quantitative variables. The uncertainty research has also explored

Bertin’s visual variables (Bertin, 1981) for their application in uncertainty representation.

These visual variables are: location, size, value, texture, colour, orientation, and shape.

Edge crispness (fuzziness), fill clarity, fog, resolution, transparency, saturation, boundary

(thickness, texture, and colour), transparency and animation have subsequently been suggested as specific visual variables for representing uncertainty (MacEachren (1992);

Slocum et al. (2008); Gershon (1992)), while other researchers have explored shapes and symbols (Newman and Lee, 2004; Sanyal et al., 2010).

27 CHAPTER 2. LITERATURE REVIEW

Much research has been done to explore the effectiveness of these visual methods of uncertainty representation (Brodlie, Osorio, and Lopes (2012); Potter et al. (2012); Zuk,

Carpendale, and Glanzman (2005); Pang, Wittenbrink, and Lodha (1997); Johnson and

Sanderson (2003); Johnson (2004); Evans (1997); Wittenbrink et al., 1996; Aerts, Clarke, and Keuper (2003); Sanyal et al. (2010); Leitner and Buttenfield (2000)). Despite this body of research, there are still contradictory results, and more work is needed to validate these methods (Buttenfield and Beard (1991); MacEachren (1992); Sanyal et al. (2010)).

Additionally, what the user is using the information for, as well as the medium they are accessing the information on, is being shown to effect the users interpretation of the represented uncertainty (Sanyal et al. (2010); Leitner and Buttenfield (2000)).

A full review of these studies is outside of the time and resources of this thesis. However, these studies may be of use to the development of the Australian Cancer Atlas, therefore additional research from this domain is included in AppendixA.

Cognitive overload

One of the concerns of adding more visual variables to data visualisation for the non- expert is cognitive overload (McGranaghan, 1993). Cognitive overload results when the user reaches an information threshold, beyond which they are not able to make sense of the information. This concern is not unfounded; complexity and density of representation methods seem to overwhelm novice decision makers, while experts are able to use the detail more readily in decision-making (Cliburn et al., 2002). Technological advances may provide a solution to this through the use of interactivity, animation, dynamic displays and sound, by providing additional channels for representing uncertainty information without interfering with the ability to see the features that are present in the display (MacEachren,

1992).

An un answered question in uncertainty representation and communication is whether uncertainty information is viewed differently to data generally (Sanyal et al., 2010). Some researchers believe that data quality is a particular example of data and should, therefore, be treated and visualised as data in general (McGranaghan (1993)). Others (Buttenfield and Beard (1991); Buttenfield (1993)) believe that people do not associate data quality in the same way they associate data in general and therefore data quality needs to be

28 CHAPTER 2. LITERATURE REVIEW visualized differently. This may also support the use of other display features such as interactivity, dynamic features, side by side comparisons and scenario representation, rather than encoding uncertainty using the visual variables described above.

Dynamic & interactive features

Technological advances in visualisation tools, interactivity and animation, as well as the capabilities of online platforms, fundamentally influences the types of approaches that can be used to visualise and communicate uncertainty. These technologies enable research to extend beyond the visual variables offered by Bertin (Davis and Keller, 1997), such as duration of a flashing symbol (DiBiase et al., 1992), and sound (Fisher (1994); Krygier

(1994)). One advantage of dynamic displays is that the dynamic feature can draw the reader’s attention to the areas of uncertainty, which the novice user often ignores (Davis and Keller, 1997).

Additionally, interactivity allowing the user to toggle between the estimates and un- certainty information provides an additional mechanism for the user’s comprehension

(MacEachren (1992); Van der Wel, Hootsmans, and Ormeling (1994)). These methods still require user testing to determine their effectiveness. Technology does not always enhance communication or meet the users’ needs, and in some cases, users actually prefer static maps over dynamic displays (Aerts, Clarke, and Keuper, 2003).

Display comparisons

Within mapping, Pang, Wittenbrink, and Lodha (1997) suggest that the way in which these visual variables are integrated into a map can be categorized into three groups: over- loading (bivariate maps), side-by-side comparison (map pairs), or seamless integration

(MacEachren, 1992). These approaches are also relevant to any non-mapping contexts.

The three broad visualisation approaches are defined as:

Overloading (Bivariate Maps) - report data and the associated uncertainty information within one map. Overloading is an approach that augments a base visualisation technique with an uncertainty visualisation technique but the data and uncertainty information are clearly separable. This is probably the most popular mechanism for uncertainty visualisation.

29 CHAPTER 2. LITERATURE REVIEW

Side-by-side comparison (Map Pairs) - two similar maps presented side by side. One map shows the data, and the other shows the associated uncertainty.

Seamless integration - the data and the uncertainty are displayed in a unified rendering.

Unlike the overloading approach in which uncertainty is superimposed on the graphical representation of the dataset, the seamless integration approach directly includes (i.e., integrates) the uncertainty in the visualisation rendering.

Among these three types of methods, research suggests that bivariate maps are the most popular approach (Pang, Wittenbrink, and Lodha, 1997). Multiple studies have found this method to be easier to interpret and enhances the users’ performance compared to separate maps (Kubicek & Sasinka, 2011; Viard et al., 2011; Evans, 1997).

Representing confidence when there is disagreement

As discussed above, some forms of uncertainty do not have a current quantitative measure but are much more ambiguous. For example, knowledge gaps in a domain, consensus among peers, discrepancies between independent studies, model uncertainties or disagree- ment between researchers, can be indicators of incomplete knowledge. However, this type of uncertainty cannot be summarised using error bars, probability distributions or other quantified measures. For these sources of uncertainty, new representation methods need to be developed.

One proposed solution to showing confidence when there is disagreement between study results is a graph which shows consensus in results for modelled information. This approach tries to capture disagreement or differences between methods and/or results, with the aim of making these disagreements visible to the non-expert audience. Schneider and Moss (1999) have developed such a graphic which aims to show where confidence, or lack of it, arises. The graphic (see Figure 2.2) involves plotting points on four axes, corresponding to; 1. confidence in the theory, 2. the observations, 3. the models and 4. consensus within a field (arranged like the points on a compass). The plotted points are then joined to create a shape, where the area indicates the overall degree of confidence in the final estimated results. This graph is intended to be used in scenarios where there is an absence of clear consensus – such as for climate-sensitivity estimates. Schneider and

30 CHAPTER 2. LITERATURE REVIEW

Figure 2.2: Summarising consensus, Source: Schneider and Moss (2000)

Moss (1999) proposed this graph as a way to communicate the confidence and level of evidence around climate policy. They successfully argued for this to be used in the 2001

IPCC (Intergovernmental Panel on Climate Change) report (Giles, 2002).

Critics of this four axis plot raised concerns that the area of the plotted shape may not accurately represent the overall confidence in the result if the uncertainties associated with the different axes are not independent. “Theory comes from observations, and both of these feed into models, so the different axes may depend on each other”, says Joe Perry, an ecologist at

Rothamsted Research, an agricultural research institute in Harpenden, north of London who studies the impact of genetically modified crops on biodiversity. “If they are dependent, the size of the shape will not be representative of the total probability”. (Giles, 2002).

The primary focus of many user studies which explore uncertainty representation is to develop generalisable methods of uncertainty representation that would be applicable to a wide range of uses. Studies aimed at evaluating specific uncertainty representation meth- ods often focus on: designing a visualisation (Buttenfield (1993); Fauerbach et al. (1996)), evaluating whether users are able to identify specific uncertainty values (Blenkinsop et al.,

2000), and thus assessing the impact of uncertainty representation on perceptions and

31 CHAPTER 2. LITERATURE REVIEW data identification (Hope and Hunter (2007)). However, effective representation may not guarantee communication, and the effectiveness of an appropriate representation method can depend on the question being asked (Sanyal et al., 2010). Therefore, the goal of developing a generalisable uncertainty representation approach may not be possible or may not be a solution to uncertainty communication on its own. Thus highlighting the importance of distinguishing Buttenfield (1993) ’s third impediment of uncertainty communication from uncertainty representation.

2.4 Uncertainty communication

Effective uncertainty representation does not guarantee effective uncertainty communica- tion (Sanyal et al., 2010). As highlighted by Buttenfield (1993)’s communication must be considered separately and as a unique challenge to representation.

Multiple studies have shown that specific uncertainty measures are less important than an understanding of the impact of uncertainty on decision outcomes (Leitner and Buttenfield

(2013); Pawson, Wong, and Owen (2011)). Successful uncertainty communication, i.e. in- tegrating uncertainty representations in a way that is meaningful and useful to the end user, is more than validating methods for uncertainty representation. The most effective visualisation for uncertainty can depend on the users’ needs and reason for accessing the information (Sanyal et al., 2010). Uncertainty information also may not be viewed by the user as an additional variable, but rather a piece of metadata that is intrinsic to the interpretation of the insights presented (Leitner and Buttenfield, 2000).

While there are many examples of user studies in the literature that explore the effectivness of uncertainty representation, there are limited resources in the literature which guide or support the development of effective uncertainty visualisations or other communication products. Approaches in the literature that are emerging that try to take uncertainty representation into uncertainty communication are the consideration of multiple possible futures and methods for uncertainty communication design. The first of these I discuss below, and the second is discussed in the next section.

32 CHAPTER 2. LITERATURE REVIEW

Presenting multiple possible futures The representation of a quantified measure of un- certainty can be valuable to the user, but requires the audience to connect the dots in terms of what that uncertainty means to potential futures. As noted above, one approach emerging in the literature that goes beyond uncertainty representation is to communicate the impact that uncertainty could have on possible future realisations. Specific uncertainty measures may be less important than an understanding of the impact of uncertainty on decision outcomes across a range of possible future conditions (Lempert, Popper, and

Bankes (2003); Pawson, Wong, and Owen (2011)). Viewing realisations of several potential future outcomes enables users to focus on the effect of the uncertainty rather than only the representation of the uncertainty measure. This shift from producing and evaluat- ing a single future, to envisioning a range of possible futures can support more robust decision-making (Couclelis (2003); Lempert, Popper, and Bankes (2003)).

Presenting multiple futures reframes uncertainty as a relationship between decision out- comes and differing future conditions. This offers a new framework for uncertainty communication, which is more in line with the way policy makers think about decisions under uncertainty (Cohen, Freeman, and Wolf (1996); Jonassen (2012)).

This approach, however, has its challenges. Numerous possible uncertain outcomes place an additional burden on the user, who now needs to make sense of multiple outputs. The available information can lose its usefulness when too many options are presented as almost any scenario is possible to produce (Davis and Keller, 1997). Additionally, dynamic visualisation displays that present multiple possible futures are only made possible with the development of interactive interfaces. This not only increases the cognitive load on the user but also increases the work required of the designer, where the task is not simply to communicate one message, but multiple messages. Considerable work is needed to determine what “reasonable” uncertainty/error values are, and how these translate into reality in the field (Davis and Keller, 1997), and these considerations may be beyond the expertise of the communication designer.

33 CHAPTER 2. LITERATURE REVIEW

2.5 Uncertainty Communication design

The importance of the user’s needs in uncertainty communications is well recognised

(Davis and Keller (1997); Carroll et al. (2014); Sanyal et al. (2010)). Each user of scientific output may have considerably different requirements. Therefore, an understanding of user requirements is necessary to make effective uncertainty communication possible (Carroll et al., 2014). There is a lack of case studies in the literature that explore communication design approaches for addressing the challenge of uncertainty communication, and additionally there is a lack of examples in the design literature that extend communication design frameworks to consider uncertainty communication. Often in uncertainty communication research, the user is only consulted in the evaluation or testing stage and not in the design or development phase. Design guidelines for scientific communication are rare and those that are available do not consider uncertainty information at all.

Guidelines for assessing uncertainties

In order to create communication material that effectively communicates uncertainty information, firstly it is important to understand the types of uncertainty present within the research. There are only two frameworks identified in the literature, at the time of writting this, which support this process: that proposed by Schneider and Moss (1999), and that proposed by Lapinski (2009).

Schneider and Moss (1999) propose guidelines for assessing uncertainties for authors of the IPCC Assessment Reports. These guidelines map out a step by step process. These steps are:

1. Identify the most important factors and uncertainties that are likely to affect the

conclusions

2. Document ranges and distributions in the literature

3. Make an initial determination of the appropriate level of precision

4. Characterise the distribution of values that a parameter, variable, or outcome may

take

5. Rate and describe the state of scientific information (using the terms outlined in

Figure 2.1)

34 CHAPTER 2. LITERATURE REVIEW

6. Prepare a “traceable account” of how the estimates were constructed

7. OPTIONAL: Use formal probabilistic frameworks for assessing expert judgment

These assessment guidelines are targeted at researchers. They are valuable for diagnosing sources of uncertainty and their implications on the research outputs. However, there is no reference to the user or their needs in these assessment steps, and the steps may be outside of the expertise of a communication designer.

Designing scientific communications for the non-expert user

One approach, specific to scientific uncertainty visualisation design, is the Uncertainty

Visualisation Development (UVDS), developed for a military context (Lapinski,

2009).

The UVDS has eleven steps which include:

1. identifying the uncertainty visualisation task

2. understanding the data that need to have their uncertainty visualised

3. understanding why uncertainty needs to be visualised and how the uncertainty

visualisation needs to help the user

4. deciding on the uncertainty to be visualised

5. deciding on a definition of uncertainty

6. determining the specific causes of the uncertainty

7. determining the causal categories of the uncertainty

8. determining the visualisation requirements

9. calculating, assigning, or extracting the uncertainty

10. trying different uncertainty visualisation techniques

11. and obtaining audience opinions and criticisms.

These steps help the designer understand the data as well as uncertainty. Authors applied

UVDS for uncertainty visualisation of the Canadian Recognized Maritime Picture (RMP).

A critique of the strategy suggests that an additional step is needed within the framework for embedding criticism and feedback from users (Lapinski, 2009).

35 CHAPTER 2. LITERATURE REVIEW

User centred design communication design for non-expert audiences of scientific ma- terial Outside of the scientific literature, the field of design has insights that can contribute to this challenge. The Non-Expert User Visualisation (NEUVis) framework (Gough, Wall, and Bednarz, 2014) proposes a framework for a design led approach to the development of scientific visualisations for the non-expert user. Central to a design led approach is that each user of the data may have considerably different requirements. Consideration of, and consultation with, the end user is integral to the design process (Gough, Wall, and

Bednarz, 2014), not only the evaluation of the final design.

The design led approach provides a structured framework for the designer to develop scientific visualisations for a non-expert audience. The framework is built around the idea that non-expert users need to be introduced to the nature of the data and the results of data analytics, as well as its significance and often how it is practically interpreted. The kind of problems that arise in science communication design for the non-expert user can be challenging for the communication designer, and this framework provides practical guidance to navigate that challenge (Gough, Wall, and Bednarz, 2014). The framework, however, does not explicitly consider uncertainty.

The NEUVis design method utilised six questions to guide the communication designer through the considerations important to scientific visualisations for the non-expert audi- ence. These questions are:

1. How does this knowledge benefit the user?

2. What about this data is relevant or important?

3. What is otherwise inaccessible to the user?

4. What can the user access on their own?

5. What myths and misconceptions are relevant to the data?

6. What is the potential for impact and what are the risks of this visualisation?

The NEUVis framework is a tool that helps the designer connect the data and the users’ needs and then transition these to a visualisation prototype.

36 CHAPTER 2. LITERATURE REVIEW

Figure 2.3: Components of a spatial statistical analysis (Modified from Figure 12.4 in Ord 2010, page 179.)

2.6 Spatial epidemiology and disease mapping

Place matters in health, and spatial epidemiology is the field of study that enables health and place to be connected. Spatial epidemiology aims to quantify and explain geographic variation in diseases and the relationship with environmental, demographic, behavioural, socioeconomic, genetic and infectious disease factors (Beale et al., 2008) and requires the inclusion of variables from disparate data sources (Green, Richard, and Potvin, 1996). Table

2.3 outlines the common data sources that are combined in disease mapping projects (Lai,

So, and Chan, 2008). The combination of health data, Geospatial Information Systems (GIS) and spatial statistics methods enables linkage of these data from disparate sources and the exploration of complex questions in health promotion, public health, community medicine, epidemiology and a range of other fields (Nykiforuk and Flaman, 2011). Depending on the data available, spatial epidemiological analyses can be conducted at local, regional or the international scale.

The data are only one of three integrated components in a spatial statistical analysis as outlined in Figure 2.3, the other two components being analysis and visualisation such as maps (Ord, 2010). Any one of these components may drive subsequent development of the other two, and multiple circuits will occur before the process is completed (Ord, 2010).

Disease mapping specifically deals with the mapping of health outcomes and their co- factors and is an integral component and subset of spatial epidemiology. The visual maps generated through disease mapping show geographical variation in health, disease

37 CHAPTER 2. LITERATURE REVIEW and other variables of interest, such as socioeconomic status or access to services (Elliott and Wartenberg (2004)). Disease mapping is used to explain and predict patterns of diseases outcomes across geographical areas, identify areas of increased risk, and assist in understanding the causes of diseases (Myers et al. (2000); Robinson (2000)). Disease maps can tell stories and communicate relationships in a way that otherwise may not be possible with other data presentation techniques (Mullner et al., 2004). Their usage can generally be categorised into four main themes: (a) disease surveillance, (b) risk analysis, (c) health access and planning, and (d) community health profiling (Nykiforuk and Flaman, 2011).

Disease modeling extends the disease-mapping application to (a) predict the current prevalence of a disease that may not be directly measurable such as cancer survival, (b) predict the future spread of disease, (c) identify factors that may foster or inhibit disease transmission, (d) pinpoint high-risk areas for disease prevention or intervention, (e) target control efforts, (f) identify gaps, (g) increase stimulus for data collection in these areas

(Myers et al. (2000)); Robinson (2000)) as well as advocate for change of health policy or access to services.

There are a few features that set disease mapping apart from other GIS applications, such as ecology or geology. Firstly, areas are often irregular shapes and sizes, as opposed to a regular gridded lattice. Secondly, map regions often have varying numbers of neighbours rather than constant numbers. Thirdly, any true boundaries in the underlying health of disease measures are likely to be obscured by random noise in public health data, whereas images, ecology and geology are more likely to have clearly defined boundaries that align with physical boundaries.

Table 2.3: Examples of the types of data used in spatial epidemiological studies

Data Description

Health or Vital statistics, notifiable diseases, patient registries, and health

disease surveys from various international or government agencies

(Location is usually based on residential address]

Field Surveyed data on disease occurrences with location coordinates

epidemiology collected with a GPS

38 CHAPTER 2. LITERATURE REVIEW

Data Description

Spatially Digital cartographic data available from various international or

referenced base government agencies (Often includes contours, rivers, and built

environment features)

Remotely sensed Land cover, elevation, soil type as reflected by satellite imageries

Environmental Interpreted data on land use, water quality, air quality, climate,

and natural geology, etc.

resources

Census or Sociodemographic and economic data

demographic

What are disease maps used for?

The visual representations of the spatial variation of disease and their cofactors enables the outputs of sophisticated spatial statistical analysis to be accessed by non-expert audiences.

Disease maps are found on the desks of healthcare providers, public health managers, policy makers and other health professionals in practice (Sabesan and Raju (2005); Davis and Keller (1997)). They have been shown to be effective tools for communication of public health risks (Ali et al. (2001); Choi, Afzal, and Sattler (2006)) as well as disease pre- vention and management through: facilitating and guiding design and implementation of interventions, evaluating health outcomes, estimating population risks, scenario building, planning interventions, policy recommendations, and summarizing and presenting health indicators (Garnelo, Brandão, and Levino, 2005).

One of the earliest and most well-known disease maps is John Snow’s map of cholera deaths. The map is a classic example of the power of data visualisation to enact policy change, and which ultimately lead to the removal of the handle of the contaminated water pump causing the outbreak, in September 1854 (Tufte and Robins, 1997).

After this date, the use of disease maps as a decision-making tool for public health continued to emerge slowly. In 1898 some 40 years later, the United States Government

Printing Office published a collection of maps based on data from the eleventh U.S.

39 CHAPTER 2. LITERATURE REVIEW

Census. These included detailed, hand-rendered, maps which compared mortality rates of the population to several infectious diseases, including measles, diphtheria, croup, consumption, scarlet fever and many more (Gannett, 1903). Today, disease maps are widely published and used, and a multitude of examples can be identified via a simple google search for ‘disease maps’.

Future Challenges

The field of disease mapping has come a long way since John Snow’s initial cholera map influenced public health policy and saved lives. The methodologies used to generate these maps and the estimates they visualise have evolved significantly. How best to represent and communicate uncertainty associated with the estimates and insights in these maps has emerged as a well recognised current challenge for the field (Kraemer et al. (2016);

Carroll et al. (2014)), alongside the following:

• How can researchers better communicate findings from disease cartography to

public health officials and those in the field?

• New data are becoming available at an ever-faster pace and with high volume: how

can we better direct data acquisition towards the needs of researchers?

• As data volume increases, what specific computational tools are necessary to allow

rapid processing of disease and covariate data?

• How can uncertainty be better integrated into disease-risk maps?

• How can we best integrate human mobility data into a variety of disease models in

a sensible way so that we can guide public health interventions and control?

2.6.1 Uncertainty Communication in Disease Mapping

One of the key objectives of this thesis is to inform the design of the Australian Cancer

Atlas, a spatial epidemiological map of cancer incidence and survival across Australia.

As the statistical methods and data types that underlie spatial epidemiology and disease mapping have developed and become more complex, methods for visualising and com- municating the statistical uncertainty associated with these models has not (Carroll et al.,

2014). A 2014 systematic review of visualisation and analytical tools for infectious disease epidemiology conducted by Carroll et al. (2014), identified no examples of visualisation

40 CHAPTER 2. LITERATURE REVIEW tools that addressed or enabled the communication or inclusion of uncertainty. Subse- quently, this review identified the inclusion of uncertainty information as a key challenge for spatial epidemiology in the future.

The ability to quantify the uncertainty in disease maps depends on the statistical methods and data used to model the studied disease. Many methods in disease modelling often generate full probabilistic predictions that can be used to quantify uncertainty, however, the interpretation of these probabilistic predictions by the non-expert audience is difficult

(Kraemer et al., 2016).

Within disease mapping, current discussions around uncertainty sources identify miss- ingness, data aggregation, data accuracy, and the impact of residential address errors in geocoding as common concerns (Atkinson and Graham (2006); Tatem et al. (2011);

Zinszer et al. (2010); Eaton, Plaisant, and Drizd (2005)). In order to draw meaningful and accurate conclusions from the data, visualisation tools should represent uncertainty clearly (Carroll et al., 2014). Particularly relevant to spatial epidemiology is the interplay between uncertainty of large geographical areas that have dominance in a map due to their size. Studies have shown that users overestimate rates in small populations, which often correspond to large, sparsely populated regions, resulting in visual biases in interpreting maps (Olsen, Martuzzi, and Elliott, 1996). Studies suggest that users may not be aware of the need for better representation of missingness and uncertainty, and more research is required to evaluate the best means of integrating this type of information with the mapped information (Carroll et al., 2014).

The importance of the users’ needs in developing effective uncertainty communication is well recognised across domains and this is no different in disease mapping. Both qualitative and quantitative studies have found that user-friendly, reliable tools, with high-quality online documentation, and easy access to the source code are important to users of disease maps (Kothari et al. (2008); Hu et al. (2007); Robinson, MacEachren, and

Roth (2011); Koenig, Samarasundera, and Cheng (2011)). Users of disease maps have expressed a strong interest in dynamic, interactive graphics that allow them to review their data at different levels (e.g. population or individual level) (Karlsson et al. (2013);

Gesteland et al. (2012); Hu et al. (2007); Robinson (2009); Schneiderman, Plaisant, and

41 CHAPTER 2. LITERATURE REVIEW

Hesse (2013); Koenig, Samarasundera, and Cheng (2011)), and that help make abstract data digestible.

42 Chapter 3

Research Activity 1.A: Grey literature re- view of internet published cancer maps.

Cancer Maps are powerful communication tools that are often used within a public health agenda to inform, educate and/or support advocacy for a change in public health policy (Davis and Keller, 1997). The following chapter outlines a grey literature review of currently available cancer maps, which provides an understanding of the current methods and tools used to develop, visualise and disseminate cancer maps.

3.1 Introduction

Cancer maps are commonly published on the internet rather than in academic peer- reviewed journals due to their use as a communication tool for non-academic audiences.

Additionally, as this research is focused on uncertainty communication to the non-expert audience, cancer maps found in the scientific literature cannot be assumed to be targeted towards non-experts. Therefore, in order to provide an overview of the current practices used to generate these cancer maps, a systematic grey literature review was conducted that investigated cancer maps published on the internet and available between 01/01/2010 and 01/05/2016.

43 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Maps are effective and powerful tools for communicating geographical variation in health and disease. They enable non-expert decision-makers to access, visualise and communi- cate the outputs of often sophisticated geospatial statistical analyses. Both the statistical methods and visualisation techniques used to generate these maps are highly varied, with differences depending on the disease being mapped, the intended message, the target audience, and the person or organisation publishing the material.

Improvements in statistical methods, data visualisation, geographical information system

(GIS) techniques and interactive web technologies have enabled health and disease maps to increase in popularity and utility. Health and Disease maps are now commonly used by governments, not-for-profit organisations, and research institutions to enable the use of statistical outputs in decision making, and raise community awareness around target issues. Depending on the data and technology used to generate the maps, their interactive capabilities range from simple downloadable pdf documents to dynamic and interactive web interfaces.

This systematic grey literature review aims to support the development of the Australian

Cancer Atlas, and provide a summary of the current practices, approaches, and technology platforms used to create and disseminate cancer maps for the non-expert audience. This review also aims to investigate if, and how, uncertainty information is included in these maps.

3.2 Aim and Research Question

Aim: To summarise the cancer atlases available publicly on the internet in terms of: geogra- phy covered, publishing organisation, data date range and publication date, geographical resolution, reported measure, statistical methods, inclusion of uncertainty, smoothing methods, interactivity features, and additional functionality and technology platform used.

Research Question: What cancer maps are currently available to the public via the internet, and what methodologies, tools and technologies have been used to generate them?

44 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

3.3 Methods

An investigation strategy of keywords, iteratively developed search strings, and an in- clusion and exclusion criteria was developed to identify internet published and publicly available cancer maps. Data from the identified cancer maps and their supplementary or supporting material/host websites was also extracted, summarised and recorded. The complete research strategy was documented to prevent any bias. As this grey literature review was targeted at identifying publicly available cancer maps, all searches were conducted using google - no other search engines were explored.

The systematic review was based on the Systematic Review Guidelines defined by the organisation Collaboration for Environmental Evidence (CEE), with modifications made where appropriate to accommodate a grey literature review rather than a scientific litera- ture review1.

Search Description

As noted earlier searches were not restricted by country, and were conducted from Aus- tralia.

The following list details the final search strings. These strings were developed through an iterative process of trialling and refining searches until the desired specificity was reached.

See Appendix B.1 for a full description of the search protocol including search string development and associated hits.

Search Strings

1. intitle: spatial AND epidemiology AND cancer AND map OR mapping OR atlas

-campus

2. allintitle: cancer AND map OR mapping OR atlas -campus -kinase -kinases -concept

3. allintitle: spatial AND cancer AND statistics

4. allintitle: spatial OR geographic AND cancer AND variation OR distribution

1http://www.environmentalevidence.org/wp-content/uploads/2017/01/ Review-guidelines-version-4.2-final-update.pdf

45 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

5. allintitle: spatial AND epidemiology AND cancer AND map OR mapping OR atlas

-campus

6. intitle: cancer AND atlas

Exclusion/Inclusion Criteria

Pages were selected for data extraction if they met the following criteria:

• contained a visual geographical map of cancer incidence, risk, mortality or counts

(either pdf, static image or interactive web interface),

• accessible without a password or log in,

• published or updated on or after the 1st of January 2010.

3.4 Summary findings

Grey literature searches identified 33 Cancer Atlases which were publicly available on the internet. Three of the identified atlases were not published in English; however the details of these maps were extracted where they could be determined. A database detailing all identified atlases is available in Appendix B.2.

3.4.1 Statistical uncertainty

Cancer atlases were considered to report uncertainty to the non-expert user if they included a measure of statistical uncertainty either within or alongside the map. Maps that only reported this information within the supplementary material were not considered to have directly attempted to report uncertainty.

The review did not reveal any novel uncertainty visualisation approaches or visualisations.

Maps used standard and well known measures including credible intervals and standard deviation, statistical significance, boxplots and distributions. These maps ranged from static pdfs or infographics to interactive online resources. The interactivity of the more modern maps enabled uncertainty information to be incorporated without cluttering the screen, such as in a tool tip feature.

46 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Figure 3.1: Example of CI visualisation for uncertainty representation in cancer mapping (1/3). Source: Alberta Health IHDA Geographic. (2012)

Figure 3.2: Example of CI visualisation for uncertainty representation in cancer mapping (2/3). Source: Pensylvania Cancer Atlas

Close to half of the atlases identified (42%, n=14) included some measure of uncertainty.

The most common measure used to represent uncertainty were credible or confidence intervals (CIs). CIs were either visualised by including their bounds in a scatterplot or graph of estimates vs region (see Figures 3.12, and 3.23 positioned next to the map, or reported numerically through the CI upper and lower bounds listed in a data table (see

Figure 3.44). Of those that visualised the CIs, 30% (n=10) embedded the visualisation

2Alberta Health IHDA Geographic. (2012). Age-Standardised Incidence Rate of COPD, 2011. Retrieved from: http://www.health.alberta.ca/health-info/IHDA-geographic/COPD/incidence-agestandard/ atlas.html?epik=0GJSpE_IW34lx 3Centres for Disease Control and Prevention (CDC). (n.d). United States Cancer Statistics: An Interactive Cancer Atlas (InCA). Retrieved from:https://nccd.cdc.gov/DCPC_INCA/ 4Pennsylvania Cancer Atlas. (n.d). Retrieved from: https://www.geovista.psu.edu/grants/CDC/ ?epik=0lJSpE_IW34lx

47 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Figure 3.3: Example of CI visualisation for uncertainty representation in cancer mapping (3/3). Source: Centres for Disease Control and Prevention (CDC). United States Cancer Statistics: An Interactive Cancer Atlas (InCA)

Figure 3.4: Example of an interactive data table with CI upper and lower bounds used in cancer mapping. Source: Pennsylvania Cancer Atlas within a tool tip function which visualised the CI when the mouse hovered over the relevant area (see Figure 3.55).

Methods for representing sources of uncertainty information can be visualised or com- municated in different ways, examples identified through this grey literature review are listed below.

5International Agency for Research on Cancer. (2017). Atlas of Cancer Mortality in the European Union and European Economic Area 1993-1997, Annex 4 - Cancer mortality maps by site. Retrieved from: http: //www.iarc.fr/en/publications/pdfs-online/epi/sp159/

48 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Figure 3.5: Example of standard deviation visualised in cancer mapping

Figure 3.6: Example of boxplot used in cancer mapping. Source: Source: Pensylvania Cancer Atlas

Table 3.1: Implicit and explicit measures of uncertainty.

Measure Example

CI Interval Figures 3.1, 3.2, 3.3

Statistical Significance Textured overlay on top of coloured regions used to

indicate statistical significance

Distribution Figure 3.4

Boxplots Figures 3.4, 3.5

Sample Size Textured overlay or lack of colour on a region, was

used to show regions with small sample size

Standard deviation Figure 3.6 - the second map in the bottom right

corner shows standard deviation

49 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Geographical coverage

Identified cancer atlases covered geographies from all around the world: 4 were global,

3 from Australia (AUS), 11 from the United States (US), 2 from Canada (CAN), 7 from the United Kingdom (UK), 2 from Spain, 1 from Switzerland, 1 from Germany, 1 from

Norway, and 1 covering the European Union. Not all maps had a national focus and

10 covered a region or state rather than an entire nation. The states or counties/regions covered were South Australia (AUS), Queensland (AUS), Ontario (CAN), Valencia (Spain),

Pennsylvania county Massachusetts (US), New Hampshire (US), Cape Cod (US), Missouri

(US), Florida (US), New York State (US) and Arizona (US).

Publishers

The majority of atlases were published by non-commercial organisations, including not- for-profits (NFPs), government, research organisations, advocacy groups or a partnership between an NFP & government. Only one map was published by a commercial entity

(Maps of Cancer Mortality Rates in Spain), in this case a media organisation El Pais6.

Reported measures

The majority of maps identified in this study reported age adjusted rates of either incidence, mortality or survival. The list below summarises these measures, and provides a definition of each measure.

6https://elpais.com/

50 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Table 3.2: Measures used to report cancer statistics

Measure Details

(Incidence Rate)i 1. IR (Incidence Ratio) (IR)i = Average Incidence Rate , Cancer incidence rate in region i over the average cancer

incidence rate for the total region

2. SIR (Standardised IR standardised by age structure in each region i

Incidence Ratio)

(Cancer related mortality)i 3. RER RER = Average cancer related mortality (Relative Excess Risk) Represents the estimate of cancer related mortality within

five years of diagnosis

Also referred to as ‘excess hazard ratio’

4. Age Adjusted Relative RR standardised by age structure in each region i

Risk

5. Rate per 100,000 Cancer incidence per 100,000 population

6. Age Adjusted Rate per #5 standardised by age structure or region

100,000

7. New cancer cases per Specific methods could not be found

100,000

8. Count Crude cancer counts

9. Below or above Alternative expression of the SIR

Expected

Smoothing

Smoothing is an important tool used in spatial epidemiology to increase statistical power, and is a method by which data points are averaged with their neighbours. Smoothing is an important component of the Australian Cancer Atlas. Neighbours are often defined as geographical neighbours, but can also be temporal or based on other factors such as socio-economic features or rurality. Smoothing is commonly used in the generation of small area estimates when the sample size for individual regions is small, thus the

51 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. statistical power is low. It is also useful in health maps when number of cases in individual areas are small enough to jeopardise individual privacy.

Four of the identified maps reported spatial smoothing and one used temporal smoothing

(by calendar year but not geography); 22 did not use any form of smoothing within their methods, and seven had insufficient information available to determine whether smooth- ing was used. Of the cancer atlases that used model-based smoothing, three used the BYM model (Besag, York, and Mollie, 1991). This model incorporates a spatially structured component, commonly incorporating adjacent areas using a conditional autoregressive

(CAR) prior. Spatial smoothing methods used in the identified maps are outlined in Table

3.3.

Table 3.3: Smoothing methods used in health atlases.

Atlas Title Smoothing Method Reference

All Ireland Cancer Atlas BYM Besag, York, and Mollie

1995-2007 (1991)

The Environment and Health BYM Besag, York, and Mollie

Atlas of England and Wales (1991)

Atlas of Cancer in Queensland 1. BYM (incidence), 2. 1. Besag, York, and

Poisson piecewise with Mollie (1991) , 2. Fairley

BYM components et al. (2008)

(survival).

Atlas of Cancer Mortality in the Examined regional 1. Pennello, Devesa, and

European Union and the variation by: 1. Gail (1999), 2. Similar to

European Economic Area Poisson-gamma model Langford, Unwin, and

1993-1997 (no spatial structure), 2. Maguire (1990)

Multilevel model with 3

geographic hierarchies

52 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Visualisation tools and platforms

There are a range of methods and approaches used to visualise cancer maps. Visualisation platforms are rapidly changing as GIS technologies, graphic design tools, and interactive web capabilities continue to develop. These changes are giving rise to mapping and design tools that can generate customised and interactive web based maps. The skills required to generate sophisticated and professional outputs using these emerging platforms and tools vary, however generally over time these platforms become easier to utilise.

The development of tools and technologies for generating visual cancer and disease maps has progressed rapidly. It is very common in the most recently published maps to have fully interactive web interfaces where users can customise the map to display the features they are interested in such as, population demographics, geographical resolution, cancer of interest, outcome measure(s) (survival, mortality and/or incidence) as well as the ability to compare multiple customised maps or explore changes over time.

The identified cancer maps can be classified into three categories based on their sophisti- cation; static infographics or downloadable pdfs, interactive maps (and/or dashboards) built on an existing GIS, data visualisation platform or tool such google maps, ESRI7,

ArcMap 9.38, or Instant Atlas9, or custom built web products using tools such as d3.js10 + leaflet11.

3.5 Implications for the Australian Cancer Atlas

The previous grey literature review provided valuable insight into the current best practice for generating and visualising publicly available cancer maps. The insights from this review can inform the design of the Australian Cancer Atlas by providing a benchmark for current practice. Many features were identified in this review that provide inspiration to the design of the Australian Cancer Atlas in regards to both positive and negative design examples. I have summarised these insights as design considerations and arranged them

7https://www.esri.com/en-us/home 8http://arcmap.software.informer.com/9.3/ 9https://www.instantatlas.com/ 10https://d3js.org/ 11http://leafletjs.com/

53 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. in terms of key features of a cancer map, see Table 3.4 below. Design considerations that are common in data visualisation, such as colour palette selection, have not been included.

Table 3.4: Design Considerations for the Australian Cancer Atlas.

Feature Design considerations

The Map - Does the map resemble the underlying geographical area it

represents or is it scaled or modified? For example, scaled by

population or standardised so all regions have the same area, or other

modifications that ensure the visual dominance of large geographical

areas do not bias interpretations.

———— ———————————————————-

The Legend - Does the colour scale enable easy comparison between regions?

- Does the transition of colours exaggerate or infer differences in the

report measure that is not-proportional to the real differences between

regions?

- Are the categories in the legend easy to interpret. Avoid giving one

colour a numerical range and avoid decimal places.

- Is the colour selected for the mid point/average intuitively a neutral

colour?

———— ———————————————————-

Reported - Does the report measure used contain any linguistic uncertainty or

Measure ambiguity? Will all audiences interpret uniformly?

- Is it easy to determine if the measure is modelled or count data?

- Is it easy to find further details of the measure, how it was calculated

and how it should be interpreted?

- Is mortality or survival most appropriate for your target audience?

———— ———————————————————-

Statistical - Is uncertainty important to all target audiences?

Uncertainty - What measure of uncertainty is most appropriate for the chosen

report measure?

54 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.

Feature Design considerations

- Does uncertainty make sense when applied to the chosen report

measure?

- Is it explicit? Does the uncertainty representation method chosen

require domain specific knowledge to understand how it influences

interpretation of communicated estimates?

———— ———————————————————-

Cancer Type - Will the map enable different cancers to be shown? If so, select a

non-gendered cancer to load on the landing page.

- Is it clear that the cancer type can be changed?

- Should links to cancer specific resources be included?

———— ———————————————————-

Comparing & - Is it important to enable audiences to compare regions or covariates

Searching such as rurality, gender, age, socio-economics, etc.?

- Is it important to be able to search a specific region?

- How will users compare cancer types and regions? Are there

comparisons that aren’t appropriate to enable?

———— ———————————————————-

Credibility - Is it clear who the publisher, source of the data, and analysts are?

———— ———————————————————-

Other - Can the audience download the map or data? What can they

download? How customisable will it be?

3.6 Conclusion

In this chapter I explored the current practices in cancer mapping. This grey literature review revealed that less than half (42%) of maps included uncertainty, and maps with in- teractivity contained uncertainty more often than static maps, suggesting that interactivity provides an extra visual channel for coding this information. There was no uncertainty visualisation method that could be considered a common practice within cancer mapping,

55 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. but rather both a range of uncertainty measures and visualisation were used. These measures included: standard deviation, credible/confidence intervals, error bars, distribu- tions and boxplots. Visual representation mechanisms used for encoding uncertainty in these maps included: colour, bivariate maps, tool-tips interactivity, and error bars. This review provided valuable insight the Australian Cancer Atlas team in terms of current best practice.

56 Chapter 4

Research Activity 1.B: User centred uncer- tainty communication design (Australian Cancer Atlas as a case study)

4.1 Introduction

Building on the insights from the grey literature review detailed previously, I applied a user-centred communication design for embedding uncertainty information for the non-expert audience into the Australian Cancer Atlas. This research activity had three distinct steps:

• A. Project partners Workshop - Conduct a workshop with project partners to identify

target audiences and identify sources of uncertainty within the ACA using the

taxonomy provided by Morgan and Henrion (1990).

• B. NEUVis design framework - Apply the NEUVis design framework to build

audience profiles and map out user-needs and information seeking behaviours of

the target audiences identified in the workshop

• C. Focus groups - Conduct focus groups with a subset of the target audiences to

validate their needs, information seeking behaviour and current understanding

57 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

of statistical uncertainty and risk within cancer maps. This helped to validate the

assumptions made in Step B.

The importance of incorporating uncertainty into scientific communications for the non- expert audience is well recognised across domains (Hunter and Goodchild (1996); Eiser et al. (2012); Morgan and Henrion (1990); Grubler, Ermoliev, and Kryazhimskiy (2015)) including spatial epidemiology (Carroll et al., 2014). Additionally, the importance of considering the users’ needs in designing a successful communication product is also well recognised (Davis and Keller (1997); Carroll et al. (2014); Sanyal et al. (2010)). Not with standing this, there is a lack of guidelines or case studies that can support scientist or communication designers to navigate this complicated communication challenge. The following chapter is broken into three sections, both of which use the Australian Cancer

Atlas as their focus.

Firstly, I explore the use of the Morgan and Henrion (1990) typology of uncertainty sources as a tool for identifying uncertainty sources present within the Australian Cancer Atlas.

This is important in the communication design process as different uncertainty sources can impact the interpretation of the results in different ways, so the communication message should be informed and tailored to uncertainty specific to that project. Within this first step I also identify target audiences with the project stakeholders. Secondly, I explore the application of the NEUVis design framework (Gough, Wall, and Bednarz, 2014) to place the user’s needs at the centre of the communication design process. Thirdly, I conduct focus groups with a subset of the target audiences to validate their needs, information seeking behaviour and current understanding of statistical uncertainty and risk within cancer maps.

4.2 Methods

The following methods section is arranged into three tasks: A - workshop with project partners, B - application of the NEUVis design framework, and C - focus groups.

58 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

4.2.1 Task A: Workshop with project partners: identifying target audiences and uncer- tainty sources in the Australian Cancer Atlas

A collaborative workshop brought together eight participants from all Project Partners.

Participants included Senior Research Fellow (epidemiology), Head of Research and Post- doctoral Research Fellow (epidemiology) from the Cancer Council Queensland (CCQ),

Project Officer from the National Health Performance Authority (NHPA), and Distin- guished Professor (Statistics), Senior Lecturer (Health), Senior Research Fellow (Data

Visualisation) and MPhil Candidate from the Queensland University of Technology (QUT).

Workshop participants were asked to consider the following questions:

1. Why is communicating uncertainty an important problem?

2. Who are the audiences of the Australian Cancer Atlas and what are their characteris-

tics?

3. Can these audiences be grouped by the level of information detail they require?

4. What will the Atlas report (output measure or measures)?

5. What are the sources of uncertainty within the Atlas?

Topics 1 to 4 were discussed together with the entire group. In order to discuss and elicit responses for topic 5, workshop participants were broken into groups of two and given the task of identify sources of uncertainty in the ACA relevant to two of the following uncertainty sources as defined by Morgan and Henrion (1990):

1. Measurement error

2. Systematic error

3. Variability (natural variation)

4. Model uncertainty

5. Disagreement

6. Inherent randomness

7. Linguistic imprecision

59 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Workshop participants were not provided with any reading material prior to the workshop.

The groups were given an hour to discuss two uncertainty sources (30 min per source).

For each uncertainty source category, participants were asked to list the specific sources from within the ACA. Each group then presented their results back to the workshop participants. Any disagreements, questions or lack of clarity were discussed among the full group.

A full workshop programme can be seen in Appendix C.1

4.2.2 Task B: Application of the NEUVis design framework - a user-centred approach to uncertainty communication design

The target audiences identified in the Project Partners workshop (described above) were further developed by myself and Dr. Phillip Gough using the NEUVis Design framework

(Gough, Wall, and Bednarz, 2014). For each audience, their needs, current information seeking behaviour and the impact of uncertainty information on their understanding of the insights from the ACA were mapped out using the 6-question method within the

NEUVis design framework (Gough et al., 2016). Each question was considered in terms of the data as well as the uncertainty information associated with the insights. These questions helped to evaluate how the intended data visualisation for the Australian Cancer

Atlas relates to each of the user groups in their specific context.

A seventh question was added to Gough’s method, “Potential for change”, which aimed to rank audiences in terms of the impact that an audience can have if they are armed with the information presented in the Australian Cancer Atlas. The aim of this question was to identify audiences that have the greatest potential for positive change, but also those for negative change (i.e., audiences that represent a risk if their needs are not met, or they misinterpret the information). The final seven questions considered for each audience, for both the data/modelled estimates and the associated uncertainty information are listed in table 4.1

60 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Table 4.1: Template - NEUVis questionnaire

Design question

1. How does this new knowledge benefit the user?

2. What about this data/uncertainty information is relevant

or important to the user in their context?

3. What does this data/uncertainty information show that is

otherwise inaccessible for this user?

4. What can this user access for themselves?

5. What myths/misconceptions are relevant to this data

set/uncertainty information?

6. What is the potential impact on the audience?

7. The potential for this audience to have an impact beyond

themselves?

4.2.3 Task C: Focus groups

The focus groups were used to validate and expand on the user profiles developed through the NEUVis design framework (outlined above). As it was not within the budget of this research to conduct focus groups with all identified audiences, under direction from the

Cancer Council Queensland, we targeted the four most important and accessible. These were:

1. General audiences

2. Cancer patients and their caregivers, family or supporters

3. Health practitioners (including health managers & clinicians)

4. Policy makers.

It is well understood that focus groups are a useful qualitative method in health and medical research (Kitzinger, 1995). Focus groups prompt discourse that in turn allows people to express and clarify views more easily than in a one-on-one interview, while also

61 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) allowing people of various demographics to participate, in a setting that is less confronting than a one-on-one interview (Kitzinger, 1995).

The focus groups had three specific goals:

• Goal A: Current understanding about cancer and how they have obtained this

information

• Goal B: Current awareness of how the cancer burden varies by geographical location

and whether this is important to them

• Goal C: Participants’ understanding and interpretation of examples of disease maps

showing information about how the burden of cancer varies by geographical area.

Recruitment

Participants were recruited using a variety of existing networks and methods. Health Prac- titioners were recruited using the Cancer Council Queensland’s “Queensland Cooperative

Oncology Group (QCOG)”, restricted to those members in the Brisbane Region. Details of the focus groups were also included in the December and January editions of the Cancer

Council Queensland’s “Health Professionals Network” newsletter. In addition, the focus groups were advertised on the Cancer Council Queensland’s website, and emails sent to existing clinical contacts of the research team. Recruitment called for three audience types, namely a general audience, medical practitioners or health managers, and policy makers or advisors, using a series of social media promotions and advertisements, in addition to local contacts.

The first set of focus groups comprised health practitioners and the second comprised members of the general public. Only one participant was successfully recruited to take part in the Policy Makers/Advisors group and therefore this participant was included with the health practitioners. Two sessions were run for both groups. A set of questions was designed to promote a facilitated discussion around the way that the practitioners work, use statistics and statistical uncertainty, and how geographical issues relate to the way clients perceive or experience cancer.

62 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Discussion Questions

Goal A - Background

• “If you hear the word ‘cancer’, what comes to mind? Do you know what causes it? Can it be cured? Do you know someone who has been diagnosed with cancer? What do you know

about different types of cancer? How has your knowledge impacted on how you live your

life? How likely do you think it is that you will be diagnosed with cancer? Why?”

• “What do you think of when you hear the word ‘incidence’, ‘mortality’, ‘survival’, ‘risk’, ‘diagnosis’? What about ‘reliability’, ‘uncertainty’, accuracy’?”

Goal B – Geographical variation

• “Are you interested in finding out more about how the burden of cancer varies by geographical areas? Where would you find this information? What types of information would you be

interested in? Is it important who produces the information?”

• “What do you know now about how the burden of cancer varies depending on geographical area? Do you think that where you live matters in terms of your health, and whether you

will develop cancer? Do you think where you live now has a higher or lower risk of cancer

compared to [pick a town]. Where would you get this type of information?”

Goal C - Disease maps

Participants were broken into 3 groups and provided with one of the following three maps:

Map 1: http://globalcancermap.com/

Map 2: https://nccd.cdc.gov/DCPC_INCA/

Map 3: http://www.envhealthatlas.co.uk/eha/Breast/

They were first given the following task:

• “Using the map you have been provided, compare the incidence of breast and prostate cancer.”

After this the facilitator prompted a general discussion using a sample of the questions listed below.

63 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

• What do you think are the key messages from this page?

• What features captured your attention?

• What aspects of the page were unclear/confusing?

• How confident are you of the certainty, reliability and accuracy of the information presented in the maps?

• Is their accuracy different? Discuss

• Do you feel they are all as reliable as each other? Discuss

• Which gives you the most confidence in the information presented?

• For someone who lives in region x (Australia vs NZ, Texas, greater London) what does this map tell you about their risk.

• What does graph x (location of uncertainty graph) mean? Discuss.

• Do you feel like this map was made for someone like you?

– With your level of expertise

– With your job requirements

– With your level of scientific/mathematical/statistical understanding

• Are there any terms you didn’t understand or that were unclear to you?

• Were you surprised by anything these maps show?

• Was the information predictable or boring?

• Did any part of these maps frustrate you?

• What terms were unclear to you?

Analysis of results

Affinity diagrams, also known as the KJ Method (Scupin, 1997) or topical notes (Farrell,

2017), were used to analyse data from the focus groups. This method is used to extract meaningful data from qualitative research with small sample sizes (Farrell, 2017). First, this process involved transcribing the focus group recordings. The process then requires that the researcher, using a single sentence or phrase written in the first-person voice, identifies small themes that came up in the discussion and writes them on a sticky note.

The first-person voice helps to elicit empathy for the participants, and an understanding of their perspectives. These sentences (sometimes short quotes that summarise a group consensus on a topic), are then shuffled, and clustered together to form broad groups.

64 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

These groups are then given a name that describes the ideas in the groups, which are then assigned to higher-level themes. For example, using transcripts from the second pair of focus groups (non-health practitioner), we collected approximately 150 individual sentences on sticky notes, clustered them into 17 groups, which were then collected into 5 themes.

Informed Consent and Ethics Approval

Ethics approval for the focus groups and the online game was granted by the Queens- land University of Technology (QUT) Ethics Committee (Ethics Application Number:

1500000917). All applicants provided informed consent in line with QUT regulations prior to participating in this research.

A copy of the Participant Information and Consent Form (PCIF), recruitment flyer and ethics approval letter are attached in Appendix

4.3 Results

4.3.1 Results A: Project partners workshop

The key outcomes from the workshop are detailed below and a full report can be found in

Appendix C.1

1. Why is communicating statistical uncertainty important

Workshop participants considered the importance of statistical uncertainty in the context of science communication, geospatial data and disease mapping as well as the Australian

Cancer Atlas.

Participants felt that uncertainty was important within science communication for:

• quantifying the accuracy and reliability of statistical outputs

• guiding future research by highlighting areas of high uncertainty where further

research is needed

• demonstrating the development of knowledge over time as uncertainty decreases

• being transparent with the public in regards to the evolution of knowledge, and

65 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

• supporting a greater public awareness of the scientific process.

In geospatial data and disease mapping uncertainty was considered important in the modelling process to manage the phenomenon where the model outputs can be influenced by how the data are aggregated before analysis, and in combating the misinterpretation of reliability that occurs when statistical estimates are rendered solid on a map, making them appear more reliable than in fact they may be.

Within the Australian Cancer Atlas, in addition to the points addressed above, uncertainty was considered important in:

• communicating the reliability, of the outputs to decision makers in order to inform

appropriate decision making, public health policy development and health budget

allocation,

• supporting and guiding future research in cancer outcomes

• helping to quantitatively prioritise research projects, and

• telling the ‘whole story’ by communicating clearly our current state of knowledge

about inequalities in cancer incidence and survival in Australia.

In addition, uncertainty communication is an explicit research focus of the Australian

Cancer Atlas project.

2. Target audiences

Workshop participants identified the following eight target audiences as important to the

Australian Cancer Atlas. A full description of the audience characteristics can be seen in

Appendix C.1.

1. General Audience/ General Public

2. Media

3. Government, lobby groups and health policy makers and advisors

4. Health managers (regional and local)

5. Clinicians

6. Cancer patients and their caregivers, family or supporters

66 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

7. Researchers

8. Other Cancer Councils and Health Reporting organisations.

3. Uncertainty sources in the ACA

The following table summarises the sources of uncertainty identified by the workshop participants using the uncertainty sources taxonomy outlined by Morgan and Henrion

(1990). Overall this was a difficult process which workshop participants struggled to complete. Primary blockers to complete this task were:

1. Many workshops participants didn’t understand all sources of uncertainty prior to

the workshop. Even the statistical modellers struggled with some definitions.

2. Participants struggled to translate the technical definition of the different categories

to a practical uncertainty source within the ACA.

3. Some uncertainty sources can exist in multiple categories, for example, disagreement

on methods could also be placed in model uncertainties. This resulted in limited

workshop time available to the philosophical debate of the different meanings of

different categories, rather than for completing of the diagnosis task.

4. Participants struggled to see the practical purpose and impact of identifying all

sources of uncertainty, when some cannot be communicated or cannot be measured,

such as systematic error/bias.

Participants did succeed in identifying some sources of uncertainty, even if they did not all fit within the categories defined by Morgan and Henrion (1990).

67 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Table 4.2: Sources of uncertainty within the Australian Cancer Atlas

Uncertainty category Source specific to the Australian Cancer Atlas

Data - Estimated population of each regions (ABS)

- Estimated demographic breakdown of each region (ABS)

- Socio-economic status is generalised across the entire region

- Classification uncertainty around the cause of death

- Classification uncertainty around indigenous identification

- Residential address does not contain any info of time at that

residence or region

Methodologies - Smoothing algorithm

- Model prior distributions (may also be an input rather than

a method)

Disagreements - - Spatial smoothing methods model/methods

Outputs - linguistic - Meaning of: probability, uncertainty, risk, cause, correlation, uncertainty random

4.3.2 Results B: Audience group definitions (Application of NEUVis framework)

Audience profiles were developed for each of the eight audiences identified in the Project

Partners Workshop C.1. All audience profiles are detailed in C.2.

These audience profiles were valuable in identifying the user’s needs and considering how uncertainty information may be used by each. It was highly valuable in identifying those audiences that would be most sensitive to misinterpretation and which uncertainty information may not currently be relevant for.

The addition of the 7th question not previously part of the NEUVis framework (i.e. “What is the potential for this audience to have an impact beyond themselves?”) was particularly useful for considering those audiences that may be most susceptible to misinterpreting the data or the uncertainty information. Considering that this research connects geography with

68 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) cancer incidence, this is an important consideration, as it is important to avoid unfounded assumption of cancer clusters. It was also valuable in focusing on the most important audiences when there were eight target audiences.

The extension of considering these questions for the uncertainty information and not only the data was valuable in mapping out which audiences this information will be most meaningful for. The framework helped the non-technical team members (including designers) to develop an understanding of how the uncertainty impacts the interpretation of the insights. This challenge can be difficult when the impact of uncertainty is discussed abstractly. Additionally, the process of identifying audience needs and behaviours en- abled a guided discussion between the modelling and other stakeholders. This enabled the project partners to come to an agreement of which types of uncertainty were most important and for whom.

A clear understanding of the target audiences, their needs, current behaviour and expertise enabled the diverse perspectives of the project partners to remain focused on the target audiences and balance the desired project outcomes with the target users’ needs.

4.3.3 Results C: Focus groups

Due to limited resources it was not possible to consult all audience types through focus groups. Therefore, under the guidance of the Cancer Council Queensland, we focused on the following four audiences:

1. General audience

2. Cancer patients and their caregivers, family and supporters

3. Health practitioners (including health managers & clinicians)

4. Policy makers

These four audiences were combined into two groups: general audience + cancer patients and their caregivers, family and supporters (referred to as patient-participants from here on); and health practitioners.

Focus groups were used to build on the audience groups outlined in Section 4.3.2.

69 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Table 4.3: Focus Group Participants

Focus group No. Participants

FG1 Health Practitioners F = 5, M = 0

FG2 Health Practitioners F = 7, M = 0 Total 12

FG3 Patient-Participants F = 6, M = 0

FG4 Patient-Participants F = 6, M = 1 Total 12

The first two focus groups were held with groups of health practitioners. This included nurses, General Practitioners, other clinicians and staff from support groups for cancer patients as well as one participant from health policy development. In total twelve respondents, participated in this group.

The second two focus groups were for people from a general audience and who did not identify as health practitioners of some kind. It became apparent that all of the participants in these focus groups had personal experience with a cancer diagnosis or a close relationship with a person who had. It was noted that this means that the findings from this group may not necessarily reflect the opinion of the general public. However, in the context of chronic disease maps and specifically the Australian Cancer Atlas, this audience is still highly relevant. I also note that this is not unexpected for cancer, as most people in a general audience have had some personal experience of cancer, whether through a personal diagnosis or through a relationship with someone who has.

Focus group insights: Health practitioners

These focus groups provided insight into the way that health practitioners work with cancer patients and data around cancer. The results of these focus groups can be grouped into four areas, the attitude of the health practitioner, the challenges related to their practice, the issues that their clients deal with (or that they have to deal with due to their clients) and uncertainty.

70 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Results were further translated into health practitioner needs, which can then be acted upon within the design of the ACA.

Attitude

The attitude of the health practitioner to their work is very important, as they often face significant frustrations. Overall it can be noted that they value their work, and understand its importance to people, particularly in remote areas. They have a very broad understanding of cancer, which is based on: data and research which is as current as available to them; and the personal experiences they have with dealing with cancer. They shared a perspective on the difficulties of accessing screening and treatment information.

The discussions often came back to the way that the practitioner:

• Empathises with their client as an individual, whose experience is different from

every other individual,

• Maintains a positive message and a positive outlook for their clients, and wants

to base this on data that show how outcomes are improving for cancer patients

compared to a few decades ago

• Wants to be able to give their clients comparisons between their risks in diagno-

sis/treatment options/mortality to other activities in daily life that involve some

risk, such as driving a car/being struck by lightning

• Wants information that they give to their clients to be individually meaningful, but

also supported by data

• Believes that the way that cancer is talked about needs to be addressed, particularly

related to how cancers are grouped. That is, cancer is not one disease, and not all

breast cancers are the same, and the same cancer in different areas of the body have

very different outcomes.

Challenges in their practice

Health practitioners spend a lot of time communicating with clients who are going through extremely stressful experiences. They talk a lot with clients, but also find and print out

71 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) relevant information to give to their clients. They have tools that they rely on, and the trustworthiness of these is important, but they are unsure if the tools are the most up- to-date, which they would find valuable. A large challenge is with consistency in data, particularly for those who work with children and young adults. They may have difficulty accessing data consistently, or by the way that age groups are defined. Some practitioners also wanted information about the outcome of non-intervention, doing nothing at all, and how that relates to patient outcomes. Finding the right information for the client can be difficult to impossible. However, they are able and willing to adapt to the different challenges.

Client issues

The participants worked with a range of different clients, some with the very young and others with the very old. While there can be acceptance from aged clients about the terminal nature of cancer, there is a lot of desire for information from the parents of young patients. Clients, and their parents, often have misconceptions about cancer, which the practitioner needs to address, particularly to do with treatment options, causes and outcomes for persons with cancer. Related to this is the difficulty that many clients have with interpreting statistics, and how, or if, any application can be made to the individual from the statistics they find. This is significant because often clients will look for their own information online. Clients can be generally well-informed, though the source of their information is not always reliable; it was noted that often the clients have known someone who has had cancer, and it is not as unfamiliar as it was a few decades ago.

Uncertainty

The issue of uncertainty was brought up as part of the focus group, particularly how practitioners and their clients understood the idea of uncertainty around cancer estimates.

The practitioners who understood statistical uncertainty found it useful, but for others, terms such as confidence intervals or credible intervals are unclear, and are typically ignored. Whether the practitioner gives the clients any statistics depends on the client, and whether the practitioner thinks that it would be good for them; the practitioners see clients facing very real and difficult challenges and respond to them accordingly.

72 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Focus group insights: Patient-participants

These focus groups provided insight into the current behaviour and needs of persons with cancer, or with family members affected by cancer. Discussions explored current infor- mation seeking behaviour, how they understand the way that cancer is communicated, how they find and use information about cancer, how it relates to them, the importance of autonomy and control and some of their desires and values connected with these issues.

Behaviour

Some participants commented that they only spent the time and effort to understand information about cancer after they were personally affected, through a diagnosis of themselves or a loved one. They understood that the risk of cancer developing is in-

fluenced by many behavioural choices (such as smoking, for example), but felt that the cause and effect was so far removed, that it didn’t seem real when they were younger.

Some patient-participants felt that there was some inevitability to their diagnosis, partly because of their environment, heritage, genetics, or a combination of each (such as living in northern Queensland with very fair skin), but also because of a lack of understanding about risk factors when they were younger. Outside of sun exposure, they didn’t suggest that they perceived geographical location on a national scale as a cause of risk increases.

However, risks were perceived to be influenced by local environmental factors, such as pollution or occupation.

Communication

Patient-participants generally felt that communication around cancer was fear-mongering, and negative. A significant insight into the way that cancer is talked about, particularly in the media, was how left out, or forgotten some cancer patients feel. It is perceived that some cancers, particularly breast cancer, but also now prostate cancers, are the ‘popular cancers’ because of successes in research, funding, and awareness. However, this can be discouraging for people who have other forms of cancer that don’t have the same level of support or awareness. There are two extreme cases that were brought up during the discussion. Firstly, lung cancer, where participants stated that they felt that they were stigmatised, as if they had done something to cause the lung cancer. Second, the description of cancer as a ‘manageable chronic disease’ particularly when related to some

73 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) kinds of prostate or breast cancers by the media, can be very disheartening for patients of other cancers where this is not a reality. It was noted that patients understood that it may be more relevant to group cancers by the mutation that causes them, instead of the organ where it is located.

Generally, the patient-participants would have preferred to have information about mor- tality rates reframed as survival rates. Some participants felt that this would make no difference to them personally, as they described how they mentally process these percent- ages, but the groups generally thought this would be better. This was seen as a more positive way to communicate the statistics. A 15% survival rate gave the participant more to hang on to than an 85% mortality rate. Patient-participants also value hearing and sharing personal experience about cancer. Many patient-participants shared stories of how they were able to support others in their experience, as well as how they received support from other cancer patients. This was very encouraging to them.

Sourcing information

The reputation of organisations, such as Cancer Council Queensland, is very important to cancer patients, as they are seen as a source of reputable information, whereas mass media was seen as either shallow or fear-mongering. The patient-participants were critical of information from the internet or friends, and strived to build a bank of information from a variety of sources they could draw from. They want information that is clearly explained, and ideally would be based on outcomes, which could help them take informed action.

Patient-participants often took information to their medical team, whose expertise they trust.

Relating to information

When sourcing information, users had different methods of making it personally mean- ingful, particularly statistical data. People made comparisons so that abstract statistical data could be made more tangible, for example; consider 100 people “like me”, and after 5 years, how many people with my condition would have survived. Patient-participants said that it was difficult to understand technical and statistical language, and often skipped over that information. However, they didn’t suggest a lack of confidence in the research itself, rather that it was inaccessible. One point that came up relating to relative risk was

74 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the assumption that when a range of risks is presented, people with a healthy lifestyle were at the bottom of the range, and people who have unhealthy habits (such as smokers) were at the higher end of the range.

Autonomy and control

Cancer can mean a lot of different negative things to the people who are personally affected by it. Our patient-participants wanted to feel in control of their own decisions.

They wanted to rely on the advice of the experts in the medical team, and wanted to find useful, actionable research, about which they would get advice. One challenge that was discussed was alternative treatments. While patient-participants appreciated concern for and from friends and family, they did not want other people to push their own ideas onto them about what they should be doing, or who they should or shouldn’t trust. It was more important that the patient-participant was able to make their own decisions regarding where to place their trust.

Focus group limitations and challenges

We wish to clarify a few limitations and challenges we encountered when planning and running the focus groups. Firstly, there was limited representation from men; the only male participant was involved with one of the focus groups for the general public. One challenge we encountered was that we were unable to attract health policy experts or policy makers to participate in the focus groups, where their input would have been valuable. Finally, it quickly became apparent that most of the patient-participants had personal experience with cancer diagnosis, or had close friends or family that had. It is not unexpected that members of the general public would know someone who had previously received a cancer diagnosis. However, the level of personal experience with cancer diagnoses among our patient-participants was higher than for a totally random sample of the population. It is likely that this is because people who are part of the

Queensland Cancer Council’s networks will be more likely to be personally affected by cancer. Though this was a challenge in conducting and evaluating the focus group data, it is notable that people who were involved in the general public focus groups represent a group who are interested in the disease maps as an information resource in their own experiences with cancer diagnosis.

75 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

4.4 Discussion & insights for the Australian Cancer Atlas.

Embedding uncertainty information into an already visually rich cancer map is a signifi- cant design challenge that relies on: understanding the current practices and technologies; identifying target audiences; identifying why uncertainty is important and how it influ- ences the key insights; and understanding the target audience’s current behaviours and needs. The Australian Cancer Atlas has been utilised as a case study for the application of design thinking tools, uncertainty diagnosis frameworks and user focus groups as ap- proaches for addressing this complex challenge. Combined with the grey literature review detailed in Chapter 3 that explored current cancer mapping practices, many insights have been developed through this case study. The process, however, has also highlighted areas where further work is required to improve methods and tools for including uncertainty into the design of scientific communication material for the non-expert audience.

Insight 1: If uncertainty information is not understood, it is ignored. Key

messages of a cancer map should stand on their own, independent of any

uncertainty information.

Insight 2: Credible intervals are of interest to the non-expert audience of a

Cancer Map once their meaning is explained.

As identified by the grey literature review, including uncertainty information is not common practice in cancer maps. Uncertainty is often not included and modelled outputs are commonly reported as point estimates. When uncertainty is included, it is usually in the form of credible/confidence intervals, error bars or statistical significance. The participants in the focus groups highlighted that these measures are not often understood by health practitioners or the general public, and in these cases are simply ignored. Map design should ensure that the main message can be understood independent of any uncertainty information.

Discussion with the focus group participants revealed that explaining the significance of the credible intervals and errors bars was very easy, and the information was perceived

76 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) as very useful, interpretable and interesting once explained. Focus group participants suggested that a short video explaining their meaning would be useful.

Insight 3: There is a need for practical tools for diagnosing uncertainty sources

in language that is accessible to applied researchers and communication de-

signers. Taxonomies of uncertainty sources are challenging to apply in the real

world applications for this purpose.

There are many different sources of uncertainty within quantitative research projects, as outlined in Section 2.1.3. Identifying these different sources can inform how uncertainty within a specific project is expressed, communicated or visualised (or justifying why it should be left out). For example, uncertainty due to high natural variation may be visualised in a different manner to uncertainty due to small sample size. Although the uncertainty taxonomy by Morgan and Henrion (1990) is well cited in the uncertainty communication literature, using this as a framework for identifying uncertainty sources in the Australian Cancer Atlas proved to be very challenging, even for experienced applied researchers. These challenges highlighted the need for practical diagnostic tools for identifying uncertainty sources within specific research project. Pivotal to the success of such a tool would be an agreed taxonomy which explains the different sources of uncertainty in language that is accessible to both a quantitative researcher as well as non-technical stakeholders such as design experts. Tools that help the design expert understand these complicated concepts and why they are important to the key messages of a project would be highly valuable but do not currently exist.

Insight 4: Reputation of the publishing organisation is important to a general

audience’s perception of credibility of the information. Further to this, easily

identifying the data sources and organisations who conducted the analysis

influences perceived credibility for health practitioners.

The majority of maps identified were published by research organisations, government, not-for-profit organisations or partnerships between these. Focus groups highlighted that this was not a trivial matter, and expressed that they specifically considered the credibility

77 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) of the cancer information they accessed, and the publisher of that material was a key factor in evaluating credibility. Further to this, health practitioners specifically looked for where the data had come from to inform the credibility of the resource. In many of the online cancer maps it was very difficult to find the data used to generate the modelled estimates presented in the map.

Insight 5: Uncertainty can be misinterpreted when an interval is presented.

Care should be taken when intervals are used to present uncertainty information. Focus group participants perceived, for relative risk of a particular cancer, that people at the lower end of the range had made healthy lifestyle choices and people at the top end of the range had made poor lifestyle choices such as drinking, smoking and insufficient exercise.

Insight 6: Comparison of cancer outcomes and communicating the complexity

of cancer are design opportunities for health practitioners.

Communication

There is an opportunity to communicate and think differently about cancer, which was expressed during the focus group. The practitioners need to communicate this, which may be an opportunity for a design intervention. The practitioners also have to re-interpret statistics that are published, or found by the client. A design intervention should take this into account, particularly given that dealing with cancer is an individual experience for the client. Communicating effectively with the client is important, and should consider the way that the non-expert in statistics will respond to this information.

Comparison

In order to make the information relevant to the client, the practitioner may make compar- isons with other parts of life that involve risk. This would be an interesting opportunity for a designed intervention, particularly for the client who comes to the practitioners with information they have found online. The practitioners were optimistic, positive, hard-working people who do extraordinary work helping clients through extremely chal- lenging circumstances. Their optimism is fuelled by a broad outlook on cancer, and how

78 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) it has changed. This may prove to be a useful opportunity for a design intervention to make comparison in order to show how diagnosis, treatment and patient outcomes are improving, and will continue to improve.

Insight 7: Design opportunities for a general audience - Communication and

community.

Communication

As noted with the focus groups with the health practitioners, there is an opportunity to communicate and think differently about cancer, which was expressed during the focus group: cancer is not one disease, breast cancer is not one disease either. In addition to this, it may also be beneficial to more positively frame statistics from mortality to survival. A user without training in statistics has the same need and desire to understand their own circumstances. Statistical information should therefore be tailored to an end-user who has no experience with statistics. More detailed information may be given on demand if the user has the understanding to process it, but information should be accessible to the statistical novice. From examples that were described, it seems that some resources are not created with due consideration for needs of the patient, but for statisticians. Data, and the results of data anlayses, would be more useful if it were made more tangible to users. Finally, communication around cancers generally should be more mindful of the perceived biases that participants with cancers other than breast and prostate cancer have.

This is also an opportunity to create an intervention that addresses misconceptions around what causes cancers, to help alleviate some of the unfair stigma around certain cancers.

Community

Real personal experiences are valuable to cancer patients. Participants gave numerous examples of how they were called upon by friends and family who had recently received a diagnosis, or how they were frustrated by unwarranted advice from a friend that they didn’t trust. Personal stories and connections can potentially reduce feelings of isolation for patients who have rare, or typically stigmatised cancers. Creating links between communities (such as Facebook groups and advocacy groups) with individual stories and reliable, evidence-based information may be valuable to the everyday user.

79 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

4.5 Conclusion

In this chapter I explore uncertainty communication within cancer mapping. Informed by the grey literature review described in the previous chapter which built an understanding of the current practices in cancer mapping and the lack of uncertainty generally included in these maps. I also explored the use of Morgan and Henrion (1990)’s taxonomy for diagnosing uncertainty within the Australian Cancer Atlas, and extended the NEUVIs design method proposing a user-centred design framework that can be applied to all science communication challenges, and conduct focus groups with target audiences of the

Australian Cancer Atlas. I have summarised the conclusions for each of these research activities below.

Diagnosing uncertainty sources

The research team tested the use of Morgan and Henrion’s (Morgan and Henrion (1990)) taxonomy of uncertainty as a practical tool for identifying the uncertainty sources within the Australian Cancer Atlas. This taxonomy alone was not sufficient and the team found many challenges including: perceived overlap between uncertainty sources, a high level of technical knowledge required to practically apply the framework, and a lack of per- ceived benefit in the activity. The team did identify the main uncertainty sources, but the taxonomy is not a practical tool for diagnosing uncertainty without further development.

Further development would focus on clearer definitions of the taxonomy, including ex- amples for each category, and re-designing the tool so that the application focused on the most important uncertainty sources in a project first, rather than just going through every possible uncertainty source.

NEUVis Design Framework - for uncertainty communication design

Utilising the NEUVis design framework to systematically map the impact and importance of uncertainty information to the identified targeted audiences was highly successful. The framework provided a mechanism for technical experts, applied researchers and commu- nication design practitioners to navigate the communication challenge collaboratively and share required knowledge. The framework maps different uncertainty information within the project with the target audiences and their needs. The addition of a 7th question to

80 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the NEUVis framework through this study “What is the potential for this audience to have an impact beyond themselves”, helped further rank potential audiences in terms of importance. The NEUVis design framework places the users’ needs at the centre of the design/communication challenge. Within the uncertainty communication literature, the users’ needs are increasingly recognised as essential to the uncertainty communication challenge. This case study has demonstrated that the NEUVIs design framework can pro- vide a road map for design practitioners and technical experts to collaboratively navigate through this complex communication challenge. The framework was time consuming to apply and future research could focus on streamlining the process.

Focus groups

The focus groups validated, and expanded on, the audience profiles developed through the application of the NEUVis design framework. They revealed opportunities for improving the way cancer is discussed and communicated to both medical practitioners and the general audience within, or alongside, a cancer maps.

In particular, it was noted that both groups saw value in raising understanding about different types of cancer and groupings of cancers. A recurring discussion in the focus groups was that as cancer is not one disease, that is, the same type of cancer can be found in different tissue and different types of cancer can be found in the same tissue. Educating the general public about this may have a beneficial effect on the way that cancer is understood and perceived. In addition, it is apparent from both groups that any statistical information that is not understandable is simply ignored. This must be a design consideration when developing public-facing disease maps, as data that is rendered may be only partially interpreted, an incomplete view should not be misleading. The relationship of statistics to an individual is another area that emerged in the discussions. It is difficult for a member of the general public to understand if and how statistics relate to them personally, and often practitioners are approached with this kind of question. A cancer atlas that supports the discussion around the difference between population statistics and personal statistics could be valuable for a practitioner who is trying to help their patients find meaning in the data.

81 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)

Medical Practitioners

Medical practitioners stated that they wished to communicate improving results from medical interventions, especially in comparison to outcomes several years (or decades) ago, as well as in comparison to other activities in life that involve some risk. As outcomes have changed, it would be helpful to medical practitioners to have easy and consistent access to the most up-to-date information available, in order to pass this onto their clients.

General Public

The general public expressed that they would benefit from having a more thorough understanding of risk, and what effects risk, particularly cancer risk. One participant commented that she, as a non-smoker, felt isolated because she had lung cancer and that people around her felt that it was probably because of something that she had done that caused her cancer, such as smoking. Of course, this is not always the case and it is not how risk works. This type of stigma around some cancers (such as lung cancer) and popular or celebrity support for fundraising around other cancers (such as breast cancer) can leave some patients with feelings of isolation because of their own situation. While supporting all cancer research should obviously be encouraged, it is also important that all patients, particularly those with rare cancers, are able to connect to support and share their stories with each other.

Opportunities

The focus groups brought several opportunities for design interventions to light that were outlined in the discussion. We recommend that future research and designs incorporate

User Modelling within cancer maps to better serve the needs of different audiences 1.

Potential audiences for cancer maps may be: researchers, policy-makers, patients and the media. There is a wide range of understanding of statistics across these groups, for which design of cancer maps–or any chronic disease map–should take account. The most effective way of publicly communicating uncertainty around modelled estimates will always depend on the nature of the message, the statistical methods used, the data, the context, and of course, the audience. Uncertainty communication must be embedded within the larger communication design challenge if it is going to contribute to addressing

1For more information about User Modelling in the field of Human-Computer Interaction, see (Kay2011,fischer2001).

82 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the users needs. The tempting and easy solution of simply adding an error bar of credible interval is generally ignored by the non-expert user.

83

Chapter 5

Research Activity 2: User study - Uncer- tainty representation in an online game.

The following investigation aims to contribute to a growing body of research on uncer- tainty communication. It uses an online game to investigate how different uncertainty representation methods influence players’ behaviour. In this research I explored how players allocate resources differently depending on how the uncertainty around an es- timate of risk is communicated. The research involved multiple steps including: game design, digital implementation, recruitment of participants, and analysis of the online game interactions. In addition to exploring the impact of different uncertainty represen- tation methods on behaviour, this chapter also explores the impact of the risk level and uncertainty levels within the game.

5.1 Introduction

There are three impediments to effective uncertainty communication (Buttenfield and

Beard, 1991), they are:

1. standardisation of terminology,

2. methods for measuring and representing uncertainty, and

85 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

3. methods for depicting uncertainty information simultaneously alongside the esti-

mates/data it relates to in an understandable, useful, and meaningful way.

The following chapter focuses on Buttenfield and Beard (1991)’s second impediment and contributes to the challenge of uncertainty communication through a user study that explores uncertainty representation methods. Uncertainty measures that commonly make their way into material targeted at a general audience include: numeric intervals (such as confidence intervals, credible intervals or prediction intervals), point estimates ± error, statistical significance, distributions, boxplots, standard deviation or semantic versions of intervals, as discussed in Section 2.3.2. These methods are often seen in communication material targeted at a general audience, and in particular are seen in the cancer map grey literature review detailed in Table 3.1, however their impact on, or interpretation by, non-expert audiences is not well understood.

The study detailed here uses an online game to explore if users’ behaviour is influenced by the uncertainty representation method and compares: intervals, semantic uncertainty intervals, point estimates ± error and point estimates without error (i.e., no uncertainty).

Validated uncertainty representation methods are important building blocks for address- ing uncertainty communication more broadly (Buttenfield and Beard, 1991). A greater understanding of the most effective ways to represent uncertainty measures can sup- port communication designers to develop communication solutions that more effectively addresses the users’ needs.

Within the grey literature review of online cancer maps outlined in Section3,( 42%) of the identified maps included a measure of uncertainty. The most common measures were credible/confidence intervals, and point estimates ± error. Most however, reported only a point estimate of the modelled cancer incidence/survival or risk with no uncertainty.

When participants of the focus groups, outlined in Section 4.2.3, were asked to interpret confidence/credible intervals and point estimates ± error on example maps, those that did not have previous experience with statistical information ignored the measures as they did not understand them.

86 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

The challenge of uncertainty representation for the non-expert user has an extra layer of complexity when the estimate itself is a probability. This uncertainty around an estimated probability (or uncertainty on top of uncertainty) is present in the Australian Cancer Atlas and therefore is used within the online game. Due to the concern that referring to cancer within the game could be upsetting to any participant who has a lived experience of cancer, either personally or of a loved one suffering from a cancer diagnosis, the game uses a non-health related context. A fictional pirate, exploring the seas and looking for ships to steal gold doubloons from.

While the communication of the statistical uncertainty within the game and the cancer atlas are similar, in that they are both an estimated probability with uncertainty, the different contexts clearly have an influence behaviour. The online game is a fictional setting with no connection to the players’ real life. While the cancer map could have a very real connection to the viewer, either through their own lived experience or that of a close friend or family member. Additionally, the outcome in the online game is financial, while those in the cancer atlas represent loss of health, potentially pain and death. It is reasonable to question if decision makers respond differently to risks of a financial verse a physical nature as well as ‘real’ verse fictional. These considerations should be kept in mind when interpreting the study results for the Australian Cancer Atlas.

5.2 Aim

To use an online game to investigate how different uncertainty representation methods influence players’ behaviour. Specifically, this study explores three different uncertainty representation methods, these are: a numeric uncertainty interval, a semantic uncertainty interval, and a point estimate ± error. A point estimate with no uncertainty is used as a control.

5.3 Online Simulation - Impact of uncertainty communication meth- ods on decision making

To quantitatively investigate the impact of different uncertainty representation methods on decision making I designed and built a web-based game to assess if players allocate

87 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

resources differently when uncertainty information is presented in different formats. The

research explored user behaviour across different: uncertainty representation methods,

levels of risk and levels of uncertainty.

5.4 Methods

5.4.1 Game Design

Players were randomly shown one of four different versions of the on-line game, for each

version of the game all features were identical, except for how the uncertainty around the

‘risk of defeat’ was communicated. The four styles of representation are listed in table 5.1

below.

Table 5.1: Uncertainty representation method for each game mode.

Uncertainty Representation

Game Mode Method Representation in Game

Interval Upper & lower bounds of the Risk of my defeat: is between

interval 30% and 70%

Plus/minus Point estimate ± error Risk of my defeat:is 50% give

or take 20%

Semantic Interval Semantic bounds of a interval Risk of my defeat: is between

below average to above average

Point Estimate No uncertainty shown Risk of my defeat: is 50%

To make the game, we created a theme and novel mechanic and context, and gave the

player an objective, an obstacle and an incentive with a rule framework (Deterding et al.,

2011). Each of these are defined below.

88 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Figure 5.1: Game Landing Page

Context

The theme or premise of the game places the player as the quartermaster on a pirate ship, see Figure 5.1. Three ships, of different sizes and carrying different amounts of gold doubloons ( ‘reward’ ), each of which are known, are sailing into range for the player to attack, see Figure 5.1.

The ‘risk of defeat’ for each ship is represented in two ways, pictorially by the relative size of the ship and in sentence form, as shown in Figure 5.2. Large galleons and frigates are more difficult to attack, and have a higher ‘risk of defeat’ compared to smaller cutters and tiny sloops.

The player’s ‘risk of defeat’ and the uncertainty around that ‘risk of defeat’, (seen in the blue squares in 5.2) is presented in one of four uncertainty modes, which are outlined in Table

5.1.

For each ship there is a stated ‘reward’, in gold doubloons, which the player wins if they are successful in their attack on that ship. The player must attack all three ships. Players were not informed that the purpose of the game was to explore uncertainty representation methods, see Appendix E.2 for the informed consent flyer.

89 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Figure 5.2: Game Play Page

Objective

The player’s task is to defeat all three ships, thus winning the ‘reward’ (gold doubloons) on each ship. For each ship in the game, there is a risk that the ship will defeat the player, that is there is a ‘risk of defeat’. This ‘risk of defeat’ is expressed as a probability, see the dark blue boxes with white lettering on the right of Figure 5.2. The player starts the game with 30 gold doubloons that must be spent across the ships to purchase ammunition and supplies needed for attacking the approaching ships. For every gold doubloon allocated to a ship, the ‘risk of defeat’ decreases by 1%.

Each ship has three variables associate with it, these are: a ‘risk of defeat’, ‘uncertainty’ information, and a ‘reward’.

The goal of the game is for the player to make more money from the attack on the three ships than the original 30 doubloons they started with. For every ship that is defeated the player wins the stated reward. In this way, the research aims to explore if players allocate the available 30 gold doubloons differently across the four game modes. For example, consider a game that has the following ships (these are different ships than those shown

90 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

in Figure 5.2): 1. a French frigate with a moderately high ‘risk of defeat’ and low ‘reward’, 2.

a Dutch galleon with a very high ‘risk of defeat’, and high ‘reward’, and 3. a Spanish sloop

with a low ‘risk of defeat’ and a high ‘reward’. All three ships in this example game are

coming into range for attack. The player must attack all three ships and must decide how

to spend the gold doubloons to buy ammunition and supplies across the three attacks

that will balances risk and reward. They must choose a strategy that maximises their

perceived potential reward. In this way, how the player allocates the gold doubloons

provides an insight into their risk-taking behaviour and their ability to maximise their

expected reward.

Action

The players available action for reaching the objective of the game (defeat all ships and

win as many gold doubloons as possible) is to allocate doubloons across the three ships

and thus reduce the ‘risk of defeat’ on each ship and maximise the overall winnings. The

player can allocate the 30 doubloons provided for each game using the +/- 1 and/or +/-

5 buttons seen in Figure 5.2 (white buttons with blue outline and blue lettering). The

square to the left of these buttons indicates how many doubloons have been allocated

to that ship. For example, in Figure 5.2, the first ship has 5 doubloons allocated to it,

while the second ship has 20 doubloons. The ‘risk of defeat’ is reduced by 1% for each gold

doubloon allocated to a specific ship, thus giving the player influence over their chance of

defeating the target ship and thus the outcome of the game. The change in ‘risk of defeat’ is

reflected directly to the user by the ‘risk of defeat’ displayed on the screen decreasing as

soon as doubloons are allocated to that ship.

Obstacle

The player has limited resources to spend on supplies and must decide a strategy of

allocating the doubloons across the three ships in a way that maximises the potential

‘reward’.

Incentive

For each ship the player defeats, they win the stated ‘reward’ associated with that ship.

91 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Game play

In each game there are three ships approaching the island at one time. All ships must be attacked simultaneously, and the outcome of an attack on one ship does not influence the outcome of the attack on any of the other ships.

The player is told of the outcome of the game within a few seconds of submitting their allocation choices. The time it takes the play the game depends on how long the player deliberates on how to allocate resources. The shortest possible game would be how long it takes to click the buttons and allocate the 30 doubloon (< 30 seconds). Each game is set to time out after 20 minutes. Once the game is over and the results reported, the player is given a choice to reset the game and play again. There is no accumulation of points, that is, there is no relationships between games.

Optimal strategy & player performance

It is assumed that the player aims to maximise the expected ‘reward’ in each game. That is, they are playing with the aim of winning as much gold as possible.

5.4.2 Game Hosting and Advertising

The game was built using Flask1, a web server framework built in the Python program- ming language. The game was available online from March 4th to May 15th 2017, and was advertised through the CCQ’s e-newsletter and social media channels, as well as through

Queensland University of Technology’s (QUT) social media channels.

5.4.3 Informed Consent and Ethics Approval

Ethics approval for the focus groups and the online game was granted by QUT’s Ethics

Committee (Ethics Application Number: 1500000917).

All applicants provided informed consent in line with QUT regulations prior to partici- pating in this research. The informed consent and email flyer for the game can be seen in

Appendix E.2

1http://flask.pocoo.org/

92 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

5.4.4 Data Collection

Game interactions were recorded in a MongoDB database2 and hosted seperately on mLab3. To ensure all collected data was securely stored the database and game were hosted seperately.

The following variables from the game were recorded:

game mode : type of uncertainty communication method shown to the player

si : ship within a game, where i can be 1, 2, or 3

ri : reward if the attack on si is successful (in gold doubloons). Drawn randomly from an arbitrary normal distribution N(µ = 30, σ = 11).

pi : probability of being defeated by si (a number between 0 and 1) shown as ’risk of defeat’ in Figure 5.2.

Drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24).

ui : uncertainty level of pi expressed as an artificial uncertainty interval (selected randomly as either 0.1, 0.3, 0.6)

usr βi ∈ {0, 1, ..., 30} : the number of gold doubloons allocated by the player to s1, s2 and s3 ∗ βi ∈ {0, 1, ..., 30} : the set of gold doubloons allocated to s1, s2, and s3 which optimises the expected value of the game.

session : game session used as a proxy for player

risk profile : categorical set of pi for s1 to s3, in which pi is either high (h) or low (l), and therefore risk profile is either lll, llh, lhh, or hhh.

See further details below.

uncertainty profile : categorical set of ui for s1 to s3, in which ui is either high (h), medium (m) or low (l), and therefore uncertainty profile

is either lll, hhh, mmm, llh, lhh, llm, lmm, mhh, mmh or lmh.

2https://www.mongodb.com/ 3https://mlab.com/

93 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

5.4.5 Analysis

Software

Statistical analysis was conducted using R version 3.3.3 (R Team, 2017) and R Studio version 1.1.383. All duplicates were removed from the dataset prior to analysis. GUROBI optimisation software (Gurobi Optimization, 2016) was used to solve the integer program- ∗ ming problem required to calculate βi ∈ {0, 1, ..., 30}.

Quantifying Behaviour

Behaviour is complicated. It is multi-dimensional, complex and difficult to measure. In this research I use two measures to quantify the players behaviour, a performance ratio

(PR) and the Gini coefficient. These two measures provide two different one dimensional slices of a multi-dimensional variable. They are not intended to be a complete measure of beahviour but they can contribute valuable insights. Due to limited resources this investigation explores only these two measures, which are considered a starting point to exploring how behaviour is influenced by uncertainty representation methods.

The performance ratio (PR) provides a measure of the players ability to maximise the expected reward within the game. While the Gini coefficient provides a measure of inequality in how the player allocates their available resources. Resource spreading is a measure of risk averse behaviour (Mistry and Trueblood (2017); Wernerfelt and Karnani

(1987)) and in this research the Gini Coefficient is used in this way.

While PR could also be considered a measure of risk-taking behaviour, in that the strategy that maximises reward also minimises risk of loss over repeated games. However, the PR measure does not differentiate between games that have the same score but have achieved that score through different strategies. For example, two players may achieve the same

PR value, but one allocated the majority of their doubloons to a losing ship, while the other spread the doubloons across all three ships. Neither have maximised their expected reward, however their allocation of resources is very different. The PR measure does not differentiate between the behaviour of these two players. Therefore I use the PR measure only as a measure of performance maximisation and not of risk-taking behaviour.

94 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Notation

For each iteration of the game one of each of the parameters gamemode, ri, pi, and ui are drawn randomly for each ship.

Let S = {s1, s2, s3} be a set of ships presented to the player, and for ship si ∈ S the following are defined:

Data: game mode Game mode (0-3). Allocated at random

ri ∈ {10, 11, ..., 50} Reward for a successful attack on ship si. A number between 10 to 50, drawn randomly from an arbitrary normal distribution N(µ = 30, σ = 11)

pi ∈ (0, 1) Probability (beween 0 and 1) of being defeated by ship si. Drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24)

ui ∈ (0.1, 0.3, 0.6) Uncertainty level of pi randomly allocated as either 0.1, 0.3 or 0.6. Allocation: usr βi ∈ {0, 1, ..., 30} A set of gold doubloons allocated by the user to s1i. ∗ βi ∈ {0, 1, ..., 30} Gold doubloons allocated to si in a maximised version of a game.

Calculating the Performance Ratio (PR)

The players performance is measured as a ratio of the player’s strategy against the optimal strategy. This ratio is termed the performance ratio (PR), it is given as a percentage, and is calculated as:

E [R] PR = user , (5.4.1) Eoptimal[R] where R is the reward, Euser[R] is the expected reward using the user’s allocation of gold doubloons, and Eoptimal[R] is the expected reward using an optimal allocation of gold doubloons.

Each doubloon reduces the probability of being defeated (pi) by 1%. If βi doubloons are allocated to si, pi is

1 − max(0, pi − βi/100).

95 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

j The performance of any strategy βi is

3   E j j[R] = ∑ 1 − max(0, pi − βi/100) ri. i=1

usr usr usr The player’s chosen allocation (β1 , β2 , β3 ) is used to calculate the expected reward, and the optimal allocation of doubloons are used to calculate the maximum expected reward possible for that game. A maximisesd game is denotated as β∗ such that (β∗ = ∗ ∗ ∗ (β1, β2, β3)). Therefore:

3 ∗ β = argmaxβ ∑ (1 − max(0, pi − βi/100)) ri i=1 3 such that ∑ βi ≤ 30. i=1

The integer programming problem was solved using the GUROBI optimisation software

(Gurobi Optimization, 2016).

PR provides a measure of the players ability to maximise the expected value of the game.

Calculating the Gini Coefficient

The Gini coefficient (also commonly called the Gini index) measures the inequality amongst values of a frequency distribution, and was developed by the Italian statis- tician Corrado Gini (Gini, 1912). The Gini coefficient is defined as a ratio with values between 0 and 1, where the numerator is the area between the Lorenz curve (R in Figure

5.3) of the distribution and the line of perfect equality (perfect equality) (area ORP in

Figure 5.3); the denominator is the area under the uniform distribution line (R) (Bellu and

Liberati, 2006).

A Gini coefficient of 0 expresses perfect equality, where all values are the same (e.g., where everyone in a population has the same income). A Gini coefficient of 1 expresses maximal inequality among the population (e.g., one person has all the income and all others have zero income). The measure is commonly used for evaluating income equality, but is also widely used in a range of applications where equality is of interest, such as; the

96 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Figure 5.3: The Lorenz curve and Gini coefficient (Bellú, 2006) geographical distribution of rare and endangered species (Engler, Guisan, and Rechsteiner,

2004), access to education (Gregorio and Lee, 2002), or access to opportunity in health

(Rosa Dias, 2009).

concentration area ORP G = = . maximum concentration area OPQ

In this research the Gini coefficient (Gini, 1912) is a measure of how evenly players spread their gold doubloons across the three ships within a game. Since resource allocation is an expression of risk-taking behaviour (Mistry and Trueblood (2017); Wernerfelt and Karnani

(1987)), this measure quantifies risk-averse behaviour. A Gini coefficient of 0 suggests a player has allocated all doubloons equally across the three ships indicating a risk-averse type behaviour that indicates unwillingness to risk all resources on one option. A Gini coefficient of 1 suggests the player has allocated all doubloons to one ship, indicating the player is willing to put all available resources on one option, displaying a more risk-taking behaviour.

The Gini coeffcient was calculated using the ineq package in R (Zeileis, 2014).

97 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Models and Tests

This research investigates the influence of uncertainty communication methods ( ‘game mode’ ) on players behaviour. In order to understand any between player changes in behaviour, I have also investigated the influence that ‘risk profile’, and ‘uncertainty profile’ have on behaviour. Therefore, the following analysis is broken into three sections looking at the influence of game mode, risk profile or uncertainty profile on behaviour. Within each section I test for statistically significant differences in PR and the Gini coefficient.

Analysing the effect of game mode on behaviour

I remind the reader there are four different game modes in this analysis, see Table 5.1 for definitions.

Performance ratio (PR) by game mode

PR was highly skewed towards 1, with the median and mean for all game modes relatively high (within 5% of a PR value of 1). The skewness of the data is a result of the structure of the game, in that before the player allocates any gold doubloons the performance ratio is already at least 0.50 even if no further doubloons are allocated. The player however is forced to allocate all their gold doubloons, improving the PR value even if they make a less than optimal decision and thus resulting in a PR value of at least 0.7 in any game.

PR To address this skewness, and since PR is a ratio, the odds of PR (i.e. 1−PR was considered instead and a logit transformation was applied as follows:

 PR  logit(PR) = log . 1 − PR

Hence logit(PR) represents the log odds of PR.

Testing for differences in logit(PR) was conducted in two stages, firstly a linear mixed effects model (LME) was used to test for overall differences, with game mode as the fixed effect and session (proxy for player) as the random effect. It was important to use this type of model as some participants played the game multiple times, therefore the independence of observations could not be assumed. The lme package (Pinheiro et al., 2017) in R (R Core

Team, 2017) was used for this analysis, as follows:

98 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

logit(PR) = game_modei( f ixed) + sessionj(random) + e.

Posthoc pairwise analyses were then conducted, testing for differences between each group using Tukey’s test (Tukey, 1949). A Benjamini-Hochberg adjustment (Benjamini and Hochberg, 1995) was applied in order to control for Type 1 errors.

The estimated expected responses, denoted here after by yˆ, for each game mode were back transformed to obtain the corresponding PRˆ , and the odds ratio calculated such that:

yˆ = logit(PRˆ )

eyˆ PRˆ = 1 + eyˆ and PRˆ odds(PRˆ ) = 1 − PRˆ

The following four examples demonstrate the practical interpretation of PR and odds(PR):

1.A PR of 0.8 = odds(PR) of 4 = the player won 4 times what they missed.

2.A PR of 0.6 = odds(PR) of 1.5 = the player won 1.5 times more than they missed.

3. A PR of 0.5 = odds(PR) of 1 = the player won the same as they missed. That is, they

missed 50% of the total possible reward.

4.A PR of 0.2 = odds(PR) of 0.25 = the player only won 25% of what they missed.

Power analysis

A power analysis for the lme model for game mode was conducted using the simr package in (R Core Team, 2017)

Gini coefficient by game mode

Testing for differences in the Gini coefficient was conducted in two stages (Conover and Conover, 1980). Firstly a Kruskal-Wallis test for overall differences in distributions between game modes was conducted. Based on a significant result of this test, pairwise

99 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. comparisons were conducted using a Mann-Whitney test (Dunn, 1964), which involves re- ranking the observations for each comparison, and then applying a Benjamini-Hockberg correction for multiple comparisons (Shaffer, 1995).

The Kruskal-Wallis test, also kown as the one-way ANOVA on ranks, is a non-parametric method for testing if samples originate from the same distribution (Kruskal and Wallis,

1952). Posthoc pairwise analysis using the kolmogorov-Smirnov test was considered however, due to ties present in the data the Wilcox-Mann-Whitney test was selected as it is more capable of handling ties (Siegel, 1956). A Benjamini-Hockberg adjustment was applied to these posthoc pairwise analyses in order to control for type 1 errors. Both statistical tests were conducted using the stats package in R (R Core Team, 2017).

Analysing the effect of risk profile on behaviour

For each ship the ‘risk of defeat’ (pi) was drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24) and presented to the player as a probability between 0 and 1. For analysis, pi was categorised as either high (h) or low (l), and therefore the risk profile for any game is either lll, llh, lhh or hhh. Order was not considered important to the analysis, and this subsequently influenced the sample size across the groups.

The same methods for investigating the PR and the Gini coefficient across game modes were used for evaulating differences in PR and the Gini coefficient across risk profiles and uncertainty profiles. As discussed above, a linear mixed effects model was used to investigate these differencee as this method can handle differences in sample sizes, and since an independence of observations could not be assumed (as discussed above within the game mode analysis section).

Performance ratio by risk profile

A logit transformation was applied to PR. An LME model was applied to logit(PR) and the estimated coefficients back transformed as defined in the game mode section above.

The model for investigating the influence of game mode on logit(PR) is defined as:

logit(PR)i = risk_pro f ilei( f ixed) + sessionj(random) + e

100 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Power analysis

A power analysis for the lme model for risk profile was conducted using the simr package in (R Core Team, 2017).

Gini Coefficient by risk profile

To test for differences in the Gini coefficient between risk profiles, a Kruskal-Wallis test

(Daniel, 1990) was used. Pairwise comparison was conducted using the Wilcox-Mann-

Whitney test (Conover and Iman, 1979) with a Benjamini-Hockberg adjustment (Shaffer,

1995) were used (as defined in the game mode section above).

The Kruskal-Wallis test is appropriate for samples of different sizes as is present between risk profiles.

Analysing the effect of uncertainty profile on behaviour

The uncertainty level for each ship was randomly allocated as an interval with a length of either 0.1, 0.3 or 0.6. This was added to the pi in order to calculate the upper and lower bounds which were reported to the player depending on the game mode. For analyses these inervals were categorised as either high (h), medium (m) or low (l). Therefore, for each game the uncertainty profile was any combination of l, m or h. The same methods outlined above for game mode were used in evaluating differences in PR and the Gini coefficient between uncertainty profiles.

Performance ratio by uncertainty profile

As outlined in the game mode analysis section above, a logit transformation was applied to PR, then an LME was applied to Logit(PR) and estimated coefficients back transformed.

The model for investigating the influence of uncertainty profile on logit(PR) is defined as:

logit(PR) = uncert_pro f ilei( f ixed) + sessionj(random) + e

Gini coefficient by uncertainty profile

101 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

A Kruskal-Wallis test with posthoc pairwise comparisons was conducted using a Wilcox-

Mann-Whitney test with a Benjamini-Hockberg adjustment as defined for the analysis of the Gini coefficient by game mode above.

5.5 Results

In this analysis both the performance ratio and the Gini coefficient were used to evaluate how the player allocated their gold doubloons, and if this behaviour changed across the different game modes, risk profiles or uncertainty profiles.

The section begins with descriptive statistics of the sample sizes across game mode, risk profile and unertainty profile followed by descriptive statistics of logit(PR)) and the Gini coefficient. The results are then broken into three sections which analyse the effect pf game mode, risk profile and uncertainty profile on behaviour. Within each section I look at both the logit(PR) and the Gini coeffcient.

75

50

25 Number of games played

0 0 25 50 75 100 Sessions ID

40

30

20 Number of players 10

0

1 2−4 5−9 10−14 15−19 20−24 25−29 >30 Number of games played

Figure 5.4: Number of games per session

102 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Table 5.2: Number of games per game mode

game_mode n 0 162 1 175 2 178 3 164

Table 5.3: Number of games per uncertainty profile

game_uncert n HHH 21 LHH 89 LLH 80 LLL 19 LLM 73 LMH 153 LMM 67 MHH 67 MMH 85 MMM 25

5.5.1 Descriptive statistics: Number of games, logit(PR) and Gini coefficient

Number of games

Data was collected for a total of 679 games, some players played more than one game, with 2 to 4 games the median number of games (n = 40), 23 players played just one game. Figure 5.4 show the number of games per session. Game mode was randomly allocated to each game, this the number of games played in each mode was relatively even, see Table 5.2. As described above, the number of games with each uncertainty profile and risk profile were not uniform, this was due to the order of the individual ships not being important, therefore the profiles (lll and hhh or mmm) occured less than other

Table 5.4: Number of games per risk profile

game_risk n hhh 77 lhh 236 llh 261 lll 105

103 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Table 5.5: Descriptive statistics for Gini coefficient and logit(PR)

Behaviour_measure No.games mean sd min Q1 median Q3 max Gini coefficient 679 0.689 0.281 0.000 0.500 0.667 1.000 1.000 logit(PR) 679 2.810 0.659 0.975 2.347 2.884 3.366 3.664 combinations. Samples sizes for each risk and uncertainty profile are detailed in Table 5.4 and Table 5.3 respectively.

Descriptive statistics - logit(PR)

0.10 3

0.05 logit(PR) 2 Proportion of Games

0.00

1 2 3 1 logit(PR)

Figure 5.5: logit(PR)

Table 5.5 and Figure 5.5 show descriptive statistics, histogram and box plot for logit(PR).

The logit(PR) had a mean of 2.810, standard deviation of 0.187 and a maximum of 3.664.

Over 14% of players allocated their gold doubloons in align with the maximised solution, which was to allocate all doubloons to the ship with the highest reward, regardless of risk

(logit(PR) ≥ 3.66). Players that did not use this strategy are explored further through the

Gini coefficient (see below).

Descriptive statistics - Gini coefficient

1.00 0.3

0.75

0.2

0.50 Proportion

0.1 Gini coeffcieint

0.25

0.0

0.00 0.25 0.50 0.75 1.00 Gini coefficient 0.00

Figure 5.6: Gini coefficient

104 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Table 5.6: Descriptive statistics logit(PR) by game mode

method No.games mean sd min Q1 median Q3 max total 679 2.810 0.659 0.975 2.347 2.884 3.366 3.664 interval 162 2.703 0.648 1.007 2.278 2.733 3.165 3.664 plus/minus 175 2.879 0.637 1.117 2.435 2.973 3.468 3.664 semantic 178 2.783 0.606 0.989 2.393 2.801 3.267 3.664 pont estimate 164 2.871 0.735 0.975 2.380 3.072 3.554 3.664

3 logit(PR) 2

1

Interval Plus/minus Semantic Interval Point Estimate Game mode

Figure 5.7: logit(PR) by game mode

Overall, more than 30% of players allocated all doubloons to one ship (gini coefficient =

0.66) and fewer than 5% of players spread their dubloons equally across all ships (gini coefficient = 0.0), see Table 5.5 and Figure 5.6.

Table 5.7: Logit(PR) vs game mode: Summary of post hoc analysis on linear mixed effects model (BH adjust)

Game Mode yˆ SE z value p-value p≤0.05 PRˆ odds(PRˆ ) Plus/minus - Interval 0.168 0.069 2.440 0.044 * 0.542 1.183 Semantic - Interval 0.099 0.069 1.440 0.223 0.525 1.105 Point Est. - Interval 0.200 0.700 2.850 0.027 * 0.550 1.222 Semantic - Plus/minus -0.069 0.670 -1.030 0.364 0.483 0.934 Point Est. - Plus/minus 0.032 0.690 0.459 0.646 0.508 1.033 Point Est. - Semantic 0.101 0.680 1.480 0.223 0.525 1.105

105 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

A:Game mode = Interval B:Game mode = Plus/Minus

0.20 0.20

0.15 0.15

0.10 0.10 Proportion Proportion

0.05 0.05

0.00 0.00

1 2 3 1 2 3 logit(PR) logit(PR) C:Game mode = Semantic Interval D:Game mode = Point Estimate

0.20 0.20

0.15 0.15

0.10 0.10 Proportion Proportion

0.05 0.05

0.00 0.00

1 2 3 1 2 3 logit(PR) logit(PR)

Figure 5.8: logit(PR) by game mode

5.5.2 Effect of game mode on behaviour

Performance ratio and game mode

The analysis showed a statistically significant difference in the logit(PR) between game modes (F = 3.215, p = 0.023). Posthoc analysis showed this improvement was present in the plus/minus game mode (z = 2.440, p = 0.04) and the point estimate game mode (z =

2.850, p = 0.03), compared to the interval game mode, see Table 5.7 and Figures 5.7& 5.8.

The power for this analysis was 0.76.

This equates to an 18% increase (yˆ = 0.168, PRˆ = 0.542, odds(PRˆ ) = 1.18) and 22% increase (yˆ = 0.200, PRˆ = 0.549, odds(PRˆ ) = 1.22,) in the odds(PR) when uncertainty was presented as a point estimate plus/minus error or a point estimate (without error), compared to using the numeric upper and lower bounds of the uncertainty interval. The power for this analysis was 0.76.

There was no statistically significant difference in the logit(PR) between any other game modes. Visual inspection of residual plots did not reveal any violations of the assumption

106 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Table 5.8: Descriptive statistics Gini coefficient by game mode

method No.games mean sd min Q1 median Q3 max total 679 0.689 0.281 0 0.500 0.667 1.00 1 interval 162 0.713 0.284 0 0.500 0.667 1.00 1 plus/minus 175 0.691 0.285 0 0.500 0.667 1.00 1 semantic 178 0.637 0.284 0 0.333 0.667 0.95 1 pont estimate 164 0.718 0.266 0 0.500 0.733 1.00 1 of normality, and Levene’s test for homogeneity of variance (Brown and Forsythe, 1974) did not reveal any violation of the assumption of homoscedasticity, see Appendix D.2.

Gini coefficient and game mode

1.00

0.75

0.50 Gini coefficient

0.25

0.00

Interval Plus/minus Semantic Interval Point Estimate Game Mode

Figure 5.9: Gini coefficient vs game mode

A Kruskal-Wallis test demonstrated that the Gini coefficients for each game mode were not from the same populations (chi-squared = 10.11, df = 3, p-value = 0.018). Posthoc pairwise analyses showed that this difference applied specifically to the interval compared to the semantic game modes (p.adjust = 0.042), and the semantic compared to the point estimate game modes (p.adjust = 0.042). Descriptive statistics, histograms and boxplots of the Gini coefficient across game mode can be seen in Table 5.8, Figure 5.9 and Figure 5.10.

The wider variance for the semantic game mode in Figure 5.9 shows greater variation in how players spread their doubloons across the three ships, as well as a larger number of players distributing their doubloons across the ships. This is also seen in the histograms in Figure 5.10, where it is clear that for the semantic interval game mode, more players spread their doubloons across ships (lower Gini coefficient).

It is interesting to note that no statisically significant difference was detected in the distribution of the Gini coefficient between the point estimate and point estimate ± error

107 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

A:Game mode = Interval B:Game mode = Plus/Minus

0 1

0.4 0.4

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient C:Game mode = Semantic Interval D:Game mode = Point Estimate

2 3

0.4 0.4

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient

Figure 5.10: Gini coefficient by game mode

Table 5.9: Descriptive statistics logit(PR) by risk profile

risk_profile No.games mean sd min Q1 median Q3 max hhh 77 2.737 0.843 1.055 2.094 2.794 3.664 3.664 lhh 236 2.780 0.695 0.989 2.326 2.830 3.431 3.664 llh 261 2.778 0.609 0.975 2.307 2.817 3.267 3.664 lll 105 3.010 0.498 1.610 2.700 3.103 3.396 3.664 game modes. Suggesting that the addition of the error term does not promote more or less risk averse behaviour.

5.5.3 Effect of risk profile on behaviour

As described in Section 5.4.5 above, since order of the ‘risk of defeat’ (pi) across the 3 ships within a single game was not important, the sample sizes across risk profiles were not uniform. As can be expected the llh (n = 261) and lhh (n = 236) profiles had a sample size of at least twice that of the uniform profiles hhh (n = 77) and lll (n = 105).

108 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

3

2 logti(PR)

1

0

hhh lhh llh lll Risk profile

Figure 5.11: Logit(PR) by risk profile

Table 5.10: logit(PR) - Summary of post hoc pairwise analysis of risk profiles (linear mixed effects model with BH adjustment)

Risk Profile yˆ SE z value p-value p≤0.05 PRˆ odds(PRˆ ) lhh - hhh 0.030 0.083 0.360 0.884 0.5074994 1.030455 llh - hhh 0.028 0.082 0.336 0.884 0.5069995 1.028396 lll - hhh 0.257 0.095 2.700 0.014 * 0.5638987 1.293045 llh - lhh -0.002 0.057 -0.040 0.968 0.4995000 0.998002 lll - lhh 0.227 0.740 3.080 0.006 * 0.5565076 1.254830 lll - llh 0.229 0.073 3.150 0.006 * 0.5570011 1.257342

risk profile = hhh risk profile = lhh

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0

1 2 3 4 1 2 3 4 logit(PR) logit(PR) risk profile = llh risk profile = lll

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0

1 2 3 4 1 2 3 4 logit(PR) logit(PR)

Figure 5.12: logit(PR) by risk profile

109 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Logit(PR) by risk profile

Results of the linear mixed-effects model and posthoc analysis, (methods in Section 5.4.5)

showed a statistically sinificant difference in logit(PR) between risk profiles (F = 4.053,

p = 0.0072). The power for this analysis was 0.83.

Posthoc analyses showed this improvement was present in the uniform low risk profile

(lll) games, compared to all other risk profiles (lll − llh: z = 3.150, p = 0.01; lll − lhh:

z = 3.080, p = 0.006; lll − hhh: z = 2.700, p = 0.01). Specifically for the odds(PR), in

comparison to lll: the mixed low risk profile (llh) had a 25% decrease (odds ratio = 1.257,

yˆ = 0.229 and PRˆ = 0.557); the mixed high-risk had a 25% decrease (lll − lhh: odds ratio

= 1.254, yˆ = 0.227, PRˆ = 0.557); and the uniform high-risk had a 29% decrease (lll − hhh:

odds ratio = 1.293, yˆ = 0.257, PRˆ = 0.564).

Visual inspection of residual plots did not reveal any violations of the assumption of

normality. Levene’s test for homogeneity of variance (Brown and Forsythe, 1974) did

show that this assumption was violated (p < 0.01), see Appendix D.3. This presence of

heteroscedasticity should be considered when interpreting these results.

See Table 5.9, Figure 5.11 and Figure 5.12 for descriptive statistics and Table 5.10 for

outputs of the posthoc analysis.

Inspection of the histograms and boxplots in 5.12 and 5.11 show that while the mean

logit(PR) for uniform low risk profiles (lll) is higher, the uniform high profiles (hhh)

resulted in a much larger proportion of players allocating their doubloons inline with the

maximised solution (i.e., allocating all doubloons to the ship with the largest reward).

Gini coefficient by risk profile

A Kruskal-Wallis test showed that the distributions of the Gini coefficient were statistically

significantly different between risk-profile (chi-squared = 20.75, df = 3 and p-value =

0.0001). Pairwise comparison showed these differences were between the low risk profile

(lll) and all other game modes (lll - hhh: p.adjust = 0.006, lll - lhh: p.adjust = 0.018, lll

- llh: p.adjust = 0.006). Inspection of the histograms (Figure 5.14) and boxplots (Figure

5.13) show a lower proportion of players concentrate their resources on one ship (Gini

110 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

1.00

0.75

0.50 Gini coefficient

0.25

0.00

hhh lhh llh lll Risk profile

Figure 5.13: Gini coefficient by risk profile

Risk Profile = hhh Risk Profile = lhh

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini Coefficient Gini Coefficient Risk Profile = llh Risk Profile = lll

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2 Proportion Proportion

0.1 0.1

0.0 0.0

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini Coefficient Gini Coefficient

Figure 5.14: Gini coefficient by risk profile coefficient = 1) when the risk profile was uniformly low. Figure 5.13 also shows much less variation in behaviour for the low risk profile.

These results suggest that when all options have a uniformly low risk, players engage in a more risk spreading behaviour, and that players are more likely to concentrate their resources when risk across multiple options is uniformly high.

111 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

Table 5.11: Gini coefficient - Posthoc pairwise comparison by risk profile (Mann-Whitney test with BH adjust)

comparison W.statistics p.value p.adjust p≤ 0.05 hhh - lhh 3007.5 0.443 1.000 hhh - llh 4139.0 0.316 1.000 hhh - lll 5204.0 0.001 0.006 * lhh - llh 3633.5 0.951 1.000 lhh - lll 4834.5 0.003 0.018 * llh - lll 6567.5 0.001 0.006 *

5.5.4 Effect of uncertainty profile behaviour

The uncertainty level for each ship was either high (h), medium (m) or low (l), therefore for each game the uncertainty profile was any combination of l, m and/or h, for example lll, lmh, llh, mmh, etc. Since order was not important, the uniform profiles occured much less than the miuxed profiles such as lmh. logit(PR) and uncertainty profile

3 logit(PR) 2

1

HHH LHH LLH LLL LLM LMH LMM MHH MMH MMM Uncertainty profile

Figure 5.15: logit(PR) by uncertainty profile

There was no statistically significant difference in the logit(PR) between uncertainty game profiles. See Section 5.4.5 for methods.

112 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

1.00

0.75

0.50 Gini coefficient

0.25

0.00

HHH LHH LLH LLL LLM LMH LMM MHH MMH MMM Uncertainty profile

Figure 5.16: Gini coefficient by uncertainty profile

Uncertainty profile = HHH Uncertainty profile = LHH Uncertainty profile = LLH

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2 Proportion Proportion Proportion

0.1 0.1 0.1

0.0 0.0 0.0

0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = LLL) Uncertainty profile = LLM Uncertainty profile = LMH

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2 Proportion Proportion Proportion

0.1 0.1 0.1

0.0 0.0 0.0

0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = LMM Uncertainty profile = MHH Uncertainty profile = MMH

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2 Proportion Proportion Proportion

0.1 0.1 0.1

0.0 0.0 0.0

0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = MMM

0.5

0.4

0.3

0.2 Proportion

0.1

0.0

0.4 0.6 0.8 1.0 Gini coefficient

Figure 5.17: Gini coefficient by uncertainty profile

Gini coefficient

A Kruskal-Wallis test did not provide any evidence that the distributions of the Gini coefficient over the different uncertainty risk profiles came from different populations

(chi-squared = 9.99, df = 9, p-value = 0.35).

113 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

From the boxplots on Figure 5.16 the three uniform profiles (lll, mmm and hhh) appear to deviate most significantly. However this could also be a product of the smaller sample sizes of these groups.

5.6 Discussion

The aim of this study was to explore if the behaviour of players of an online game changed depending on the uncertainty representation method used to communicate an estimated risk. As well as investigating the effect of the uncertainty representation method on players behaviour, the analysis explored if behaviour was influenced by the level of risk and level of uncertainty associated with the estimated risk. Two measures were used to explore the players behaviour; a performance ratio (PR) and the Gini coefficient.

Behaviour is a complex and multidimensional variable and these two measures provided a one dimensional slice or simplified view of behaviour, and are not designed to be a comprehensive measure. PR measured the players ability to maximise the expected value of the game. While the Gini coefficient quantified how the player spread their resources across the three ships within the game, thus providing a measure of risk-spreading or risk-averse behaviour.

I acknowledge that PR could be used as a measure of risk-aversion, in that, maximising the potential outcome is in essence a behaviour that attempts to minimise risk. However, the

PR measure does not differentiate or provide information about how players reached the

PR score, and cannot differentiate between games that have the same score, but allocated resources across the game differently. Therefore, in this analysis I did not use PR to investigate risk behaviour, it is used only as a measure of the players ability to maximise the expected outcome.

A limitation of this study is the presence of repeated games by individual players that was not initially planned for in the study design. The repeated games of some players could have effected behaviour, however this was not investigated further. One of the challenges of investigating this is that while many participants played the game more than once, the number of games played was not uniform, and very few played sufficient games to provide multiple observations of all game modes for individual players. While this is a

114 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. consideration in the interpretation of the results, I believe that there are sufficient games with less than 4 repeats, or no repeats, for the results to not be biased by within player changes. An interesting future analysis would be to explore the effect of behaviour over repeated games.

An additional limitation of the study is non-uniform sample sizes across both risk profiles and uncertainty profiles were non-uniform. This was due to poor experimental design, and could have been addressed by instead of randomly assigning these values to the individual ships, the overall profile for the game could have been set. Despite these differences in sample sizes, power analyses demonstrated that the analyses were not underpowered. Interactions between game mode, risk profile and uncertainty profile were not more extensively investigated in this study also due to insufficient sample sizes of all groups. Future analyses could investigate these interactions further.

Game mode (uncertainty communication method)

In terms of players performance, this analysis showed that using a point estimate or a point estimate ± error to communicate an estimated risk, improved the players ability to maximise the expected value of the game, compared to when the upper and lower bounds of an uncertainty interval are used. This suggests that the point estimate provides support for maximising the expected outcome in a way that only providing an uncertainty interval does not. Thus, including the point estimate may be important in circumstances where an event is repeated and the decision-maker is interested in maximising the expected return of an event and/or maximising the long run average over multiple events.

In terms of risk behaviour (the Gini coefficient), the semantic version of the interval promoted greater resource spreading and therefore more risk-averse behaviour. There was little difference in behaviour observed between the other game modes. The difference seen in the semantic version of the game is possibly due to the linguistic uncertainty or ambiguity inherent in using semantic terms such as average, high, low, above-average, etc.

The semantic definitions of risk (average, above average, below average), possibly allows more room for subjective interpretation of these terms, explaining the higher variation in the Gini coefficient for the semantic game mode. Interestingly there was no evidence that the addition of ± error to a point estimate influenced players risk-averse behaviour.

115 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.

These results suggest that a point estimate should be included with or without uncertainty if the aim of the decision is to maximise performance. Semantic versions of uncertainty should be used with care as they may promote more risk-averse behaviour.

Risk profile

Both the performance ratio and the Gini coefficient demonstrated that player behaviour is influenced by the level of risk when allocating resources across uncertain options. This analysis showed that uniformly high-risk options promoted more risk-taking behaviour, in which players more readily concentrate their resources on one option.

In terms of performance it is not surprising that games with a uniform risk profile perform better than those without. When all options have the same risk one variable is eliminated from the decision process, thus making the choice of allocating resources easier as players are not weighing up multiple risk vs reward combinations. It is surprising however, that behaviour was not the same between the uniformly high and low risk games. Suggesting that players use a different strategy depending on the level of risk present.

There are a range of possible explanations for this observed difference in behaviour.

Potentially, players perceive that their available resources have more impact allocated to a high-risk option compared to an option where the risk is already low. Players may also have an internal level of risk they are willing to accept, and thus they allocate resources to one ship until the risk reaches a level that aligns with this internal meter they are comfortable with, after which they turn their attention to the other ships. However, the lack of homogeneity of variance between groups means that these results should be interpreted with caution and further analyses are needed to validate this observation.

Uncertainty profile

The analysis showed no evidence that the uncertainty level was influencing players behaviour. However, small sample sizes in the uniform profiles subgroups limited the power of these analyses.

Learnings for the Australian Cancer Atlas

This research has a range of insights relevant to the design of the Australian Cancer Altas.

In particular, semantic versions of uncertainty should be avoided, as they leave room for

116 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. subjective interpretations of the uncertainty measures. In addition, there appears to be no added advantage of the inclusion of ± error when reporting a point estimate. Designers of the atlas should keep in mind that the audiences’ level of risk-averse behaviour may be influenced by the level of risk presented in the Australian Cancer Atlas. However, the behaviour seen in this research, where uniformly high-risk promotes risk-taking behaviour, is unlikely to apply in a cancer risk context, where audiences are not direcly allocating resources.

117

Chapter 6

Discussion

Buttenfield states there are three impediments to uncertainty communication (Buttenfield and Beard, 1991). They are:

1. Standardisation of terminology

2. Methods for presenting uncertainty

3. Methods for communicating uncertainty in a way that is meaningful and useful, and

that meet the users’ needs.

This thesis aimed to contribute to the broad reaching problem of how to communicate statistical uncertainty to non-expert audiences and it did this through contributing to items 2 and 3 in the list above. This problem is an important challenge for the statistical sciences, but also a pressing issue for science more broadly. In this research I conducted two research activities. Firstly, I used the Australian Cancer Atlas as a case study for uncertainty communication design, and secondly, I investigated the relationship between uncertainty representation methods and audience behaviour, exploring commonly utilised uncertainty representation methods. Within activity 1 I: conducted a grey literature review of publically available cancer maps, explored the use of Morgan and Henrion (1990)’s taxonomy of uncertainty as a practical took for diagnosing uncertainty sources, applied the

NEUVis design framework to uncertainty communication, and conducted focus groups with end-users of the Australian Cancer Atlas in order to understand their needs, current

119 CHAPTER 6. DISCUSSION behaviour and understanding of uncertainty. In the second research activity, I designed, built, implemented and analysised the results from, an online game which investigated how the different uncertainty representation methods of numeric intervals, semantic intervals, point estimates with error and point estimates without error. Players of the game had to allocate resources across a range of options with different levels of risk, reward, and uncertainty.

6.1 Uncertainty communication design

Grey literature review

42% of cancer maps identified in this review reported some measure of uncertainty in the form of: standard deviation, credible/confidence intervals, error bars, distributions and boxplots. These measures and visualisations are common forms of representing uncertainty between scientific peers, but as the focus groups in this research demonstrated, they are not always understood by the non-expert. The lack of a consistent approach to uncertainty representation highlights the lack of standardisation for including uncertainty in cancer maps.

Uncertainty representation was more prevalent in maps that contained interactivity. This is not suprising considering that Aerts, Clarke, and Keuper (2003) and Gerharz and Pebesma

(2009) have both demonstrated that interactivity can support non-experts to understand uncertainty information. This may be an intuitive response from the map designers. Inter- estingly, none of the maps identified in this review used cartograms, which are emerging as useful tools in maps that contain population information as they allow the map’s geography to be distorted to visually represent the underlying population distribution

(Nusrat and Kobourov, 2016). Cartograms could make at least one source of uncertainty within disease maps, i.e., sample size, more visible. This review highlighted the lack of consistency in including uncertainty in cancer mapping. Examples and guidelines for map developers may promote the inclusion of uncertainty in future disease mapping.

Application of the NEUVis design framework

Communicating statistical uncertainty to the non-expert audience is not a simple task,

120 CHAPTER 6. DISCUSSION and there are limited case studies in the literature to support communication cre- ators/designers in the development of material that includes uncertainty information. The process of diagnosing uncertainty sources, understanding the impact of uncertainty on the interpretation of analysis outputs, and identifying the needs of the target audience(s) can be difficult to navigate. In many cases, the communication designer may not understand the importance, and impact of, the uncertainty information, while the scientist may lack the skills to identify, and respond to, the users’ needs. Considering the users’ needs in the design process is critical as users interact differently depending on what their motiva- tions are. Joslyn and Savelli (2010) have demonstrated that users of scientific information that includes uncertainty respond differently to the same information when they were provided with a goal based decision rather than just asked to evaluate the information.

Using the ACA, this research provided a case study which explored the use of two tools to navigate the uncertainty communication design challenge. Firstly, Morgan and Henrion

(1990)’s taxonomy of uncertainty was used to diagnose uncertainty sources within the

ACA. Secondly, the NEUVis design framework was used to identify target audiences, and profile the users’ needs as well as the impact uncertainty information may have on their understanding of the presented insights (Gough et al., 2016).

The design framework was extended to consider uncertainty information and was a suc- cessful tool for the project stakeholders (communication designers, statistical modellers,

Cancer Council QLD staff) to come to a consensus about which audiences to target, sys- tematically consider/ identify the users’ needs, and evaluate how uncertainty information may influence these audiences differently. The framework provided a platform for stake- holders to navigate the communication design process in a context where no one project partner had sufficient skills to address all angles of the communication challenge. This is a consideration that Schneider and Moss (1999)‘s confidence in the information score or

Lapinski (2009)’s uncertainty visualisation design framework explicitly support. Because the NEUVis design framework focuses on the users’ needs and the impact may have on them, all stakeholders can engaged with framework. It does not take technical statistical expertise to empathise with how uncertainty will impact the interpretation of insights of another non-expert.

121 CHAPTER 6. DISCUSSION

The framework is accessible for both the design and analytics novices and experts. For the design-novice it provides an intuitive structure for breaking down the communication challenge by audience, user needs, data, insights and uncertainty information. For the design-expert it provides a familiar design framework but specifically considers the scientific, data, data analysis and uncertainty information that they may not be familiar with, but which are important to the communication challenge.

Insights from consulting the user

Consulting the user through focus groups enabled unexpected user-needs and behaviours specific to the context of the ACA to be identified.

In general, health care practitioners did not perceive uncertainty information, in a cancer mapping context, to be valuable in their clinical responsibilities. This perspective from health care workers was backed up by the general audiences who generally ignored the uncertainty information in the maps. However, although this information was ignored initially, once it was explained the general audience found it valuable and useful. This sup- ports the study from Joslyn and Savelli (2010) who suggested that it is not the uncertainty itself that is difficult for the non-expert user to understand, it is the representation method which makes it inaccessible to them. Studies in the social sciences indicate that people anticipate uncertainty (Morss et al., 2008; Lazo et al., 2009) suggesting that audiences are prepared to understand uncertainty, if they can access the information.

Error bars on existing cancer maps were often misinterpreted. Participants interpreted their position within the error bar range to be determined by lifestyle choices. That is, the lower risk end of the error range contained individuals that make positive lifestyle decisions in regards to health, diet, exercise etc, while the higher risk end contained people that did not take care of their health.

It was interesting that the focus group participants personalised the uncertainty interval in that they assumed that the interval represented a collection of individuals and that an individual’s personal position within the interval (their personal risk) was influenced by their lifestyle choices or situation, rather than the modellers ability to predict the risk. As far as I am aware, this observation has not been identified in other uncertainty research

122 CHAPTER 6. DISCUSSION explicitly. This interpretation by the non-expert may be more pertinent in health research, where the user often attempts to personalise the information available. This would be an interesting question to explore further in a non-health context to see if this interpretation of the uncertainty interval is the same.

This personalisation is also seen when focus group participants expressed a preference for risk to be presented in relative frequencies, for example if there were 10 people just like me, 1 would develop bowel cancer, rather than a 10% chance of developing bowel cancer. They felt this was more comparable with other risks in life. Much work from

David Speigelhalter has also shown non-experts find this format much easier to relate to

(Spiegelhalter, Pearson, and Short, 2011).

Diagnosing sources of uncertainty

One of the necessary steps for incorporating uncertainty into any communication output is identifying the sources of uncertainty within a research project. Within this study I attempted to achieve this systematically, using Morgan and Henrion (1990)’s taxonomy of uncertainty sources as a diagnostic tool. All stakeholders, including the statistical modellers, struggled to identify the uncertainty sources within the ACA using Morgan and Henrion (1990)’s taxonomy. There were three issues that limited the use of this taxonomy as a tool, they were: understandability of the taxonomy categories by all stakeholders, lack of technical knowledge of the project’s methods by all stakeholders, and a lack of perceived usefulness of identifying these uncertainty sources to the overall communication challenge.

The first task was a challenge for all stakeholders, even the experienced analysts who, despite their experience and training, still debated the differences between several of

Morgan and Henrion (1990)’s uncertainty sources. Allocating more time to this task would have been beneficial, allowing participants to digest the taxonomy and then apply it to the

ACA. More guidance and clear examples of each source may have also supported partici- pants to digest the taxonomy and more readily apply it. For the visualisation specialists within the group, while the taxonomy was thorough, it did not help them understand the significance of these uncertainty sources, or the impact they have on the interpretation of

123 CHAPTER 6. DISCUSSION outputs of the ACA. They could not engage with the task without a technical understand- ing of the different sources. Other attempts at turning traditional taxonomies into more useful tools (Walker et al. (2003); Knol et al. (2009)) have also struggled to support the participants in understanding the taxonomy. One investigation showed that the taxonomy defined by Walker et al. (2003), which includes nature, location and level for diagnosing uncertainties was limited by differing abilities and experience (Gillund et al. (2008); Knol et al. (2009); Skinner, Rocks, and Pollard (2016)) of those applying the taxonomy. Skinner et al. (2013) suggest structured guidance for understanding these taxonomies is essential and showed this was successful for an extension of Walker et al. (2003)’s taxonomy within environmental risk assessments.

Further to this, the non-technical stakeholders also could not apply the uncertainty sources to the ACA as this required technical understanding of the methods and data. This resulted in the non-technical stakeholders with no access point to contribute to mapping the uncertainty sources to the ACA, and also did not support their understanding of why these uncertainties were important to the project outputs. The technical nature of Morgan and Henrion (1990)’s taxonomy, while thorough and structured, was not user friendly to the non-technical team members.

Project stakeholders also found it difficult to see the usefulness of characterising all sources listed in the taxonomy, or how the mental energy required to understand the different sources warranted the return. An example is systematic bias. Firstly, time was spent debating the definition by the group, secondly bias in any project is difficult to identify and formalise, and thirdly, participants questioned the usefulness of formalising sources of bias, as they cannot be measured, reduced, and would most probably not be relevant to the audience of the ACA. Diagnosing this source of uncertainty appeared to not contribute to the overall communication design challenge. A more effective way to identify the uncertainty sources and engaged the full team would be to connect more explicitly the key insights of the project with the uncertainty sources, working backwards from the key insights, rather than forwards form the taxonomy. In this way, work would be focused on identifying uncertainty sources that influence the key research findings, rather than all uncertainty sources. A hybrid between Morgan and Henrion (1990)’s taxonomy and

124 CHAPTER 6. DISCUSSION the guidelines for assessing uncertainty developed by Schneider and Moss (1999) could be a starting point for this approach. Schneider and Moss (1999)’s guidelines start with identifying uncertainties that will most influence the final insights, but they do not provide any framework for defining the uncertainty sources in the way that Morgan and Henrion

(1990) do.

Identifying which uncertainties are present in a project is an important component of uncertainty communication design. Morgan and Henrion (1990)’s taxonomy, in its current form, was not effective as a tool for diagnosing uncertainty sources. The taxonomy was difficult to understand and further guidance was required. A successful tool in the future should connect the key uncertainty sources to the key messages of the project, and also facilitate communication and technical stakeholders to discuss the uncertainty sources and their impact on the analysis outputs.

6.2 Testing uncertainty representation methods

Semantic version of uncertainty promoted more variation in behaviour and more risk averse behaviour

The semantic version of the upper and lower bounds of an uncertainty interval promoted more risk-averse behaviour (resource spreading) on average, as well as more variation in how players allocated the resource. Ambiguity in the interpretation of the uncertainty terms ‘average’, ‘above average’ or ‘below average’ may have led to players having less confidence in which ship to select. Semantic intervals compared to text intervals have been previously shown to be more open to misinterpretation (Savelli and Joslyn, 2013). The variation seen in the results is possibly due to more room for personal interpretation of the semantic uncertainty terminology between users. Studies by Joslyn and Savelli (2010) and Morgan and Henrion (1990) support this potential for misinterpretation and have shown that users interpret semantic terminology differently. These results further justify the need for standardisation of termiology as outlined by Buttenfield and Beard (1991) above and by Schneider and Moss (1999)‘s call for consistent mapping of semantic terms such as ’average’, and ‘above average’ to numeric values within the IPCC publications.

125 CHAPTER 6. DISCUSSION

Finally, reduced feedback could also be a contributing factor to the variation seen in the semantic version. In the game, when uncertainty was presented numerically, 1 gold doubloon spent on a particular ship automatically changed the risk displayed to the player, even if it was only 1%. However, with the semantic version, sufficient gold had to be allocated to move to the next semantic category, so 1 gold doubloon may not result in feedback to the player, while 10 gold doubloons might. Further investigation of the role feedback has on behaviour in these contexts would be valuable. At the time of writing, there was no known literature on the impact of feedback on risk behaviour or in uncertainty communication research.

Point estimates ± error

The players’ ability to maximise how much gold they won (and minimise how much they missed) was higher for the point estimate and the point estimate ± error game modes compared to the numeric uncertainty interval. Interestingly, the addition of an error term to the point estimate did not appear to have an influence. The lower performance of the confidence intervals aligns with results of Sanyal et al. (2010), which demonstrated that error bars underperformed in an information seeking task, although this research explored visual rather than numeric intervals. A reduction in performance is not always a negative outcome, depending on the stakes of the risk scenario. Studies have shown that interval forecasts allow participants to more effectively distinguish situations in which precautionary action was warranted, despite the fact that their decisions have a lower performance overall (Joslyn and LeClerc, 2011), intervals may be more appropriate when caution is warranted.

Influence of risk level

This study demonstrated that how a player allocates resources across options is influenced by the risk level of the available options. In terms of performance it is not surprising that games with a uniform risk profile perform better than those with a non-uniform risk profile. When all options have the same risk one variable is eliminated from the decision process, thus making the choice of allocating resources easier as players are not weighing up multiple risk vs reward combinations, and simply target the highest potential reward.

It is interesting however, that behaviour was not the same between the uniformly high

126 CHAPTER 6. DISCUSSION and uniformly low risk games, suggesting that players use a different strategy depending on the level of risk present.

There are a range of possible explanations for this observed difference in behaviour between uniform high and low risk profiles. Players may have an internal level of risk they are willing to accept, and thus they allocate resources to one ship until the risk reaches a level that aligns with the internal meter they are comfortable with, after which they turn their attention to the other ships. The violation of the assumption of heteroscedasticity however, means these results should be interpreted with caution and further analyses are required to validate these observations.

6.3 Critique & limitations

6.3.1 Literature Review

The uncertainty literature is broad and diverse. The best attempt was made to consolidate the most important aspects of this literature. Other areas of investigation may have been valuable to include in this review, but were considered to be outside the scope of this thesis. An example is uncertainty quantification. Understanding the uncertainty quantification landscape may have had valuable insights for the process of designing uncertainty communication, as well as selection of uncertainty representation methods used in the online game.

6.3.2 Research Activity 1.A: Grey literature review of internet published cancer maps

The grey literature review of internet published cancer maps aimed to provide an overview of the current practices in cancer mapping. Considering how quickly technology moves, this review will quickly date. Particularly as programming languages that support bespoke visualisations, such as R and d3.js, become both more user friendly and more widely used.

6.3.3 Research Activity 1.B: User centred uncertainty communication design

Diagnosing uncertainty

Diagnosing uncertainty sources is a complex task. A limitation is that in the evaluation of

127 CHAPTER 6. DISCUSSION

Morgan and Henrion (1990)’s taxonomy was available time in the workshop, and also that this task was set at the end of a full day workshop. A second application of the taxonomy would have provided a more data on the taxonomy’s strengths and weaknesses.

Focus groups

Focus group participants were difficult to attract, and only one participant from the policy- maker audience participated. I would have liked to have had more participation from the policy-maker or policy-advisor audience, as they are an important audience for both the

ACA and uncertainty communication more broadly. The policy-maker is far more likely to need to make decisions informed by scientific information than a general audience.

Further to this, I believe more focused research on the non-expert decision-maker, rather than just the non-expert would lead to more nuanced research outcomes. This is not to say that the general audience should not be a target for uncertainty communication research, but their needs and motives differ to that of the decision maker.

An additional limitation of the focus groups was both their sample size and the recruitment strategy. Holding two focus groups for each target audience could limit generalisable insights. It is often difficult to dictate the conversation in a focus group. Some level of freedom is required for participants to feel that their opinion is valued and to allow their perspectives to emerge. Naturally this results in different focus groups often taking different paths and discussing different content. Repeating the focus groups for each audience more than twice could have been beneficial. In terms of recruitment, participants for these focus groups were strongly recruited through the Cancer Council Queensland contact database. This represents a biased sample in which participants are likely to have a lived experience of cancer, either personally or through friends/family. By nature, the cancer council contact database contains people with these lived experiences. Therefore, this audience may have been more emotionally motivated by the topic than a general audience that has had no experience of cancer.

6.3.4 Research Activity 2: User study - uncertainty representation in an online game

Designing, building, implementing and analysing data from the online user study was a very valuable learning experience. Uneven sample sizes of the sub-groups of interest

128 CHAPTER 6. DISCUSSION was a limiting factor in this study, and something that could have been addressed with more careful experimental design. Specifically, utilising set combinations of risk profiles and uncertainty profiles across the four different game modes would have enabled a more balanced investigation of the effects of risk profile and uncertainty profile. Defining set combinations of variables could also have enabled analysis of interaction effects between uncertainty, risk and reward. That was not possible as some sub-groups did not have sufficient sample size.

An additional limitation of this study was the repeated games played by some players.

For the players that had repeat games, their strategy and approach to the game may have changed as they played the game more times. However, there were insufficient numbers of the repeated games across all game modes to investigate the effect of behaviour over multiple games.

6.4 Future Work

Uncertainty communication is a complex challenge that will require contributions from both qualitative and quantitative domains to address. Future challenges that need to be addressed within this include:

• Specific focus on the non-expert decision maker.

• Developing and validating a practical tool that is accessible for a range of stakeholder

for diagnosing uncertainty sources in research projects.

• Further case studies to solidify the use of an extended version of the NEUVis design

method for other contexts outside of cancer mapping.

• Investigate if the differences seen between point estimate (and point estimates ±

error) and intervals extends to graphic intervals.

129 CHAPTER 6. DISCUSSION

6.5 Conclusion

In this research I have contributed to a growing body of literature on uncertainty commu- nication, making a unique contribution by comparing uncertainty representation method that are commonly used, but so far are under investigated. I have demonstrated that the use of Morgan and Henrion’s (1990) taxonomy of uncertainty is insufficient as a tool for diagnosing uncertainty in cross disciplinary teams, and provided a case study for the use of the NEUVis design framework as a valuable tool in the design of uncertainty communications for the non-expert audience, including the value of focus groups to consult end-users within this design process. The outputs of both studies culminated in an outline of design considerations and user insights that will be used by the communication designers of the Australian Cancer Atlas.

130 Appendix A

Appendix: Literature Review

A.1 Uncertainty Representation in Mapping & GISciences

Motivated by the use of the Australian Cancer Atlas as a case study in this research, the scope of this section has been limited to uncertainty representation methods for

GIScience and mapping. Building on a strong tradition of quantitatively standardizing visual variables for communication the GISciences and mapping have a made significant contributions to research in this area (MacEachren 1992, Goodchild et al., 1994, Leitner and

Buttenfield, 2002; Brodlie et al., 2012 ((Leitner2013 Aerts et al (2003a); pham and brown

2003; Li and Zhang, 2006; bostrom et al., 2007; viard et al., 2011; dong and hayes, 2012), and these insights are applicable to any visual display (McGranaghan, 1993).

Initial approaches to new uncertainty representations methods began with Bertin’s (1981) visual variables. Bertin (ref) was a well renowned cartographer and was one of the first to standardise visual variables in mapping and visualisation and suggested location, size, value, texture, color, orientation, and shape, as visual variables for encoding information.

Much research has been done to explore the effectiveness of these variables for uncertainty representation (Mathews et al., 2008; Brodlie et al. 2012; Potter et al., 2012; Zuk & Carpen- dale, 2006; Pang et al., 1997; Johnson & Sanderson, 2003; Johnson, 2004; Evans, 1997;

Wittenbrink et al., 1996; Aerts et al., 2003; Sanyal et al., 2009, Leitner & Buttendfield 2000)).

MacEachren’s (1992) and Slocum (Slocum et al., 2004) added to Bertin’s list and suggested

131 APPENDIX A. APPENDIX: LITERATURE REVIEW edge crispness (fuzziness), fill clarity, fog, resolution, transparency and saturation as spe- cific visual valuables for representing uncertainty and Gershon (1992) suggested boundary

(thickness, texture, and color), transparency, and animation. Newman and Lee (2004) evaluated techniques for the visualization of uncertainty in volumetric data comparing glyph-based techniques, such as cylinders and cones, with colormapping and transparency adjustments. They found that while each method was useful for identifying uncertainty in the scenario test, the glyph techniques were most beneficial. But this depended on the question being asked (Sanyal et al., 2010).

Despite this growing body of research, there are still contradictory results, and more work is needed to validate these methods further. Color value and texture have been repeatedly suggested for the display of data quality information. For example, darker value or finer texture should be applied to display data of higher quality in mapping, whereas lighter value or coarser texture should be used to visualize lower data quality

(Buttenfield 1991; MacEachren 1992; van der WeI et al. 1994). The same relationship was found by McGranaghan (1986) when using symbols rather than just hue. However, this relationship can be dependent on the medium used. One result testing hue suggests that on a cathode-ray tube (CRT), compared to paper, people seem to associate lighter value, not darker value with more certain information (Leitner & Buttenfield, 2013). MacEachren

(1992) explored the use of high saturated hues for data of higher quality to unsaturated (i.e., gray) hues for lower data quality. However, Robinson (1952) does not regard saturation as avery useful dimension of color, and McGranaghan (1986) did not find saturation, by itself, effective at conveying differences in magnitude

132 Appendix B

Appendix: Research Acitivity 1. A - Grey Literature Review

B.1 Search Protocol

Research Question: What cancer maps are currently available to the public on the internet and what methodologies and technologies have been used to generate them.

Aim: To summarise the breadth of cancer atlases published publicly on the internet in terms of: statistical methods used, outcome measures, inclusion of uncertainty, map interactivity features, available functions, access to data, availability of explanations or supporting material explaining methods and data sources, technology platform used to create the web product, country, the area of resolution, smoothing methods are used, date of the data used, date of publication, generated by (gov, research institution, university), academic publications associated with the map.

Pre-Scoping

Cancer Atlas Synonyms:

Cancer map, oncology map, geospatial health statistics, geospatial cancer

statistics, Health atlas, disease Atlas, health map, spatial statistics, spatial

133 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

cancer statistics geographic clustering, geographic cancer variation, geographic

variation, Geographic patterns of disease, spatial patterns, geographic disease

distribution, atlas of disease distribution, disease distribution, bayesian cancer

map, spatial epidemiology, geospatial health data, geovisuali$ation, health

geographics, Geographic maldistribution, disease distribution, thematic cancer

map .

Search Details

Search Strings

The following list details the final search strings used:

1. intitle: spatial AND epidemiology AND cancer AND map OR mapping

OR atlas -campus

2. allintitle: cancer AND map OR mapping OR atlas -campus -kinase

-kinases -concept

3. allintitle: spatial AND cancer AND statistics

4. allintitle: spatial OR geographic AND cancer AND variation OR

distribution

5. allintitle: spatial AND epidemiology AND cancer AND map OR mapping

OR atlas -campus

6. intitle: cancer AND atlas

Within these search strings, we used the context-specific terms of “allintitle” (which requires all the search terms to be in the title) and “intitle” (which requires only the first search term to be in the title and the rest anywhere in the document). Hits containing in their title “campus”, “kinase”, “kinases” and “concept” were excluded. “Kinase” and

134 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

“kinases” refer to a protein enzymes often the focus of research when investigating the biology of cancer, but not relevant to geospatial mapping of cancer incidence or survival.

“Campus” and “concept” were excluded for their obvious connection with “campus map” and “concept map” neither of which are relevant to cancer mapping.

A search testing log that outlines the testing and refinement of these search strings is detailed in table A.1 below. Search qualifiers have the following action:

• allintitle - restricts the results to those with all of the query words in the

title. For instance, [allintitle: google search] will return only documents

that have both “google” and “search” in the title. Without this limitation

all the search strings listed above return in excess of 100,000,000 hits,

many of which were irrelevant.

• intitle - restricts the results to documents containing that word in the

title. For instance, [intitle:google search] will return documents that

mention the word “google” in their title, and mention the word “search”

anywhere in the document (title or elsewhere).

• date - all searches were limited to pages published between 01/01/2010

and 01/05/2016.

Search Engine

Google was used for all searches. No other search engines were explored.

Language

Only English was used in these searches. Searching in additional languages is outside of the resources of this project. Atlases that were identified in the searches but are not published in English were still extracted.

Eligibility Criteria

Hits were selected for data extraction if they met the following criteria:

135 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

• contained a visual geographical map of cancer incidence, mortality,

survival or risk (either pdf, static image or interactive web interface).

• were accessible without a password or log in.

• were published or updated on or after the 1st of January 2010.

Table B.1: Search string development testing log

Search String Hits Date Updated Hits

1a Cancer AND map* OR Atlas 260 × 106 22/10/15 12/5/16 126

1b intitle:cancer AND map* OR atlas 109 × 104 23/10/15 12/5/16 120 ×

103

1c allintitle:cancer AND map* OR 2,620 23/10/15

atlas

1c allintitle:cancer AND map* OR 2,620 23/10/15

atlas -campus

1c allintitle:cancer AND map* OR 2,490 23/10/15

atlas -campus -kinase

1c allintitle:cancer AND map* OR 189 23/10/15

atlas -campus -kinase

1c restricted to publications after

1/1/2010

1c allintitle:cancer AND map* OR 182 23/10/15 12/5/16 31

atlas -campus -kinase -kinases

1c restricted to publications after

1/1/2010

1d allintitle:cancer AND map OR 7,160 23/10/15 12/5/16 122

mapping OR atlas -campus -kinase

-kinases

136 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Search String Hits Date Updated Hits

1d restricted to publications after 623 23/10/15

1/1/2010

1d restricted in the past 2 yrs 327 23/10/15

1d restricted to publications in the 149 23/10/15

past 12 months

1d restricted to publications in the 23 23/10/15

past month

1d restricted to publications in the 4 23/10/15

past week

1e allintitle:cancer AND map OR 6,960 29/10/15 122

mapping OR atlas -campus -kinase

-kinases -concept

1e restricted to publications after 625 29/10/15

1/1/2010

1e restricted to publications after 333 29/10/15

29/10/2013

1e restricted to publications after 155

29/10/2014 (past yr)

1e published in the last month 20 29/10/15

(29/09/2015

1f allintitle:cancer AND map OR 7,110 23/10/15

mapping OR atlas -campus -kinase

-kinases -concept

2 allintitle:Oncology AND map OR 8,250 23/10/15

mapping OR atlas -campus -kinase

-kinases -concept

3 allintitle:Spatial cancer statistics 75 23/10/15

137 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Search String Hits Date Updated Hits

4 allintitle:spatial OR geographic 1,030 30/10/15

AND cancer AND variation OR

distribution

4a restricted to pages published after 90 29/10/15

01/01/2010

5 allintitle:Bayesian AND cancer 0 23/10/15

AND Map OR atlas OR mapping

6 allintitle: thematic AND cancer 0 23/10/15

AND Map OR atlas

7 allintitle:Spatial AND 0 23/10/15

epidemiology AND cancer AND

map OR mapping

7a intitle:Spatial AND epidemiology 12,200 30/10/15

AND cancer AND map OR

mapping OR atlas -campus

7b restricted to pages published after 1,880 23/10/15

1/1/2010

8 intitle:cancer AND atlas 258,000 9/11/15

8a restricted to publications between 39,800 9/11/15

01/01/2010 to 09/11/2015

8b -genome 24,200 9/11/15

9 intitle:atlas AND cancer 207,000 11/11/15

9a restricted to publications between 19,500 11/11/15

01/01/2010 to 09/11/2015

9b -genome 18,200 11/11/15

138 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

B.2 Database of identified cancer atlases

Table B.2: Title, url and key for each cancer map identified

Key Map Title URL

1 All Ireland Cancer Atlas 1995-2007 http://www.ncri.ie/publications/ cancer-atlases

2 Breast Cancer Mortality in Canada http://www.ehatlas.ca/ light-pollution/maps/

breast-cancer-mortality

3 Globocan 2012: Estimated Cancer http://globocan.iarc.fr/Pages/

Incidence, Mortality and Prevalence Map.aspx

Worldwide in 2012

4 The Cancer Atlas http: //canceratlas.cancer.org/data/#?

view=map&metric=INCID_ALL_M

5 Global Cancer Map http://globalcancermap.com/

6 Spatio-Temporal Atlas of Mortality in http://www.geeitema.org/AtlasET/

Comunitat Valenciana index.jsp?idioma=I

7 United States Cancer Statistics: An https://nccd.cdc.gov/DCPC_INCA/

Interactive Cancer Statistics Website

8 MapNH Health http://www.mapnhhealth.org/

9 Pensylvania Cancer Atlas http://www.geovista.psu.edu/ grants/CDC/

10 NCI Geoviewer (NIH GIS Resources https:

for Cancer Research) //gis.cancer.gov/geoviewer/

11 Longer Lives Healthier Lives

12 Lung Cancer Map - Global Lung http://www.lungcancercoalition.

Cancer Coalition org/e-atlas/

139 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Key Map Title URL

13 Environmental Facilities and Cancer https://apps.health.ny.gov/

Mapping statistics/cancer/environmental_ facilities/mapping/map/

14 An Atlas of Cancer in South Australia https://www.cancersa.org.au/ assets/images/pdfs/An%20Atlas%

20of%20Cancer%20in%20South%

20Australia%20-%20Full%20Report.

pdf

15 Bowel Cancer Australia Atlas http://www.bowelcanceratlas.org/

16 Epidemiologisches krebsregister http://www.krebsregister.nrw.de/

Nordhein-Westfalen index.php?id=116

17 Helseatlas - Dagkirurgi, 2011 - 2013 http://www.helse-nord.no/

(Skulderkirurgi) helseatlas/atlas.html

18 Cancer Incidence in Switzerland http://www.nicer.org/ NicerReportFiles2015-2/EN/

report/atlas.html?&geog=0

19 Age Adjusted Invasive Cancer http:

Incidence Rate: All Sites: 2011 //mcriaweb.col.missouri.edu/IAS/

(experimental dashboard) dataviews/report?reportId=13& viewId=3&geoReportId=62&geoId=1&

geoSubsetId=

20 CINA+ Online Cancer in North http:

America //www.cancer-rates.info/naaccr/

21 The Environment and Health Atlas of http://www.envhealthatlas.co.uk/

England and Wales eha/Breast/

22 UK Cancer e-atlas NCIN

23 Map of Cancer Mortality Rates in Spain http: //elpais.com/elpais/2014/10/06/

media/1412612722_141933.html

140 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Key Map Title URL

24 The Florida Prostate Cancer Atlas http: //prostatecanceradvisorycouncil.

org/categorynews/

florida-prostate-cancer-atlas-2014/

#prettyPhoto

25 Atlas of Cancer in Queensland https: //cancerqld.org.au/research/

queensland-cancer-statistics/

queensland-cancer-atlas/

26 Atlas of Childhood Cancer in Ontario http://www.pogo.ca/ research-data/pogo-atlas/

27 Atlas of Cancer Mortality in the http:

European Union and the European //www.iarc.fr/en/publications/

Economic Area 1993-1997 pdfs-online/epi/sp159/ AtlasCancerMortalityEU-10.pdf

28 National Cancer Registry of Ireland - http://www.ncri.ie/data/maps?

Cancer Atlas field_cancers_tid_selective=61

29 Geographic Variation in Primary Liver http://www.ncin.org.uk/

and Gallbladder Cancer publications/data_briefings/ liver_and_gall_bladder

30 Cancer Atlases of UK and Ireland http://www.ons.gov.uk/ons/rel/ cancer-unit/

cancer-atlas-of-the-united-kingdom-and-ireland/

1991---2000/index.html

31 Cancer Mortality Maps (US) http://ratecalc.cancer.gov/ archivedatlas/

32 Cape Cod Breast Cancer Atlas http://silentspring.org/ cape-cod-atlas-breast-cancer-incidence

141 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Key Map Title URL

33 Arizona Cancer Rates by Community http:

Health Analysis Area (CHAA) > 2005 - //www.azdhs.gov/preparedness/

2009 public-health-statistics/ cancer-registry/chaa/index.php

142 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Table B.3: General details of each map identified in grey literature review

pub data publisher key date range publisher type country coverage

1 2011 1995 - National Cancer Gov Ireland Multiple

2007 Registry Ireland - Nations

Cancer Atlas

2 2012 2012 Social Sciences and Gov Canada Nation

Humanities

Research Council of

Canada & Canadian

Institutes of health

Research

3 2015 2012 WHO & IARC NFP Global Global

(international

Agency on Research

on Cancer)

4 2014 2000 - The American NFP & Global Global

20111 Cancer Society & Gov

International

Agency of Research

on Cancer

5 2012 2008 www.pri.org NFP Global Global

6 2008 1987 - Valencia health Gov Spain State

2006 department

7 unknown 1999 - Centers for Disease Gov US Nation

(after 2012 Control and

2012) Prevention

8 unknown 2001 - MapNH Health Gov US state

2011

1date of data depends on the cancer type

143 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

pub data publisher key date range publisher type country coverage

9 2007 1994 - Penn State Milton S. Hospital US State

2002 Hershey Medical &

Center ( Hospital Research

and teaching

hospital)

10 1/10/15 2008 - National Cancer Gov US Nation

2012 Institute (GIS

Resource Centre for

Cancer Research,

NIH

11 unknown 2012 Public Health Gov England Nation

England

12 2014 2012 Global Lung Cancer NFP Global Global

Coalition

13 2013 2005 - New York State Gov US State

2009 (Department of

Health)

14 2012 2 Cancer Council SA NFP Australia State

15 2014 2011 - Bowel Cancer NFP Australia Nation

2013 Australia

16 2013 2002 - ? not published in Gov Germany Nation

2011 English

17 unknown3 2011 - unknown4 unknown5 Norway Nation

2013

2depends on cancer type 3not published in english 4not published in english 5not published in english

144 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

pub data publisher key date range publisher type country coverage

18 2015 2008 - NICER Foundation - NFP Switzerland nation

2012 National Institute

for Cancer

Epidemiology and

Registration

19 2012 1996 - Missouri Cancer NFP US State

2011 Registry and

Research Centre

20 2015 2008 - NAACR - North Gov US Nation

2012 American

Association of

Central Cancer

Registries

21 2014 unknown Small Area Health NFP England Multiple

Statistics Unit (UK) + Wales Nations

22 2011 2008-2011 NCIN (National Gov UK

& 2009- Cancer Intelligencen

20116 Network

23 2014 2004 - elpais news and Spain

2008 media

organisation

24 post 1998 - Florida Prostate advisory/ US State

2012 2007 Cancer Advisory advocacy

Council org.

25 2011 1998 - Cancer Council NFP & Australia State

2007 Queensland Gov

organisations

6* differs by cancer type

145 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

pub data publisher key date range publisher type country coverage

26 2015 1985 - Pediatric Oncology NFP Canada State

2004 Group of Ontario

27 2008 1993- WHO - NFP/ EU Nultiple

1997 International Gov Nations

Agency for Research

on Cancer

28 unknown 1994 - National Cancer Gov Ireland Nation

2012* Registry of Irelands

29 2010 1998 - Publich Health Gov UK Nation

2006 England ( National

Cancer Intelligence

Network)

30 2000 1991 - Office of National Gov UK + Multiple

2000 * Statistics Ireland Nations

31 1999 1950 - National Institute of Gov US Nation

19947 Health & National

Cancer Institute

(US)

32 unknown 1982 - Silent Spring NFP US

1994, Institute

1995 -

2012

33 unknown 2005 - Arizona Gov US State

2009 Department of

Health Services

7differs by cancer type

146 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Table B.4: Report Measures of each map identified in grey literature review

incidence/

mortality/death/

key report measure counts/ other details

1 Incidence Ratio / SIR (standardised incidence ratio)

Incidence Relative Relative Risks (Smoothed Age

Risk Standardised incidence ratios)

2 rate per 100,000 Mortality Rate Crude Mortality Rate per 100,000

3 age adjusted Incidence Rate / Age adjusted rate incidence per

rate per 100,000 Mortality Rate/ 100,000 (incidence, mortality and

Prevalence prevalence)

4 age adjusted Incidence Rate / Sge adjusted rate per 100,000 people,

rate per 100,000 Survival Rate incidence and survival

5 New cancer Incidence Rate / Age adjusted new Cancer cases

cases per Mortality Rate annually per 100,000 (incidence and

100,000 (age mortality )

adjusted)

6 na Mortality Rate / Spatio-Temporal Standardised

Mortality smoothed Mortality Ratio and

Probability Probability (Excess Risk) * no further

details available on how measure was

calculated

7 age adjusted Incidence Rate / Incidence rate (number of new cancer

new cancer Death Rate * cases), Death rate (number of deaths

cases per /Incidence Count / due to cancer), incidence count and

100,000 Death Count death count . Rates are per 100,000

persons, per year and are age adjusted

to the 2000 U.S standard

population(19 age groups)

147 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

incidence/

mortality/death/

key report measure counts/ other details

8 age adjusted Incidence Rate Projected (2020 & 2030) age adjusted

rate per 100,000 (Projected) cancer incidence per 100,000

9 age adjusted Incidence Rate age adjusted incidence per 100,000

rate per 100,000

10 age adjusted Incidence Rate / age adjusted annual incidence rates &

rate per 100,000 Death Rate Age adjusted death rates (per 100,000)

11 na Mortality Rate Age standardised premature mortality

per 100,000. legend indicates if

mortality rates are significantly

different ot the national average - no

further details is provided about the

methods used to calculate these rates.

12 count, age Incidence Rate / counts and age standardised rates of

adjusted rate Incidence Count / lung cancer incidence and mortality

per 100,000 Mortality Rate (death from cancer.

(death from cancer)

/ Survival Rate.

13 Below or above Incidence five year cancer counts and and

expected (unknown) indication of if the rate is below

expected or above expected.

14 age adjusted Incidence Rate Age Standardised incidence rates per

rate per 100,000 100,000 people (ASR per 100,000)

15 unknown Incidence % of cases in population. Measure is

indeciferable, no titles on the graphs

or legend and units associated with

the key.

148 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

incidence/

mortality/death/

key report measure counts/ other details

16 age adjusted Incidence Rate Age adjusted Incidence rates per

rate per 100,000 100,000

17 age adjusted Incidence Rate Age adjusted Incidence rates per

rate per 100,000 100,000

18 age adjusted Incidence Rate / Age standardized incidence/mortality

rate per 100,000, Incidence Crude rate per 100,000 per year (Crude rate +

crude rate, Rate/ Incidence ASR + number fof cases)

number of cases Counts / Mortality

rate (cancer deaths)

19 age adjusted Incidence Rate / Age-Adjusted Invasive Cancer

rate Incidence Counts Incidence Rate (presumably per

100,000 persons, but this is not stated

on the map. (crude numbers also

reported)

20 age adjusted Incidence Rate / Age adjusted incidence rate per

rate per 100,000, Incidence Counts 100,000 (also reported - area

crude rate, population, cases, crude rates, age

number of adjusted rates.

cases, cases

21 age adjusted Incidence Ratio Relative Risk (Age Adjusted incidence

relative risk (Relative Risk ratio)

22 age adjusted Incidence Rate / Age standardised

rate per 100,000 Mortality Rate incidence/mortality/survival rate per

(cancer deaths) / 100,000.

Survival Rate (for

some cancers)

149 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

incidence/

mortality/death/

key report measure counts/ other details

23 na Mortality Counts / Mortality, but no methods are

Mortality Relative provided. Appears to be a Relative

Rate (Mortality Mortality Rate - legend shows

Risk) mortality as a % risk greater or lower

than the average.

24 age adjusted Incidence Rate Age adjusted incidence rate per

rate per 100,000 100,000

25 standardised Incidence Ratio / Smoothed SIR (standardised incidence

incidence ratio Mortality Rate ratio) and Smoothed RER (Relative

(SIR), relative Excess Risk) Incidence & Mortality

excess risk

(RER)

26 age adjusted Incidence Rate / Age standardised incidence rates per

rate per 100,000 Mortality Rate / 100,000 children 0 - 14yrs. + Age

Survival Rates standardised mortality rate (ASMR) +

Age adjusted surival rates

27 na Mortality Rate Age standardised mortality rate per

100,000 + RRS (relative risk standard

deviation)

28 standardise Incidence Ratio SIR (Age standardised ratios)

(age) incidence (Relative Risk) / observed incidence (counts) are also

rations (SIR), Incidence Counts provided

counts

29 age adjusted Incidence Rate Age standardised incidence rates per

rate per 100,000 100,000

150 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

incidence/

mortality/death/

key report measure counts/ other details

30 standardised Incidence Ratio / SIR - Ratio of directly

incidence ratio Mortality Ratio age-standardised rate in health

(SIR), counts authority to overall UK + Ireland

average

31 na Mortality Rate Mortality rates per 100,000 people

(areas with sparse data were not

reported)

32 standardised Incidence Rate / SIR (standardised incidence ratio) &

incidence ratio Mortality Rate SMR (standardised mortality rate)

(SIR)

33 age adjusted Incidence Rate Age Adjusted incidence rates per

rate per 100,000 100,000

151 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Table B.5: Modelling methods for each map identified in grey literature review

smoothing key co-variates modelling methods method

1 socio-economic Indirect Age Standardised adjusted BYM9

indicators for age structure of each small area

discussed in region 8

methods

2 nil Mortality rate per 100,000. Not nil

adjusted for age, only crude rates

reported.

3 nil Incidence: Sex and age specific nil

incidence rates. Mortality: varied

depending on country. Prevalence:

Sex and cancer adjusted.

4 Risk factors, actions nil

being tacken to

address cancer

across the world

5 nil Age adjusted incidence and nil

mortality rates per 100,000

6 nil no information provided no information

provided

7 age-adjusted incidence/death rates nil

8for more details see - http://www.ncri.ie/atlas/232-spatial-analysis-and-smoothing 9Besage et al. (1991)

152 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

smoothing key co-variates modelling methods method

8 Population, obesity, projected, age adjusted cancer nil

Tobacco use, births, incidence rates per 100,000.

poverty, diabetes,

alzheimers disease.

But the relationship

with cancer and

these diseases is

not explored.

9 race, age, cancer age adjusted rate per 100,000 nil

stage, time period.

10 crowding, age adjusted annual incidence rates nil

education, income, (per 100,000). data is supressed if the

insurance, counts are smaller than 16.

population,

poverty, mobility,

workforce.

11 nil Age standardised rates nil

12 mortality, incidence Age standardised incidence & nil

& survival mortality rates per 100,000 per

country.

13 environmental indirectly adjusted for age, and sex nil

facilities assuming equal risk throughout the

state of Nwe York.

14 smoking, alcohol age adjusted incidence rate per nil

consumptio, 100,000

obesity,

socio-economic

status, rurality

153 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

smoothing key co-variates modelling methods method

15 nil no information provided on nil

methods used.

16 nil Age standardised incidences rates Unknown10

(per 100,000).

17 Maybe (no in Age standardised incidences rates nil

english) (per 100,000).

18 language regions Age standardized incidence and nil

mortality rates. Mortality rates are

based on cause of death database.

19 nil age-adjusted Cancer Incidence Rates nil

20 nil Age adjusted rate per 100,000 nil

21 nil Age Adjusted rates - Bayesian BYM12

Hierarchical model11

22 nil Age standardised nil

incidences/mortality rates (per

100,000).

23 nil No methods provided No methods

provided

24 Treatment rates (by Age adjusted rate per 100,000. nil

treatment type),

ethnicity ,

Urban/non

Hospital service

areas, deprivation,

economic , rurality,

education. + others

10not in english (German) 11http://onlinelibrary.wiley.com/doi/10.1002/env.571/abstract 12Besage et al. (1991) , Fairley et al. (2008)

154 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

smoothing key co-variates modelling methods method

25 Socioeconomic Age adjusted standardised rates Incidence:

categories, rurality using bayesian Hierarchical BYM13.Survival:Poisson

modelling - With CAR prior for piecewise with

spatial smoothing BYM

components14

26 nil Age standardised incidence rates nil

per 100,000 children 0 - 14yrs. Age

standardised mortality rate

(ASMR).The Brenner method for

Survival rates, modelled after period

life tables.

27 nil Age standardised mortality rates . regional

Average annual rates per 100,000 variation:1.

population, directly age Poisson-gamma

standardised to world standard model (1

population (SICE, 1964) unstructured

random effect, no

spatial structure)15

2. Multilevel

model with 3

geographic

hierarchies (no

spatial structure)16

28 nil Age Standardised Incidence Ratio no information

provided

13Besage et al. (1991) 14Fairley et al. (2008) 151. Pennello et al. (1990) 16Similar to Langford et al (1990)

155 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

smoothing key co-variates modelling methods method

29 nil Age Standardised incidence rates nil

per 100,000

30 socioeconomic Ratio of directly age-standardised nil - no

deprivation is rate over UK and Ireland average. information

mentioned but not provided

analysed in this

report, or shown in

the map.

31 nil Age -adjusted cancer mortality rates nil

per 100,000 (binomial approximation

of the age adjusted rates were

calculated)17

32 nil Standardised Incidence Ratio No geographical

adjusted for age structure of region smoothing.

(areas with sparse data were not Smoothed

reported) temporaly

however no

details provided

33 Indian population age standardised rate per 100,000 nil

(yes/no)

17further detail of the methods can be found here - http://ratecalc.cancer.gov/archivedatlas/pdfs/ text.pdf

156 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Table B.6: Uncertainty information within each map identified in grey literature review

uncert. uncert. uncert.

key included measure visualisation notes

1 no na na na

2 no na na na

3 no na na na

4 no na na na

5 no na na na

6 no na na

7 yes CI18 CI bar, CI Confidence Interval upper and

bounds data lower bounds reported in data table.

table CIs also appear in mouse tip

function when rolling over a region.

8 no na na na

9 yes CI, box plot, CI box plot of overall data, Confidence

interquartile bar in tool tip, Interval of each estimate visualised

range, CI bounds in when hover over estimate on

variance data table, scatterplot. Upper & Lower bound

of CI reported in data table, range of

data shown on scatterplot

10 no na na na

11 no na na na

12 no na na

13 no na na na

14 no na na na

15 no na na

16 no na na

17 yes Confidence CI Bar 95% CI shown in barchart of

Interval estimates.

18Credible/Confidence Interval

157 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

uncert. uncert. uncert. key included measure visualisation notes

18 yes Credible error bar, CI 95% CI show on barchart of

Interval bounds in data estimates. Numeric upper and lower

table, bounds appear when the tool tip

hovers over the barchart.

19 yes CI, CI bar 95% Confidence Intervals shown on

interquartile barchart. Interquartile ranges shown.

range Statistically significant difference to

state average indicated. (no details

on the measure of statistical

significance)

20 yes CI nil 95% CI bounds reported in a table

alongside the map. No uncertainty

shown on map directly

21 yes Credible CI bar 95% Credible intervals on line graph

Interval of region vs relative risk (ascending).

Bayesian methods that incorporate

uncertainty

22 yes CI, statistical 1. Shows confidence intervals of

significance estimate vs region graph. 2. shows a

symbol for areas that are statistically

significantly different from the

national average (Standardised

incidence rates)

23 no na na

24 yes indication of no colour small counts = fewer than 10

small sample applied to (excluded for privacy reasons) or

size areas of small few than 25 (unstable areas.

samepl size.

158 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

uncert. uncert. uncert.

key included measure visualisation notes

25 yes Credible box plot, CI Credible Intervals shown on graph

Interval line of relative risk vs region, box plots

shown for socioeconomic and

rurality

26 yes small sample shading Shading indicates which census

size divisions have less than 30 cases

27 yes distribution, box plot, box plots, full distribution of

standard distribution, incidence data shown, + RRSD

deviation, (relative Risk Standard deviation)

interquartile

range

28 no na na

29 no na na na

30 no error bars nil not in map, but error bars are

present on supplementary graphs

31 no small sample nil areas with sparse data not reported

size

32 yes statistical nil Reported if the estimated

significance Standardised Incidence Ratio

difference from normal was

statistically signficant at a p-value

(0.05) in graph.

33 yes CI CI bar Confidence intervals shown on

barchart

159 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Table B.7: Technology platforms used to generate cancer maps identified in grey literature review

Programming language (where

key Technology platform applicable)

1 ESRI ArcMap 9.3 (Part of the ESRI na

ArcGIS Desktop suit) used to

generate pdf

2 Custom Built jpeg + javascript

3 Custom Built - D3.js d3.js + javascript

4 Custom Built javascript + CSS + google maps

API

5 Custom Built modest maps + javascript +

mapbox

6 Custom Built could not determine

7 InstantAtlas na

8 Custom Built - D3.js d3.js + javascript + GIS data

cabailities

9 GeoVista Outputs to a flash application

10 ESRI ArcMap (Part of the ESRI

ArcGIS Desktop suite)

11 Custom Built Javascript + Googlemaps api

12 Googlemaps based visualisation

13 Googlemaps based visualisation

14 InstantAtlas na

15 InstantAtlas na

16 InstantAtlas na

17 InstantAtlas na

18 InstantAtlas na

19 InstantAtlas na

20 Custom Built javascript

160 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW

Programming language (where

key Technology platform applicable)

21 Custom Built - D3.js +leaflet Professional (javascript + d3.js +

leaflet + html)

22 InstantAtlas na

23 pdf or infographic na

24 pdf na

25 pdf na

26 pdf na

27 pdf na

28 pdf na

29 pdf na

30 pdf na

31 pdf na

32 pdf na

33 InstantAtlas na

161

Appendix C

Research Activity 1.B - User-centred de- sign for uncertainty communication

C.1 Project Partners Workshop

C.1.1 Workshop Programme

Time Activity

10:00 - 10:20 Welcome & stakeholder introductions

10:20 - 10:40 Why is communicating uncertainty an important problem?

10:40 - 11:00 Morning tea

11:00 - 11:30 Who are the audiences of the Australian Cancer Atlas and

what are their characteristics?

11:30 - 12:00 Grouped by the level of information detail they require?

12:00 - 1:00 Lunch

1:00 - 2:00 What will the Atlas report (output measure or measures)?

2:00 - 3:00 Diganosing uncertianty sources in the Australian Cancer

Atlas

3:00 - 3:45 Groups present uncertainty sources back to the total group

4:00 Close

163 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

This appendix summarises the outcomes of the Uncertainty Communication Design

Workshop that was held with the project partners of the Australian Cancer Atlas project.

Date: September 9, 2015, 1pm - 4pm

Attendees: Kerrie Mengersen, Fiona Harden, Peter Baade, Joanne Aitken, William Watson,

Tomasz Bednarz, Jessie Roberts

Topics discussed through this workshop:

1. Why is communicating uncertainty an important problem?

2. Who are the Audiences of the Australian Cancer Atlas and what are their character-

istics?

3. Can these audiences be grouped by the level of information detail they require?

4. What will the Atlas report (output measure or measures)?

5. What are the sources of uncertainty within the Atlas?

C.1.2 Discussion 1: Why is Communicating Uncertainty an Important Problem

Workshop participants were asked to consider why communicating uncertainty is an important problem in the context of: the Australian Cancer Atlas, geospatial health statistics or disease mapping, and within science communication generally. Results of this discussion are summarised below.

Science and Science Communication Generally?

• Important in evaluating the quality of scientific research and the reliability of data

driven insights.

• Essential in comparing the accuracy of research methods and comparing between

similar studies. highlighting areas of uncertainty guides future research priorities by

focusing on gaps in our current state of knowledge.

• Important in evaluating the effectiveness, reliability and performance of new tech-

nology and methodologiess.

• Supports valid interpretations of results and applications of insights to real world

settings.

164 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• Lack of uncertainty can degrade the general public’s trust in scientific insights

and degrade the reputation of science generally. When widely accepted ‘facts’ are

questions due to a new scientific discoveries, the public does not know who to trust.

Hidding uncertainty, hides the inaccuracies in all scientific discoveries and present

the scientific process to be more solid than it actually is.

• The clear communication of uncertainty would better inform the general audience

of the scientific process.

• Changes in uncertainty demonstrates if our knowledge and methods are improving

over time (is our uncertainty reducing over time).

Geospatial Data and Disease Mapping

• Creating a map of modeled disease occurrence or risk can present estimates as more

certain and accurate than they may actually be. This can be particularly misleading

and lead to suboptimal decision making.

• Data aggregation decisions can influence final model outputs. Uncertainty can be

useful tool in both evaluating which decisions lead to the most accurate results and

also can make these inaccuracies more transparent. the Australian Cancer Atlas

• The small sample sizes present in some of the regions within the Cancer Atlas can

lead to uncertain estimates. while the model outputs may be our best estimate, it is

important that decision makers understand these uncertainties.

• Provides a guide to applying these insights to policy developments. Informs deci-

sions makers regarding the accuracy and reliability of estimates.

• Uncertainty is important in applying the regional generalisations from the atlas to

individual situations.

• Uncertainties support the need for future research in cancer outcomes and can help

prioritize research projects.

165 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• Inclusion of uncertainty enhances the research output of the atlas. - tells the whole

story and communicates clearly our current state of knowledge about inequalities in

cancer incidence and survival in Australia.

• Provides examples of uncertainty communication methodologies for other Cancer

Councils.

C.1.3 Discussion 2: Who are the Audiences of the Australian Cancer Atlas and what are their characteristics?

Workshop participants identified the following eight target audiences as important to the

Australian Cancer Atlas.

1. General Audience/ General Public

2. Media

3. Gov, lobby groups and health policy makers and advisors

4. Health managers

• Regional

• Local

5. Clinicians

6. Cancer patients and their carers, family or supporters

7. Researchers

8. Other Cancer Councils and Health Reporting organisations

For each of these audiences, participants then defined the characteristics of each, in terms of:

• The key messages we want to communicate to them through the Atlas?

• What decisions or questions is each audience trying to answer when exploring the

Atlas?

• What is the skill levels of each audience (in terms of formal statistical training and

analytical skills)?

166 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• What potential risks are there when including uncertainty in the Atlas (misinterpre-

tations, disregard info, etc)?

• What potential benefits could each audience gain from including uncertainty in the

Atlas?

• What level of interest does each audience have in uncertainty?

General Audience

Key Messages:

• highlight the regions with variation in cancer incidence and survival.

• Show any relationship between cancer risk/survival and socio-demographic or

rurality variables.

Knowledge & Skills:

• Formal Statistical training: low

• Analytical skills: Low

Decisions or Questions

• How does my region compare to other regions in Australia.

• What are the reasons for areas of low or high risk?

• have I ever lived in an area of high risk?

Interest in uncertainty?

• minimal - most probably not aware of the presence of uncertainty.

Risks of including uncertainty

• Key messages could be lost in information overload.

• too complex/difficult graph to interpret . There is a risk the audience will disengage.

167 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Benefits of including uncertainty

• May calm an over-reaction to high risk regions.

Media

Key Messages

• Simple, short, graphs, infographics that are accurate and sharable.

• Where is there the greatest variation in cancer outcomes geographically. Are there

any reasons why these areas have greater variation?

“this is really important work”

“this is innovative work”

• clearly explain the uncertainty in any high risk regions. Provide examples and words

they can use to embed the uncertainty into their media messaging.

Skills

• Formal Stats training: low

• Analytical skills: low to medium

• May have some specialist training

Decisions & Questions

• Looking for a hook

• Is this newsworthy?

• Where are the highest risks?

• What is the government doing about these inequalities in cancer outcomes?

• What resources are available for people most at risk or with the highest needs?

Interest in Uncertainty

168 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• averse to uncertainty

• Confuses the hook. looking for simple, clear news stories.

• Risks of including uncertainty

• Misinterpretation or misrepresentation?

• May misinterpret uncertainty for poor quality research?

Benefits of including uncertainty

• General promotion and education about uncertainty.

• may reduce anxiety in small regions with high risk incidence. e.g. “A high risk in

Mackay doesn’t mean that everyone in Mackay will get cancer.”

Government, lobby groups, health policy makers and advisors

Key Messages

• Are the current cancer treatment, screening and support services sufficient?

• Are there inequalities and if so, where?

• Are the government programs working? Jas there been a change over time?

• How does their jurisdiction compare to others?

• What are the highest priority interventions and researcj for the future.

• So what -> how best to translate these insights into policy.

• What are the most pressing inequalities in cancer outcomes and are there any recom-

mendations for addressing these?

Skills

• Formal Stats Training: low to medium

• Analytical skills: medium to high

• Other: mostly communication and decision making skills not statistical

Decisions and Questions

169 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• What can we do to improve survival rates and reduce inequalities in cancer out-

comes?

• Can we show our current or recent health services are creating change?

Interest in uncertainty

• uncertainty can be confusing and can hinder or slow down decision making. Often

viewed as a bad thing and decision makers would generally want to see a definite

number.

• May not know how to apply the uncertainty in their current decision making frame-

work.

• Could be seen as a valuable tool if presented the right way or if they have had

sufficient training.

Risks of including uncertainty

• Information may be regarded as of poorer quality if uncertainty but of greater long

term benefit because policy will be developed for better future outcomes.

• May lead to decision paralysi.

Benefits of including uncertainty

• Better represents our current state of knowledge

• Uncertainty may help quantify how much money should be spend on a program

and when. May be very valuable in designing milestones and clarification points for

a health program. May mean policy decisions are made that embed flexibility when

the current state of knowledge contains uncertainty.

Other

• Need to ensure that the scientific evidence provided can inform the decision making

process.

• This audience will have many competing priorities.

• Likes to be able to show improvement over time.

170 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Cancer Patients/Survivors and their family, carers and friends

Key Messages

• Insights at their community.

• May be interested in additional informaiton about where can they access services,

support and information

Decisions and Questions

• What are the benefits of my different treatment options and ancillary side effects?

• What is the best treatment available to me?

• What have other people with my cancer diagnosis, and/or in my region,s done?

What services did they access? What treatment did they have?

• Is the risk of survival lower or higher in my region?

• what treatment options are available to be in my regions?

• What resources are available in my region?

• How far do I have to travel for my treatment?

• What resources are available to me in my community?

• Is there a lower than average risk of survival in my region? If so why? What can I

do about it?

• How does my community compare to other similar communities? (in the same peer

group)

• Looking for more accurate information to replace “Dr Google”

Skills and Knowledge

• Formal statistical training: overall low, but highly varied

• Analytical skills: overall low, but highly varied

• Will be looking for more accurate information to replace Dr Google.

Interest in uncertainty

171 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• How likely is my treatment to be unsuccessful/ successful

• could provide comfort for people living in high risk areas.

• May affect their life and their treatment choices.

• uncertainty about life and treatment options will lead to high anxiety.

• Risks of including uncertainty in the Atlas.

• The uncertainty may create greater physical and emotional stress for the patient and

their family. Difficulty of the unknown and not having a clear right answer.

Benefits of including uncertainty in the Atlas

• may enable more informed decisions about how they manage their treatment.

• may provide comfort if they live in an area of high risk. (for example if their family

also live in the same region).

Researchers

Key Messages

• Here are the gaps in our knowledge.

• Here is the uncertainty in our outputs.

• The methods we used for developing these disease maps are accurate and robust.

• The methods we used to communicate the uncertainty are clear and accurate.

• Our methods of communicating /representing uncertainty have been successful and

are accessible to non-expert audiences and decision makers.

• our research is awesome and our methods robust. !!!

Decisions and Questions

• What is the quality/accuracy/uncertainty of the estimates?

• Are the inferences made from the data appropriate.

• What is the current state of knowledge in this area, current best practice?

172 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• What are the gaps in the current knowledge, how can this research relate to my

research.

• Are these methods applicable to my area of research?

Skills & Knowledge

• Formal statistical training: high

• Analytical skills: high

• What does uncertainty mean to this audience?

• Highlights the quality of the research.

• Highlights where future research should focus.

• Guides the application of the scientific insights to real world practice

Interest in uncertainty

• High

Risk of including uncertainty information

• minimal

Risks of excluding uncertainty information

• excluding uncertainty information can give a false representation of our current state

of knowledge. This could result in important research problems of knowledge gaps

being missed because our knowledge us presented more certain than it is.

• Inaccuracies are missed and future research is misguided.

• missed opportunities for research and for patient outcomes.

Benefits of including uncertainty information

• Clear spotlight on future research opportunities.

• Clear support for the need of research they may be applying for funding for.

173 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Health Managers (Regional and Local)

Key Messages

• Where are the demands for services greatest?

• ‘These’ regions need to focus on these support services.

• Quantify what the needs of their region are.

• These are the services available in your region.

Decisions and Questions

• How do I budget and allocate resources to best meet the needs of residents in my

region.

• How does my region /jurisdiction compare to other regions in Australia? Bet-

ter/worse/same.

• What services are available in my region and what services should I be advocating

for?

• Are there any shortfalls in screening or support services in my regions?

• Do I need to budget any extra services to meet the needs of this group?

• Are these results what I expect? better/worse/the same?

Skills

• Formal statistical training: medium

• Analytical skills: medium

Interest in uncertainty

• low to medium

Risks of including uncertainty in communications

• Confusing or difficult to understand (time poor audience)

174 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Risks of excluding uncertainty information

• State of knowledge and information appear more accurate than they actually are.

• Recommendations and advice to patients could be represented as more solid than it

actually is.

• high or low risk in their region may be interpreted as more certain or accurate than

it actually is. Leading to over/under prescription.

Benefits of including uncertainty in communications

• Helps ensure that health strategies and spending are meeting real needs

• Optimise cashflow (reduce the risk of spending money when the estimates/insights

are not reliable)

Clinicians

(Similar to health managers)

Key Messages

• Information on the needs of the region they work in.

• Type of services available, and should be provided to this patient group.

• Which Regions have higher than average risk of cancer incidence or lower survival

• Which Regions that have higher needs or are a higher ‘disadvantage’ (due to rurality

or socio-demographic aspects).

Skills & Knowledge

• Formal statistical training: low

• Analytical skills: low

Decisions or Questions

• What services do I need to ensure are available in the region I work.

175 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

• Do I need to promote a higher rate of screening in my region?

• Do I need to promote the services that are available in my region? (e.g support

for travel, or other support, treatment options that might be impacted by travel

challenges).

• Are residents in my region facing greater challenges due to socio-economic or

gepgraphic boundaries?

Interest in Uncertainty

• low to medium - time poor.

Risk of including uncertainty information

• May overwhelm a time poor audience. They may give up on the atlas because the

uncertianty makes it difficult to digest the information quickly.

Benefits of including uncertainty information

• Can clarify the Atlas outputs and ensure that under or over treatment is not pre-

scribed due to estimates appearing more accurate or certain than they actually

are.

C.1.4 Discussion 3: Can these audiences be grouped by the level of information detail they require?

Product Details Audience

Executive Summary short clear statements of insights Media

Map + results results and uncertainty information media, general

presented in a formal accessible to a audience, cancer

non-expert patients & carers

176 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Product Details Audience

Map + numbers + includes technical estimates and cancer patients &

technical measures of uncertainty carers, clinicians,

uncertainty Health policy

advisors, health

managers

Technical report + Contains details of methods, access Researchers

data set to data, other statistical outputs

C.1.5 Discussion 4: What will the Atlas report (output measure or measures)?

What measures could be used in the Atlas? what are the pros and cons or each, and which are applicable to which audiences.

The workshop participants briefly discussed the pros and cons of: Relative Excess Risk,

Incidence, Survival and Crude Probabilities.

Outcome: The discussion highlights that further research was required into the different measures commonly used in cancer maps as well as the need for agreed upon definitions of the terms, as different participants understood the terms differently depending on their domain of expertise.

C.1.6 Discussion 5: Sources of uncertainty within the Australian Cancer Atlas?

Workshops participants were asked to focus only on scientific uncertianty and to consider that this could be present in either the data, the methods, the model or the outputs.

The following table summarises the outcome of these discussions. In general, this process highlighted the need to develop an agreed understanding of what scientific uncertainty was, where it arises from and a template to help diagnose (or layout) uncertainty sources within the Australian Cancer Atlas.

Location Uncertainty Source

Data - Estimated population of each regions (ABS)

177 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Location Uncertainty Source

- Estimated demographic breakdown of each region (ABS)

- Socio-economic status is generalised across the entire

region.

- Classification uncertainty around the cause of death.

- Classification uncertainty around indigenous identification

- Residential address does not contain any info of time at that

residence or region.

Methodologies - Smoothing algorithm

- Model prior distributions (may also be a input rather than a

method)

Model Assumptions - Residential address does not contain any info of length of

time at that residence.

Disagreements - - Spatial smoothing methods model/methods

Outputs - linguistic - Meaning of: probability, uncertainty, risk, cause, correlation, uncertainty random

178 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

C.2 NEUVis Audience Profiles

Table C.4: QI: How does this new knowledge benefit the user?

Audience General Atlas Uncertainty information

1. General Incidence Benefit - more informed More informed,

Audience decision makers. - What issues are understanding of the cancer

important in my area - Does the prevalence and survival of

information on the map resonate residences in their region.

with a personal story?- are actions

being taken that need to be to

address inequalities? Survival

Benefit - being informed.-

inequalities of health care

2. Government, - Tool for campaigning and - A greater understanding of

lobby groups, lobbying to address the needs to how to apply the information

policy makers/ regions. - Being informed of the presented to influence

advisors needs of the region. decisions. I.e. regions with

medium risk & high

uncertainty vs medium risk

and low uncertainty.

3. Health - help target health campaigns for Greater understanding - is

managers (local local area needs. their region a high priority

& regional) for action to address

inequalities and/or

understanding inequalities &

reducing uncertainties

179 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

4. Clinicians Greater understanding of the health

inequalities that may be present to

their patients, and if additional

resources are available to support

them.

5. Cancer Edification, showing they are not

patients (their alone. - does this info align with my

carers or personal story? Am i typical?

family)

6. Other Cancer

Councils and

Health

Reporting

organisations

7. Researchers - Highlights areas of greatest

uncertainties and therefore research

priorities/opps - Justifies extension

of current research. - validates

findings of previous work. -

Highlights research opportunities

that expand knowledge not only

addresses inequalities. - Supports

applications for $ to conduct

research or collect data. - tracking

uncertainty over time can show

how knowledge is improving.

180 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

8. Media - News worthy stories. - being

informed. - topical - regional stories.

- add value to other regional

personal or other stories. - advocate

for new policy. - educate regions/

audiences that could benefit from

greater support/knowledge.

181 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Table C.5: Q2: What about this data is relevant or important to the user in their context?

Audience General Atlas Uncertainty information

1. General - Incidence & Survival estimates

Audience in ‘my area’ - uncertainty -

regional borders

2. Government, - What are the high risk regions.

lobby groups,

policy makers/

advisors

- How has that - What type of action is best

risk changed over (address inequalities or

time. understand risk better)

3. Health - SIR, RER, uncertainty info,

managers (local & regional boundaries, compare

regional) ‘my region’ with socio-economic

& rurality like regions.

4. Clinicians - comparing regions to ‘like’

regions. - learning from others

5. Cancer patients - SIR, - RER, - Risk of Death

(their carers or

family)

6. Other Cancer

Councils and

Health Reporting

organisations

7. Researchers - Uncertainty info, SIR, RER,

Spatial variation, methods used

to model estimates.

182 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

8. Media Cluster, Why?, What is being

done to address this, areas of

high risk, areas of low risk

Table: Q3: What does this data show that is otherwise inaccessible for this user?

183 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

1. General All info shown in the QLD atlas currently unavailable

Audience is currently inaccessible to the

general audience.

2. Government, Same as Aud 1 + :uncertainty Same as Aud 1 +: info about

lobby groups, info, change over time why risk estimates have

policy makers/ different levels of uncertainty

advisors for different regions, sample

size info and change over

time.

3. Health Same as Audience 1 +:ability to -

managers (local & compare 2 regions.

regional)

4. Clinicians Same as Aud 3

5. Cancer patients Same as Aud 1 +:Personal stories -

(their carers or of other cancer patients or their

family) family/carers.

6. Other Cancer --

Councils and

Health Reporting

organisations

7. Researchers The raw data, sample size -

information.

8. Media As above(?) -

184 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Table C.7: Q4: What can this user access for themselves?

Audience General Atlas Uncertainty information

1. General The QLD Cancer Atlas Nil

Audience

2. Government, Same as Audience 1 +: historical -

lobby groups, census data

policy makers/

advisors

3. Health Same as Audience 1 -

managers (local &

regional)

4. Clinicians Same as Audience 1 -

5. Cancer patients Same as Audience 1 -

(their carers or

family)

6. Other Cancer Same as Audience 1 -

Councils and

Health Reporting

organisations

7. Researchers methods, uncertainty info. CIs, standard deviation,

distributions, statistical

significance (p-values)

8. Media Same as Audience 1 +: -

Interviews with researchers &

others.

185 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Table C.8: Q5: What myths/misconceptions are relevant to this data set?

Audience General Atlas Uncertainty information

1. General - High risk = I will get cancer. All regions have the same

Audience Environmental factors are level of accuracy, reliability &

causing high risk regions. All certainty.

regions have the same level of

accuracy, reliability & certainty.

all cancers have the same level of

accuracy, reliability and

uncertainty. All cancer have the

same prevalence (cannot

compare relative risk between

cancers). Correlation does not =

cause. The

knowledge/information

presented is complete. All point

estimates are created equal.

2. Government, Spatial variation is due to

lobby groups, environmental or geographical

policy makers/ factors.

advisors

3. Health Two point estimates are the same.

managers (local & All region estimates are equally

regional) reliable. All city regions are

better than rural regions for all

cancers.

4. Clinicians As above

5. Cancer patients Same as Aud 1 +: Correlation =

(their carers or cause. Statistics defines the

family) individual.

186 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

6. Other Cancer

Councils and

Health Reporting

organisations

7. Researchers na

8. Media - all point estimates are created

equal. Correlation = cause.

Regional variation is due to

some environmental or

geographical factors.

187 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Table C.9: Q6: Potential impact on Audience

Audience General Atlas Uncertainty information

1. General - High risk = I will get cancer. -

Audience Environmental factors are

causing high risk regions. all

regions have the same level of

accuracy, reliability & certainty

all cancers have the same level of

accuracy, reliability and

uncertainty. All cancer have the

same prevalence (cannot

compare relative risk between

cancers). Correlation does not =

cause. The

knowledge/information

presented is complete. All point

estimates are created equal.

2. Government, - - Discredit or degrade info.

lobby groups, choice paralysis. promote

policy makers/ further research, and support

advisors need for investing in further

research. support better

decisions. Show change in

knowledge and reliability of

information over time. choice

paralysis

188 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Audience General Atlas Uncertainty information

3. Health Adds to health managers tacit Uncert info could

managers (local & knowledge of their patients. overwhelm.

regional) Better manage local needs .

Motivate to engage in future

research (provide data etc)

4. Clinicians - tacit knowledge similar to Aud

3.

5. Cancer patients Depression, Anger, Denial,

(their carers or Advocacy, Survival, ask for

family) additional support if areas of

lower survival. Provide

additional support if in areas of

lower survival. Access screening

earlier.

6. Other Cancer

Councils and

Health Reporting

organisations

7. Researchers Explore research opportunities - -

highlight research opportunities

8. Media overwhelmed by uncert info, -

mis-interpretation of info &

inflation of myths. Advocacy for

policy and action that addresses

inequalities.

189 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION

Table C.10: Q7: Potential for change

Audience General Atlas Uncertainty information

1. General

Audience

2. Government, - Action that address inequalities.

lobby groups, Further research to understand

policy makers/ inequalities.

advisors

3. Health

managers (local &

regional)

4. Clinicians

5. Cancer patients

(their carers or

family)

6. Other Cancer

Councils and

Health Reporting

organisations

7. Researchers Generate greater knowledge and

understanding of methods, and

inequalities.

8. Media

190 Appendix D

Appendix: Online Game

D.1 Untransformed Performance Data

## Warning in logit(performance): proportions remapped to (0.025, 0.975)

0.20 1.0

0.15

0.9 0.10 Proportion of Games 0.05

0.8

0.00 (non−transformed) ratio Performance

0.8 0.9 1.0 Performance ratio (non−transformed)

Figure D.1: Performance ratio

D.2 Logit(PR) by game mode: LME output and diagnostic plots

## Linear mixed-effects model fit by REML

## Data: game_perf

## AIC BIC logLik

## 1344.415 1371.503 -666.2073

##

## Random effects:

## Formula: ~1 | session

191 APPENDIX D. APPENDIX: ONLINE GAME

## (Intercept) Residual

## StdDev: 0.2238952 0.6161685

##

## Fixed effects: perf_logit_transform ~ game_mode

## Value Std.Error DF t-value p-value

## (Intercept) 2.6674917 0.05775239 577 46.18842 0.0000

## game_mode1 0.1683792 0.06894343 577 2.44228 0.0149

## game_mode2 0.0989419 0.06852640 577 1.44385 0.1493

## game_mode3 0.2000226 0.07026788 577 2.84657 0.0046

## Correlation:

## (Intr) gm_md1 gm_md2

## game_mode1 -0.622

## game_mode2 -0.628 0.519

## game_mode3 -0.614 0.510 0.519

##

## Standardized Within-Group Residuals:

## Min Q1 Med Q3 Max

## -3.00097516 -0.60233524 0.09753618 0.74692413 1.89738498

##

## Number of Observations: 679

## Number of Groups: 99

## Approximate 95% confidence intervals

##

## Fixed effects:

## lower est. upper

## (Intercept) 2.55406117 2.66749171 2.7809223

## game_mode1 0.03296851 0.16837920 0.3037899

## game_mode2 -0.03564971 0.09894188 0.2335335

## game_mode3 0.06201063 0.20002263 0.3380346

## attr(,"label")

192 APPENDIX D. APPENDIX: ONLINE GAME

## [1] "Fixed effects:"

##

## Random Effects:

## Level: session

## lower est. upper

## sd((Intercept)) 0.1620591 0.2238952 0.3093259

##

## Within-group standard error:

## lower est. upper

## 0.5825994 0.6161685 0.6516718

## numDF denDF F-value p-value

## (Intercept) 1 577 5412.032 <.0001

## game_mode 3 577 3.215 0.0225

##

## Simultaneous Tests for General Linear Hypotheses

##

## Multiple Comparisons of Means: Tukey Contrasts

##

##

## Fit: lme.formula(fixed = perf_logit_transform ~ game_mode, data = game_perf,

## random = ~1 | session)

##

## Linear Hypotheses:

## Estimate Std. Error z value Pr(>|z|)

## 1 - 0 == 0 0.16838 0.06894 2.442 0.0438 *

## 2 - 0 == 0 0.09894 0.06853 1.444 0.2232

## 3 - 0 == 0 0.20002 0.07027 2.847 0.0265 *

## 2 - 1 == 0 -0.06944 0.06743 -1.030 0.3638

## 3 - 1 == 0 0.03164 0.06893 0.459 0.6462

193 APPENDIX D. APPENDIX: ONLINE GAME

2

2 1

0 0 −1

−2 −2 Standardized residuals Standardized Quantiles of standard normal −3

−3 −2 −1 0 1 2 2.4 2.6 2.8 3.0 3.2 3.4 Standardized residuals Fitted values

Figure D.2: Diagnostic Plots of Linear Mixed Effects Model (Logit(PR) by Game Mode)

## 3 - 2 == 0 0.10108 0.06811 1.484 0.2232

## ---

## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

## (Adjusted p values reported -- BH method)

## Levene’s Test for Homogeneity of Variance (center = median)

## Df F value Pr(>F)

## group 3 2.3015 0.07601 .

## 675

## ---

## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

D.3 logit(PR) by risk profile - LME model output and diagnostic plots

## numDF denDF F-value p-value

## (Intercept) 1 577 5553.198 <.0001

## game_risk 3 577 4.053 0.0072

##

## Simultaneous Tests for General Linear Hypotheses

##

## Multiple Comparisons of Means: Tukey Contrasts

##

##

194 APPENDIX D. APPENDIX: ONLINE GAME

2

2 1

0

0 −1

−2 −2 Standardized residuals Standardized Quantiles of standard normal −3

−3 −2 −1 0 1 2 2.4 2.6 2.8 3.0 3.2 3.4 Standardized residuals Fitted values

Figure D.3: Diagnostic Plots of Linear Mixed Effects Model (Risk by Game Mode)

## Fit: lme.formula(fixed = perf_logit_transform ~ game_risk, data = game_perf,

## random = ~1 | session)

##

## Linear Hypotheses:

## Estimate Std. Error z value Pr(>|z|)

## lhh - hhh == 0 0.029867 0.082866 0.360 0.88413

## llh - hhh == 0 0.027574 0.082035 0.336 0.88413

## lll - hhh == 0 0.256790 0.094996 2.703 0.01374 *

## llh - lhh == 0 -0.002293 0.056632 -0.040 0.96771

## lll - lhh == 0 0.226923 0.073763 3.076 0.00629 **

## lll - llh == 0 0.229216 0.072826 3.147 0.00629 **

## ---

## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

## (Adjusted p values reported -- BH method)

## Levene’s Test for Homogeneity of Variance (center = median)

## Df F value Pr(>F)

## group 3 11.827 1.475e-07 ***

## 675

## ---

## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

195 APPENDIX D. APPENDIX: ONLINE GAME

# since power is calculated through simulation,

#this code will not run when this file is rendered.

#the ouput of the following code is is included as an image below.

#library(lme4)

#lme_per_lme4 <- lmer(perf_logit_transform ~ game_mode + (1|session), data=game_perf)

#summary(lme_per_lme4)

#power_game_mode <- simr::powerSim(lme_per_lme4)

#power_game_mode

# Risk profile

#lme_per_risk_lme4 <- lmer(perf_logit_transform ~ game_risk + (1|session), data=game_perf)

#summary(lme_per_risk_lme4)

#power_risk <-simr::powerSim(lme_per_risk_lme4)

#power_risk

196 APPENDIX D. APPENDIX: ONLINE GAME

197

Appendix E

Ethics Approvals

E.1 Focus Groups Recruitment Flyer and Consent Form

199 APPENDIX E. ETHICS APPROVALS

Figure E.1: Focus Group Recruitment Flyer

200 APPENDIX E. ETHICS APPROVALS

Figure E.2: Focus Group Consent Form

201 APPENDIX E. ETHICS APPROVALS

Figure E.3: Focus Group Consent Form

202 APPENDIX E. ETHICS APPROVALS

Figure E.4: Focus Group Consent Form

203 APPENDIX E. ETHICS APPROVALS

E.2 Online Game

Figure E.5: Online Game Recruitment flyer

204 APPENDIX E. ETHICS APPROVALS

Figure E.6: Online Game Consent Form

205 APPENDIX E. ETHICS APPROVALS

Figure E.7: Online Game Consent Form

206 Bibliography

Aerts, JC, KC Clarke, and AD Keuper (2003). Testing popular visualization techniques for

representing model uncertainty. Cartography and Geographic Information Science 30(3),

249–261.

Ali, M, M Emch, C Ashley, and PK Streatfield (2001). Implementation of a medical ge-

ographic information system: concepts and uses. Journal of Health, Population and

Nutrition, 100–110.

Alpert, M and H Raiffa (1982). A progress report on the training of probability assessors.

Atkinson, PM and A Graham (2006). Issues of scale and uncertainty in the global remote

sensing of disease. Advances in parasitology 62, 79–118.

Authority, GBRMP (2010). Water Quality Guidelines for the Great Barrier Reef Marine

Park.

Beale, L, JJ Abellan, S Hodgson, and L Jarup (2008). Methodologic issues and approaches

to spatial epidemiology. Environmental health perspectives 116(8), 1105–10.

Bedford, T and R Cooke (n.d.). Probabilistic Risk Analysis: Foundations and Methods ().

Begg, SH and M Welsh (2014). Uncertainty vs . Variability : What ’ s the Difference and

Why is it Important ? (May), 19–20.

Belia, S, F Fidler, J Williams, and G Cumming (2005). Researchers misunderstand confi-

dence intervals and standard error bars. Psychological methods 10(4), 389.

Bellu, LG and P Liberati (2006). Inequality Analysis The Gini Index.

Benjamini, Y and Y Hochberg (1995). Controlling the false discovery rate: a practical and

powerful approach to multiple testing. Journal of the royal statistical society. Series B

(Methodological), 289–300.

Bertin, J (1981). Graphics and graphic information processing. Walter de Gruyter.

207 BIBLIOGRAPHY

Besag, J, J York, and A Mollie (1991). Bayesian image restoration, with two applications in

spatial statistics. Annals of the Institute of Statistical Mathematics 43(1).

Blenkinsop, S, P Fisher, L Bastin, and J Wood (2000). Evaluating the perception of uncer-

tainty in alternative visualization strategies. Cartographica: The International Journal for

Geographic Information and Geovisualization 37(1), 1–14.

Bobkoff, D (2015). Seed by seed, acre by acre, big data is taking over the farm. Business

Insider 15.

Box, GE (1979). “Robustness in the strategy of scientific model building”. In: Robustness in

statistics. Elsevier, pp.201–236.

Brodlie, K, RA Osorio, and A Lopes (2012). Expanding the Frontiers of Visual Analytics

and Visualization. Ed. by J Dill, R Earnshaw, D Kasik, J Vince, and PC Wong, 81–109.

Brown, MB and AB Forsythe (1974). Robust tests for the equality of variances. Journal of

the American Statistical Association 69(346), 364–367.

Brown, R (2004). Animated visual vibrations as an uncertainty visualisation technique.

In: Proceedings of the 2nd international conference on Computer graphics and interactive

techniques in Austalasia and Southe East Asia - GRAPHITE ’04. Vol. 1. 212. ACM Press,

pp.84–89.

Brugnach, M, A Dewulf, C Pahl-Wostl, and T Taillieu (2008). Toward a relational concept

of uncertainty: about knowing too little, knowing too differently, and accepting not to

know. Ecology and society 13(2).

Burgman, M (2005). Risks and decisions for conservation and environmental management.

Cambridge University Press.

Buttenfield, BP (1993). Representing data quality. Cartographica: The International Journal

for Geographic Information and Geovisualization 30(2-3), 1–7.

Buttenfield, BP and MK Beard (1991). This paper represents part of Research Initiative# 7,"

Visualizing the Quality of Spatial Information", of the National Center for Geographic

Information and Analysis, supported by flagrant from the National Science Foundation

(SES-88-10917); support by NSF is gratefully acknowledged.

Carroll, LN, AP Au, LT Detwiler, T chieh Fu, IS Painter, and NF Abernethy (2014). Visual-

ization and analytics tools for infectious disease epidemiology: A systematic review.

Journal of Biomedical Informatics 51, 287–298.

208 BIBLIOGRAPHY

Choi, M, B Afzal, and B Sattler (2006). Geographic information systems: a new tool for

environmental health assessments. Public Health Nursing 23(5), 381–391.

Cleveland, WS and R McGill (1984). Graphical perception: Theory, experimentation, and

application to the development of graphical methods. Journal of the American statistical

association 79(387), 531–554.

Cliburn, DC, JJ Feddema, JR Miller, and TA Slocum (2002). Design and evaluation of a

decision support system in a water balance application. Computers & Graphics 26(6),

931–949.

Cohen, MS, JT Freeman, and S Wolf (1996). Metarecognition in time-stressed decision

making: Recognizing, critiquing, and correcting. Human Factors 38(2), 206–219.

Conover, WJ and WJ Conover (1980). Practical nonparametric statistics.

Conover, WJ and RL Iman (1979). On multiple-comparisons procedures. Los Alamos Sci.

Lab. Tech. Rep. LA-7677-MS, 1–14.

Couclelis, H (2003). The certainty of uncertainty: GIS and the limits of geographic knowl-

edge. Transactions in GIS 7(2), 165–175.

Cummings, G, F Fidler, and DL Vaux (2007). Error bars in experimental biology. 177(1),

7–11.

Daniel, WW (1990). Kruskal–Wallis one-way analysis of variance by ranks. Applied non-

parametric statistics, 226–234.

Davis, TJ and CP Keller (1997). Modelling and visualizing multiple spatial uncertainties.

Computers & Geosciences 23(4), 397–408.

Deitrick, S and R Edsall (2008). Making uncertainty usable: Approaches for visualizing

uncertainty information. Geographic Visualization: Concepts, Tools and Applications, 277–

291.

Deterding, S, D Dixon, R Khaled, and L Nacke (2011). From game design elements to

gamefulness. In: Proceedings of the 15th International Academic MindTrek Conference on

Envisioning Future Media Environments - MindTrek ’11. New York, New York, USA: ACM

Press, pp.9–11. eprint: 11/09 (ACM 978-1-4503-0816-8).

DiBiase, D, AM MacEachren, JB Krygier, and C Reeves (1992). Animation and the role of

map design in scientific visualization. Cartography and geographic information systems

19(4), 201–214.

209 BIBLIOGRAPHY

Du, N, DV Budescu, MK Shelly, and TC Omer (2011). The appeal of vague financial

forecasts. Organizational Behavior and Human Decision Processes 114(2), 179–189.

Dunn, OJ (1964). Multiple comparisons using rank sums. Technometrics 6(3), 241–252.

Eaton, C, C Plaisant, and T Drizd (2005). Visualizing missing data: classification and em-

pirical study. In: IFIP International Conference on Human-Computer Interaction: September

12-16 2005; Rome, Italy. Springer, pp.861–872.

Eberhardt, L (1987). Population projections from simple models. Journal of Applied Ecology,

103–118.

Eiser, JR, A Bostrom, I Burton, DM Johnston, J McClure, D Paton, J Van Der Pligt, and MP

White (2012). Risk interpretation and action: A conceptual framework for responses to

natural hazards. International Journal of Disaster Risk Reduction 1, 5–16.

Elliott, P and D Wartenberg (2004). Spatial epidemiology: current approaches and future

challenges. Environmental health perspectives 112(9), 998.

Engler, R, A Guisan, and L Rechsteiner (2004). An improved approach for predicting the

distribution of rare and endangered species from occurrence and pseudo-absence data.

Journal of applied ecology 41(2), 263–274.

Evans, BJ (1997). Dynamic display of spatial data-reliability: Does it benefit the map user?

Computers & Geosciences 23(4), 409–422.

Fairley, L, D Forman, R West, and S Manda (2008). Spatial variation in prostate cancer

survival in the Northern and Yorkshire region of England using Bayesian relative

survival smoothing. British journal of cancer 99(11), 1786.

Farrell, S (2017). Group Notetaking for User Research. https://www.nngroup.com/articles/

group-notetaking/ (visited on 07/17/2017).

Fauerbach, E, R Edsall, D Barnes, and A MacEachren (1996). Visualization of uncertainty in

meteorological forecast models. In: Proceedings of the International Symposium on Spatial

Data Handling, Delft, The Netherlands, pp.465–76.

Fischhoff, B (2012). Communicating uncertainty fulfilling the duty to inform. Issues in

Science and Technology 28(4), 63–70.

Fisher, P (1994). Animation and sound for the visualization of uncertain spatial information.

Wiley Chichester„ UK.

210 BIBLIOGRAPHY

Frewer, L and B Salter (2002). Public attitudes, scientific advice and the politics of regula-

tory policy: the case of BSE. Science and public policy 29(2), 137–145.

Funtowicz, SO and JR Ravetz (1990). Uncertainty and quality in science for policy. Vol. 15.

Springer Science & Business Media.

Gannett, H (1903). Statistical atlas. United States Census Office.

Garnelo, L, LC Brandão, and A Levino (2005). Dimensions and potentialities of the geo-

graphic information system on indigenous health. Revista de saude publica 39(4), 634–

640.

Gavankar, S, S Anderson, and AA Keller (2015). Critical Components of Uncertainty Com-

munication in Life Cycle Assessments of Emerging Technologies. Journal of Industrial

Ecology 19(3), 468–479.

Gerharz, LE and EJ Pebesma (2009). Usability of interactive and non-interactive visualisa-

tion of uncertain geospatial information. Geoinformatik 4, 223–230.

Gesteland, PH, Y Livnat, N Galli, MH Samore, and AV Gundlapalli (2012). The EpiCanvas

infectious disease weather map: an interactive visual exploration of temporal and

spatial correlations. Journal of the American Medical Informatics Association 19(6), 954–

959.

Gigerenzer, G and U Hoffrage (1995). How to improve Bayesian reasoning without in-

struction: frequency formats. Psychological review 102(4), 684.

Giles, J (2002). When doubt is a sure thing. Nature 418(6897), 476–478.

Gillund, F, KA Kjølberg, MK von Krauss, and AI Myhr (2008). Do uncertainty analyses

reveal uncertainties? Using the introduction of DNA vaccines to aquaculture as a case.

Science of the total environment 407(1), 185–196.

Gini, C (1912). Variabilità e Mutuabilità. Contributo allo Studio delle Distribuzioni e delle

Relazioni Statistiche.

Gough, P, T Bednarz, C de Bérigny, and J Roberts (2016). A process for non-expert user

visualization design. In: Proceedings of the 28th Australian Conference on Computer-Human

Interaction. ACM, pp.247–251.

Gough, P, CDB Wall, and T Bednarz (2014). Affective and effective visualisation: commu-

nicating science to non-expert users. In: Visualization Symposium (PacificVis), 2014 IEEE

Pacific. IEEE, pp.335–339.

211 BIBLIOGRAPHY

Green, LW, L Richard, and L Potvin (1996). Ecological foundations of health promotion.

American Journal of Health Promotion 10(4), 270–281.

Gregorio, JD and JW Lee (2002). Education and income inequality: new evidence from

cross-country data. Review of income and wealth 48(3), 395–416.

Grubler, A, Y Ermoliev, and A Kryazhimskiy (2015). Coping with uncertainties-examples

of modeling approaches at IIASA. Technological Forecasting and Social Change 98, 213–

222.

Gurobi Optimization, I (2016). Gurobi Optimizer Reference Manual. http://www.gurobi.com.

Han, PKJ, WMP Klein, T Lehman, B Killam, H Massett, and AN Freedman (2011). Com-

munication of Uncertainty Regarding Indi- vidualized Cancer Risk Estimates: Effects

and Influential Factors. Med Decis Making 31, 354–366.

Harbage, B and AG Dean (1999). Distribution of epi info software: an evaluation using the

Internet. American journal of preventive medicine 16(4), 314–317.

Harrower, M and NP Street (2003). Representing Uncertainty : Does it Help People Make

Better Decisions ? (1).

Hertwig, R and G Gigerenzer (1999). The’conjunction fallacy’revisited: How intelligent

inferences look like reasoning errors. Journal of behavioral decision making 12(4), 275.

Hope, S and G Hunter (2007). Testing the effects of positional uncertainty on spatial

decision-making. International Journal of Geographical Information Science 21(6), 645–665.

Hora, SC (1996). Aleatory and epistemic uncertainty in probability elicitation with an

example from hazardous waste management. Reliability Engineering & System Safety

54(2-3), 217–223.

Hu, PJH, D Zeng, H Chen, C Larson, W Chang, C Tseng, and J Ma (2007). System for

infectious disease information sharing and analysis: design and evaluation. IEEE

Transactions on information technology in biomedicine 11(4), 483–492.

Hughes, C (1989). The representation of uncertainty in medical expert systems. Medical

Informatics 14(4), 269–279.

Hunter, GJ and MF Goodchild (1996). Communicating uncertainty in spatial databases.

Transactions in GIS 1(1), 13–24.

Johnson, C (2004). Top scientific visualization research problems. IEEE Computer Graphics

and Applications 24(4), 13–17.

212 BIBLIOGRAPHY

Johnson, CR and AR Sanderson (2003). A next step: Visualizing errors and uncertainty.

IEEE Computer Graphics and Applications 23(5), 6–10.

Jonassen, DH (2012). Designing for decision making. Educational technology research and

development 60(2), 341–359.

Joslyn, SL and JE LeClerc (2011). Uncertainty forecasts improve weather-related decisions

and attenuate the effects of forecast error. Journal of experimental psychology: applied

18(1), 126.

Joslyn, SL and RM Nichols (2009). Probability or frequency? Expressing forecast uncer-

tainty in public weather forecasts. Meteorological Applications 16(3), 309–314.

Joslyn, S and S Savelli (2010). Communicating forecast uncertainty: Public perception of

weather forecast uncertainty. Meteorological Applications 17(2), 180–195.

Kahneman, D and a Tversky (1982). Variants of uncertainty. Cognition 11(2), 143–157.

Karlsson, D, J Ekberg, A Spreco, H Eriksson, and T Timpka (2013). Visualization of infec-

tious disease outbreaks in routine practice. In: MedInfo, pp.697–701.

Kinkeldey, C, AM MacEachren, and J Schiewe (2014). How to assess visual communication

of uncertainty? A systematic review of geospatial uncertainty visualisation user studies.

The Cartographic Journal 51(4), 372–386.

Kitzinger, J (1995). Qualitative Research: Introducing focus groups. BMJ 311(7000), 299–

302.

Knol, AB, AC Petersen, JP Van der Sluijs, and E Lebret (2009). Dealing with uncertainties

in environmental burden of disease assessment. Environmental Health 8(1), 21.

Koenig, A, E Samarasundera, and T Cheng (2011). Interactive map communication: Pilot

study of the visual perceptions and preferences of public health practitioners. Public

Health 125(8), 554–560.

Kothari, A, SM Driedger, J Bickford, J Morrison, M Sawada, ID Graham, and E Crighton

(2008). Mapping as a knowledge translation tool for Ontario Early Years Centres: views

from data analysts and managers. Implementation Science 3(1), 4.

Kraemer, MU, SI Hay, DM Pigott, DL Smith, GW Wint, and N Golding (2016). Progress

and challenges in infectious disease cartography. Trends in parasitology 32(1), 19–29.

213 BIBLIOGRAPHY

Kruskal, WH and WA Wallis (1952). Use of Ranks in One-Criterion Variance Analysis.

Journal of the American Statistical Association 47(260), 583–621. eprint: https://www.

tandfonline.com/doi/pdf/10.1080/01621459.1952.10483441.

Krygier, JB (1994). “Sound and geographic visualization”. In: Modern cartography series.

Vol. 2. Elsevier, pp.149–166.

Kujala, H, MA Burgman, and A Moilanen (2013). Treatment of uncertainty in conservation

under climate change. Conservation Letters 6(2), 73–85.

Lai, PC, FM So, and KW Chan (2008). Spatial epidemiological approaches in disease mapping

and analysis. CRC press.

Landesberger, T von, S Bremm, and M Wunderlich (2017). Typology of Uncertainty in

Static Geolocated Graphs for Visualization. IEEE computer graphics and applications

37(5), 18–27.

Langford, M, D Unwin, and D Maguire (1990). Generating improved population density

maps in an integrated GIS. In: EGIS’90: Proceedings of the First European Conference

on Geographical Information Systems, EGIS Foundation, Utrecht, The Netherlands. Vol. 2,

pp.651–660.

Lapinski, ALS (2009). A strategy for uncertainty visualization design. Tech. rep. Defence

Research and Development Atlantic Dartmouth (Canada).

Lazo, JK, RE Morss, and JL Demuth (2009). 300 billion served: Sources, perceptions, uses,

and values of weather forecasts. Bulletin of the American Meteorological Society 90(6),

785–798.

Leitner, M and BP Buttenfield (2000). Guidelines for the display of attribute certainty.

Cartography and Geographic Information Science 27(1), 3–14.

Leitner, M and BP Buttenfield (2013). Cartography and Geographic Information Science

Guidelines for the Display of Attribute Certainty Guidelines for the Display of At-

tribute Certainty.

Lempert, RJ, SW Popper, and SC Bankes (2003). Shaping the next one hundred years: new

methods for quantitative, long-term policy analysis. Rand, Santa Monica, CA.

Lipshitz, R and O Strauss (1997). Coping with Uncertainty: A Naturalistic Decision-Making

Analysis. Organizational Behavior and Human Decision Processes 69(2), 149–163.

214 BIBLIOGRAPHY

MacEachren, AM (1992). Visualizing uncertain information. Cartographic Perspectives (13),

10–19.

Manski, CF (2014). Communicating uncertainty in official economic statistics. Tech. rep. Na-

tional Bureau of Economic Research.

Manski, CF (2015). Communicating uncertainty in official economic statistics: an appraisal

fifty years after Morgenstern. Journal of Economic Literature 53(3), 631–53.

McGranaghan, M (1993). A cartographic view of spatial data quality. Cartographica: The

International Journal for Geographic Information and Geovisualization 30(2-3), 8–19.

Miles, S and LJ Frewer (2003). Public perception of scientific uncertainty in relation to food

hazards. Journal of risk research 6(3), 267–283.

Mistry, PK and JS Trueblood (2017). An Investigation of Factors that Influence Resource

Allocation Decisions.

Morgan, MG and M Henrion (1990). Uncertainty: a Guide to dealing with uncertainty in

quantitative risk and policy analysis Cambridge University Press. New York, New York,

USA.

Morgenstern, O et al. (1963). On the accuracy of economic observations. Princeton University

Press.

Morss, RE, JL Demuth, and JK Lazo (2008). Communicating uncertainty in weather

forecasts: A survey of the US public. Weather and forecasting 23(5), 974–991.

Mullner, RM, K Chung, KG Croke, and EK Mensah (2004). Introduction: geographic informa-

tion systems in public health and medicine.

Munzner, T (2009). Visualization.

Munzner, T (2014). Visualization analysis and design. CRC press.

Myers, MF, D Rogers, J Cox, A Flahault, and S Hay (2000). “Forecasting disease risk for

increased epidemic preparedness in public health”. In: Advances in Parasitology. Vol. 47.

Elsevier, pp.309–330.

Newman, TS and W Lee (2004). On visualizing uncertainty in volumetric data: techniques

and their evaluation. Journal of Visual Languages & Computing 15(6), 463–491.

Nusrat, S and S Kobourov (2016). The state of the art in cartograms. In: Computer Graphics

Forum. Vol. 35. 3. Wiley Online Library, pp.619–642.

215 BIBLIOGRAPHY

Nykiforuk, CI and LM Flaman (2011). Geographic information systems (GIS) for health

promotion and public health: a review. Health promotion practice 12(1), 63–73.

O’Hagan, A (2012). Probabilistic uncertainty specification: Overview, elaboration tech-

niques and their application to a mechanistic model of carbon flux. Environmental

Modelling & Software 36, 35–48.

Olsen, SF, M Martuzzi, and P Elliott (1996). Cluster analysis and disease mapping–why,

when, and how? A step by step guide. BMJ: British Medical Journal 313(7061), 863.

Ord, JK (2010). “Spatial Autocorrelation: A Statistician’s Reflections”. In: Perspectives on

Spatial Data Analysis. Springer, pp.165–180.

Palmer, TN and PJ Hardaker (2011). Handling uncertainty in science.

Pang, AT, CM Wittenbrink, and SK Lodha (1997). Approaches to uncertainty visualization.

The Visual Computer 13(8), 370–390.

Pawson, R, G Wong, and L Owen (2011). Known knowns, known unknowns, unknown

unknowns: the predicament of evidence-based policy. American Journal of Evaluation

32(4), 518–546.

Pennello, GA, SS Devesa, and MH Gail (1999). Using a mixed effects model to estimate

geographic variation in cancer rates. Biometrics 55(3), 774–781.

Pinheiro, J, D Bates, S DebRoy, D Sarkar, and R Core Team (2017). Linear and Nonlinear

Mixed Effects Models. R package version 3.1-131. https://CRAN.R- project.org/

package=nlme.

Politi, MC, PK Han, and NF Col (2007). Communicating the uncertainty of harms and

benefits of medical interventions. Medical Decision Making 27(5), 681–695.

Potter, K, M Kirby, D Xiu, and CR Johnson (2012). Interactive visualization of probability

and cumulative density functions. International journal for uncertainty quantification 2(4).

R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation

for Statistical Computing. Vienna, Austria. https://www.R-project.org/.

Ramirez, AJ, AC Jensen, and BH Cheng (2012). A taxonomy of uncertainty for dynami-

cally adaptive systems. In: Proceedings of the 7th International Symposium on Software

Engineering for Adaptive and Self-Managing Systems. IEEE Press, pp.99–108.

Regan, HM, M Colyvan, and Ma Burgman (2002). A taxonomy and treatment of uncer-

tainty for ecology and conservation biology. Ecological Applications 12(2), 618–628.

216 BIBLIOGRAPHY

Ristovski, G, T Preusser, HK Hahn, and L Linsen (2014). Uncertainty in medical visualiza-

tion: Towards a taxonomy. Computers & Graphics 39, 60–73.

Robinson, AC (2009). Needs assessment for the design of information synthesis visual

analytics tools. In: Information Visualisation, 2009 13th International Conference. IEEE,

pp.353–360.

Robinson, AC, AM MacEachren, and RE Roth (2011). Designing a web-based learning

portal for geographic visualization and analysis in public health. Health informatics

journal 17(3), 191–208.

Robinson, T (2000). Spatial statistics and geographical information systems in epidemiol-

ogy and public health. Advances in Parasitology 47, 81–128.

Rosa Dias, P (2009). Inequality of opportunity in health: evidence from a UK cohort study.

Health Economics 18(9), 1057–1074.

Sabesan, S and K Raju (2005). GIS for rural health and sustainable development in India,

with special reference to vector-borne diseases. Current Science 88(11), 1749–1752.

Sanyal, J, Song Zhang, J Dyer, A Mercer, P Amburn, and RJ Moorhead (2010). Noodles:

A Tool for Visualization of Numerical Weather Model Ensemble Uncertainty. IEEE

Transactions on Visualization and Computer Graphics 16(6), 1421–1430.

Savelli, S and S Joslyn (2013). The advantages of predictive interval forecasts for non-expert

users and the impact of visualizations. Applied Cognitive Psychology 27(4), 527–541.

Scheufele, DA and BV Lewenstein (2005). The public and nanotechnology: How citizens

make sense of emerging technologies. Journal of Nanoparticle Research 7(6), 659–667.

Schneider, SH and R Moss (1999). Uncertainties in the IPCC TAR: Recommendations to

lead authors for more consistent assessment and reporting. Unpublished document.

Schneiderman, B, C Plaisant, and B Hesse (2013). Improving health and healthcare with

interactive visualization methods. HCIL Technical Report 1, 1–13.

Schrage, M (2016). How the big data explosion has changed decision making. Harvard

Business Review.

Scupin, R (1997). The KJ method: A technique for analyzing data derived from Japanese

ethnology. Human organization 56(2), 233–237.

Shaffer, JP (1995). Multiple hypothesis testing. English. Annual Review of Psychology 46,

561.

217 BIBLIOGRAPHY

Siegel, S (1956). Nonparametric statistics for the behavioral sciences, New York, 1956.

Google Scholar.

Skinner, DJ, Sa Rocks, SJ Pollard, and GH Drew (2013). Identifying Uncertainty in Environ-

mental Risk Assessments: The Development of a Novel Typology and Its Implications

for Risk Characterisation. Human and Ecological Risk Assessment: An International Journal

7039(November), 130301143601004.

Skinner, DJ, SA Rocks, and SJ Pollard (2016). Where do uncertainties reside within envi-

ronmental risk assessments? Expert opinion on uncertainty distributions for pesticide

risks to surface water organisms. Science of the Total Environment 572, 23–33.

Slocum, TA, RM McMaster, FC Kessler, HH Howard, and RB Mc Master (2008). Thematic

cartography and geographic visualization.

Soll, JB and J Klayman (2004). Overconfidence in interval estimates. Journal of Experimental

Psychology: Learning, Memory, and Cognition 30(2), 299.

Spiegelhalter, D, M Pearson, and I Short (2011). Visualizing uncertainty about the future.

science 333(6048), 1393–1400.

Tatem, AJ, N Campiz, PW Gething, RW Snow, and C Linard (2011). The effects of spatial

population dataset choice on estimates of population at risk of disease. Population

Health Metrics 9(1), 4.

Thomson, J, E Hetzler, A MacEachren, M Gahegan, and M Pavel (2005). A typology for

visualizing uncertainty. In: ed. by RF Erbacher, JC Roberts, MT Grohn, and K Borner.

International Society for Optics and Photonics, pp.146.

Thunnissen, DP (2003). Uncertainty classification for the design and development of

complex systems. In: 3rd annual predictive methods conference, pp.1–16.

Tufte, ER (1983). The Visual Display of. Quantitative Information.

Tufte, ER and D Robins (1997). Visual explanations. Graphics Cheshire, CT.

Tukey, JW (1949). Comparing individual means in the analysis of variance. Biometrics,

99–114.

Uusitalo, L, A Lehikoinen, I Helle, and K Myrberg (2015). An overview of methods

to evaluate uncertainty of deterministic models in decision support. Environmental

Modelling & Software 63, 24–31.

218 BIBLIOGRAPHY

Van der Wel, FJ, RM Hootsmans, and F Ormeling (1994). “Visualization of data quality”.

In: Modern Cartography Series. Vol. 2. Elsevier, pp.313–331.

Walker, W, P Harremoës, J Rotmans, J van der Sluijs, M van Asselt, P Janssen, and M

Krayer von Krauss (2003). Defining Uncertainty: A Conceptual Basis for Uncertainty

Management in Model-Based Decision Support. Integrated Assessment 4(1), 5–17.

Wernerfelt, B and A Karnani (1987). RESEARCH NOTES AND COMMUNICATIONS

COMPETITIVE STRATEGY UNDER UNCERTAINTY. English. Strategic Management

Journal (1986-1998) 8(2). Copyright - Copyright Wiley Periodicals Inc. Mar/Apr 1987;

Last updated - 2011-08-09; CODEN - SMAJD8, 187.

Wolfert, S, L Ge, C Verdouw, and MJ Bogaardt (2017). Big data in smart farming–a review.

Agricultural Systems 153, 69–80.

Yaniv, I and DP Foster (1995). Graininess of judgment under uncertainty: An accuracy-

informativeness trade-off. Journal of Experimental Psychology: General 124(4), 424.

Yin, S and O Kaynak (2015). Big data for modern industry: challenges and trends [point of

view]. Proceedings of the IEEE 103(2), 143–146.

Zeileis, A (2014). ineq: Measuring Inequality, Concentration, and Poverty. R package version

0.2-13. https://CRAN.R-project.org/package=ineq.

Zinszer, K, C Jauvin, A Verma, L Bedard, R Allard, K Schwartzman, L de Montigny,

K Charland, and DL Buckeridge (2010). Residential address errors in public health

surveillance data: a description and analysis of the impact on geocoding. Spatial and

Spatio-temporal Epidemiology 1(2-3), 163–168.

Zuk, T, MST Carpendale, and WD Glanzman (2005). Visualizing Temporal Uncertainty in

3D Virtual Reconstructions. In: VAST. Vol. 2005, pp.6th.

219