Communication of Statistical Uncertainty to Non-expert Audiences
A thesis submitted for the degree of
Master of Philosophy (Mathematics)
by
Jessie Roberts
B.Sc. Biotechnology Innovation, Queensland University of Technology
School of Mathematical Sciences
Science and Engineering Faculty
Queensland University of Technology
Australia
2019 Contents
Acknowledgementsv
Declaration vii
Abstract ix
1 Introduction1
1.1 Aims and objectives ...... 6
1.2 The Australian Cancer Atlas ...... 7
1.3 Research contributions ...... 8
1.4 Thesis structure ...... 8
2 Literature Review 11
2.1 Uncertainty ...... 12
2.2 Why is Uncertainty Information important to decision-makers? ...... 21
2.3 Communicating Statistical Uncertainty ...... 24
2.4 Uncertainty communication ...... 32
2.5 Uncertainty Communication design ...... 34
2.6 Spatial epidemiology and disease mapping ...... 37
3 Research Activity 1.A: Grey literature review of internet published cancer maps. 43
3.1 Introduction ...... 43
3.2 Aim and Research Question ...... 44
3.3 Methods ...... 45
3.4 Summary findings ...... 46
3.5 Implications for the Australian Cancer Atlas ...... 53
3.6 Conclusion ...... 55
ii CONTENTS
4 Research Activity 1.B: User centred uncertainty communication design (Australian
Cancer Atlas as a case study) 57
4.1 Introduction ...... 57
4.2 Methods ...... 58
4.3 Results ...... 65
4.4 Discussion & insights for the Australian Cancer Atlas...... 76
4.5 Conclusion ...... 80
5 Research Activity 2: User study - Uncertainty representation in an online game. 85
5.1 Introduction ...... 85
5.2 Aim ...... 87 5.3 Online Simulation - Impact of uncertainty communication methods on decision
making ...... 87
5.4 Methods ...... 88
5.5 Results ...... 102
5.6 Discussion ...... 114
6 Discussion 119
6.1 Uncertainty communication design ...... 120
6.2 Testing uncertainty representation methods ...... 125
6.3 Critique & limitations ...... 127
6.4 Future Work ...... 129
6.5 Conclusion ...... 130
A Appendix: Literature Review 131
A.1 Uncertainty Representation in Mapping & GISciences ...... 131
B Appendix: Research Acitivity 1. A - Grey Literature Review 133
B.1 Search Protocol ...... 133
Pre-Scoping ...... 133
Search Details ...... 134
B.2 Database of identified cancer atlases ...... 139
iii CONTENTS
C Research Activity 1.B - User-centred design for uncertainty communication 163
C.1 Project Partners Workshop ...... 163
C.2 NEUVis Audience Profiles ...... 179
D Appendix: Online Game 191
D.1 Untransformed Performance Data ...... 191
D.2 Logit(PR) by game mode: LME output and diagnostic plots ...... 191
D.3 logit(PR) by risk profile - LME model output and diagnostic plots ...... 194
E Ethics Approvals 199
E.1 Focus Groups Recruitment Flyer and Consent Form ...... 199
E.2 Online Game ...... 204
Bibliography 207
iv Acknowledgements
I would like to acknowledge the many people that have contributed to the following thesis and more importantly the development of my academic research skills and competencies.
I express my appreciation and thanks to my supervisors Distinguished Professor Kerrie
Mengersen & Dr Kate Helmstedt for their guidance and support. Further to this I would like to specifically acknowledge the contribution and support of the following people:
1. Phil Gough - with whom I collaborated closely with to design, develop and imple-
ment the online game (Chapter 4), as well as the design, data collection and data
analysis of the Cancer Atlas focus groups (Chapter 3).
2. The Cancer Council QLD and specifically Peter Baade and Susanna Cramb - I am very
grateful for their guidance and feedback along the journey and their collaboration
for the work detailed in Chapter 3, including financial support to conduct the cancer
mapping focus groups.
3. Nicholas Dendle - who through his VRES summer project of 2016/17 translated a
paper version of the pirate game to a functioning online game (Chapter 4).
4. Matt Sutton - who implemented the optimisation solution needed for part of the
analysis of the online game data (Chapter 4)
5. The QUT ACEMS and BRAG community for their many coffees, lunches, board
games and generous feedback.
6. Shannon Ryan - for editing and proof-reading services.
7. Lawrence Jones for his support and encouragement.
v
Declaration
I hereby declare that this thesis contains no material which has been accepted for the award of any other degree or diploma in any university or equivalent institution, and that, to the best of my knowledge and belief, this thesis contains no material previously published or written by another person, except where due reference is made in the text of the thesis.
QUT Verified Signature
Jessie Roberts
June 2019
vii
Abstract
Introduction
Statistical uncertainty is present in modelled and data derived information and plays an important role in how outputs of scientific and quantitative analyses are interpreted and used by decision makers, many of which are non-experts. Despite the importance of uncertainty information, there still remains impediments to successfully communicating this information to the non-expert audience. Standardisation of uncertainty representation methods as well as methods and guidelines for communication uncertainty in a way that is accessible to non-expert audiences are two recognised impediments to this challenge. I contribute to both of these in this thesis. A significant motivation of this thesis is to inform the design of an Australian national cancer atlas, which is used as a case study within this research.
Aim and Objectives
This research aims to contribute to a growing body of literature on uncertainty com- munication. It achieves this aim through four distinct objectives: 1 - provide a greater understanding of the current methods used to visualise uncertainty in disease mapping;
2 - investigate a tool for systematically identifying the sources of uncertainty within a multidisciplinary research project; 3 - explore a framework for including uncertainty within the design of scientific communication material; and 4 - and investigate the impact of three different uncertainty visualisation methods on decision making.
Methods
The first, second and third objectives are connected to the Australian Cancer Atlas, which is used as a case study throughout this thesis. Objective 1 is achieved through a grey literature review of currently available cancer maps. Objective 2 is achieved through
ix CONTENTS exploring the use of an uncertainty taxonomy (developed by Morgan and Henrion (1990)), as a tool to diagnose uncertainty sources within the atlas. And objective 3 explores the NEUVis design framework (Gough, Wall, and Bednarz, 2014) for its effectiveness in uncertainty communication design, including the use of focus groups to understand how people interpret uncertainty within cancer mapping. The final and 4th objective is achieved through a quantitative user study which uses an online game to evaluate if players behaviour differs depending on if uncertainty is represented as: the upper and lower bounds of an interval, the semantic bounds of an interval, a point estimate ± error, or a point estimate (no uncertainty). Representation methods are evaluated in terms of the players ability to maximise their potential reward (performance), as well as how players distribute available resources (risk-averse behaviour).
Findings/Results
1. The grey literature review identified 33 publicly available cancer maps of which
13 contained uncertainty information. Many different approaches were used to
represent uncertainty, there were no common approaches. Uncertainty was more
common in maps that contained interactivity.
2. Morgan and Henrion (1990)’s taxonomy of uncertainty, in its current form, did not
prove to be a useful tool for diagnosing sources of uncertainty within the Australian
Cancer Atlas. The technical detail of the taxonomy and lack of connection to the
project or the target audiences hindered its applicability.
3. The extension of the NEUVis design framework to include uncertainty information
worked well to navigate the uncertainty communication design process and sup-
ported a cross disciplinary team to identify audiences and their needs. The design
tool provided a platform for mapping audience needs to uncertainty information
within the Australian Cancer Atlas.
4. The user study evaluating the effect of different uncertainty representation methods
on user behaviour showed that, in terms of maximising potential reward (perfor-
mance), players performed better in the online game when a point estimate or a
point estimate ± error was used, compared to when an interval was used. The
x CONTENTS
inclusion of the ± error did not have a statistically significant effect on performance.
The point estimate with error and the point estimate without error performed sim-
ilarly. In terms of resource spreading, or risk-averse behaviour, when uncertainty
was represented as a semantic interval there was greater diversity in how players
distributed their available resources, they were more likely to spread their resources
across the available options but overall their behaviour was less consistent.
Conclusion
Uncertainty communication is a difficult challenge, it is not technically a statistical problem, but is an important problem for statistics. Addressing this challenge requires: quantitative user studies to evaluate the effectiveness of uncertainty representation methods; qualita- tive investigations into how users interpret uncertainty, navigate communication material, as well as what their needs are in accessing this information; and finally, a framework that supports cross-disciplinary teams to bring the quantitative and qualitative methods together and embed uncertainty into the communication design process. Through a grey literature review, extending the NEUVis communication design framework to include un- certainty (including focus groups), testing an uncertainty diagnostic tool and conducting user testing, this thesis has contributed to all three of these challenges. Further work is required to develop a tool that supports the identification and prioritisation of uncertainty sources within a project.
xi Chapter 1
Introduction
Uncertainty is pervasive in every choice we make and knowledge of its presence and nature can clarify the applicability of information to real world decisions. Despite its pervasiveness and value to decision making, uncertainty is under-represented in the com- munication of scientific information to the non-expert audience who lack prior knowledge of statistics and/or data science (Fischhoff (2012); Frewer and Salter (2002); Schneider and
Moss (1999); Manski (2014)).
The term uncertainty can have different definitions across domains. The different def- initions or meanings of uncertainty are: linguistic imprecision, which arises when one term can be interpreted in several ways, or its meaning is ambiguous; human decision or behavioural uncertainty, which is present in the unknowns about the worldviews, objec- tives and future actions of stakeholders; and statistical or scientific uncertainty, which is uncertainty about facts, for example the uncertainty around tomorrow’s weather forecast
(Kujala, Burgman, and Moilanen, 2013). Each of these types of uncertainty are defined further in the Literature Review (Chapter 2).
This thesis focuses specifically on the communication of statistical uncertainty. Which is the uncertainty related to empirical values, which are quantities that can be measured to some level of accuracy and precision (e.g. birth weight, height, rainfall, cancer incidence, etc
(Begg and Welsh (2014), Kujala, Burgman, and Moilanen (2013)). Statistical uncertainty is represented through a range of methods, including: uncertainty intervals, points estimates
1 CHAPTER 1. INTRODUCTION
± error, semantic versions of uncertainty, standard deviation, standard error, error bars and probabilities. The clear communication of scientific uncertainty is a critical and urgent challenge for statistics, but also for science more broadly (Palmer and Hardaker, 2011).
There is an increasing number of non-expert decision makers demanding data driven insights from the recently popularised big data and machine learning ‘revolutions’ (Olsen,
Martuzzi, and Elliott (1996); Harbage and Dean (1999)). However, without the skills to interrogate the uncertainty of these data derived and modelled insights, the non-expert audience is in danger of misinterpreting them.
The challenge of communicating statistical uncertainty is relevant across all domains of science (Palmer and Hardaker, 2011), particularly if research outputs, and data derived or modelled estimates are to be adopted in real world decisions and planning for indus- try, business and government. This type of data derived information is already being used within decision making across domains, for example farmers collecting data from equipment and environmental sensors which guide their farm management, pest control and forecasting decisions (Bobkoff (2015); Wolfert et al. (2017)), or brand managers using social media analytics to plan marketing campaigns (Schrage, 2016). If this information is perceived as more accurate and reliable than the data or methods warrant, decision- makers may be misguided in their planning. The future decisions for important sectors of our society such as healthcare, education, infrastructure planning and environmental regulations utilise data and modelled insights in their decision making and it is important that the uncertainties within this information are understood and considered.
Acknowledging uncertainty is not only important to decision makers, but is also impor- tant to the public’s perception of science (Frewer and Salter (2002); Gavankar, Anderson, and Keller (2015)). The researcher’s role is to push the boundaries of current knowledge and this means exploring unknowns. The inevitable outcome of the exploration of these unknowns are discrepancies or disagreements between independent studies, an essential step in the development of knowledge and the scientific process. Neglecting to inform the non-expert of the uncertainty in study outputs can lead to a mistrust in science. As independent studies can produce different results, if perceived as contradictory, these results erode the public’s confidence in the ‘scientist’, even though disagreement between
2 CHAPTER 1. INTRODUCTION scientific peers is a natural part of emerging knowledge. A prime example of this is the climate change debate, which is in part fuelled by a lack of acknowledgement and handling of uncertainty (Giles, 2002). Communicating uncertainty to the non-expert enables different research outputs to be seen as developments of knowledge, rather than contradictory findings which invalidate each other and create mistrust and fuel scepticism
(Giles, 2002).
Uncertainty information is one of the key pieces of information used by the scientific com- munity to evaluate the rigour of study outputs. Without an understanding of uncertainty, or at least a respect for its presence, the public audience does not have the tools to make sense of differing research outputs.
Furthermore, science is facing a greater presence of uncertainty both in the questions it targets and the tools it uses for investigation. This is not unexpected as science contin- ues to explore increasingly complex systems and relationships, and as developments in computational capacity have enabled an increased uptake of simulation and stochastic methods, which explicitly use uncertainty to explain the scientific output. Uncertainty is an inevitable piece of metadata and is inseparable from study outputs. Even if it is not in a form that the decision maker can access, uncertainty is an important part of research outputs, and not communicating uncertainty to the end-user is neglecting to report the full story. It is the responsibility of the entire science community to ensure scientific outputs and statistical information are communicated effectively, in order to build a greater appreciation of the uncertainty present in any and all frontier/leading research and to ensure leaders and decision makers can use information appropriately to make decisions for our future.
Uncertainty communication is a difficult challenge to address. Buttenfield (1993) proposes there are three impediments to this challenge: 1. standardised terminology, 2. methods for uncertainty representation and visualisation, and 3. methods for communicating these representations in a meaningful way for the audience (i.e. uncertainty communication).
Impediment 1 is important to address, however is not the focus of this thesis. This thesis aims to contribute to addressing impediments 2 and 3.
3 CHAPTER 1. INTRODUCTION
Much work has been done to address impediment 2. However, while there is a growing body of literature and user studies on uncertainty representation and visualisation, there remains a lack of agreement and many unknowns about how the non-expert understands these uncertainty representation methods (Kinkeldey, MacEachren, and Schiewe, 2014), and which are most effective. This thesis contributes to this growing body of research through a user study which investigates how decision making is influenced by different uncertainty representation methods (Chapter 5: Research Activity 2).
Much less work has been done to address impediment 3. Uncertainty communication.
The literature lacks examples or frameworks, case studies or guidelines for integrating uncertainty representation methods to ensure uncertainty communication. An acknowl- edgement of the importance of the users’ needs in creating effective uncertainty commu- nications is emerging across a range of domains (Davis and Keller (1997); Carroll et al.
(2014); Sanyal et al. (2010)). This thesis considers how the users’ needs can be used in uncertainty communication design. Through the use of the Australian Cancer Atlas as a case study, this thesis aims to contribute to addressing this final impediment and does so through three activities: an evaluation of current practices of uncertainty communi- cation in cancer mapping (Chapter3: Research Activity 1.A); testing a potential tool for diagnosing uncertainty sources (Chapter 4: Research Activity 1.B); extending the NEUVis science communication design framework to include uncertainty, place the user at the centre of the design process, and understand the users’ needs in the context of the ACA and its related uncertainty information (including conducting focus groups to validate user needs) (Chapter 4: Research Activity 1.B).
This thesis uses both quantitative and qualitative methods to contribute to impediments 2 and 3 outlined by Buttenfield (1993) (i.e., 2. methods for uncertainty representation and visualisation, and 3. methods for communicating these representations in a meaningful way for the audience (i.e. uncertainty communication). Firstly, Research Activities 1.A and 1.B, I use qualitative methods to explore the third impediment (uncertainty commu- nication). This first research activity is motivated by the Australian Cancer Atlas (ACA), which I use as a case study. Within this research activity I: conducted a grey literature review of current cancer maps available on the internet between January 2010 to June 2016;
4 CHAPTER 1. INTRODUCTION conducted a project participant workshop to identify target audiences and diagnose the uncertainty sources within the ACA; apply a user centred design framework (NEUVis) to build audience profiles that explicitly consider the impact of uncertainty information, and mapped the users’ needs and current behaviour; and conducted focus groups with target audiences to validate the user profiles including their needs and understanding of uncertainty in cancer mapping. Secondly, within Research Activity 2 (uncertainty representation - impediment 1), I conduct a quantitative user study to evaluate the effect of common uncertainty representation methods on the decision-making behaviour of the non-expert audience. This component utilises an online game that I co-developed. In the game players are presented with a risk of failure and a potential reward if successful.
Players must allocate limited resources to reduce the risk of failure and maximise their potential reward. I use the game to compare the players behaviour across four different uncertainty representation methods. In addition, I also explore the influence of risk level and uncertainty level on players’ behaviour.
This research explores how to communicate statistical uncertainty to non-expert audiences.
I contribute to the last two items on Buttenfield (1993)‘s list of impediments. The successful communication of statistical uncertainty to the non-expert audiences is an important chal- lenge for the statistical sciences as well as science more broadly, and is essential to ensuring that scientific outputs are used appropriately in decision making in the ‘real world’. In this research I contribute to a growing body of research on uncertainty representation by investigating how uncertainty representation methods influence audience behaviour
(Research Activity 2). In addition, I present a case study for uncertainty communication design which identifies uncertainty sources and places the user’s needs at the centre of the communication design process (Research Activity 1.A and 1.B).
5 CHAPTER 1. INTRODUCTION
1.1 Aims and objectives
1.1.1 Aim:
The overall aim of this research is to explore how to communicate statistical uncertainty to non-expert audiences.
1.1.2 Objectives:
I have focused on the following objectives within the field of uncertainty communication:
1. Apply an uncertainty diagnostic framework, user centred design approaches, and
focus groups to inform uncertainty communication design.
2. Investigate the relationship between commonly used uncertainty representation
methods and audience behaviour.
1.1.3 Research Plan:
To achieve my research objectives, I use both quantitative and qualitative methods. Quali- tative workshops, design tools and focus groups place the user at the centre of the design process, and explore an uncertainty diagnostic tool. These are applied through the design of a national cancer atlas as a case study for user centred design for uncertainty commu- nication. Quantitative methods are deployed through a user study, which explores the non-experts’ decision making when presented with different uncertainty representation methods. The user-study contributes to a growing body of knowledge exploring how the common uncertainty representation methods influence behaviour.
These objectives will be achieved through the following research plan.
Objective 1: Use the Australian Cancer Atlas as a case study in the application of the following research activities:
• Investigate and summarise the current practices for generating and visualising
cancer maps.
6 CHAPTER 1. INTRODUCTION
• Explore the use of Morgan & Henrion’s (1990) taxonomy of uncertainty as a tool for
uncertainty diagnosis.
• Explore the application of user centred design (NEUVis design framework) for
informing uncertainty communication design.
• Conduct a range of focus groups to understand the target audiences of a national
cancer atlas, including identifying audience needs, current behaviours and current
understanding of existing uncertainty communication methods in cancer maps.
Objective 2:
• Design and build an online game to quantitatively investigate the relationship
between behaviour and different uncertainty representation methods, including:
uncertainty intervals, point estimates ± error and semantic versions of uncertainty.
1.2 The Australian Cancer Atlas
The Australian Cancer Atlas (ACA) is an Australian-first research study aimed at under- standing national patterns in cancer incidence, survival and screening practices based on where people live. The development of the Atlas will allow health agencies, policy makers and the community to understand the location and resource requirements for the most common cancers in Australia. The outcomes of this thesis aim to inform the design of communication material that clearly and efficiently communicates the insights from the
Australian Cancer Atlas project to a wide range of audiences.
The Australian Cancer Atlas brings together data from Australia’s state cancer registries and utilises cutting edge spatial statistical methodologies to investigate spatial variation in key cancer indicators. The project is a partnership between Cancer Council Queensland,
The Queensland University of Technology, the Australian Institute of Health and Welfare and the Australia and New Zealand Cooperative Research Centre for Spatial Information
(CRCSI).
The outputs from Australian Cancer Atlas project will be communicated to a wide range of audiences including, policy makers and health managers, cancer patients, general public
7 CHAPTER 1. INTRODUCTION and media. Inclusion of statistical uncertainty within these cancer maps is a key challenge within the development of the published atlas.
1.3 Research contributions
This research provides a unique case study demonstrating the use of the NEUVis design framework for uncertainty communication design, as well as the use of a taxonomy of uncertainty sources for diagnosing and mapping uncertainty sources with project stake- holders. This research provides a case study for the use of design thinking tools and focus groups in uncertainty communication design. In addition, this research contributes to a growing body of user studies which investigate uncertainty representation methods, and makes a unique contribution by exploring how commonly used methods for representing measures of uncertainty influence behaviour.
1.4 Thesis structure
This thesis has three main sections; a Literature Review, Research Activity 1 - user centred design of uncertainty communication in cancer mapping, and Research Activity 2 - User study: Uncertainty representation in an online game.
The literature review summarises the current peer reviewed literature around uncertainty classification, uncertainty sources, uncertainty representation and communication meth- ods, user centred design for science communication to the non-expert audience and spatial epidemiology.
The first research activity of this thesis is further broken into two sections and explores uncertainty communication in the context of the Australian Cancer Atlas. Firstly, Research
Activity 1.A investigates current practices in cancer map visualisation and cancer mapping through a grey literature review. Secondly, Research Activity 1.B, uses qualitative methods including an uncertainty diagnosis tool, design thinking tools and audience focus groups, to explore the users’ needs and current information seeking behaviour. Research Activity
1 concludes with a summary of design insights developed through this process, which will inform the design of the Australian Cancer Atlas.
8 CHAPTER 1. INTRODUCTION
The second part of this thesis, Research Activity 2, uses an online game to investigate the impact of uncertainty representation methods on players’ behaviour. The online game places the player as the quartermaster on a pirate ship preparing to attack three on-coming ships and steal their treasure. Each on-coming ship has: an estimate risk of defeat, a measure of the uncertainty around the estimate (high, medium or low), and a potential reward. The player has limited resources and they must allocate those to reduce their risk of defeat, thus making trade-offs and decisions between risk, uncertainty and rewatd within the options available. The collected interactions from the game were analysed to investigate changes in allocation behaviour and performance across game mode, risk level and uncertainty level.
9
Chapter 2
Literature Review
Uncertainty is present in all aspects of life, and influences decisions in government, industry and the personal lives of individuals. In the quantitative sciences, uncertainty is a valuable piece of metadata that can guide the decision maker about how the available information should be used to inform decisions (Davis and Keller, 1997). As more decision makers without training in quantitative methods are using data driven insights, it is important to develop methods that enable these new audiences to access the uncertainty information associated with these insights. Developing effective methods for uncertainty communication is a critical challenge that is important to both statistics and science more broadly. This review begins with an outline of the different types of uncertainty and focuses this thesis on statistical uncertainty (Section 2.1). Section 2.2, then discusses why uncertainty communication is important to the decision maker. Section 2.3 presents the current literature on uncertainty communication and highlights the challenges or gaps that still exist in this area. Finally, Section 2.4, gives an overview of spatial epidemiology and disease mapping, including a summary of the literature on uncertainty communication specific to this domain. This final section is included in order to support the use of the
Australian Cancer Atlas as a case study in this thesis.
The non-expert audience
This research project is focused on the non-expert audience, which we define as the user that does not have training in statistical methods or quantitative science (Gough, Wall,
11 CHAPTER 2. LITERATURE REVIEW and Bednarz, 2014). Despite their lack of training in quantitative methods, the non-expert audience still consumes scientific and modelled insights on a daily basis to make decisions, allocate resources and plan for the future. These decision makers are throughout our communities. Government and policy-makers use modelled information and data to inform policy, for example, the management of irrigation allocations in the Murray darling
Basin Plan1, setting water quality guidelines for the Great Barrier Reef (Authority, 2010), or allocating resources of remote cancer patients to travel to the city to receive treatment.
Beyond government, industry also demands access to insights from the ‘big data revolution’ to guide decision making such as: target marketing campaigns, evaluate technology safety and develop economic forecasts (Yin and Kaynak, 2015). And finally, the general public also use quantitative information, such as: planning their daily commute based on weather forecasts, lifestyle choices based on the latest health research or investment decisions guided by economic or financia data. While these audiences are not trained in quantitative methods, statistical and modelled insights and information can still help them make more informed decisions.
I make a special note that, uncertainty communication is also important to the expert- audience. There remains much work to be done in order to standardise methods for including uncertainty in science communication and visualisation within the research community, particularly when communicating across domains, as terms to describe uncertainty can be used differently (Cummings, Fidler, and Vaux, 2007). However, when the intended audience understands the modelling process, the communication challenge is different. Within this thesis we focus on the non-expert audience.
2.1 Uncertainty
The term uncertainty is used to refer to a range of concepts. These include: different views, imprecision, error, subjectivity, non-specificity, a lack of knowledge, or a state of being (Aerts, Clarke, and Keuper (2003); Pang, Wittenbrink, and Lodha (1997); Deitrick and Edsall (2008); Thomson et al. (2005); Han et al. (2011)). While there is no single agreed classification for the different types of uncertainty, Kujala, Burgman, and Moilanen
1https://www.mdba.gov.au/basin-plan/developing-basin-plan/science-behind-basin-plan
12 CHAPTER 2. LITERATURE REVIEW
(2013) define three broad categories: linguistic imprecision, human decision/behavioural uncertainty, and scientific uncertainty. Han et al. (2011), adds more details to the last category of scientific uncertainty, by defining two subcategories of aleatoric uncertainty, and epistemic uncertainty. Each of these is defined below.
Linguistic imprecision refers to occasions where one term can be interpreted in several ways, or its meaning is ambiguous (Kujala, Burgman, and Moilanen, 2013). For example, the meaning of the world ‘love’ in the sentence “I love my partner” compared to “I love chocolate” is similar but undeniably different. “Love” in these two contexts is not the same. Human decision or behavioural uncertainty is defined as the uncertainty about the worldviews, objectives and future behaviours of a stakeholder (Kujala, Burgman, and
Moilanen, 2013). For example, the future policies of the US President are unknown. Finally, scientific (aka statistical) uncertainty relates to empirical quantities that can be measured to some level of accuracy and precision (e.g. birth weight, height, rainfall, cancer incidence, etc (Begg and Welsh (2014), Kujala, Burgman, and Moilanen (2013)).
Scientific/statistical uncertainty includes two subcategories: aleatoric and epistemic uncer- tainty (Han et al., 2011). Aleatoric uncertainties are a result of the underlying randomness within the model, or the processes that are being modelled. They cannot be reduced by gathering more information or additional data, and are a result of the fundamental irreducible randomness natural events (Hora, 1996). Epistemic uncertainties on the other hand, are presumed to be due to a lack of knowledge, and can be reduced by gathering more data or refining the model (Han et al., 2011).
In this thesis, statistical and scientific uncertainty are considered interchangeable, and I use statistical uncertainty from here onwards in order to be consistent. Future reference to uncertainty refers to statistical uncertainty unless otherwise specifically stated.
2.1.1 Focus of this Study: Statistical Uncertainty within a modelling context
This thesis focuses on statistical uncertainty, and does not consider behaviour and linguis- tic uncertainties any further. Specifically, in this thesis, I focus on the communication of statistical uncertainty within a modelling context.
13 CHAPTER 2. LITERATURE REVIEW
Modelling is a simplified mathematically-formalised approach to approximating reality.
Models are used to estimate/predict a future event (e.g. tomorrows weather). Statistical uncertainty arises when some relevant information about the phenomena of interest is not definite, known or reliable (Thunnissen, 2003). Every model includes uncertainty as it is not feasible or realistic to accurately identify and measure all the system processes or all model parameters or have 100% accurate data.
Before progressing to the next section, it is important to clarify a few terms that can be confused within the uncertainty literature. These are: risk versus uncertainty and variability (a.k.a. natural variation) versus uncertainty. I also note that in this research
I do not consider uncertainty in the context of quantum mechanics, as defined by the
Heisenberg’s uncertainty principle (Hughes, 1989), but focus on uncertainty as related to empirical quantities, where empirical quantities are properties of the real-word that can, in principle be measured to some level of accuracy (e.g. birth weight, height, rainfall, cancer incidence, etc) (Begg and Welsh, 2014).
Risk vs Uncertainty
In general terms, uncertainty refers to a future event or available information that is unsure.
Uncertainty does not infer any positive or negative outcome, simply that the outcome is not known or not knownable. Alternatively, risk is the possible negative outcome of an uncertain situation or event (Bedford and Cooke, n.d.). It is important to note that these terms are related but different.
Variability vs Uncertainty
Variability, also known as natural variation, is one of numerous sources of uncertainty.
It can be used as a measure of uncertainty, but it is not the only source or measure of uncertainty. Variability arises when multiple measurements of an event or phenomenon are observed, and is a natural feature of a dataset. Variability attempts to describe one characteristic of a population Take for example, the birth weight of babies born in Australia in 2015. The weight of each baby (observation) is different, and the range from the smallest to the largest baby describes the normal weight range for a new born. Investing in better scales, or collecting more observations may improve the data, and create a more accurate
14 CHAPTER 2. LITERATURE REVIEW mean, but there will always be a range, because new born babies are not all the same. This range is due to natural variation (Morgan and Henrion, 1990).
In contrast, if I wanted to predict the birthweight of your next child, I could use the
2015 data to make a good guess, but I could only give a prediction range, and could not guess with 100% accuracy a specific weight. The birthweight of your next child contains uncertainty as the event has not occurred yet. I don’t know where, within the prediction range, your child will fall. Natural variation/variability is a source of uncertainty, and is sometimes used as a measure of uncertainty, but the terms cannot be used interchangeably.
Much of the confusion between uncertainty and variability arises due to the fact that distributions are used to describe both. Variability is quantified using a frequency dis- tribution derived from measurements or observations (i.e., data) and is a feature of an observed population. Uncertainties are often quantified using probability distributions or probabilities and are used to describe an unobserved or unobservable population. The confusion is often compounded by the fact that frequency distributions, derived from an observed population, are often used to inform probability distributions.
2.1.2 Sources of statistical uncertainty
Several authors have attempted to categorise the different sources of statistical/scientific uncertainty. However, there is no agreed taxonomy. Some taxonomies are generalised
(Kahneman and Tversky (1982); Morgan and Henrion (1990); Walker et al. (2003); Funtow- icz and Ravetz (1990)), and try to provide a typology that can be applied to any domain.
Table 2.1 outlines the key categories and references for a majority of these taxonomies.
Alternatively, others have developed domain specific taxonomies, for example: Regan,
Colyvan, and Burgman (2002) outlined uncertainty sources in ecology & conservation biology; Skinner et al. (2013) and Burgman (2005) map uncertainty sources specific to environmental risk assessment and management; Beven et al., (2015) in hydrology; Han et al. (2011) in health; Thomson et al. (2005) in intelligence analysis & geospatial data;
Ristovski et al. (2014) in medical imaging; Ramirez, Jensen, and Cheng (2012) in software engineering; and Landesberger, Bremm, and Wunderlich (2017) in geolocated graphs.
15 CHAPTER 2. LITERATURE REVIEW
Of these generalised taxonomies, I consider two that could be used as tools to map out the uncertainty sources within a specific project that was defined by Morgan and Henrion
(1990) and another defined by Funtowicz and Ravetz (1990). The taxonomy defined by
Morgan and Henrion (1990) is the most comprehensive of the uncertainty taxonomies listed in Table 2.1 and defines the following sources of uncertainty: measurement error, systematic error, variability (natural variation), approximation, disagreements, inherent randomness, or linguistic uncertainty 2 (each of these are defined in section 2.1.3). Alterna- tively, Funtowicz and Ravetz (1990) classify uncertainty sources in terms of their location rather than the type of knowledge that is lacking. Within the Funtowicz and Ravetz (1990) framework, uncertainty arises from eith the data or the model. Uncertainty that arises from the data refers to the quality or appropriateness of the data used as inputs, while uncertainty that arises from the model can be due to numerical approximations used in the mathematical representation of a processes or relationship, or the assumptions and errors in the model structure. It is easy to see how each of Morgan and Henrion (1990)’s sources of uncertainty can also be classified by their location. This is demonstrated in
Table 2.2.
In this thesis we consider the taxonomy outlined by Morgan and Henrion (1990) to be more comprehensive, and provide valuable detail that is not apparent using the location classification provided by Funtowicz and Ravetz (1990). For example, in Table 2.2 both measurement error and systematic error are classified as arising from the data; however, they are very different sources of uncertainty, which require different actions if they are to be reduced, and may have different impacts on the interpretation of the modelled outputs.
Measurement error can be addressed through the improvement of equipment or data collection methods, while systematic error creates bias and is very difficult to address.
The extra depth and detail provided by Morgan and Henrion (1990) may be important in communication and decision making.
2Linguistic uncertainty is not a source of scientific uncertainty but is included by Morgan and Henrion (1990) due to the potential impact it can have on communicating scientific information.
16 CHAPTER 2. LITERATURE REVIEW
Table 2.1: Taxonomies of sources of scientific uncertainty identified in the academic literature.
Typology categories Reference
1. The data, the model, completeness Funtowicz and Ravetz (1990)
2. Internal vs external Kahneman and Tversky (1982), Han et al.
(2011)
3. Ignorance Lipshitz and Strauss (1997)
4. The scientific pipeline
(data acquisition > transformation > Pang, Wittenbrink, and Lodha (1997),
modeling > data visualisation) Johnson and Sanderson (2003), Brodlie,
Osorio, and Lopes (2012)
5. Location, level and nature Walker et al. (2003)
6. Context, inputs (data) and model Refsgaard et al., (2007)
7. Data uncertainties and modelling Funtowicz and Ravetz (1990)
uncertainties
Table 2.2: Morgan and Henrion (1990)‘s uncertainty sources classified in terms of Funtowicz and Ravetz (1990)’s location categories of the data or the model. ’Other’ refers to uncertianty sources that do not map to either the data or the model.
The data The model Other
* Inherent randomness
Measurement error * Model Structure * Natural variability
Systematic error * Disagreement
* Model uncertainty
2.1.3 Uncertainty Sources from Morgan and Henrion (1990)
The following section provides a summary of each of the uncertainty sources described by
Morgan and Henrion (1990).
1. Uncertainty induced by measurement variation is the most studied and best under- stood source of uncertainty and arises from random error in direct measurements of a
17 CHAPTER 2. LITERATURE REVIEW quantity. Imperfections in the measuring instruments and observational techniques in- evitably give rise to variations from one observation to the next. The resulting uncertainty depends on the magnitude of the variation between observations and the number of observations taken. Measurement error can be estimated by statistical methods, and there are a variety of well-known techniques for quantifying this uncertainty, such as standard deviation and confidence intervals.
Example: Imagine you were measuring the weights of 100 marathon athletes.
There could be variation in the weights of each athlete at different times and
variation between the scales used to measure them.
2. Uncertainty induced by systematic error (subjective judgment or bias) occurs when measurements are biased. Bias is defined as the difference between the true value of a parameter and the measured or estimated value of that parameter. In a sampling setup, this estimated value may be the value to which the mean of the measurements converge as the sample size increases. Measurements that have systematic error do not vary randomly around a true value. Systematic errors can also result from a scientist’s decision to exclude
(or include) data that should be included (or excluded). This can be viewed as subjective judgment bias. The only way to deal with such an error is to recognize a bias in the experimental procedure and remove it. Systematic error can also result from consistent unintentional errors in calibrating equipment or recording measurements.
Example 1: In 1997 the EPA implemented regulations that limited permissible
levels of pollutants. This was in part based on the result of a statistical analysis
that demonstrated that when daily levels of soot rise, slightly more people
die from heart and lunch disease. In 2002 the US EPA discovered that studies
linking deaths to very fine particulate matter (such as diesel exhaust) were
biased due to a default setting in a computer program. The correction resulted
in a revision of the estimate from a 0.41% rise in mortality per 10µg/m3 of
fine particles to 0.27% increase (Kaiser, 2002; Figure 2.2). As a result, the EPA
allowable emissions standards changed (Burgman, 2005)
18 CHAPTER 2. LITERATURE REVIEW
3. Variability (natural variation) occurs naturally in a measured quantity over time and space and is a feature of the natural world. Variability is a descriptive feature of an observed population. Variability arises when multiple instances of an event or phenomenon are observed, and is a natural feature of the population being studied.
Equipment accuracy and measuring methodologies can influence the variability of ob- served data, but variability cannot be removed by improving methodologies and measure- ment accuracy or collecting more observations. Variability is considered predominantly due to irreducible natural variation and in itself not uncertainty. However, is a source of uncertainty when we use observed information from a population to predict or estimate an unobserved event or measured quantity. The natural variability of Queensland birth weights in 2015 gives rise to uncertainty when this information is used to prediction of the birth weight of the next baby born in Queensland.
Example: The birth weight of babies born in Australia, in 2015. The weight
of each baby is different, and the range from the smallest to the largest birth
weight is due to natural variation. This variation gives rise to a natural and
expected birth weight range for newborns from a specific population.
4. Model uncertainty arises because models are abstractions of reality. Some less in-
fluential variables and interactions are left out, and the shapes of the functions used to describe the system processes are always abstractions of the real processes. The current understanding of the system and its processes may be incomplete, and the shapes of the corresponding functions and their parameter values are only best estimates. Models may be used to further understand the structure of the system of interest, to predict future events or to answer questions about a system (Burgman, 2005).
Model uncertainty arises in two main ways. Firstly, there is uncertainty of the model parameters. This can be accounted for in probabilistic models, with careful consideration of the range of possible values and their probabilities. Secondly, there is uncertainty about the model structure, i.e. uncertainty about cause-and-effect relationships. This can be very difficult to quantify (Regan, Colyvan, and Burgman, 2002).
19 CHAPTER 2. LITERATURE REVIEW
Example: Growth models that predict the abundance of populations do not
include parameters for describing weather events, such as rainfall, because
modellers believe that weather is not sufficiently important to understanding
population dynamics to be explicitly incorporated in the model (Eberhardt,
1987).
5. Approximation gives rise to uncertainty by introducing simplified abstractions of the real-world system into the model. Spatial and temporal resolutions of a model are approximations, and so is the resolution in terms of time intervals and grid size.
Approximations are used due to a lack of knowledge about a specific feature of one aspect of the system being modelled or due to computational limitations.
6. Disagreements give rise to uncertainty when researchers must select between statistical, computational and mathematical methods and techniques that may not be agreed upon within the scientific community. Disagreement may also be a source of uncertainty due to differing interpretations of scientific information.
7. Inherent randomness can be thought of as being innate. This is linked to 1 above.
Within modelling setups that “predict forward” to learn about a system and its outcomes, no matter how well we know the process and the initial (starting) conditions, we cannot be certain of what the outcome will be. This kind of randomness can often be quantified very well, and is easy to deal with in probabilistic models. There is still much debate in the mathematical and statistical literature about whether inherent randomness is in principle reducible or not (epistemic vs aleatoric uncertainty), a phenomenon of the natural world or a by-product of our lack of knowledge about a system or relationship, and its starting conditions.
8. Linguistic uncertainty (linguistic imprecision) arises because both natural and scien- tific language can be interpreted in several ways, or an event is ill-defined. Linguistic uncertainty can be classified into five distinct types: context dependence, ambiguity, in- determinacy of theoretical terms, and under-specificity. All of these uncertainties arise in natural and scientific language, and can impact the interpretation and application of scientific insights to real world decision making.
20 CHAPTER 2. LITERATURE REVIEW
As stated above, linguistic uncertainty is not a source of scientific uncertainty, but it is included here as it is important for science communication.
2.2 Why is Uncertainty Information important to decision-makers?
Ignoring uncertainty can lead to misinterpretation of statistical information and consid- ering its presence, possible sources, magnitude and potential impact can lead to more informed decision making. Uncertainty communication to the non-expert audience is being recognised as a critical challenge across a wide range of domains. This increase in recognition is driven by a range of factors including: the increasingly sophisticated systems and relationships that scientists are researching, an increase in simulation based tools, a movement towards stochastic rather than deterministics apporoaches and an in- crease in the number of non-expert decision makers demanding insights from the ‘big data’ and ‘machine learning’ revolutions. These non-expert decision makers are in government, business and industry and their decisions impact us all in terms of policy, health care, in- frastructure, environmental regulations etc. Ensuring that these audiences understand the uncertainty of data driven and/or statistically derived information is a critical challenge for all of science.
The very nature of using quantitative methods to derive information about a future or un-measurable event, means that complete knowledge is not available. As highlighted by
George E. P. Box in his famous quote “All models are wrong, but some models are useful” (Box,
1979), incomplete knowledge can be valuable, even though it is uncertain. In practice, decision makers are very aware of the un-avoidability of uncertainty, and potentially view uncertainty as integral to the definition of a problem (Brugnach et al., 2008). Rather than seeing uncertainty as a problem that must be dealth with, perhaps it is more useful to reframed uncertainty informaton as a tool that allows the available information to be used efficiently, when what is available is limited or incomplete.
The importance of uncertainty to decision makers is recognised across domains
The idea that uncertainty should be communicated to non-expert audience is not new
21 CHAPTER 2. LITERATURE REVIEW
(Manski, 2015), Morgenstern argued forcefully for error estimates to be published along- side official economic statistics in the 1960s (Morgenstern et al., 1963). However, recogni- tion of the importance of this challenge has increased in popularity in recent years. This is happening across a wide range of domains, in fields such as spatial analyses (Hunter and
Goodchild, 1996), weather forecasting (Joslyn and Savelli, 2010), healthcare (Sanyal et al.
(2010)), decision science (Grubler, Ermoliev, and Kryazhimskiy, 2015), epidemiology (Car- roll et al., 2014), disaster management (Eiser et al., 2012) and policy development (Morgan and Henrion, 1990) as well as many others. For some disciplines such as the geographic information science and forecasting, understanding, quantifying and communicating uncertainty is even considered the dominant challenge in the discipline (Harrower and
Street (2003); Grubler, Ermoliev, and Kryazhimskiy (2015)). Further to this, the ubiquitous nature of this challenge is also beginning to bring together a multidisciplinary cohort to discuss and compare methods, such as in the 2010 Discussion Meeting at the Royal Society of London (Palmer and Hardaker (2011)).
The acknowledgement of the importance of uncertainty is not only present in the research domains. The past decade has seen a growing awareness of uncertainty in policy-making, and a recognition that policies that ignore uncertainty in the long-run often lead to unsatisfactory technical, social, and political outcomes (Morgan and Henrion, 1990). The conservation and environmental sciences that inform environmental management argue for explicit assessments of uncertainty in environmental data and models as a necessary, although not sufficient, condition for balancing uncertain scientific arguments against uncertain social, ethical, moral and legal arguments, and in managing environmental systems (Brown (2004); Uusitalo et al. (2015)). Additionally, health care workers are grappling with the role of scientific uncertainty within informed decision making between clinicians and patients. Understanding the certainty of evidence-based medicine becomes critical when people are making decisions about their future health outcomes and risks
(Han et al., 2011). However, this isn’t an easy challenge and many clinicians struggle with methods for how to communicate statistical uncertainty and how this information should impact the informed consent process (Politi, Han, and Col, 2007). These recognitions signal a paradigm shift, moving from deterministic modelling and its quest for “optimality” to
22 CHAPTER 2. LITERATURE REVIEW the concept of “robust” decision making that takes uncertainties explicitly into account
(Grubler, Ermoliev, and Kryazhimskiy, 2015).
Ignoring the presence of uncertainty can lead to misinterpretation of information
One of the dangers of not communicating uncertainty is that estimates can appear more reliable than is warranted by the data or the model from which they are derived from may warrant. For example, user studies in the GISciences have demonstrated that audiences interpreting data generated graphics, where the data contains some level of missingness, tend to misinterpret results but do so with the same level of confidence as when the data are complete (Eaton, Plaisant, and Drizd, 2005). Similarly, issues of misinterpretation have also been highlighted in the interpretation of official economic statistics (Manski, 2014), which can have broad reaching implications. For example, a central bank monitoring statistics on
GDP growth, inflation and employment may mis-evaluate the state of the economy and consequently set inappropriate monetary policies (Manski, 2014). Several authors have demonstrated that only understanding the average, or point estimate, of a model output can lead to less robust decision making than understanding the underlying processes
(O’Hagan (2012); Uusitalo et al. (2015)). An additional example is within modelling potential future climate change scenarios, where neglecting to include uncertainty can suggest to policy makers that all scenarios are equally likely, which is often not the case
(Schneider and Moss, 1999; Giles, 2002).
The dangers of ignoring uncertainty are not restricted to estimates being misinterpreted.
Not handling uncertainty communication explicitly can have a detrimental impact on the public’s perception of science. This is clearly also seen in the climate change debate, where the neglect of clear communication of uncertainty is in part responsible for the backlash by climate sceptics (Schneider and Moss, 1999). The absence of adequate uncertainty in reported estimates can be detrimental to the credibility of the scientific outputs because it increases public distrust in the motives of researchers, risk regulators, and scientific advisors (Frewer and Salter (2002); Miles and Frewer (2003); Scheufele and Lewenstein
(2005)).
23 CHAPTER 2. LITERATURE REVIEW
2.3 Communicating Statistical Uncertainty
Buttenfield (Buttenfield, 1993) proposes three challenges to effective uncertainty commu- nication. They are:
1. standardisation of terminology,
2. methods for measuring and representing uncertainty, and
3. methods for depicting uncertainty simultaneously alongside the estimates/data it
relates to in an understandable, useful, usable and meaningful way.
I have used these three impediments to organise the literature detailed in the following section.
2.3.1 Impediment 1: Standardisation of terminology
Uncertainty terms can be interpreted differently by different people in different circum- stances (Morgan and Henrion, 1990). A call for a standardisation of uncertainty termi- nology has been made by Buttenfield (1993) and Schneider and Moss (1999); however, suggestions in the literature for addressing this challenge either generally or within specific domains are limited.
Schneider and Moss (1999) have been very vocal in the climate science community for standardisation of quantitative and qualitative uncertainty terminology. They have been calling for a common language when describing results. For example, Schneider and Moss
(1999) prescribe that when referring to probabilities within the climate change literature, semantic description of the uncertainty should be mapped explicitly to the numerical probabilities. In total they recommend five categories that map to numeric probabilities, from “very low confidence” (less than 5%) to “low confidence” (5% to 33%), “very high confidence” (95-100% confidence), etc., and advocate these to be used in all climate science communications. See Figure 2.2 for terminology suggested for quantitative estimates.
(Source: Figure 3 from Schneider and Moss (1999).
Since standardised terminology is not a focus of this research I do not explore this beyond acknowledging its importance in the wider uncertainty communication challenge.
24 CHAPTER 2. LITERATURE REVIEW
Figure 2.1: Scale for assessing state of knowledge, source: Schneider and Moss (1999), Figure 3.
2.3.2 Impediment 2: Uncertainty representation
As outlined by Buttenfield and Beard (1991), standardised and effective methods for encoding/representing uncertainty is a current impediment to communicating to the non-expert audience. While much work has been done in this space, there still remains a lack of agreement about the effectiveness of these representation methods, or which methods should be used with the non-expert audience. Different researchers have taken different approaches to investigating uncertainty representation. Some have focused on evaluating the effectiveness of methods commonly used to represent uncertainty such as intervals, probabilities and frequencies. The geospatial domain, which has a long history in innovation in data visualisation, has explored the effectiveness of new visual variables for encoding uncertainty such as fuzziness, transparency and others. Other researchers have explored other aspects of the display rather than the symbols within the display, such as interactivity, representing scenarios and display comparisons. Many of these have been made possible due to technological advances. Finally, worth noting is the work in the climate change literature which has suggested a new visualisation for making disagreement between experts more transparent, which is a type of uncertainty not usually measured or communicated. Each of these is discussed in further detail below.
25 CHAPTER 2. LITERATURE REVIEW
High jacking the human visual communication channel Most uncertainty representa- tion research uses visualisation to address this challenge. Visualisation is a proven channel for communicating data driven insights to non-expert audiences, and can represent com- plex and dense information in a single view (Tufte (1983); Cleveland and McGill (1984)).
The human visual system is a very high-bandwidth channel to the brain, with a signifi- cant amount of processing occurring in parallel and at the preconscious level (Munzner,
2009). Data visualisation enables an audience and a communicator to take advantage of the highly evolved and sophisticated analysis capabilities of the human visual system
(Munzner, 2014). Visualisation is not the only method for representing uncertainty to a non-expert audience. For example, the semantic definitions of uncertainty above (Figure
2.1) are non-visual approaches to representing a measure of uncertainty in a form that a non-expert can interpret. These semantic versions of representing uncertainty are far less researched than the visual representations.
Intervals
Intervals are commonly used to represent uncertainty including; confidence intervals, credible intervals and predictive intervals. These have been more commonly researched than other existing representation methods. They are widely used in domains such as
finance (e.g., Du et al. (2011)) and weather forecasts, which are commonly utilised by the general public. User studies investigating intervals have shown that even when audiences are given a deterministic forecast (a point estimate rather than a range/interval), they expect information derived from data or future predictions to contain uncertainty
(Joslyn and Savelli (2010); Lazo, Morss, and Demuth (2009); Morss, Demuth, and Lazo
(2008)), and that audiences prefer ranges over deterministic information (Du et al., 2011).
Audiences find the informativeness of the interval more useful than the precision of a deterministic representation (point-estimate) (Yaniv and Foster, 1995). An interval quantifies the uncertainty the user knows is present, and therefore may narrow the range of expected values by giving the user guidance on where the bounds of that uncertainty are, rather than leaving them to guess.
Further to this, studies have shown that non-expert users are capable of effectively utilising predictive intervals without extensive training (Savelli and Joslyn, 2013), and studies have
26 CHAPTER 2. LITERATURE REVIEW shown that non-expert users provided with predictive intervals in weather forecasts are more able to identify uncertain forecasts, and are more decisive, compared to users provided with deterministic forecasts (Joslyn and LeClerc, 2011; Savelli and Joslyn, 2013).
Some research on intervals, however, is contradictory; for example, intervals have been shown to commonly lead to overconfidence (Alpert and Raiffa (1982); Soll and Klayman
(2004)). Furthermore, at least one study has shown that even researchers (who are quite familiar with uncertainty and uncertainty visualisation) frequently misunderstand visual representations of confidence intervals and errors bars (Belia et al., 2005). Finally, a comparison of text and visual intervals has shown that text versions are less open to misinterpretation (Savelli and Joslyn, 2013).
Frequencies and probabilities
Frequencies and probabilities are also common forms of encoding uncertainty. Some research suggests that frequencies (e.g. 1 in 10) rather than probabilities (e.g. 10% chance) improve performance for some problems (Gigerenzer and Hoffrage (1995); Hertwig and
Gigerenzer (1999)). Specifically, probabilities have been shown to be better understood compared to frequencies in weather forecasts (Joslyn and Nichols, 2009). David Spiegelhal- ter has done significant work in summarising the current methods for the visualisation of probabilities, relative frequencies and other graphs for the non-expert audience (Spiegel- halter, Pearson, and Short (2011)).
Exploring new visual variables for encoding uncertainty
The geospatial sciences have a long history in innovation in visualisation. An example is the work by Bertin (1981), who developed a catalogue of visual variables that could be used in mapping to encode quantitative variables. The uncertainty research has also explored
Bertin’s visual variables (Bertin, 1981) for their application in uncertainty representation.
These visual variables are: location, size, value, texture, colour, orientation, and shape.
Edge crispness (fuzziness), fill clarity, fog, resolution, transparency, saturation, boundary
(thickness, texture, and colour), transparency and animation have subsequently been suggested as specific visual variables for representing uncertainty (MacEachren (1992);
Slocum et al. (2008); Gershon (1992)), while other researchers have explored shapes and symbols (Newman and Lee, 2004; Sanyal et al., 2010).
27 CHAPTER 2. LITERATURE REVIEW
Much research has been done to explore the effectiveness of these visual methods of uncertainty representation (Brodlie, Osorio, and Lopes (2012); Potter et al. (2012); Zuk,
Carpendale, and Glanzman (2005); Pang, Wittenbrink, and Lodha (1997); Johnson and
Sanderson (2003); Johnson (2004); Evans (1997); Wittenbrink et al., 1996; Aerts, Clarke, and Keuper (2003); Sanyal et al. (2010); Leitner and Buttenfield (2000)). Despite this body of research, there are still contradictory results, and more work is needed to validate these methods (Buttenfield and Beard (1991); MacEachren (1992); Sanyal et al. (2010)).
Additionally, what the user is using the information for, as well as the medium they are accessing the information on, is being shown to effect the users interpretation of the represented uncertainty (Sanyal et al. (2010); Leitner and Buttenfield (2000)).
A full review of these studies is outside of the time and resources of this thesis. However, these studies may be of use to the development of the Australian Cancer Atlas, therefore additional research from this domain is included in AppendixA.
Cognitive overload
One of the concerns of adding more visual variables to data visualisation for the non- expert is cognitive overload (McGranaghan, 1993). Cognitive overload results when the user reaches an information threshold, beyond which they are not able to make sense of the information. This concern is not unfounded; complexity and density of representation methods seem to overwhelm novice decision makers, while experts are able to use the detail more readily in decision-making (Cliburn et al., 2002). Technological advances may provide a solution to this through the use of interactivity, animation, dynamic displays and sound, by providing additional channels for representing uncertainty information without interfering with the ability to see the features that are present in the display (MacEachren,
1992).
An un answered question in uncertainty representation and communication is whether uncertainty information is viewed differently to data generally (Sanyal et al., 2010). Some researchers believe that data quality is a particular example of data and should, therefore, be treated and visualised as data in general (McGranaghan (1993)). Others (Buttenfield and Beard (1991); Buttenfield (1993)) believe that people do not associate data quality in the same way they associate data in general and therefore data quality needs to be
28 CHAPTER 2. LITERATURE REVIEW visualized differently. This may also support the use of other display features such as interactivity, dynamic features, side by side comparisons and scenario representation, rather than encoding uncertainty using the visual variables described above.
Dynamic & interactive features
Technological advances in visualisation tools, interactivity and animation, as well as the capabilities of online platforms, fundamentally influences the types of approaches that can be used to visualise and communicate uncertainty. These technologies enable research to extend beyond the visual variables offered by Bertin (Davis and Keller, 1997), such as duration of a flashing symbol (DiBiase et al., 1992), and sound (Fisher (1994); Krygier
(1994)). One advantage of dynamic displays is that the dynamic feature can draw the reader’s attention to the areas of uncertainty, which the novice user often ignores (Davis and Keller, 1997).
Additionally, interactivity allowing the user to toggle between the estimates and un- certainty information provides an additional mechanism for the user’s comprehension
(MacEachren (1992); Van der Wel, Hootsmans, and Ormeling (1994)). These methods still require user testing to determine their effectiveness. Technology does not always enhance communication or meet the users’ needs, and in some cases, users actually prefer static maps over dynamic displays (Aerts, Clarke, and Keuper, 2003).
Display comparisons
Within mapping, Pang, Wittenbrink, and Lodha (1997) suggest that the way in which these visual variables are integrated into a map can be categorized into three groups: over- loading (bivariate maps), side-by-side comparison (map pairs), or seamless integration
(MacEachren, 1992). These approaches are also relevant to any non-mapping contexts.
The three broad visualisation approaches are defined as:
Overloading (Bivariate Maps) - report data and the associated uncertainty information within one map. Overloading is an approach that augments a base visualisation technique with an uncertainty visualisation technique but the data and uncertainty information are clearly separable. This is probably the most popular mechanism for uncertainty visualisation.
29 CHAPTER 2. LITERATURE REVIEW
Side-by-side comparison (Map Pairs) - two similar maps presented side by side. One map shows the data, and the other shows the associated uncertainty.
Seamless integration - the data and the uncertainty are displayed in a unified rendering.
Unlike the overloading approach in which uncertainty is superimposed on the graphical representation of the dataset, the seamless integration approach directly includes (i.e., integrates) the uncertainty in the visualisation rendering.
Among these three types of methods, research suggests that bivariate maps are the most popular approach (Pang, Wittenbrink, and Lodha, 1997). Multiple studies have found this method to be easier to interpret and enhances the users’ performance compared to separate maps (Kubicek & Sasinka, 2011; Viard et al., 2011; Evans, 1997).
Representing confidence when there is disagreement
As discussed above, some forms of uncertainty do not have a current quantitative measure but are much more ambiguous. For example, knowledge gaps in a domain, consensus among peers, discrepancies between independent studies, model uncertainties or disagree- ment between researchers, can be indicators of incomplete knowledge. However, this type of uncertainty cannot be summarised using error bars, probability distributions or other quantified measures. For these sources of uncertainty, new representation methods need to be developed.
One proposed solution to showing confidence when there is disagreement between study results is a graph which shows consensus in results for modelled information. This approach tries to capture disagreement or differences between methods and/or results, with the aim of making these disagreements visible to the non-expert audience. Schneider and Moss (1999) have developed such a graphic which aims to show where confidence, or lack of it, arises. The graphic (see Figure 2.2) involves plotting points on four axes, corresponding to; 1. confidence in the theory, 2. the observations, 3. the models and 4. consensus within a field (arranged like the points on a compass). The plotted points are then joined to create a shape, where the area indicates the overall degree of confidence in the final estimated results. This graph is intended to be used in scenarios where there is an absence of clear consensus – such as for climate-sensitivity estimates. Schneider and
30 CHAPTER 2. LITERATURE REVIEW
Figure 2.2: Summarising consensus, Source: Schneider and Moss (2000)
Moss (1999) proposed this graph as a way to communicate the confidence and level of evidence around climate policy. They successfully argued for this to be used in the 2001
IPCC (Intergovernmental Panel on Climate Change) report (Giles, 2002).
Critics of this four axis plot raised concerns that the area of the plotted shape may not accurately represent the overall confidence in the result if the uncertainties associated with the different axes are not independent. “Theory comes from observations, and both of these feed into models, so the different axes may depend on each other”, says Joe Perry, an ecologist at
Rothamsted Research, an agricultural research institute in Harpenden, north of London who studies the impact of genetically modified crops on biodiversity. “If they are dependent, the size of the shape will not be representative of the total probability”. (Giles, 2002).
The primary focus of many user studies which explore uncertainty representation is to develop generalisable methods of uncertainty representation that would be applicable to a wide range of uses. Studies aimed at evaluating specific uncertainty representation meth- ods often focus on: designing a visualisation (Buttenfield (1993); Fauerbach et al. (1996)), evaluating whether users are able to identify specific uncertainty values (Blenkinsop et al.,
2000), and thus assessing the impact of uncertainty representation on perceptions and
31 CHAPTER 2. LITERATURE REVIEW data identification (Hope and Hunter (2007)). However, effective representation may not guarantee communication, and the effectiveness of an appropriate representation method can depend on the question being asked (Sanyal et al., 2010). Therefore, the goal of developing a generalisable uncertainty representation approach may not be possible or may not be a solution to uncertainty communication on its own. Thus highlighting the importance of distinguishing Buttenfield (1993) ’s third impediment of uncertainty communication from uncertainty representation.
2.4 Uncertainty communication
Effective uncertainty representation does not guarantee effective uncertainty communica- tion (Sanyal et al., 2010). As highlighted by Buttenfield (1993)’s communication must be considered separately and as a unique challenge to representation.
Multiple studies have shown that specific uncertainty measures are less important than an understanding of the impact of uncertainty on decision outcomes (Leitner and Buttenfield
(2013); Pawson, Wong, and Owen (2011)). Successful uncertainty communication, i.e. in- tegrating uncertainty representations in a way that is meaningful and useful to the end user, is more than validating methods for uncertainty representation. The most effective visualisation for uncertainty can depend on the users’ needs and reason for accessing the information (Sanyal et al., 2010). Uncertainty information also may not be viewed by the user as an additional variable, but rather a piece of metadata that is intrinsic to the interpretation of the insights presented (Leitner and Buttenfield, 2000).
While there are many examples of user studies in the literature that explore the effectivness of uncertainty representation, there are limited resources in the literature which guide or support the development of effective uncertainty visualisations or other communication products. Approaches in the literature that are emerging that try to take uncertainty representation into uncertainty communication are the consideration of multiple possible futures and methods for uncertainty communication design. The first of these I discuss below, and the second is discussed in the next section.
32 CHAPTER 2. LITERATURE REVIEW
Presenting multiple possible futures The representation of a quantified measure of un- certainty can be valuable to the user, but requires the audience to connect the dots in terms of what that uncertainty means to potential futures. As noted above, one approach emerging in the literature that goes beyond uncertainty representation is to communicate the impact that uncertainty could have on possible future realisations. Specific uncertainty measures may be less important than an understanding of the impact of uncertainty on decision outcomes across a range of possible future conditions (Lempert, Popper, and
Bankes (2003); Pawson, Wong, and Owen (2011)). Viewing realisations of several potential future outcomes enables users to focus on the effect of the uncertainty rather than only the representation of the uncertainty measure. This shift from producing and evaluat- ing a single future, to envisioning a range of possible futures can support more robust decision-making (Couclelis (2003); Lempert, Popper, and Bankes (2003)).
Presenting multiple futures reframes uncertainty as a relationship between decision out- comes and differing future conditions. This offers a new framework for uncertainty communication, which is more in line with the way policy makers think about decisions under uncertainty (Cohen, Freeman, and Wolf (1996); Jonassen (2012)).
This approach, however, has its challenges. Numerous possible uncertain outcomes place an additional burden on the user, who now needs to make sense of multiple outputs. The available information can lose its usefulness when too many options are presented as almost any scenario is possible to produce (Davis and Keller, 1997). Additionally, dynamic visualisation displays that present multiple possible futures are only made possible with the development of interactive interfaces. This not only increases the cognitive load on the user but also increases the work required of the designer, where the task is not simply to communicate one message, but multiple messages. Considerable work is needed to determine what “reasonable” uncertainty/error values are, and how these translate into reality in the field (Davis and Keller, 1997), and these considerations may be beyond the expertise of the communication designer.
33 CHAPTER 2. LITERATURE REVIEW
2.5 Uncertainty Communication design
The importance of the user’s needs in uncertainty communications is well recognised
(Davis and Keller (1997); Carroll et al. (2014); Sanyal et al. (2010)). Each user of scientific output may have considerably different requirements. Therefore, an understanding of user requirements is necessary to make effective uncertainty communication possible (Carroll et al., 2014). There is a lack of case studies in the literature that explore communication design approaches for addressing the challenge of uncertainty communication, and additionally there is a lack of examples in the design literature that extend communication design frameworks to consider uncertainty communication. Often in uncertainty communication research, the user is only consulted in the evaluation or testing stage and not in the design or development phase. Design guidelines for scientific communication are rare and those that are available do not consider uncertainty information at all.
Guidelines for assessing uncertainties
In order to create communication material that effectively communicates uncertainty information, firstly it is important to understand the types of uncertainty present within the research. There are only two frameworks identified in the literature, at the time of writting this, which support this process: that proposed by Schneider and Moss (1999), and that proposed by Lapinski (2009).
Schneider and Moss (1999) propose guidelines for assessing uncertainties for authors of the IPCC Assessment Reports. These guidelines map out a step by step process. These steps are:
1. Identify the most important factors and uncertainties that are likely to affect the
conclusions
2. Document ranges and distributions in the literature
3. Make an initial determination of the appropriate level of precision
4. Characterise the distribution of values that a parameter, variable, or outcome may
take
5. Rate and describe the state of scientific information (using the terms outlined in
Figure 2.1)
34 CHAPTER 2. LITERATURE REVIEW
6. Prepare a “traceable account” of how the estimates were constructed
7. OPTIONAL: Use formal probabilistic frameworks for assessing expert judgment
These assessment guidelines are targeted at researchers. They are valuable for diagnosing sources of uncertainty and their implications on the research outputs. However, there is no reference to the user or their needs in these assessment steps, and the steps may be outside of the expertise of a communication designer.
Designing scientific communications for the non-expert user
One approach, specific to scientific uncertainty visualisation design, is the Uncertainty
Visualisation Development Strategy (UVDS), developed for a military context (Lapinski,
2009).
The UVDS has eleven steps which include:
1. identifying the uncertainty visualisation task
2. understanding the data that need to have their uncertainty visualised
3. understanding why uncertainty needs to be visualised and how the uncertainty
visualisation needs to help the user
4. deciding on the uncertainty to be visualised
5. deciding on a definition of uncertainty
6. determining the specific causes of the uncertainty
7. determining the causal categories of the uncertainty
8. determining the visualisation requirements
9. calculating, assigning, or extracting the uncertainty
10. trying different uncertainty visualisation techniques
11. and obtaining audience opinions and criticisms.
These steps help the designer understand the data as well as uncertainty. Authors applied
UVDS for uncertainty visualisation of the Canadian Recognized Maritime Picture (RMP).
A critique of the strategy suggests that an additional step is needed within the framework for embedding criticism and feedback from users (Lapinski, 2009).
35 CHAPTER 2. LITERATURE REVIEW
User centred design communication design for non-expert audiences of scientific ma- terial Outside of the scientific literature, the field of design has insights that can contribute to this challenge. The Non-Expert User Visualisation (NEUVis) framework (Gough, Wall, and Bednarz, 2014) proposes a framework for a design led approach to the development of scientific visualisations for the non-expert user. Central to a design led approach is that each user of the data may have considerably different requirements. Consideration of, and consultation with, the end user is integral to the design process (Gough, Wall, and
Bednarz, 2014), not only the evaluation of the final design.
The design led approach provides a structured framework for the designer to develop scientific visualisations for a non-expert audience. The framework is built around the idea that non-expert users need to be introduced to the nature of the data and the results of data analytics, as well as its significance and often how it is practically interpreted. The kind of problems that arise in science communication design for the non-expert user can be challenging for the communication designer, and this framework provides practical guidance to navigate that challenge (Gough, Wall, and Bednarz, 2014). The framework, however, does not explicitly consider uncertainty.
The NEUVis design method utilised six questions to guide the communication designer through the considerations important to scientific visualisations for the non-expert audi- ence. These questions are:
1. How does this knowledge benefit the user?
2. What about this data is relevant or important?
3. What is otherwise inaccessible to the user?
4. What can the user access on their own?
5. What myths and misconceptions are relevant to the data?
6. What is the potential for impact and what are the risks of this visualisation?
The NEUVis framework is a tool that helps the designer connect the data and the users’ needs and then transition these to a visualisation prototype.
36 CHAPTER 2. LITERATURE REVIEW
Figure 2.3: Components of a spatial statistical analysis (Modified from Figure 12.4 in Ord 2010, page 179.)
2.6 Spatial epidemiology and disease mapping
Place matters in health, and spatial epidemiology is the field of study that enables health and place to be connected. Spatial epidemiology aims to quantify and explain geographic variation in diseases and the relationship with environmental, demographic, behavioural, socioeconomic, genetic and infectious disease factors (Beale et al., 2008) and requires the inclusion of variables from disparate data sources (Green, Richard, and Potvin, 1996). Table
2.3 outlines the common data sources that are combined in disease mapping projects (Lai,
So, and Chan, 2008). The combination of health data, Geospatial Information Systems (GIS) and spatial statistics methods enables linkage of these data from disparate sources and the exploration of complex questions in health promotion, public health, community medicine, epidemiology and a range of other fields (Nykiforuk and Flaman, 2011). Depending on the data available, spatial epidemiological analyses can be conducted at local, regional or the international scale.
The data are only one of three integrated components in a spatial statistical analysis as outlined in Figure 2.3, the other two components being analysis and visualisation such as maps (Ord, 2010). Any one of these components may drive subsequent development of the other two, and multiple circuits will occur before the process is completed (Ord, 2010).
Disease mapping specifically deals with the mapping of health outcomes and their co- factors and is an integral component and subset of spatial epidemiology. The visual maps generated through disease mapping show geographical variation in health, disease
37 CHAPTER 2. LITERATURE REVIEW and other variables of interest, such as socioeconomic status or access to services (Elliott and Wartenberg (2004)). Disease mapping is used to explain and predict patterns of diseases outcomes across geographical areas, identify areas of increased risk, and assist in understanding the causes of diseases (Myers et al. (2000); Robinson (2000)). Disease maps can tell stories and communicate relationships in a way that otherwise may not be possible with other data presentation techniques (Mullner et al., 2004). Their usage can generally be categorised into four main themes: (a) disease surveillance, (b) risk analysis, (c) health access and planning, and (d) community health profiling (Nykiforuk and Flaman, 2011).
Disease modeling extends the disease-mapping application to (a) predict the current prevalence of a disease that may not be directly measurable such as cancer survival, (b) predict the future spread of disease, (c) identify factors that may foster or inhibit disease transmission, (d) pinpoint high-risk areas for disease prevention or intervention, (e) target control efforts, (f) identify gaps, (g) increase stimulus for data collection in these areas
(Myers et al. (2000)); Robinson (2000)) as well as advocate for change of health policy or access to services.
There are a few features that set disease mapping apart from other GIS applications, such as ecology or geology. Firstly, areas are often irregular shapes and sizes, as opposed to a regular gridded lattice. Secondly, map regions often have varying numbers of neighbours rather than constant numbers. Thirdly, any true boundaries in the underlying health of disease measures are likely to be obscured by random noise in public health data, whereas images, ecology and geology are more likely to have clearly defined boundaries that align with physical boundaries.
Table 2.3: Examples of the types of data used in spatial epidemiological studies
Data Description
Health or Vital statistics, notifiable diseases, patient registries, and health
disease surveys from various international or government agencies
(Location is usually based on residential address]
Field Surveyed data on disease occurrences with location coordinates
epidemiology collected with a GPS
38 CHAPTER 2. LITERATURE REVIEW
Data Description
Spatially Digital cartographic data available from various international or
referenced base government agencies (Often includes contours, rivers, and built
environment features)
Remotely sensed Land cover, elevation, soil type as reflected by satellite imageries
Environmental Interpreted data on land use, water quality, air quality, climate,
and natural geology, etc.
resources
Census or Sociodemographic and economic data
demographic
What are disease maps used for?
The visual representations of the spatial variation of disease and their cofactors enables the outputs of sophisticated spatial statistical analysis to be accessed by non-expert audiences.
Disease maps are found on the desks of healthcare providers, public health managers, policy makers and other health professionals in practice (Sabesan and Raju (2005); Davis and Keller (1997)). They have been shown to be effective tools for communication of public health risks (Ali et al. (2001); Choi, Afzal, and Sattler (2006)) as well as disease pre- vention and management through: facilitating and guiding design and implementation of interventions, evaluating health outcomes, estimating population risks, scenario building, planning interventions, policy recommendations, and summarizing and presenting health indicators (Garnelo, Brandão, and Levino, 2005).
One of the earliest and most well-known disease maps is John Snow’s map of cholera deaths. The map is a classic example of the power of data visualisation to enact policy change, and which ultimately lead to the removal of the handle of the contaminated water pump causing the outbreak, in September 1854 (Tufte and Robins, 1997).
After this date, the use of disease maps as a decision-making tool for public health continued to emerge slowly. In 1898 some 40 years later, the United States Government
Printing Office published a collection of maps based on data from the eleventh U.S.
39 CHAPTER 2. LITERATURE REVIEW
Census. These included detailed, hand-rendered, maps which compared mortality rates of the population to several infectious diseases, including measles, diphtheria, croup, consumption, scarlet fever and many more (Gannett, 1903). Today, disease maps are widely published and used, and a multitude of examples can be identified via a simple google search for ‘disease maps’.
Future Challenges
The field of disease mapping has come a long way since John Snow’s initial cholera map influenced public health policy and saved lives. The methodologies used to generate these maps and the estimates they visualise have evolved significantly. How best to represent and communicate uncertainty associated with the estimates and insights in these maps has emerged as a well recognised current challenge for the field (Kraemer et al. (2016);
Carroll et al. (2014)), alongside the following:
• How can researchers better communicate findings from disease cartography to
public health officials and those in the field?
• New data are becoming available at an ever-faster pace and with high volume: how
can we better direct data acquisition towards the needs of researchers?
• As data volume increases, what specific computational tools are necessary to allow
rapid processing of disease and covariate data?
• How can uncertainty be better integrated into disease-risk maps?
• How can we best integrate human mobility data into a variety of disease models in
a sensible way so that we can guide public health interventions and control?
2.6.1 Uncertainty Communication in Disease Mapping
One of the key objectives of this thesis is to inform the design of the Australian Cancer
Atlas, a spatial epidemiological map of cancer incidence and survival across Australia.
As the statistical methods and data types that underlie spatial epidemiology and disease mapping have developed and become more complex, methods for visualising and com- municating the statistical uncertainty associated with these models has not (Carroll et al.,
2014). A 2014 systematic review of visualisation and analytical tools for infectious disease epidemiology conducted by Carroll et al. (2014), identified no examples of visualisation
40 CHAPTER 2. LITERATURE REVIEW tools that addressed or enabled the communication or inclusion of uncertainty. Subse- quently, this review identified the inclusion of uncertainty information as a key challenge for spatial epidemiology in the future.
The ability to quantify the uncertainty in disease maps depends on the statistical methods and data used to model the studied disease. Many methods in disease modelling often generate full probabilistic predictions that can be used to quantify uncertainty, however, the interpretation of these probabilistic predictions by the non-expert audience is difficult
(Kraemer et al., 2016).
Within disease mapping, current discussions around uncertainty sources identify miss- ingness, data aggregation, data accuracy, and the impact of residential address errors in geocoding as common concerns (Atkinson and Graham (2006); Tatem et al. (2011);
Zinszer et al. (2010); Eaton, Plaisant, and Drizd (2005)). In order to draw meaningful and accurate conclusions from the data, visualisation tools should represent uncertainty clearly (Carroll et al., 2014). Particularly relevant to spatial epidemiology is the interplay between uncertainty of large geographical areas that have dominance in a map due to their size. Studies have shown that users overestimate rates in small populations, which often correspond to large, sparsely populated regions, resulting in visual biases in interpreting maps (Olsen, Martuzzi, and Elliott, 1996). Studies suggest that users may not be aware of the need for better representation of missingness and uncertainty, and more research is required to evaluate the best means of integrating this type of information with the mapped information (Carroll et al., 2014).
The importance of the users’ needs in developing effective uncertainty communication is well recognised across domains and this is no different in disease mapping. Both qualitative and quantitative studies have found that user-friendly, reliable tools, with high-quality online documentation, and easy access to the source code are important to users of disease maps (Kothari et al. (2008); Hu et al. (2007); Robinson, MacEachren, and
Roth (2011); Koenig, Samarasundera, and Cheng (2011)). Users of disease maps have expressed a strong interest in dynamic, interactive graphics that allow them to review their data at different levels (e.g. population or individual level) (Karlsson et al. (2013);
Gesteland et al. (2012); Hu et al. (2007); Robinson (2009); Schneiderman, Plaisant, and
41 CHAPTER 2. LITERATURE REVIEW
Hesse (2013); Koenig, Samarasundera, and Cheng (2011)), and that help make abstract data digestible.
42 Chapter 3
Research Activity 1.A: Grey literature re- view of internet published cancer maps.
Cancer Maps are powerful communication tools that are often used within a public health agenda to inform, educate and/or support advocacy for a change in public health policy (Davis and Keller, 1997). The following chapter outlines a grey literature review of currently available cancer maps, which provides an understanding of the current methods and tools used to develop, visualise and disseminate cancer maps.
3.1 Introduction
Cancer maps are commonly published on the internet rather than in academic peer- reviewed journals due to their use as a communication tool for non-academic audiences.
Additionally, as this research is focused on uncertainty communication to the non-expert audience, cancer maps found in the scientific literature cannot be assumed to be targeted towards non-experts. Therefore, in order to provide an overview of the current practices used to generate these cancer maps, a systematic grey literature review was conducted that investigated cancer maps published on the internet and available between 01/01/2010 and 01/05/2016.
43 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Maps are effective and powerful tools for communicating geographical variation in health and disease. They enable non-expert decision-makers to access, visualise and communi- cate the outputs of often sophisticated geospatial statistical analyses. Both the statistical methods and visualisation techniques used to generate these maps are highly varied, with differences depending on the disease being mapped, the intended message, the target audience, and the person or organisation publishing the material.
Improvements in statistical methods, data visualisation, geographical information system
(GIS) techniques and interactive web technologies have enabled health and disease maps to increase in popularity and utility. Health and Disease maps are now commonly used by governments, not-for-profit organisations, and research institutions to enable the use of statistical outputs in decision making, and raise community awareness around target issues. Depending on the data and technology used to generate the maps, their interactive capabilities range from simple downloadable pdf documents to dynamic and interactive web interfaces.
This systematic grey literature review aims to support the development of the Australian
Cancer Atlas, and provide a summary of the current practices, approaches, and technology platforms used to create and disseminate cancer maps for the non-expert audience. This review also aims to investigate if, and how, uncertainty information is included in these maps.
3.2 Aim and Research Question
Aim: To summarise the cancer atlases available publicly on the internet in terms of: geogra- phy covered, publishing organisation, data date range and publication date, geographical resolution, reported measure, statistical methods, inclusion of uncertainty, smoothing methods, interactivity features, and additional functionality and technology platform used.
Research Question: What cancer maps are currently available to the public via the internet, and what methodologies, tools and technologies have been used to generate them?
44 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
3.3 Methods
An investigation strategy of keywords, iteratively developed search strings, and an in- clusion and exclusion criteria was developed to identify internet published and publicly available cancer maps. Data from the identified cancer maps and their supplementary or supporting material/host websites was also extracted, summarised and recorded. The complete research strategy was documented to prevent any bias. As this grey literature review was targeted at identifying publicly available cancer maps, all searches were conducted using google - no other search engines were explored.
The systematic review was based on the Systematic Review Guidelines defined by the organisation Collaboration for Environmental Evidence (CEE), with modifications made where appropriate to accommodate a grey literature review rather than a scientific litera- ture review1.
Search Description
As noted earlier searches were not restricted by country, and were conducted from Aus- tralia.
The following list details the final search strings. These strings were developed through an iterative process of trialling and refining searches until the desired specificity was reached.
See Appendix B.1 for a full description of the search protocol including search string development and associated hits.
Search Strings
1. intitle: spatial AND epidemiology AND cancer AND map OR mapping OR atlas
-campus
2. allintitle: cancer AND map OR mapping OR atlas -campus -kinase -kinases -concept
3. allintitle: spatial AND cancer AND statistics
4. allintitle: spatial OR geographic AND cancer AND variation OR distribution
1http://www.environmentalevidence.org/wp-content/uploads/2017/01/ Review-guidelines-version-4.2-final-update.pdf
45 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
5. allintitle: spatial AND epidemiology AND cancer AND map OR mapping OR atlas
-campus
6. intitle: cancer AND atlas
Exclusion/Inclusion Criteria
Pages were selected for data extraction if they met the following criteria:
• contained a visual geographical map of cancer incidence, risk, mortality or counts
(either pdf, static image or interactive web interface),
• accessible without a password or log in,
• published or updated on or after the 1st of January 2010.
3.4 Summary findings
Grey literature searches identified 33 Cancer Atlases which were publicly available on the internet. Three of the identified atlases were not published in English; however the details of these maps were extracted where they could be determined. A database detailing all identified atlases is available in Appendix B.2.
3.4.1 Statistical uncertainty
Cancer atlases were considered to report uncertainty to the non-expert user if they included a measure of statistical uncertainty either within or alongside the map. Maps that only reported this information within the supplementary material were not considered to have directly attempted to report uncertainty.
The review did not reveal any novel uncertainty visualisation approaches or visualisations.
Maps used standard and well known measures including credible intervals and standard deviation, statistical significance, boxplots and distributions. These maps ranged from static pdfs or infographics to interactive online resources. The interactivity of the more modern maps enabled uncertainty information to be incorporated without cluttering the screen, such as in a tool tip feature.
46 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Figure 3.1: Example of CI visualisation for uncertainty representation in cancer mapping (1/3). Source: Alberta Health IHDA Geographic. (2012)
Figure 3.2: Example of CI visualisation for uncertainty representation in cancer mapping (2/3). Source: Pensylvania Cancer Atlas
Close to half of the atlases identified (42%, n=14) included some measure of uncertainty.
The most common measure used to represent uncertainty were credible or confidence intervals (CIs). CIs were either visualised by including their bounds in a scatterplot or graph of estimates vs region (see Figures 3.12, and 3.23 positioned next to the map, or reported numerically through the CI upper and lower bounds listed in a data table (see
Figure 3.44). Of those that visualised the CIs, 30% (n=10) embedded the visualisation
2Alberta Health IHDA Geographic. (2012). Age-Standardised Incidence Rate of COPD, 2011. Retrieved from: http://www.health.alberta.ca/health-info/IHDA-geographic/COPD/incidence-agestandard/ atlas.html?epik=0GJSpE_IW34lx 3Centres for Disease Control and Prevention (CDC). (n.d). United States Cancer Statistics: An Interactive Cancer Atlas (InCA). Retrieved from:https://nccd.cdc.gov/DCPC_INCA/ 4Pennsylvania Cancer Atlas. (n.d). Retrieved from: https://www.geovista.psu.edu/grants/CDC/ ?epik=0lJSpE_IW34lx
47 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Figure 3.3: Example of CI visualisation for uncertainty representation in cancer mapping (3/3). Source: Centres for Disease Control and Prevention (CDC). United States Cancer Statistics: An Interactive Cancer Atlas (InCA)
Figure 3.4: Example of an interactive data table with CI upper and lower bounds used in cancer mapping. Source: Pennsylvania Cancer Atlas within a tool tip function which visualised the CI when the mouse hovered over the relevant area (see Figure 3.55).
Methods for representing sources of uncertainty information can be visualised or com- municated in different ways, examples identified through this grey literature review are listed below.
5International Agency for Research on Cancer. (2017). Atlas of Cancer Mortality in the European Union and European Economic Area 1993-1997, Annex 4 - Cancer mortality maps by site. Retrieved from: http: //www.iarc.fr/en/publications/pdfs-online/epi/sp159/
48 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Figure 3.5: Example of standard deviation visualised in cancer mapping
Figure 3.6: Example of boxplot used in cancer mapping. Source: Source: Pensylvania Cancer Atlas
Table 3.1: Implicit and explicit measures of uncertainty.
Measure Example
CI Interval Figures 3.1, 3.2, 3.3
Statistical Significance Textured overlay on top of coloured regions used to
indicate statistical significance
Distribution Figure 3.4
Boxplots Figures 3.4, 3.5
Sample Size Textured overlay or lack of colour on a region, was
used to show regions with small sample size
Standard deviation Figure 3.6 - the second map in the bottom right
corner shows standard deviation
49 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Geographical coverage
Identified cancer atlases covered geographies from all around the world: 4 were global,
3 from Australia (AUS), 11 from the United States (US), 2 from Canada (CAN), 7 from the United Kingdom (UK), 2 from Spain, 1 from Switzerland, 1 from Germany, 1 from
Norway, and 1 covering the European Union. Not all maps had a national focus and
10 covered a region or state rather than an entire nation. The states or counties/regions covered were South Australia (AUS), Queensland (AUS), Ontario (CAN), Valencia (Spain),
Pennsylvania county Massachusetts (US), New Hampshire (US), Cape Cod (US), Missouri
(US), Florida (US), New York State (US) and Arizona (US).
Publishers
The majority of atlases were published by non-commercial organisations, including not- for-profits (NFPs), government, research organisations, advocacy groups or a partnership between an NFP & government. Only one map was published by a commercial entity
(Maps of Cancer Mortality Rates in Spain), in this case a media organisation El Pais6.
Reported measures
The majority of maps identified in this study reported age adjusted rates of either incidence, mortality or survival. The list below summarises these measures, and provides a definition of each measure.
6https://elpais.com/
50 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Table 3.2: Measures used to report cancer statistics
Measure Details
(Incidence Rate)i 1. IR (Incidence Ratio) (IR)i = Average Incidence Rate , Cancer incidence rate in region i over the average cancer
incidence rate for the total region
2. SIR (Standardised IR standardised by age structure in each region i
Incidence Ratio)
(Cancer related mortality)i 3. RER RER = Average cancer related mortality (Relative Excess Risk) Represents the estimate of cancer related mortality within
five years of diagnosis
Also referred to as ‘excess hazard ratio’
4. Age Adjusted Relative RR standardised by age structure in each region i
Risk
5. Rate per 100,000 Cancer incidence per 100,000 population
6. Age Adjusted Rate per #5 standardised by age structure or region
100,000
7. New cancer cases per Specific methods could not be found
100,000
8. Count Crude cancer counts
9. Below or above Alternative expression of the SIR
Expected
Smoothing
Smoothing is an important tool used in spatial epidemiology to increase statistical power, and is a method by which data points are averaged with their neighbours. Smoothing is an important component of the Australian Cancer Atlas. Neighbours are often defined as geographical neighbours, but can also be temporal or based on other factors such as socio-economic features or rurality. Smoothing is commonly used in the generation of small area estimates when the sample size for individual regions is small, thus the
51 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. statistical power is low. It is also useful in health maps when number of cases in individual areas are small enough to jeopardise individual privacy.
Four of the identified maps reported spatial smoothing and one used temporal smoothing
(by calendar year but not geography); 22 did not use any form of smoothing within their methods, and seven had insufficient information available to determine whether smooth- ing was used. Of the cancer atlases that used model-based smoothing, three used the BYM model (Besag, York, and Mollie, 1991). This model incorporates a spatially structured component, commonly incorporating adjacent areas using a conditional autoregressive
(CAR) prior. Spatial smoothing methods used in the identified maps are outlined in Table
3.3.
Table 3.3: Smoothing methods used in health atlases.
Atlas Title Smoothing Method Reference
All Ireland Cancer Atlas BYM Besag, York, and Mollie
1995-2007 (1991)
The Environment and Health BYM Besag, York, and Mollie
Atlas of England and Wales (1991)
Atlas of Cancer in Queensland 1. BYM (incidence), 2. 1. Besag, York, and
Poisson piecewise with Mollie (1991) , 2. Fairley
BYM components et al. (2008)
(survival).
Atlas of Cancer Mortality in the Examined regional 1. Pennello, Devesa, and
European Union and the variation by: 1. Gail (1999), 2. Similar to
European Economic Area Poisson-gamma model Langford, Unwin, and
1993-1997 (no spatial structure), 2. Maguire (1990)
Multilevel model with 3
geographic hierarchies
52 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Visualisation tools and platforms
There are a range of methods and approaches used to visualise cancer maps. Visualisation platforms are rapidly changing as GIS technologies, graphic design tools, and interactive web capabilities continue to develop. These changes are giving rise to mapping and design tools that can generate customised and interactive web based maps. The skills required to generate sophisticated and professional outputs using these emerging platforms and tools vary, however generally over time these platforms become easier to utilise.
The development of tools and technologies for generating visual cancer and disease maps has progressed rapidly. It is very common in the most recently published maps to have fully interactive web interfaces where users can customise the map to display the features they are interested in such as, population demographics, geographical resolution, cancer of interest, outcome measure(s) (survival, mortality and/or incidence) as well as the ability to compare multiple customised maps or explore changes over time.
The identified cancer maps can be classified into three categories based on their sophisti- cation; static infographics or downloadable pdfs, interactive maps (and/or dashboards) built on an existing GIS, data visualisation platform or tool such google maps, ESRI7,
ArcMap 9.38, or Instant Atlas9, or custom built web products using tools such as d3.js10 + leaflet11.
3.5 Implications for the Australian Cancer Atlas
The previous grey literature review provided valuable insight into the current best practice for generating and visualising publicly available cancer maps. The insights from this review can inform the design of the Australian Cancer Atlas by providing a benchmark for current practice. Many features were identified in this review that provide inspiration to the design of the Australian Cancer Atlas in regards to both positive and negative design examples. I have summarised these insights as design considerations and arranged them
7https://www.esri.com/en-us/home 8http://arcmap.software.informer.com/9.3/ 9https://www.instantatlas.com/ 10https://d3js.org/ 11http://leafletjs.com/
53 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. in terms of key features of a cancer map, see Table 3.4 below. Design considerations that are common in data visualisation, such as colour palette selection, have not been included.
Table 3.4: Design Considerations for the Australian Cancer Atlas.
Feature Design considerations
The Map - Does the map resemble the underlying geographical area it
represents or is it scaled or modified? For example, scaled by
population or standardised so all regions have the same area, or other
modifications that ensure the visual dominance of large geographical
areas do not bias interpretations.
———— ———————————————————-
The Legend - Does the colour scale enable easy comparison between regions?
- Does the transition of colours exaggerate or infer differences in the
report measure that is not-proportional to the real differences between
regions?
- Are the categories in the legend easy to interpret. Avoid giving one
colour a numerical range and avoid decimal places.
- Is the colour selected for the mid point/average intuitively a neutral
colour?
———— ———————————————————-
Reported - Does the report measure used contain any linguistic uncertainty or
Measure ambiguity? Will all audiences interpret uniformly?
- Is it easy to determine if the measure is modelled or count data?
- Is it easy to find further details of the measure, how it was calculated
and how it should be interpreted?
- Is mortality or survival most appropriate for your target audience?
———— ———————————————————-
Statistical - Is uncertainty important to all target audiences?
Uncertainty - What measure of uncertainty is most appropriate for the chosen
report measure?
54 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS.
Feature Design considerations
- Does uncertainty make sense when applied to the chosen report
measure?
- Is it explicit? Does the uncertainty representation method chosen
require domain specific knowledge to understand how it influences
interpretation of communicated estimates?
———— ———————————————————-
Cancer Type - Will the map enable different cancers to be shown? If so, select a
non-gendered cancer to load on the landing page.
- Is it clear that the cancer type can be changed?
- Should links to cancer specific resources be included?
———— ———————————————————-
Comparing & - Is it important to enable audiences to compare regions or covariates
Searching such as rurality, gender, age, socio-economics, etc.?
- Is it important to be able to search a specific region?
- How will users compare cancer types and regions? Are there
comparisons that aren’t appropriate to enable?
———— ———————————————————-
Credibility - Is it clear who the publisher, source of the data, and analysts are?
———— ———————————————————-
Other - Can the audience download the map or data? What can they
download? How customisable will it be?
3.6 Conclusion
In this chapter I explored the current practices in cancer mapping. This grey literature review revealed that less than half (42%) of maps included uncertainty, and maps with in- teractivity contained uncertainty more often than static maps, suggesting that interactivity provides an extra visual channel for coding this information. There was no uncertainty visualisation method that could be considered a common practice within cancer mapping,
55 CHAPTER 3. RESEARCH ACTIVITY 1.A: GREY LITERATURE REVIEW OF INTERNET PUBLISHED CANCER MAPS. but rather both a range of uncertainty measures and visualisation were used. These measures included: standard deviation, credible/confidence intervals, error bars, distribu- tions and boxplots. Visual representation mechanisms used for encoding uncertainty in these maps included: colour, bivariate maps, tool-tips interactivity, and error bars. This review provided valuable insight the Australian Cancer Atlas team in terms of current best practice.
56 Chapter 4
Research Activity 1.B: User centred uncer- tainty communication design (Australian Cancer Atlas as a case study)
4.1 Introduction
Building on the insights from the grey literature review detailed previously, I applied a user-centred communication design for embedding uncertainty information for the non-expert audience into the Australian Cancer Atlas. This research activity had three distinct steps:
• A. Project partners Workshop - Conduct a workshop with project partners to identify
target audiences and identify sources of uncertainty within the ACA using the
taxonomy provided by Morgan and Henrion (1990).
• B. NEUVis design framework - Apply the NEUVis design framework to build
audience profiles and map out user-needs and information seeking behaviours of
the target audiences identified in the workshop
• C. Focus groups - Conduct focus groups with a subset of the target audiences to
validate their needs, information seeking behaviour and current understanding
57 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
of statistical uncertainty and risk within cancer maps. This helped to validate the
assumptions made in Step B.
The importance of incorporating uncertainty into scientific communications for the non- expert audience is well recognised across domains (Hunter and Goodchild (1996); Eiser et al. (2012); Morgan and Henrion (1990); Grubler, Ermoliev, and Kryazhimskiy (2015)) including spatial epidemiology (Carroll et al., 2014). Additionally, the importance of considering the users’ needs in designing a successful communication product is also well recognised (Davis and Keller (1997); Carroll et al. (2014); Sanyal et al. (2010)). Not with standing this, there is a lack of guidelines or case studies that can support scientist or communication designers to navigate this complicated communication challenge. The following chapter is broken into three sections, both of which use the Australian Cancer
Atlas as their focus.
Firstly, I explore the use of the Morgan and Henrion (1990) typology of uncertainty sources as a tool for identifying uncertainty sources present within the Australian Cancer Atlas.
This is important in the communication design process as different uncertainty sources can impact the interpretation of the results in different ways, so the communication message should be informed and tailored to uncertainty specific to that project. Within this first step I also identify target audiences with the project stakeholders. Secondly, I explore the application of the NEUVis design framework (Gough, Wall, and Bednarz, 2014) to place the user’s needs at the centre of the communication design process. Thirdly, I conduct focus groups with a subset of the target audiences to validate their needs, information seeking behaviour and current understanding of statistical uncertainty and risk within cancer maps.
4.2 Methods
The following methods section is arranged into three tasks: A - workshop with project partners, B - application of the NEUVis design framework, and C - focus groups.
58 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
4.2.1 Task A: Workshop with project partners: identifying target audiences and uncer- tainty sources in the Australian Cancer Atlas
A collaborative workshop brought together eight participants from all Project Partners.
Participants included Senior Research Fellow (epidemiology), Head of Research and Post- doctoral Research Fellow (epidemiology) from the Cancer Council Queensland (CCQ),
Project Officer from the National Health Performance Authority (NHPA), and Distin- guished Professor (Statistics), Senior Lecturer (Health), Senior Research Fellow (Data
Visualisation) and MPhil Candidate from the Queensland University of Technology (QUT).
Workshop participants were asked to consider the following questions:
1. Why is communicating uncertainty an important problem?
2. Who are the audiences of the Australian Cancer Atlas and what are their characteris-
tics?
3. Can these audiences be grouped by the level of information detail they require?
4. What will the Atlas report (output measure or measures)?
5. What are the sources of uncertainty within the Atlas?
Topics 1 to 4 were discussed together with the entire group. In order to discuss and elicit responses for topic 5, workshop participants were broken into groups of two and given the task of identify sources of uncertainty in the ACA relevant to two of the following uncertainty sources as defined by Morgan and Henrion (1990):
1. Measurement error
2. Systematic error
3. Variability (natural variation)
4. Model uncertainty
5. Disagreement
6. Inherent randomness
7. Linguistic imprecision
59 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Workshop participants were not provided with any reading material prior to the workshop.
The groups were given an hour to discuss two uncertainty sources (30 min per source).
For each uncertainty source category, participants were asked to list the specific sources from within the ACA. Each group then presented their results back to the workshop participants. Any disagreements, questions or lack of clarity were discussed among the full group.
A full workshop programme can be seen in Appendix C.1
4.2.2 Task B: Application of the NEUVis design framework - a user-centred approach to uncertainty communication design
The target audiences identified in the Project Partners workshop (described above) were further developed by myself and Dr. Phillip Gough using the NEUVis Design framework
(Gough, Wall, and Bednarz, 2014). For each audience, their needs, current information seeking behaviour and the impact of uncertainty information on their understanding of the insights from the ACA were mapped out using the 6-question method within the
NEUVis design framework (Gough et al., 2016). Each question was considered in terms of the data as well as the uncertainty information associated with the insights. These questions helped to evaluate how the intended data visualisation for the Australian Cancer
Atlas relates to each of the user groups in their specific context.
A seventh question was added to Gough’s method, “Potential for change”, which aimed to rank audiences in terms of the impact that an audience can have if they are armed with the information presented in the Australian Cancer Atlas. The aim of this question was to identify audiences that have the greatest potential for positive change, but also those for negative change (i.e., audiences that represent a risk if their needs are not met, or they misinterpret the information). The final seven questions considered for each audience, for both the data/modelled estimates and the associated uncertainty information are listed in table 4.1
60 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Table 4.1: Template - NEUVis questionnaire
Design question
1. How does this new knowledge benefit the user?
2. What about this data/uncertainty information is relevant
or important to the user in their context?
3. What does this data/uncertainty information show that is
otherwise inaccessible for this user?
4. What can this user access for themselves?
5. What myths/misconceptions are relevant to this data
set/uncertainty information?
6. What is the potential impact on the audience?
7. The potential for this audience to have an impact beyond
themselves?
4.2.3 Task C: Focus groups
The focus groups were used to validate and expand on the user profiles developed through the NEUVis design framework (outlined above). As it was not within the budget of this research to conduct focus groups with all identified audiences, under direction from the
Cancer Council Queensland, we targeted the four most important and accessible. These were:
1. General audiences
2. Cancer patients and their caregivers, family or supporters
3. Health practitioners (including health managers & clinicians)
4. Policy makers.
It is well understood that focus groups are a useful qualitative method in health and medical research (Kitzinger, 1995). Focus groups prompt discourse that in turn allows people to express and clarify views more easily than in a one-on-one interview, while also
61 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) allowing people of various demographics to participate, in a setting that is less confronting than a one-on-one interview (Kitzinger, 1995).
The focus groups had three specific goals:
• Goal A: Current understanding about cancer and how they have obtained this
information
• Goal B: Current awareness of how the cancer burden varies by geographical location
and whether this is important to them
• Goal C: Participants’ understanding and interpretation of examples of disease maps
showing information about how the burden of cancer varies by geographical area.
Recruitment
Participants were recruited using a variety of existing networks and methods. Health Prac- titioners were recruited using the Cancer Council Queensland’s “Queensland Cooperative
Oncology Group (QCOG)”, restricted to those members in the Brisbane Region. Details of the focus groups were also included in the December and January editions of the Cancer
Council Queensland’s “Health Professionals Network” newsletter. In addition, the focus groups were advertised on the Cancer Council Queensland’s website, and emails sent to existing clinical contacts of the research team. Recruitment called for three audience types, namely a general audience, medical practitioners or health managers, and policy makers or advisors, using a series of social media promotions and advertisements, in addition to local contacts.
The first set of focus groups comprised health practitioners and the second comprised members of the general public. Only one participant was successfully recruited to take part in the Policy Makers/Advisors group and therefore this participant was included with the health practitioners. Two sessions were run for both groups. A set of questions was designed to promote a facilitated discussion around the way that the practitioners work, use statistics and statistical uncertainty, and how geographical issues relate to the way clients perceive or experience cancer.
62 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Discussion Questions
Goal A - Background
• “If you hear the word ‘cancer’, what comes to mind? Do you know what causes it? Can it be cured? Do you know someone who has been diagnosed with cancer? What do you know
about different types of cancer? How has your knowledge impacted on how you live your
life? How likely do you think it is that you will be diagnosed with cancer? Why?”
• “What do you think of when you hear the word ‘incidence’, ‘mortality’, ‘survival’, ‘risk’, ‘diagnosis’? What about ‘reliability’, ‘uncertainty’, accuracy’?”
Goal B – Geographical variation
• “Are you interested in finding out more about how the burden of cancer varies by geographical areas? Where would you find this information? What types of information would you be
interested in? Is it important who produces the information?”
• “What do you know now about how the burden of cancer varies depending on geographical area? Do you think that where you live matters in terms of your health, and whether you
will develop cancer? Do you think where you live now has a higher or lower risk of cancer
compared to [pick a town]. Where would you get this type of information?”
Goal C - Disease maps
Participants were broken into 3 groups and provided with one of the following three maps:
Map 1: http://globalcancermap.com/
Map 2: https://nccd.cdc.gov/DCPC_INCA/
Map 3: http://www.envhealthatlas.co.uk/eha/Breast/
They were first given the following task:
• “Using the map you have been provided, compare the incidence of breast and prostate cancer.”
After this the facilitator prompted a general discussion using a sample of the questions listed below.
63 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
• What do you think are the key messages from this page?
• What features captured your attention?
• What aspects of the page were unclear/confusing?
• How confident are you of the certainty, reliability and accuracy of the information presented in the maps?
• Is their accuracy different? Discuss
• Do you feel they are all as reliable as each other? Discuss
• Which gives you the most confidence in the information presented?
• For someone who lives in region x (Australia vs NZ, Texas, greater London) what does this map tell you about their risk.
• What does graph x (location of uncertainty graph) mean? Discuss.
• Do you feel like this map was made for someone like you?
– With your level of expertise
– With your job requirements
– With your level of scientific/mathematical/statistical understanding
• Are there any terms you didn’t understand or that were unclear to you?
• Were you surprised by anything these maps show?
• Was the information predictable or boring?
• Did any part of these maps frustrate you?
• What terms were unclear to you?
Analysis of results
Affinity diagrams, also known as the KJ Method (Scupin, 1997) or topical notes (Farrell,
2017), were used to analyse data from the focus groups. This method is used to extract meaningful data from qualitative research with small sample sizes (Farrell, 2017). First, this process involved transcribing the focus group recordings. The process then requires that the researcher, using a single sentence or phrase written in the first-person voice, identifies small themes that came up in the discussion and writes them on a sticky note.
The first-person voice helps to elicit empathy for the participants, and an understanding of their perspectives. These sentences (sometimes short quotes that summarise a group consensus on a topic), are then shuffled, and clustered together to form broad groups.
64 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
These groups are then given a name that describes the ideas in the groups, which are then assigned to higher-level themes. For example, using transcripts from the second pair of focus groups (non-health practitioner), we collected approximately 150 individual sentences on sticky notes, clustered them into 17 groups, which were then collected into 5 themes.
Informed Consent and Ethics Approval
Ethics approval for the focus groups and the online game was granted by the Queens- land University of Technology (QUT) Ethics Committee (Ethics Application Number:
1500000917). All applicants provided informed consent in line with QUT regulations prior to participating in this research.
A copy of the Participant Information and Consent Form (PCIF), recruitment flyer and ethics approval letter are attached in Appendix
4.3 Results
4.3.1 Results A: Project partners workshop
The key outcomes from the workshop are detailed below and a full report can be found in
Appendix C.1
1. Why is communicating statistical uncertainty important
Workshop participants considered the importance of statistical uncertainty in the context of science communication, geospatial data and disease mapping as well as the Australian
Cancer Atlas.
Participants felt that uncertainty was important within science communication for:
• quantifying the accuracy and reliability of statistical outputs
• guiding future research by highlighting areas of high uncertainty where further
research is needed
• demonstrating the development of knowledge over time as uncertainty decreases
• being transparent with the public in regards to the evolution of knowledge, and
65 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
• supporting a greater public awareness of the scientific process.
In geospatial data and disease mapping uncertainty was considered important in the modelling process to manage the phenomenon where the model outputs can be influenced by how the data are aggregated before analysis, and in combating the misinterpretation of reliability that occurs when statistical estimates are rendered solid on a map, making them appear more reliable than in fact they may be.
Within the Australian Cancer Atlas, in addition to the points addressed above, uncertainty was considered important in:
• communicating the reliability, of the outputs to decision makers in order to inform
appropriate decision making, public health policy development and health budget
allocation,
• supporting and guiding future research in cancer outcomes
• helping to quantitatively prioritise research projects, and
• telling the ‘whole story’ by communicating clearly our current state of knowledge
about inequalities in cancer incidence and survival in Australia.
In addition, uncertainty communication is an explicit research focus of the Australian
Cancer Atlas project.
2. Target audiences
Workshop participants identified the following eight target audiences as important to the
Australian Cancer Atlas. A full description of the audience characteristics can be seen in
Appendix C.1.
1. General Audience/ General Public
2. Media
3. Government, lobby groups and health policy makers and advisors
4. Health managers (regional and local)
5. Clinicians
6. Cancer patients and their caregivers, family or supporters
66 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
7. Researchers
8. Other Cancer Councils and Health Reporting organisations.
3. Uncertainty sources in the ACA
The following table summarises the sources of uncertainty identified by the workshop participants using the uncertainty sources taxonomy outlined by Morgan and Henrion
(1990). Overall this was a difficult process which workshop participants struggled to complete. Primary blockers to complete this task were:
1. Many workshops participants didn’t understand all sources of uncertainty prior to
the workshop. Even the statistical modellers struggled with some definitions.
2. Participants struggled to translate the technical definition of the different categories
to a practical uncertainty source within the ACA.
3. Some uncertainty sources can exist in multiple categories, for example, disagreement
on methods could also be placed in model uncertainties. This resulted in limited
workshop time available to the philosophical debate of the different meanings of
different categories, rather than for completing of the diagnosis task.
4. Participants struggled to see the practical purpose and impact of identifying all
sources of uncertainty, when some cannot be communicated or cannot be measured,
such as systematic error/bias.
Participants did succeed in identifying some sources of uncertainty, even if they did not all fit within the categories defined by Morgan and Henrion (1990).
67 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Table 4.2: Sources of uncertainty within the Australian Cancer Atlas
Uncertainty category Source specific to the Australian Cancer Atlas
Data - Estimated population of each regions (ABS)
- Estimated demographic breakdown of each region (ABS)
- Socio-economic status is generalised across the entire region
- Classification uncertainty around the cause of death
- Classification uncertainty around indigenous identification
- Residential address does not contain any info of time at that
residence or region
Methodologies - Smoothing algorithm
- Model prior distributions (may also be an input rather than
a method)
Disagreements - - Spatial smoothing methods model/methods
Outputs - linguistic - Meaning of: probability, uncertainty, risk, cause, correlation, uncertainty random
4.3.2 Results B: Audience group definitions (Application of NEUVis framework)
Audience profiles were developed for each of the eight audiences identified in the Project
Partners Workshop C.1. All audience profiles are detailed in C.2.
These audience profiles were valuable in identifying the user’s needs and considering how uncertainty information may be used by each. It was highly valuable in identifying those audiences that would be most sensitive to misinterpretation and which uncertainty information may not currently be relevant for.
The addition of the 7th question not previously part of the NEUVis framework (i.e. “What is the potential for this audience to have an impact beyond themselves?”) was particularly useful for considering those audiences that may be most susceptible to misinterpreting the data or the uncertainty information. Considering that this research connects geography with
68 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) cancer incidence, this is an important consideration, as it is important to avoid unfounded assumption of cancer clusters. It was also valuable in focusing on the most important audiences when there were eight target audiences.
The extension of considering these questions for the uncertainty information and not only the data was valuable in mapping out which audiences this information will be most meaningful for. The framework helped the non-technical team members (including designers) to develop an understanding of how the uncertainty impacts the interpretation of the insights. This challenge can be difficult when the impact of uncertainty is discussed abstractly. Additionally, the process of identifying audience needs and behaviours en- abled a guided discussion between the modelling and other stakeholders. This enabled the project partners to come to an agreement of which types of uncertainty were most important and for whom.
A clear understanding of the target audiences, their needs, current behaviour and expertise enabled the diverse perspectives of the project partners to remain focused on the target audiences and balance the desired project outcomes with the target users’ needs.
4.3.3 Results C: Focus groups
Due to limited resources it was not possible to consult all audience types through focus groups. Therefore, under the guidance of the Cancer Council Queensland, we focused on the following four audiences:
1. General audience
2. Cancer patients and their caregivers, family and supporters
3. Health practitioners (including health managers & clinicians)
4. Policy makers
These four audiences were combined into two groups: general audience + cancer patients and their caregivers, family and supporters (referred to as patient-participants from here on); and health practitioners.
Focus groups were used to build on the audience groups outlined in Section 4.3.2.
69 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Table 4.3: Focus Group Participants
Focus group No. Participants
FG1 Health Practitioners F = 5, M = 0
FG2 Health Practitioners F = 7, M = 0 Total 12
FG3 Patient-Participants F = 6, M = 0
FG4 Patient-Participants F = 6, M = 1 Total 12
The first two focus groups were held with groups of health practitioners. This included nurses, General Practitioners, other clinicians and staff from support groups for cancer patients as well as one participant from health policy development. In total twelve respondents, participated in this group.
The second two focus groups were for people from a general audience and who did not identify as health practitioners of some kind. It became apparent that all of the participants in these focus groups had personal experience with a cancer diagnosis or a close relationship with a person who had. It was noted that this means that the findings from this group may not necessarily reflect the opinion of the general public. However, in the context of chronic disease maps and specifically the Australian Cancer Atlas, this audience is still highly relevant. I also note that this is not unexpected for cancer, as most people in a general audience have had some personal experience of cancer, whether through a personal diagnosis or through a relationship with someone who has.
Focus group insights: Health practitioners
These focus groups provided insight into the way that health practitioners work with cancer patients and data around cancer. The results of these focus groups can be grouped into four areas, the attitude of the health practitioner, the challenges related to their practice, the issues that their clients deal with (or that they have to deal with due to their clients) and uncertainty.
70 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Results were further translated into health practitioner needs, which can then be acted upon within the design of the ACA.
Attitude
The attitude of the health practitioner to their work is very important, as they often face significant frustrations. Overall it can be noted that they value their work, and understand its importance to people, particularly in remote areas. They have a very broad understanding of cancer, which is based on: data and research which is as current as available to them; and the personal experiences they have with dealing with cancer. They shared a perspective on the difficulties of accessing screening and treatment information.
The discussions often came back to the way that the practitioner:
• Empathises with their client as an individual, whose experience is different from
every other individual,
• Maintains a positive message and a positive outlook for their clients, and wants
to base this on data that show how outcomes are improving for cancer patients
compared to a few decades ago
• Wants to be able to give their clients comparisons between their risks in diagno-
sis/treatment options/mortality to other activities in daily life that involve some
risk, such as driving a car/being struck by lightning
• Wants information that they give to their clients to be individually meaningful, but
also supported by data
• Believes that the way that cancer is talked about needs to be addressed, particularly
related to how cancers are grouped. That is, cancer is not one disease, and not all
breast cancers are the same, and the same cancer in different areas of the body have
very different outcomes.
Challenges in their practice
Health practitioners spend a lot of time communicating with clients who are going through extremely stressful experiences. They talk a lot with clients, but also find and print out
71 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) relevant information to give to their clients. They have tools that they rely on, and the trustworthiness of these is important, but they are unsure if the tools are the most up- to-date, which they would find valuable. A large challenge is with consistency in data, particularly for those who work with children and young adults. They may have difficulty accessing data consistently, or by the way that age groups are defined. Some practitioners also wanted information about the outcome of non-intervention, doing nothing at all, and how that relates to patient outcomes. Finding the right information for the client can be difficult to impossible. However, they are able and willing to adapt to the different challenges.
Client issues
The participants worked with a range of different clients, some with the very young and others with the very old. While there can be acceptance from aged clients about the terminal nature of cancer, there is a lot of desire for information from the parents of young patients. Clients, and their parents, often have misconceptions about cancer, which the practitioner needs to address, particularly to do with treatment options, causes and outcomes for persons with cancer. Related to this is the difficulty that many clients have with interpreting statistics, and how, or if, any application can be made to the individual from the statistics they find. This is significant because often clients will look for their own information online. Clients can be generally well-informed, though the source of their information is not always reliable; it was noted that often the clients have known someone who has had cancer, and it is not as unfamiliar as it was a few decades ago.
Uncertainty
The issue of uncertainty was brought up as part of the focus group, particularly how practitioners and their clients understood the idea of uncertainty around cancer estimates.
The practitioners who understood statistical uncertainty found it useful, but for others, terms such as confidence intervals or credible intervals are unclear, and are typically ignored. Whether the practitioner gives the clients any statistics depends on the client, and whether the practitioner thinks that it would be good for them; the practitioners see clients facing very real and difficult challenges and respond to them accordingly.
72 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Focus group insights: Patient-participants
These focus groups provided insight into the current behaviour and needs of persons with cancer, or with family members affected by cancer. Discussions explored current infor- mation seeking behaviour, how they understand the way that cancer is communicated, how they find and use information about cancer, how it relates to them, the importance of autonomy and control and some of their desires and values connected with these issues.
Behaviour
Some participants commented that they only spent the time and effort to understand information about cancer after they were personally affected, through a diagnosis of themselves or a loved one. They understood that the risk of cancer developing is in-
fluenced by many behavioural choices (such as smoking, for example), but felt that the cause and effect was so far removed, that it didn’t seem real when they were younger.
Some patient-participants felt that there was some inevitability to their diagnosis, partly because of their environment, heritage, genetics, or a combination of each (such as living in northern Queensland with very fair skin), but also because of a lack of understanding about risk factors when they were younger. Outside of sun exposure, they didn’t suggest that they perceived geographical location on a national scale as a cause of risk increases.
However, risks were perceived to be influenced by local environmental factors, such as pollution or occupation.
Communication
Patient-participants generally felt that communication around cancer was fear-mongering, and negative. A significant insight into the way that cancer is talked about, particularly in the media, was how left out, or forgotten some cancer patients feel. It is perceived that some cancers, particularly breast cancer, but also now prostate cancers, are the ‘popular cancers’ because of successes in research, funding, and awareness. However, this can be discouraging for people who have other forms of cancer that don’t have the same level of support or awareness. There are two extreme cases that were brought up during the discussion. Firstly, lung cancer, where participants stated that they felt that they were stigmatised, as if they had done something to cause the lung cancer. Second, the description of cancer as a ‘manageable chronic disease’ particularly when related to some
73 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) kinds of prostate or breast cancers by the media, can be very disheartening for patients of other cancers where this is not a reality. It was noted that patients understood that it may be more relevant to group cancers by the mutation that causes them, instead of the organ where it is located.
Generally, the patient-participants would have preferred to have information about mor- tality rates reframed as survival rates. Some participants felt that this would make no difference to them personally, as they described how they mentally process these percent- ages, but the groups generally thought this would be better. This was seen as a more positive way to communicate the statistics. A 15% survival rate gave the participant more to hang on to than an 85% mortality rate. Patient-participants also value hearing and sharing personal experience about cancer. Many patient-participants shared stories of how they were able to support others in their experience, as well as how they received support from other cancer patients. This was very encouraging to them.
Sourcing information
The reputation of organisations, such as Cancer Council Queensland, is very important to cancer patients, as they are seen as a source of reputable information, whereas mass media was seen as either shallow or fear-mongering. The patient-participants were critical of information from the internet or friends, and strived to build a bank of information from a variety of sources they could draw from. They want information that is clearly explained, and ideally would be based on outcomes, which could help them take informed action.
Patient-participants often took information to their medical team, whose expertise they trust.
Relating to information
When sourcing information, users had different methods of making it personally mean- ingful, particularly statistical data. People made comparisons so that abstract statistical data could be made more tangible, for example; consider 100 people “like me”, and after 5 years, how many people with my condition would have survived. Patient-participants said that it was difficult to understand technical and statistical language, and often skipped over that information. However, they didn’t suggest a lack of confidence in the research itself, rather that it was inaccessible. One point that came up relating to relative risk was
74 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the assumption that when a range of risks is presented, people with a healthy lifestyle were at the bottom of the range, and people who have unhealthy habits (such as smokers) were at the higher end of the range.
Autonomy and control
Cancer can mean a lot of different negative things to the people who are personally affected by it. Our patient-participants wanted to feel in control of their own decisions.
They wanted to rely on the advice of the experts in the medical team, and wanted to find useful, actionable research, about which they would get advice. One challenge that was discussed was alternative treatments. While patient-participants appreciated concern for and from friends and family, they did not want other people to push their own ideas onto them about what they should be doing, or who they should or shouldn’t trust. It was more important that the patient-participant was able to make their own decisions regarding where to place their trust.
Focus group limitations and challenges
We wish to clarify a few limitations and challenges we encountered when planning and running the focus groups. Firstly, there was limited representation from men; the only male participant was involved with one of the focus groups for the general public. One challenge we encountered was that we were unable to attract health policy experts or policy makers to participate in the focus groups, where their input would have been valuable. Finally, it quickly became apparent that most of the patient-participants had personal experience with cancer diagnosis, or had close friends or family that had. It is not unexpected that members of the general public would know someone who had previously received a cancer diagnosis. However, the level of personal experience with cancer diagnoses among our patient-participants was higher than for a totally random sample of the population. It is likely that this is because people who are part of the
Queensland Cancer Council’s networks will be more likely to be personally affected by cancer. Though this was a challenge in conducting and evaluating the focus group data, it is notable that people who were involved in the general public focus groups represent a group who are interested in the disease maps as an information resource in their own experiences with cancer diagnosis.
75 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
4.4 Discussion & insights for the Australian Cancer Atlas.
Embedding uncertainty information into an already visually rich cancer map is a signifi- cant design challenge that relies on: understanding the current practices and technologies; identifying target audiences; identifying why uncertainty is important and how it influ- ences the key insights; and understanding the target audience’s current behaviours and needs. The Australian Cancer Atlas has been utilised as a case study for the application of design thinking tools, uncertainty diagnosis frameworks and user focus groups as ap- proaches for addressing this complex challenge. Combined with the grey literature review detailed in Chapter 3 that explored current cancer mapping practices, many insights have been developed through this case study. The process, however, has also highlighted areas where further work is required to improve methods and tools for including uncertainty into the design of scientific communication material for the non-expert audience.
Insight 1: If uncertainty information is not understood, it is ignored. Key
messages of a cancer map should stand on their own, independent of any
uncertainty information.
Insight 2: Credible intervals are of interest to the non-expert audience of a
Cancer Map once their meaning is explained.
As identified by the grey literature review, including uncertainty information is not common practice in cancer maps. Uncertainty is often not included and modelled outputs are commonly reported as point estimates. When uncertainty is included, it is usually in the form of credible/confidence intervals, error bars or statistical significance. The participants in the focus groups highlighted that these measures are not often understood by health practitioners or the general public, and in these cases are simply ignored. Map design should ensure that the main message can be understood independent of any uncertainty information.
Discussion with the focus group participants revealed that explaining the significance of the credible intervals and errors bars was very easy, and the information was perceived
76 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) as very useful, interpretable and interesting once explained. Focus group participants suggested that a short video explaining their meaning would be useful.
Insight 3: There is a need for practical tools for diagnosing uncertainty sources
in language that is accessible to applied researchers and communication de-
signers. Taxonomies of uncertainty sources are challenging to apply in the real
world applications for this purpose.
There are many different sources of uncertainty within quantitative research projects, as outlined in Section 2.1.3. Identifying these different sources can inform how uncertainty within a specific project is expressed, communicated or visualised (or justifying why it should be left out). For example, uncertainty due to high natural variation may be visualised in a different manner to uncertainty due to small sample size. Although the uncertainty taxonomy by Morgan and Henrion (1990) is well cited in the uncertainty communication literature, using this as a framework for identifying uncertainty sources in the Australian Cancer Atlas proved to be very challenging, even for experienced applied researchers. These challenges highlighted the need for practical diagnostic tools for identifying uncertainty sources within specific research project. Pivotal to the success of such a tool would be an agreed taxonomy which explains the different sources of uncertainty in language that is accessible to both a quantitative researcher as well as non-technical stakeholders such as design experts. Tools that help the design expert understand these complicated concepts and why they are important to the key messages of a project would be highly valuable but do not currently exist.
Insight 4: Reputation of the publishing organisation is important to a general
audience’s perception of credibility of the information. Further to this, easily
identifying the data sources and organisations who conducted the analysis
influences perceived credibility for health practitioners.
The majority of maps identified were published by research organisations, government, not-for-profit organisations or partnerships between these. Focus groups highlighted that this was not a trivial matter, and expressed that they specifically considered the credibility
77 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) of the cancer information they accessed, and the publisher of that material was a key factor in evaluating credibility. Further to this, health practitioners specifically looked for where the data had come from to inform the credibility of the resource. In many of the online cancer maps it was very difficult to find the data used to generate the modelled estimates presented in the map.
Insight 5: Uncertainty can be misinterpreted when an interval is presented.
Care should be taken when intervals are used to present uncertainty information. Focus group participants perceived, for relative risk of a particular cancer, that people at the lower end of the range had made healthy lifestyle choices and people at the top end of the range had made poor lifestyle choices such as drinking, smoking and insufficient exercise.
Insight 6: Comparison of cancer outcomes and communicating the complexity
of cancer are design opportunities for health practitioners.
Communication
There is an opportunity to communicate and think differently about cancer, which was expressed during the focus group. The practitioners need to communicate this, which may be an opportunity for a design intervention. The practitioners also have to re-interpret statistics that are published, or found by the client. A design intervention should take this into account, particularly given that dealing with cancer is an individual experience for the client. Communicating effectively with the client is important, and should consider the way that the non-expert in statistics will respond to this information.
Comparison
In order to make the information relevant to the client, the practitioner may make compar- isons with other parts of life that involve risk. This would be an interesting opportunity for a designed intervention, particularly for the client who comes to the practitioners with information they have found online. The practitioners were optimistic, positive, hard-working people who do extraordinary work helping clients through extremely chal- lenging circumstances. Their optimism is fuelled by a broad outlook on cancer, and how
78 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) it has changed. This may prove to be a useful opportunity for a design intervention to make comparison in order to show how diagnosis, treatment and patient outcomes are improving, and will continue to improve.
Insight 7: Design opportunities for a general audience - Communication and
community.
Communication
As noted with the focus groups with the health practitioners, there is an opportunity to communicate and think differently about cancer, which was expressed during the focus group: cancer is not one disease, breast cancer is not one disease either. In addition to this, it may also be beneficial to more positively frame statistics from mortality to survival. A user without training in statistics has the same need and desire to understand their own circumstances. Statistical information should therefore be tailored to an end-user who has no experience with statistics. More detailed information may be given on demand if the user has the understanding to process it, but information should be accessible to the statistical novice. From examples that were described, it seems that some resources are not created with due consideration for needs of the patient, but for statisticians. Data, and the results of data anlayses, would be more useful if it were made more tangible to users. Finally, communication around cancers generally should be more mindful of the perceived biases that participants with cancers other than breast and prostate cancer have.
This is also an opportunity to create an intervention that addresses misconceptions around what causes cancers, to help alleviate some of the unfair stigma around certain cancers.
Community
Real personal experiences are valuable to cancer patients. Participants gave numerous examples of how they were called upon by friends and family who had recently received a diagnosis, or how they were frustrated by unwarranted advice from a friend that they didn’t trust. Personal stories and connections can potentially reduce feelings of isolation for patients who have rare, or typically stigmatised cancers. Creating links between communities (such as Facebook groups and advocacy groups) with individual stories and reliable, evidence-based information may be valuable to the everyday user.
79 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
4.5 Conclusion
In this chapter I explore uncertainty communication within cancer mapping. Informed by the grey literature review described in the previous chapter which built an understanding of the current practices in cancer mapping and the lack of uncertainty generally included in these maps. I also explored the use of Morgan and Henrion (1990)’s taxonomy for diagnosing uncertainty within the Australian Cancer Atlas, and extended the NEUVIs design method proposing a user-centred design framework that can be applied to all science communication challenges, and conduct focus groups with target audiences of the
Australian Cancer Atlas. I have summarised the conclusions for each of these research activities below.
Diagnosing uncertainty sources
The research team tested the use of Morgan and Henrion’s (Morgan and Henrion (1990)) taxonomy of uncertainty as a practical tool for identifying the uncertainty sources within the Australian Cancer Atlas. This taxonomy alone was not sufficient and the team found many challenges including: perceived overlap between uncertainty sources, a high level of technical knowledge required to practically apply the framework, and a lack of per- ceived benefit in the activity. The team did identify the main uncertainty sources, but the taxonomy is not a practical tool for diagnosing uncertainty without further development.
Further development would focus on clearer definitions of the taxonomy, including ex- amples for each category, and re-designing the tool so that the application focused on the most important uncertainty sources in a project first, rather than just going through every possible uncertainty source.
NEUVis Design Framework - for uncertainty communication design
Utilising the NEUVis design framework to systematically map the impact and importance of uncertainty information to the identified targeted audiences was highly successful. The framework provided a mechanism for technical experts, applied researchers and commu- nication design practitioners to navigate the communication challenge collaboratively and share required knowledge. The framework maps different uncertainty information within the project with the target audiences and their needs. The addition of a 7th question to
80 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the NEUVis framework through this study “What is the potential for this audience to have an impact beyond themselves”, helped further rank potential audiences in terms of importance. The NEUVis design framework places the users’ needs at the centre of the design/communication challenge. Within the uncertainty communication literature, the users’ needs are increasingly recognised as essential to the uncertainty communication challenge. This case study has demonstrated that the NEUVIs design framework can pro- vide a road map for design practitioners and technical experts to collaboratively navigate through this complex communication challenge. The framework was time consuming to apply and future research could focus on streamlining the process.
Focus groups
The focus groups validated, and expanded on, the audience profiles developed through the application of the NEUVis design framework. They revealed opportunities for improving the way cancer is discussed and communicated to both medical practitioners and the general audience within, or alongside, a cancer maps.
In particular, it was noted that both groups saw value in raising understanding about different types of cancer and groupings of cancers. A recurring discussion in the focus groups was that as cancer is not one disease, that is, the same type of cancer can be found in different tissue and different types of cancer can be found in the same tissue. Educating the general public about this may have a beneficial effect on the way that cancer is understood and perceived. In addition, it is apparent from both groups that any statistical information that is not understandable is simply ignored. This must be a design consideration when developing public-facing disease maps, as data that is rendered may be only partially interpreted, an incomplete view should not be misleading. The relationship of statistics to an individual is another area that emerged in the discussions. It is difficult for a member of the general public to understand if and how statistics relate to them personally, and often practitioners are approached with this kind of question. A cancer atlas that supports the discussion around the difference between population statistics and personal statistics could be valuable for a practitioner who is trying to help their patients find meaning in the data.
81 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY)
Medical Practitioners
Medical practitioners stated that they wished to communicate improving results from medical interventions, especially in comparison to outcomes several years (or decades) ago, as well as in comparison to other activities in life that involve some risk. As outcomes have changed, it would be helpful to medical practitioners to have easy and consistent access to the most up-to-date information available, in order to pass this onto their clients.
General Public
The general public expressed that they would benefit from having a more thorough understanding of risk, and what effects risk, particularly cancer risk. One participant commented that she, as a non-smoker, felt isolated because she had lung cancer and that people around her felt that it was probably because of something that she had done that caused her cancer, such as smoking. Of course, this is not always the case and it is not how risk works. This type of stigma around some cancers (such as lung cancer) and popular or celebrity support for fundraising around other cancers (such as breast cancer) can leave some patients with feelings of isolation because of their own situation. While supporting all cancer research should obviously be encouraged, it is also important that all patients, particularly those with rare cancers, are able to connect to support and share their stories with each other.
Opportunities
The focus groups brought several opportunities for design interventions to light that were outlined in the discussion. We recommend that future research and designs incorporate
User Modelling within cancer maps to better serve the needs of different audiences 1.
Potential audiences for cancer maps may be: researchers, policy-makers, patients and the media. There is a wide range of understanding of statistics across these groups, for which design of cancer maps–or any chronic disease map–should take account. The most effective way of publicly communicating uncertainty around modelled estimates will always depend on the nature of the message, the statistical methods used, the data, the context, and of course, the audience. Uncertainty communication must be embedded within the larger communication design challenge if it is going to contribute to addressing
1For more information about User Modelling in the field of Human-Computer Interaction, see (Kay2011,fischer2001).
82 CHAPTER 4. RESEARCH ACTIVITY 1.B: USER CENTRED UNCERTAINTY COMMUNICATION DESIGN (AUSTRALIAN CANCER ATLAS AS A CASE STUDY) the users needs. The tempting and easy solution of simply adding an error bar of credible interval is generally ignored by the non-expert user.
83
Chapter 5
Research Activity 2: User study - Uncer- tainty representation in an online game.
The following investigation aims to contribute to a growing body of research on uncer- tainty communication. It uses an online game to investigate how different uncertainty representation methods influence players’ behaviour. In this research I explored how players allocate resources differently depending on how the uncertainty around an es- timate of risk is communicated. The research involved multiple steps including: game design, digital implementation, recruitment of participants, and analysis of the online game interactions. In addition to exploring the impact of different uncertainty represen- tation methods on behaviour, this chapter also explores the impact of the risk level and uncertainty levels within the game.
5.1 Introduction
There are three impediments to effective uncertainty communication (Buttenfield and
Beard, 1991), they are:
1. standardisation of terminology,
2. methods for measuring and representing uncertainty, and
85 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
3. methods for depicting uncertainty information simultaneously alongside the esti-
mates/data it relates to in an understandable, useful, and meaningful way.
The following chapter focuses on Buttenfield and Beard (1991)’s second impediment and contributes to the challenge of uncertainty communication through a user study that explores uncertainty representation methods. Uncertainty measures that commonly make their way into material targeted at a general audience include: numeric intervals (such as confidence intervals, credible intervals or prediction intervals), point estimates ± error, statistical significance, distributions, boxplots, standard deviation or semantic versions of intervals, as discussed in Section 2.3.2. These methods are often seen in communication material targeted at a general audience, and in particular are seen in the cancer map grey literature review detailed in Table 3.1, however their impact on, or interpretation by, non-expert audiences is not well understood.
The study detailed here uses an online game to explore if users’ behaviour is influenced by the uncertainty representation method and compares: intervals, semantic uncertainty intervals, point estimates ± error and point estimates without error (i.e., no uncertainty).
Validated uncertainty representation methods are important building blocks for address- ing uncertainty communication more broadly (Buttenfield and Beard, 1991). A greater understanding of the most effective ways to represent uncertainty measures can sup- port communication designers to develop communication solutions that more effectively addresses the users’ needs.
Within the grey literature review of online cancer maps outlined in Section3,( 42%) of the identified maps included a measure of uncertainty. The most common measures were credible/confidence intervals, and point estimates ± error. Most however, reported only a point estimate of the modelled cancer incidence/survival or risk with no uncertainty.
When participants of the focus groups, outlined in Section 4.2.3, were asked to interpret confidence/credible intervals and point estimates ± error on example maps, those that did not have previous experience with statistical information ignored the measures as they did not understand them.
86 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
The challenge of uncertainty representation for the non-expert user has an extra layer of complexity when the estimate itself is a probability. This uncertainty around an estimated probability (or uncertainty on top of uncertainty) is present in the Australian Cancer Atlas and therefore is used within the online game. Due to the concern that referring to cancer within the game could be upsetting to any participant who has a lived experience of cancer, either personally or of a loved one suffering from a cancer diagnosis, the game uses a non-health related context. A fictional pirate, exploring the seas and looking for ships to steal gold doubloons from.
While the communication of the statistical uncertainty within the game and the cancer atlas are similar, in that they are both an estimated probability with uncertainty, the different contexts clearly have an influence behaviour. The online game is a fictional setting with no connection to the players’ real life. While the cancer map could have a very real connection to the viewer, either through their own lived experience or that of a close friend or family member. Additionally, the outcome in the online game is financial, while those in the cancer atlas represent loss of health, potentially pain and death. It is reasonable to question if decision makers respond differently to risks of a financial verse a physical nature as well as ‘real’ verse fictional. These considerations should be kept in mind when interpreting the study results for the Australian Cancer Atlas.
5.2 Aim
To use an online game to investigate how different uncertainty representation methods influence players’ behaviour. Specifically, this study explores three different uncertainty representation methods, these are: a numeric uncertainty interval, a semantic uncertainty interval, and a point estimate ± error. A point estimate with no uncertainty is used as a control.
5.3 Online Simulation - Impact of uncertainty communication meth- ods on decision making
To quantitatively investigate the impact of different uncertainty representation methods on decision making I designed and built a web-based game to assess if players allocate
87 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
resources differently when uncertainty information is presented in different formats. The
research explored user behaviour across different: uncertainty representation methods,
levels of risk and levels of uncertainty.
5.4 Methods
5.4.1 Game Design
Players were randomly shown one of four different versions of the on-line game, for each
version of the game all features were identical, except for how the uncertainty around the
‘risk of defeat’ was communicated. The four styles of representation are listed in table 5.1
below.
Table 5.1: Uncertainty representation method for each game mode.
Uncertainty Representation
Game Mode Method Representation in Game
Interval Upper & lower bounds of the Risk of my defeat: is between
interval 30% and 70%
Plus/minus Point estimate ± error Risk of my defeat:is 50% give
or take 20%
Semantic Interval Semantic bounds of a interval Risk of my defeat: is between
below average to above average
Point Estimate No uncertainty shown Risk of my defeat: is 50%
To make the game, we created a theme and novel mechanic and context, and gave the
player an objective, an obstacle and an incentive with a rule framework (Deterding et al.,
2011). Each of these are defined below.
88 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Figure 5.1: Game Landing Page
Context
The theme or premise of the game places the player as the quartermaster on a pirate ship, see Figure 5.1. Three ships, of different sizes and carrying different amounts of gold doubloons ( ‘reward’ ), each of which are known, are sailing into range for the player to attack, see Figure 5.1.
The ‘risk of defeat’ for each ship is represented in two ways, pictorially by the relative size of the ship and in sentence form, as shown in Figure 5.2. Large galleons and frigates are more difficult to attack, and have a higher ‘risk of defeat’ compared to smaller cutters and tiny sloops.
The player’s ‘risk of defeat’ and the uncertainty around that ‘risk of defeat’, (seen in the blue squares in 5.2) is presented in one of four uncertainty modes, which are outlined in Table
5.1.
For each ship there is a stated ‘reward’, in gold doubloons, which the player wins if they are successful in their attack on that ship. The player must attack all three ships. Players were not informed that the purpose of the game was to explore uncertainty representation methods, see Appendix E.2 for the informed consent flyer.
89 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Figure 5.2: Game Play Page
Objective
The player’s task is to defeat all three ships, thus winning the ‘reward’ (gold doubloons) on each ship. For each ship in the game, there is a risk that the ship will defeat the player, that is there is a ‘risk of defeat’. This ‘risk of defeat’ is expressed as a probability, see the dark blue boxes with white lettering on the right of Figure 5.2. The player starts the game with 30 gold doubloons that must be spent across the ships to purchase ammunition and supplies needed for attacking the approaching ships. For every gold doubloon allocated to a ship, the ‘risk of defeat’ decreases by 1%.
Each ship has three variables associate with it, these are: a ‘risk of defeat’, ‘uncertainty’ information, and a ‘reward’.
The goal of the game is for the player to make more money from the attack on the three ships than the original 30 doubloons they started with. For every ship that is defeated the player wins the stated reward. In this way, the research aims to explore if players allocate the available 30 gold doubloons differently across the four game modes. For example, consider a game that has the following ships (these are different ships than those shown
90 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
in Figure 5.2): 1. a French frigate with a moderately high ‘risk of defeat’ and low ‘reward’, 2.
a Dutch galleon with a very high ‘risk of defeat’, and high ‘reward’, and 3. a Spanish sloop
with a low ‘risk of defeat’ and a high ‘reward’. All three ships in this example game are
coming into range for attack. The player must attack all three ships and must decide how
to spend the gold doubloons to buy ammunition and supplies across the three attacks
that will balances risk and reward. They must choose a strategy that maximises their
perceived potential reward. In this way, how the player allocates the gold doubloons
provides an insight into their risk-taking behaviour and their ability to maximise their
expected reward.
Action
The players available action for reaching the objective of the game (defeat all ships and
win as many gold doubloons as possible) is to allocate doubloons across the three ships
and thus reduce the ‘risk of defeat’ on each ship and maximise the overall winnings. The
player can allocate the 30 doubloons provided for each game using the +/- 1 and/or +/-
5 buttons seen in Figure 5.2 (white buttons with blue outline and blue lettering). The
square to the left of these buttons indicates how many doubloons have been allocated
to that ship. For example, in Figure 5.2, the first ship has 5 doubloons allocated to it,
while the second ship has 20 doubloons. The ‘risk of defeat’ is reduced by 1% for each gold
doubloon allocated to a specific ship, thus giving the player influence over their chance of
defeating the target ship and thus the outcome of the game. The change in ‘risk of defeat’ is
reflected directly to the user by the ‘risk of defeat’ displayed on the screen decreasing as
soon as doubloons are allocated to that ship.
Obstacle
The player has limited resources to spend on supplies and must decide a strategy of
allocating the doubloons across the three ships in a way that maximises the potential
‘reward’.
Incentive
For each ship the player defeats, they win the stated ‘reward’ associated with that ship.
91 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Game play
In each game there are three ships approaching the island at one time. All ships must be attacked simultaneously, and the outcome of an attack on one ship does not influence the outcome of the attack on any of the other ships.
The player is told of the outcome of the game within a few seconds of submitting their allocation choices. The time it takes the play the game depends on how long the player deliberates on how to allocate resources. The shortest possible game would be how long it takes to click the buttons and allocate the 30 doubloon (< 30 seconds). Each game is set to time out after 20 minutes. Once the game is over and the results reported, the player is given a choice to reset the game and play again. There is no accumulation of points, that is, there is no relationships between games.
Optimal strategy & player performance
It is assumed that the player aims to maximise the expected ‘reward’ in each game. That is, they are playing with the aim of winning as much gold as possible.
5.4.2 Game Hosting and Advertising
The game was built using Flask1, a web server framework built in the Python program- ming language. The game was available online from March 4th to May 15th 2017, and was advertised through the CCQ’s e-newsletter and social media channels, as well as through
Queensland University of Technology’s (QUT) social media channels.
5.4.3 Informed Consent and Ethics Approval
Ethics approval for the focus groups and the online game was granted by QUT’s Ethics
Committee (Ethics Application Number: 1500000917).
All applicants provided informed consent in line with QUT regulations prior to partici- pating in this research. The informed consent and email flyer for the game can be seen in
Appendix E.2
1http://flask.pocoo.org/
92 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
5.4.4 Data Collection
Game interactions were recorded in a MongoDB database2 and hosted seperately on mLab3. To ensure all collected data was securely stored the database and game were hosted seperately.
The following variables from the game were recorded:
game mode : type of uncertainty communication method shown to the player
si : ship within a game, where i can be 1, 2, or 3
ri : reward if the attack on si is successful (in gold doubloons). Drawn randomly from an arbitrary normal distribution N(µ = 30, σ = 11).
pi : probability of being defeated by si (a number between 0 and 1) shown as ’risk of defeat’ in Figure 5.2.
Drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24).
ui : uncertainty level of pi expressed as an artificial uncertainty interval (selected randomly as either 0.1, 0.3, 0.6)
usr βi ∈ {0, 1, ..., 30} : the number of gold doubloons allocated by the player to s1, s2 and s3 ∗ βi ∈ {0, 1, ..., 30} : the set of gold doubloons allocated to s1, s2, and s3 which optimises the expected value of the game.
session : game session used as a proxy for player
risk profile : categorical set of pi for s1 to s3, in which pi is either high (h) or low (l), and therefore risk profile is either lll, llh, lhh, or hhh.
See further details below.
uncertainty profile : categorical set of ui for s1 to s3, in which ui is either high (h), medium (m) or low (l), and therefore uncertainty profile
is either lll, hhh, mmm, llh, lhh, llm, lmm, mhh, mmh or lmh.
2https://www.mongodb.com/ 3https://mlab.com/
93 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
5.4.5 Analysis
Software
Statistical analysis was conducted using R version 3.3.3 (R Core Team, 2017) and R Studio version 1.1.383. All duplicates were removed from the dataset prior to analysis. GUROBI optimisation software (Gurobi Optimization, 2016) was used to solve the integer program- ∗ ming problem required to calculate βi ∈ {0, 1, ..., 30}.
Quantifying Behaviour
Behaviour is complicated. It is multi-dimensional, complex and difficult to measure. In this research I use two measures to quantify the players behaviour, a performance ratio
(PR) and the Gini coefficient. These two measures provide two different one dimensional slices of a multi-dimensional variable. They are not intended to be a complete measure of beahviour but they can contribute valuable insights. Due to limited resources this investigation explores only these two measures, which are considered a starting point to exploring how behaviour is influenced by uncertainty representation methods.
The performance ratio (PR) provides a measure of the players ability to maximise the expected reward within the game. While the Gini coefficient provides a measure of inequality in how the player allocates their available resources. Resource spreading is a measure of risk averse behaviour (Mistry and Trueblood (2017); Wernerfelt and Karnani
(1987)) and in this research the Gini Coefficient is used in this way.
While PR could also be considered a measure of risk-taking behaviour, in that the strategy that maximises reward also minimises risk of loss over repeated games. However, the PR measure does not differentiate between games that have the same score but have achieved that score through different strategies. For example, two players may achieve the same
PR value, but one allocated the majority of their doubloons to a losing ship, while the other spread the doubloons across all three ships. Neither have maximised their expected reward, however their allocation of resources is very different. The PR measure does not differentiate between the behaviour of these two players. Therefore I use the PR measure only as a measure of performance maximisation and not of risk-taking behaviour.
94 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Notation
For each iteration of the game one of each of the parameters gamemode, ri, pi, and ui are drawn randomly for each ship.
Let S = {s1, s2, s3} be a set of ships presented to the player, and for ship si ∈ S the following are defined:
Data: game mode Game mode (0-3). Allocated at random
ri ∈ {10, 11, ..., 50} Reward for a successful attack on ship si. A number between 10 to 50, drawn randomly from an arbitrary normal distribution N(µ = 30, σ = 11)
pi ∈ (0, 1) Probability (beween 0 and 1) of being defeated by ship si. Drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24)
ui ∈ (0.1, 0.3, 0.6) Uncertainty level of pi randomly allocated as either 0.1, 0.3 or 0.6. Allocation: usr βi ∈ {0, 1, ..., 30} A set of gold doubloons allocated by the user to s1i. ∗ βi ∈ {0, 1, ..., 30} Gold doubloons allocated to si in a maximised version of a game.
Calculating the Performance Ratio (PR)
The players performance is measured as a ratio of the player’s strategy against the optimal strategy. This ratio is termed the performance ratio (PR), it is given as a percentage, and is calculated as:
E [R] PR = user , (5.4.1) Eoptimal[R] where R is the reward, Euser[R] is the expected reward using the user’s allocation of gold doubloons, and Eoptimal[R] is the expected reward using an optimal allocation of gold doubloons.
Each doubloon reduces the probability of being defeated (pi) by 1%. If βi doubloons are allocated to si, pi is
1 − max(0, pi − βi/100).
95 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
j The performance of any strategy βi is
3 E j j[R] = ∑ 1 − max(0, pi − βi/100) ri. i=1
usr usr usr The player’s chosen allocation (β1 , β2 , β3 ) is used to calculate the expected reward, and the optimal allocation of doubloons are used to calculate the maximum expected reward possible for that game. A maximisesd game is denotated as β∗ such that (β∗ = ∗ ∗ ∗ (β1, β2, β3)). Therefore:
3 ∗ β = argmaxβ ∑ (1 − max(0, pi − βi/100)) ri i=1 3 such that ∑ βi ≤ 30. i=1
The integer programming problem was solved using the GUROBI optimisation software
(Gurobi Optimization, 2016).
PR provides a measure of the players ability to maximise the expected value of the game.
Calculating the Gini Coefficient
The Gini coefficient (also commonly called the Gini index) measures the inequality amongst values of a frequency distribution, and was developed by the Italian statis- tician Corrado Gini (Gini, 1912). The Gini coefficient is defined as a ratio with values between 0 and 1, where the numerator is the area between the Lorenz curve (R in Figure
5.3) of the distribution and the line of perfect equality (perfect equality) (area ORP in
Figure 5.3); the denominator is the area under the uniform distribution line (R) (Bellu and
Liberati, 2006).
A Gini coefficient of 0 expresses perfect equality, where all values are the same (e.g., where everyone in a population has the same income). A Gini coefficient of 1 expresses maximal inequality among the population (e.g., one person has all the income and all others have zero income). The measure is commonly used for evaluating income equality, but is also widely used in a range of applications where equality is of interest, such as; the
96 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Figure 5.3: The Lorenz curve and Gini coefficient (Bellú, 2006) geographical distribution of rare and endangered species (Engler, Guisan, and Rechsteiner,
2004), access to education (Gregorio and Lee, 2002), or access to opportunity in health
(Rosa Dias, 2009).
concentration area ORP G = = . maximum concentration area OPQ
In this research the Gini coefficient (Gini, 1912) is a measure of how evenly players spread their gold doubloons across the three ships within a game. Since resource allocation is an expression of risk-taking behaviour (Mistry and Trueblood (2017); Wernerfelt and Karnani
(1987)), this measure quantifies risk-averse behaviour. A Gini coefficient of 0 suggests a player has allocated all doubloons equally across the three ships indicating a risk-averse type behaviour that indicates unwillingness to risk all resources on one option. A Gini coefficient of 1 suggests the player has allocated all doubloons to one ship, indicating the player is willing to put all available resources on one option, displaying a more risk-taking behaviour.
The Gini coeffcient was calculated using the ineq package in R (Zeileis, 2014).
97 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Models and Tests
This research investigates the influence of uncertainty communication methods ( ‘game mode’ ) on players behaviour. In order to understand any between player changes in behaviour, I have also investigated the influence that ‘risk profile’, and ‘uncertainty profile’ have on behaviour. Therefore, the following analysis is broken into three sections looking at the influence of game mode, risk profile or uncertainty profile on behaviour. Within each section I test for statistically significant differences in PR and the Gini coefficient.
Analysing the effect of game mode on behaviour
I remind the reader there are four different game modes in this analysis, see Table 5.1 for definitions.
Performance ratio (PR) by game mode
PR was highly skewed towards 1, with the median and mean for all game modes relatively high (within 5% of a PR value of 1). The skewness of the data is a result of the structure of the game, in that before the player allocates any gold doubloons the performance ratio is already at least 0.50 even if no further doubloons are allocated. The player however is forced to allocate all their gold doubloons, improving the PR value even if they make a less than optimal decision and thus resulting in a PR value of at least 0.7 in any game.
PR To address this skewness, and since PR is a ratio, the odds of PR (i.e. 1−PR was considered instead and a logit transformation was applied as follows:
PR logit(PR) = log . 1 − PR
Hence logit(PR) represents the log odds of PR.
Testing for differences in logit(PR) was conducted in two stages, firstly a linear mixed effects model (LME) was used to test for overall differences, with game mode as the fixed effect and session (proxy for player) as the random effect. It was important to use this type of model as some participants played the game multiple times, therefore the independence of observations could not be assumed. The lme package (Pinheiro et al., 2017) in R (R Core
Team, 2017) was used for this analysis, as follows:
98 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
logit(PR) = game_modei( f ixed) + sessionj(random) + e.
Posthoc pairwise analyses were then conducted, testing for differences between each group using Tukey’s test (Tukey, 1949). A Benjamini-Hochberg adjustment (Benjamini and Hochberg, 1995) was applied in order to control for Type 1 errors.
The estimated expected responses, denoted here after by yˆ, for each game mode were back transformed to obtain the corresponding PRˆ , and the odds ratio calculated such that:
yˆ = logit(PRˆ )
eyˆ PRˆ = 1 + eyˆ and PRˆ odds(PRˆ ) = 1 − PRˆ
The following four examples demonstrate the practical interpretation of PR and odds(PR):
1.A PR of 0.8 = odds(PR) of 4 = the player won 4 times what they missed.
2.A PR of 0.6 = odds(PR) of 1.5 = the player won 1.5 times more than they missed.
3. A PR of 0.5 = odds(PR) of 1 = the player won the same as they missed. That is, they
missed 50% of the total possible reward.
4.A PR of 0.2 = odds(PR) of 0.25 = the player only won 25% of what they missed.
Power analysis
A power analysis for the lme model for game mode was conducted using the simr package in (R Core Team, 2017)
Gini coefficient by game mode
Testing for differences in the Gini coefficient was conducted in two stages (Conover and Conover, 1980). Firstly a Kruskal-Wallis test for overall differences in distributions between game modes was conducted. Based on a significant result of this test, pairwise
99 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. comparisons were conducted using a Mann-Whitney test (Dunn, 1964), which involves re- ranking the observations for each comparison, and then applying a Benjamini-Hockberg correction for multiple comparisons (Shaffer, 1995).
The Kruskal-Wallis test, also kown as the one-way ANOVA on ranks, is a non-parametric method for testing if samples originate from the same distribution (Kruskal and Wallis,
1952). Posthoc pairwise analysis using the kolmogorov-Smirnov test was considered however, due to ties present in the data the Wilcox-Mann-Whitney test was selected as it is more capable of handling ties (Siegel, 1956). A Benjamini-Hockberg adjustment was applied to these posthoc pairwise analyses in order to control for type 1 errors. Both statistical tests were conducted using the stats package in R (R Core Team, 2017).
Analysing the effect of risk profile on behaviour
For each ship the ‘risk of defeat’ (pi) was drawn randomly from an arbitrary normal distribution N(µ = 0.5, σ = 0.24) and presented to the player as a probability between 0 and 1. For analysis, pi was categorised as either high (h) or low (l), and therefore the risk profile for any game is either lll, llh, lhh or hhh. Order was not considered important to the analysis, and this subsequently influenced the sample size across the groups.
The same methods for investigating the PR and the Gini coefficient across game modes were used for evaulating differences in PR and the Gini coefficient across risk profiles and uncertainty profiles. As discussed above, a linear mixed effects model was used to investigate these differencee as this method can handle differences in sample sizes, and since an independence of observations could not be assumed (as discussed above within the game mode analysis section).
Performance ratio by risk profile
A logit transformation was applied to PR. An LME model was applied to logit(PR) and the estimated coefficients back transformed as defined in the game mode section above.
The model for investigating the influence of game mode on logit(PR) is defined as:
logit(PR)i = risk_pro f ilei( f ixed) + sessionj(random) + e
100 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Power analysis
A power analysis for the lme model for risk profile was conducted using the simr package in (R Core Team, 2017).
Gini Coefficient by risk profile
To test for differences in the Gini coefficient between risk profiles, a Kruskal-Wallis test
(Daniel, 1990) was used. Pairwise comparison was conducted using the Wilcox-Mann-
Whitney test (Conover and Iman, 1979) with a Benjamini-Hockberg adjustment (Shaffer,
1995) were used (as defined in the game mode section above).
The Kruskal-Wallis test is appropriate for samples of different sizes as is present between risk profiles.
Analysing the effect of uncertainty profile on behaviour
The uncertainty level for each ship was randomly allocated as an interval with a length of either 0.1, 0.3 or 0.6. This was added to the pi in order to calculate the upper and lower bounds which were reported to the player depending on the game mode. For analyses these inervals were categorised as either high (h), medium (m) or low (l). Therefore, for each game the uncertainty profile was any combination of l, m or h. The same methods outlined above for game mode were used in evaluating differences in PR and the Gini coefficient between uncertainty profiles.
Performance ratio by uncertainty profile
As outlined in the game mode analysis section above, a logit transformation was applied to PR, then an LME was applied to Logit(PR) and estimated coefficients back transformed.
The model for investigating the influence of uncertainty profile on logit(PR) is defined as:
logit(PR) = uncert_pro f ilei( f ixed) + sessionj(random) + e
Gini coefficient by uncertainty profile
101 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
A Kruskal-Wallis test with posthoc pairwise comparisons was conducted using a Wilcox-
Mann-Whitney test with a Benjamini-Hockberg adjustment as defined for the analysis of the Gini coefficient by game mode above.
5.5 Results
In this analysis both the performance ratio and the Gini coefficient were used to evaluate how the player allocated their gold doubloons, and if this behaviour changed across the different game modes, risk profiles or uncertainty profiles.
The section begins with descriptive statistics of the sample sizes across game mode, risk profile and unertainty profile followed by descriptive statistics of logit(PR)) and the Gini coefficient. The results are then broken into three sections which analyse the effect pf game mode, risk profile and uncertainty profile on behaviour. Within each section I look at both the logit(PR) and the Gini coeffcient.
75
50
25 Number of games played
0 0 25 50 75 100 Sessions ID
40
30
20 Number of players 10
0
1 2−4 5−9 10−14 15−19 20−24 25−29 >30 Number of games played
Figure 5.4: Number of games per session
102 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Table 5.2: Number of games per game mode
game_mode n 0 162 1 175 2 178 3 164
Table 5.3: Number of games per uncertainty profile
game_uncert n HHH 21 LHH 89 LLH 80 LLL 19 LLM 73 LMH 153 LMM 67 MHH 67 MMH 85 MMM 25
5.5.1 Descriptive statistics: Number of games, logit(PR) and Gini coefficient
Number of games
Data was collected for a total of 679 games, some players played more than one game, with 2 to 4 games the median number of games (n = 40), 23 players played just one game. Figure 5.4 show the number of games per session. Game mode was randomly allocated to each game, this the number of games played in each mode was relatively even, see Table 5.2. As described above, the number of games with each uncertainty profile and risk profile were not uniform, this was due to the order of the individual ships not being important, therefore the profiles (lll and hhh or mmm) occured less than other
Table 5.4: Number of games per risk profile
game_risk n hhh 77 lhh 236 llh 261 lll 105
103 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Table 5.5: Descriptive statistics for Gini coefficient and logit(PR)
Behaviour_measure No.games mean sd min Q1 median Q3 max Gini coefficient 679 0.689 0.281 0.000 0.500 0.667 1.000 1.000 logit(PR) 679 2.810 0.659 0.975 2.347 2.884 3.366 3.664 combinations. Samples sizes for each risk and uncertainty profile are detailed in Table 5.4 and Table 5.3 respectively.
Descriptive statistics - logit(PR)
0.10 3
0.05 logit(PR) 2 Proportion of Games
0.00
1 2 3 1 logit(PR)
Figure 5.5: logit(PR)
Table 5.5 and Figure 5.5 show descriptive statistics, histogram and box plot for logit(PR).
The logit(PR) had a mean of 2.810, standard deviation of 0.187 and a maximum of 3.664.
Over 14% of players allocated their gold doubloons in align with the maximised solution, which was to allocate all doubloons to the ship with the highest reward, regardless of risk
(logit(PR) ≥ 3.66). Players that did not use this strategy are explored further through the
Gini coefficient (see below).
Descriptive statistics - Gini coefficient
1.00 0.3
0.75
0.2
0.50 Proportion
0.1 Gini coeffcieint
0.25
0.0
0.00 0.25 0.50 0.75 1.00 Gini coefficient 0.00
Figure 5.6: Gini coefficient
104 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Table 5.6: Descriptive statistics logit(PR) by game mode
method No.games mean sd min Q1 median Q3 max total 679 2.810 0.659 0.975 2.347 2.884 3.366 3.664 interval 162 2.703 0.648 1.007 2.278 2.733 3.165 3.664 plus/minus 175 2.879 0.637 1.117 2.435 2.973 3.468 3.664 semantic 178 2.783 0.606 0.989 2.393 2.801 3.267 3.664 pont estimate 164 2.871 0.735 0.975 2.380 3.072 3.554 3.664
3 logit(PR) 2
1
Interval Plus/minus Semantic Interval Point Estimate Game mode
Figure 5.7: logit(PR) by game mode
Overall, more than 30% of players allocated all doubloons to one ship (gini coefficient =
0.66) and fewer than 5% of players spread their dubloons equally across all ships (gini coefficient = 0.0), see Table 5.5 and Figure 5.6.
Table 5.7: Logit(PR) vs game mode: Summary of post hoc analysis on linear mixed effects model (BH adjust)
Game Mode yˆ SE z value p-value p≤0.05 PRˆ odds(PRˆ ) Plus/minus - Interval 0.168 0.069 2.440 0.044 * 0.542 1.183 Semantic - Interval 0.099 0.069 1.440 0.223 0.525 1.105 Point Est. - Interval 0.200 0.700 2.850 0.027 * 0.550 1.222 Semantic - Plus/minus -0.069 0.670 -1.030 0.364 0.483 0.934 Point Est. - Plus/minus 0.032 0.690 0.459 0.646 0.508 1.033 Point Est. - Semantic 0.101 0.680 1.480 0.223 0.525 1.105
105 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
A:Game mode = Interval B:Game mode = Plus/Minus
0.20 0.20
0.15 0.15
0.10 0.10 Proportion Proportion
0.05 0.05
0.00 0.00
1 2 3 1 2 3 logit(PR) logit(PR) C:Game mode = Semantic Interval D:Game mode = Point Estimate
0.20 0.20
0.15 0.15
0.10 0.10 Proportion Proportion
0.05 0.05
0.00 0.00
1 2 3 1 2 3 logit(PR) logit(PR)
Figure 5.8: logit(PR) by game mode
5.5.2 Effect of game mode on behaviour
Performance ratio and game mode
The analysis showed a statistically significant difference in the logit(PR) between game modes (F = 3.215, p = 0.023). Posthoc analysis showed this improvement was present in the plus/minus game mode (z = 2.440, p = 0.04) and the point estimate game mode (z =
2.850, p = 0.03), compared to the interval game mode, see Table 5.7 and Figures 5.7& 5.8.
The power for this analysis was 0.76.
This equates to an 18% increase (yˆ = 0.168, PRˆ = 0.542, odds(PRˆ ) = 1.18) and 22% increase (yˆ = 0.200, PRˆ = 0.549, odds(PRˆ ) = 1.22,) in the odds(PR) when uncertainty was presented as a point estimate plus/minus error or a point estimate (without error), compared to using the numeric upper and lower bounds of the uncertainty interval. The power for this analysis was 0.76.
There was no statistically significant difference in the logit(PR) between any other game modes. Visual inspection of residual plots did not reveal any violations of the assumption
106 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Table 5.8: Descriptive statistics Gini coefficient by game mode
method No.games mean sd min Q1 median Q3 max total 679 0.689 0.281 0 0.500 0.667 1.00 1 interval 162 0.713 0.284 0 0.500 0.667 1.00 1 plus/minus 175 0.691 0.285 0 0.500 0.667 1.00 1 semantic 178 0.637 0.284 0 0.333 0.667 0.95 1 pont estimate 164 0.718 0.266 0 0.500 0.733 1.00 1 of normality, and Levene’s test for homogeneity of variance (Brown and Forsythe, 1974) did not reveal any violation of the assumption of homoscedasticity, see Appendix D.2.
Gini coefficient and game mode
1.00
0.75
0.50 Gini coefficient
0.25
0.00
Interval Plus/minus Semantic Interval Point Estimate Game Mode
Figure 5.9: Gini coefficient vs game mode
A Kruskal-Wallis test demonstrated that the Gini coefficients for each game mode were not from the same populations (chi-squared = 10.11, df = 3, p-value = 0.018). Posthoc pairwise analyses showed that this difference applied specifically to the interval compared to the semantic game modes (p.adjust = 0.042), and the semantic compared to the point estimate game modes (p.adjust = 0.042). Descriptive statistics, histograms and boxplots of the Gini coefficient across game mode can be seen in Table 5.8, Figure 5.9 and Figure 5.10.
The wider variance for the semantic game mode in Figure 5.9 shows greater variation in how players spread their doubloons across the three ships, as well as a larger number of players distributing their doubloons across the ships. This is also seen in the histograms in Figure 5.10, where it is clear that for the semantic interval game mode, more players spread their doubloons across ships (lower Gini coefficient).
It is interesting to note that no statisically significant difference was detected in the distribution of the Gini coefficient between the point estimate and point estimate ± error
107 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
A:Game mode = Interval B:Game mode = Plus/Minus
0 1
0.4 0.4
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient C:Game mode = Semantic Interval D:Game mode = Point Estimate
2 3
0.4 0.4
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient
Figure 5.10: Gini coefficient by game mode
Table 5.9: Descriptive statistics logit(PR) by risk profile
risk_profile No.games mean sd min Q1 median Q3 max hhh 77 2.737 0.843 1.055 2.094 2.794 3.664 3.664 lhh 236 2.780 0.695 0.989 2.326 2.830 3.431 3.664 llh 261 2.778 0.609 0.975 2.307 2.817 3.267 3.664 lll 105 3.010 0.498 1.610 2.700 3.103 3.396 3.664 game modes. Suggesting that the addition of the error term does not promote more or less risk averse behaviour.
5.5.3 Effect of risk profile on behaviour
As described in Section 5.4.5 above, since order of the ‘risk of defeat’ (pi) across the 3 ships within a single game was not important, the sample sizes across risk profiles were not uniform. As can be expected the llh (n = 261) and lhh (n = 236) profiles had a sample size of at least twice that of the uniform profiles hhh (n = 77) and lll (n = 105).
108 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
3
2 logti(PR)
1
0
hhh lhh llh lll Risk profile
Figure 5.11: Logit(PR) by risk profile
Table 5.10: logit(PR) - Summary of post hoc pairwise analysis of risk profiles (linear mixed effects model with BH adjustment)
Risk Profile yˆ SE z value p-value p≤0.05 PRˆ odds(PRˆ ) lhh - hhh 0.030 0.083 0.360 0.884 0.5074994 1.030455 llh - hhh 0.028 0.082 0.336 0.884 0.5069995 1.028396 lll - hhh 0.257 0.095 2.700 0.014 * 0.5638987 1.293045 llh - lhh -0.002 0.057 -0.040 0.968 0.4995000 0.998002 lll - lhh 0.227 0.740 3.080 0.006 * 0.5565076 1.254830 lll - llh 0.229 0.073 3.150 0.006 * 0.5570011 1.257342
risk profile = hhh risk profile = lhh
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0
1 2 3 4 1 2 3 4 logit(PR) logit(PR) risk profile = llh risk profile = lll
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0
1 2 3 4 1 2 3 4 logit(PR) logit(PR)
Figure 5.12: logit(PR) by risk profile
109 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Logit(PR) by risk profile
Results of the linear mixed-effects model and posthoc analysis, (methods in Section 5.4.5)
showed a statistically sinificant difference in logit(PR) between risk profiles (F = 4.053,
p = 0.0072). The power for this analysis was 0.83.
Posthoc analyses showed this improvement was present in the uniform low risk profile
(lll) games, compared to all other risk profiles (lll − llh: z = 3.150, p = 0.01; lll − lhh:
z = 3.080, p = 0.006; lll − hhh: z = 2.700, p = 0.01). Specifically for the odds(PR), in
comparison to lll: the mixed low risk profile (llh) had a 25% decrease (odds ratio = 1.257,
yˆ = 0.229 and PRˆ = 0.557); the mixed high-risk had a 25% decrease (lll − lhh: odds ratio
= 1.254, yˆ = 0.227, PRˆ = 0.557); and the uniform high-risk had a 29% decrease (lll − hhh:
odds ratio = 1.293, yˆ = 0.257, PRˆ = 0.564).
Visual inspection of residual plots did not reveal any violations of the assumption of
normality. Levene’s test for homogeneity of variance (Brown and Forsythe, 1974) did
show that this assumption was violated (p < 0.01), see Appendix D.3. This presence of
heteroscedasticity should be considered when interpreting these results.
See Table 5.9, Figure 5.11 and Figure 5.12 for descriptive statistics and Table 5.10 for
outputs of the posthoc analysis.
Inspection of the histograms and boxplots in 5.12 and 5.11 show that while the mean
logit(PR) for uniform low risk profiles (lll) is higher, the uniform high profiles (hhh)
resulted in a much larger proportion of players allocating their doubloons inline with the
maximised solution (i.e., allocating all doubloons to the ship with the largest reward).
Gini coefficient by risk profile
A Kruskal-Wallis test showed that the distributions of the Gini coefficient were statistically
significantly different between risk-profile (chi-squared = 20.75, df = 3 and p-value =
0.0001). Pairwise comparison showed these differences were between the low risk profile
(lll) and all other game modes (lll - hhh: p.adjust = 0.006, lll - lhh: p.adjust = 0.018, lll
- llh: p.adjust = 0.006). Inspection of the histograms (Figure 5.14) and boxplots (Figure
5.13) show a lower proportion of players concentrate their resources on one ship (Gini
110 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
1.00
0.75
0.50 Gini coefficient
0.25
0.00
hhh lhh llh lll Risk profile
Figure 5.13: Gini coefficient by risk profile
Risk Profile = hhh Risk Profile = lhh
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini Coefficient Gini Coefficient Risk Profile = llh Risk Profile = lll
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2 Proportion Proportion
0.1 0.1
0.0 0.0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini Coefficient Gini Coefficient
Figure 5.14: Gini coefficient by risk profile coefficient = 1) when the risk profile was uniformly low. Figure 5.13 also shows much less variation in behaviour for the low risk profile.
These results suggest that when all options have a uniformly low risk, players engage in a more risk spreading behaviour, and that players are more likely to concentrate their resources when risk across multiple options is uniformly high.
111 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
Table 5.11: Gini coefficient - Posthoc pairwise comparison by risk profile (Mann-Whitney test with BH adjust)
comparison W.statistics p.value p.adjust p≤ 0.05 hhh - lhh 3007.5 0.443 1.000 hhh - llh 4139.0 0.316 1.000 hhh - lll 5204.0 0.001 0.006 * lhh - llh 3633.5 0.951 1.000 lhh - lll 4834.5 0.003 0.018 * llh - lll 6567.5 0.001 0.006 *
5.5.4 Effect of uncertainty profile behaviour
The uncertainty level for each ship was either high (h), medium (m) or low (l), therefore for each game the uncertainty profile was any combination of l, m and/or h, for example lll, lmh, llh, mmh, etc. Since order was not important, the uniform profiles occured much less than the miuxed profiles such as lmh. logit(PR) and uncertainty profile
3 logit(PR) 2
1
HHH LHH LLH LLL LLM LMH LMM MHH MMH MMM Uncertainty profile
Figure 5.15: logit(PR) by uncertainty profile
There was no statistically significant difference in the logit(PR) between uncertainty game profiles. See Section 5.4.5 for methods.
112 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
1.00
0.75
0.50 Gini coefficient
0.25
0.00
HHH LHH LLH LLL LLM LMH LMM MHH MMH MMM Uncertainty profile
Figure 5.16: Gini coefficient by uncertainty profile
Uncertainty profile = HHH Uncertainty profile = LHH Uncertainty profile = LLH
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2 Proportion Proportion Proportion
0.1 0.1 0.1
0.0 0.0 0.0
0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = LLL) Uncertainty profile = LLM Uncertainty profile = LMH
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2 Proportion Proportion Proportion
0.1 0.1 0.1
0.0 0.0 0.0
0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = LMM Uncertainty profile = MHH Uncertainty profile = MMH
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2 Proportion Proportion Proportion
0.1 0.1 0.1
0.0 0.0 0.0
0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 Gini coefficient Gini coefficient Gini coefficient Uncertainty profile = MMM
0.5
0.4
0.3
0.2 Proportion
0.1
0.0
0.4 0.6 0.8 1.0 Gini coefficient
Figure 5.17: Gini coefficient by uncertainty profile
Gini coefficient
A Kruskal-Wallis test did not provide any evidence that the distributions of the Gini coefficient over the different uncertainty risk profiles came from different populations
(chi-squared = 9.99, df = 9, p-value = 0.35).
113 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
From the boxplots on Figure 5.16 the three uniform profiles (lll, mmm and hhh) appear to deviate most significantly. However this could also be a product of the smaller sample sizes of these groups.
5.6 Discussion
The aim of this study was to explore if the behaviour of players of an online game changed depending on the uncertainty representation method used to communicate an estimated risk. As well as investigating the effect of the uncertainty representation method on players behaviour, the analysis explored if behaviour was influenced by the level of risk and level of uncertainty associated with the estimated risk. Two measures were used to explore the players behaviour; a performance ratio (PR) and the Gini coefficient.
Behaviour is a complex and multidimensional variable and these two measures provided a one dimensional slice or simplified view of behaviour, and are not designed to be a comprehensive measure. PR measured the players ability to maximise the expected value of the game. While the Gini coefficient quantified how the player spread their resources across the three ships within the game, thus providing a measure of risk-spreading or risk-averse behaviour.
I acknowledge that PR could be used as a measure of risk-aversion, in that, maximising the potential outcome is in essence a behaviour that attempts to minimise risk. However, the
PR measure does not differentiate or provide information about how players reached the
PR score, and cannot differentiate between games that have the same score, but allocated resources across the game differently. Therefore, in this analysis I did not use PR to investigate risk behaviour, it is used only as a measure of the players ability to maximise the expected outcome.
A limitation of this study is the presence of repeated games by individual players that was not initially planned for in the study design. The repeated games of some players could have effected behaviour, however this was not investigated further. One of the challenges of investigating this is that while many participants played the game more than once, the number of games played was not uniform, and very few played sufficient games to provide multiple observations of all game modes for individual players. While this is a
114 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. consideration in the interpretation of the results, I believe that there are sufficient games with less than 4 repeats, or no repeats, for the results to not be biased by within player changes. An interesting future analysis would be to explore the effect of behaviour over repeated games.
An additional limitation of the study is non-uniform sample sizes across both risk profiles and uncertainty profiles were non-uniform. This was due to poor experimental design, and could have been addressed by instead of randomly assigning these values to the individual ships, the overall profile for the game could have been set. Despite these differences in sample sizes, power analyses demonstrated that the analyses were not underpowered. Interactions between game mode, risk profile and uncertainty profile were not more extensively investigated in this study also due to insufficient sample sizes of all groups. Future analyses could investigate these interactions further.
Game mode (uncertainty communication method)
In terms of players performance, this analysis showed that using a point estimate or a point estimate ± error to communicate an estimated risk, improved the players ability to maximise the expected value of the game, compared to when the upper and lower bounds of an uncertainty interval are used. This suggests that the point estimate provides support for maximising the expected outcome in a way that only providing an uncertainty interval does not. Thus, including the point estimate may be important in circumstances where an event is repeated and the decision-maker is interested in maximising the expected return of an event and/or maximising the long run average over multiple events.
In terms of risk behaviour (the Gini coefficient), the semantic version of the interval promoted greater resource spreading and therefore more risk-averse behaviour. There was little difference in behaviour observed between the other game modes. The difference seen in the semantic version of the game is possibly due to the linguistic uncertainty or ambiguity inherent in using semantic terms such as average, high, low, above-average, etc.
The semantic definitions of risk (average, above average, below average), possibly allows more room for subjective interpretation of these terms, explaining the higher variation in the Gini coefficient for the semantic game mode. Interestingly there was no evidence that the addition of ± error to a point estimate influenced players risk-averse behaviour.
115 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME.
These results suggest that a point estimate should be included with or without uncertainty if the aim of the decision is to maximise performance. Semantic versions of uncertainty should be used with care as they may promote more risk-averse behaviour.
Risk profile
Both the performance ratio and the Gini coefficient demonstrated that player behaviour is influenced by the level of risk when allocating resources across uncertain options. This analysis showed that uniformly high-risk options promoted more risk-taking behaviour, in which players more readily concentrate their resources on one option.
In terms of performance it is not surprising that games with a uniform risk profile perform better than those without. When all options have the same risk one variable is eliminated from the decision process, thus making the choice of allocating resources easier as players are not weighing up multiple risk vs reward combinations. It is surprising however, that behaviour was not the same between the uniformly high and low risk games. Suggesting that players use a different strategy depending on the level of risk present.
There are a range of possible explanations for this observed difference in behaviour.
Potentially, players perceive that their available resources have more impact allocated to a high-risk option compared to an option where the risk is already low. Players may also have an internal level of risk they are willing to accept, and thus they allocate resources to one ship until the risk reaches a level that aligns with this internal meter they are comfortable with, after which they turn their attention to the other ships. However, the lack of homogeneity of variance between groups means that these results should be interpreted with caution and further analyses are needed to validate this observation.
Uncertainty profile
The analysis showed no evidence that the uncertainty level was influencing players behaviour. However, small sample sizes in the uniform profiles subgroups limited the power of these analyses.
Learnings for the Australian Cancer Atlas
This research has a range of insights relevant to the design of the Australian Cancer Altas.
In particular, semantic versions of uncertainty should be avoided, as they leave room for
116 CHAPTER 5. RESEARCH ACTIVITY 2: USER STUDY - UNCERTAINTY REPRESENTATION IN AN ONLINE GAME. subjective interpretations of the uncertainty measures. In addition, there appears to be no added advantage of the inclusion of ± error when reporting a point estimate. Designers of the atlas should keep in mind that the audiences’ level of risk-averse behaviour may be influenced by the level of risk presented in the Australian Cancer Atlas. However, the behaviour seen in this research, where uniformly high-risk promotes risk-taking behaviour, is unlikely to apply in a cancer risk context, where audiences are not direcly allocating resources.
117
Chapter 6
Discussion
Buttenfield states there are three impediments to uncertainty communication (Buttenfield and Beard, 1991). They are:
1. Standardisation of terminology
2. Methods for presenting uncertainty
3. Methods for communicating uncertainty in a way that is meaningful and useful, and
that meet the users’ needs.
This thesis aimed to contribute to the broad reaching problem of how to communicate statistical uncertainty to non-expert audiences and it did this through contributing to items 2 and 3 in the list above. This problem is an important challenge for the statistical sciences, but also a pressing issue for science more broadly. In this research I conducted two research activities. Firstly, I used the Australian Cancer Atlas as a case study for uncertainty communication design, and secondly, I investigated the relationship between uncertainty representation methods and audience behaviour, exploring commonly utilised uncertainty representation methods. Within activity 1 I: conducted a grey literature review of publically available cancer maps, explored the use of Morgan and Henrion (1990)’s taxonomy of uncertainty as a practical took for diagnosing uncertainty sources, applied the
NEUVis design framework to uncertainty communication, and conducted focus groups with end-users of the Australian Cancer Atlas in order to understand their needs, current
119 CHAPTER 6. DISCUSSION behaviour and understanding of uncertainty. In the second research activity, I designed, built, implemented and analysised the results from, an online game which investigated how the different uncertainty representation methods of numeric intervals, semantic intervals, point estimates with error and point estimates without error. Players of the game had to allocate resources across a range of options with different levels of risk, reward, and uncertainty.
6.1 Uncertainty communication design
Grey literature review
42% of cancer maps identified in this review reported some measure of uncertainty in the form of: standard deviation, credible/confidence intervals, error bars, distributions and boxplots. These measures and visualisations are common forms of representing uncertainty between scientific peers, but as the focus groups in this research demonstrated, they are not always understood by the non-expert. The lack of a consistent approach to uncertainty representation highlights the lack of standardisation for including uncertainty in cancer maps.
Uncertainty representation was more prevalent in maps that contained interactivity. This is not suprising considering that Aerts, Clarke, and Keuper (2003) and Gerharz and Pebesma
(2009) have both demonstrated that interactivity can support non-experts to understand uncertainty information. This may be an intuitive response from the map designers. Inter- estingly, none of the maps identified in this review used cartograms, which are emerging as useful tools in maps that contain population information as they allow the map’s geography to be distorted to visually represent the underlying population distribution
(Nusrat and Kobourov, 2016). Cartograms could make at least one source of uncertainty within disease maps, i.e., sample size, more visible. This review highlighted the lack of consistency in including uncertainty in cancer mapping. Examples and guidelines for map developers may promote the inclusion of uncertainty in future disease mapping.
Application of the NEUVis design framework
Communicating statistical uncertainty to the non-expert audience is not a simple task,
120 CHAPTER 6. DISCUSSION and there are limited case studies in the literature to support communication cre- ators/designers in the development of material that includes uncertainty information. The process of diagnosing uncertainty sources, understanding the impact of uncertainty on the interpretation of analysis outputs, and identifying the needs of the target audience(s) can be difficult to navigate. In many cases, the communication designer may not understand the importance, and impact of, the uncertainty information, while the scientist may lack the skills to identify, and respond to, the users’ needs. Considering the users’ needs in the design process is critical as users interact differently depending on what their motiva- tions are. Joslyn and Savelli (2010) have demonstrated that users of scientific information that includes uncertainty respond differently to the same information when they were provided with a goal based decision rather than just asked to evaluate the information.
Using the ACA, this research provided a case study which explored the use of two tools to navigate the uncertainty communication design challenge. Firstly, Morgan and Henrion
(1990)’s taxonomy of uncertainty was used to diagnose uncertainty sources within the
ACA. Secondly, the NEUVis design framework was used to identify target audiences, and profile the users’ needs as well as the impact uncertainty information may have on their understanding of the presented insights (Gough et al., 2016).
The design framework was extended to consider uncertainty information and was a suc- cessful tool for the project stakeholders (communication designers, statistical modellers,
Cancer Council QLD staff) to come to a consensus about which audiences to target, sys- tematically consider/ identify the users’ needs, and evaluate how uncertainty information may influence these audiences differently. The framework provided a platform for stake- holders to navigate the communication design process in a context where no one project partner had sufficient skills to address all angles of the communication challenge. This is a consideration that Schneider and Moss (1999)‘s confidence in the information score or
Lapinski (2009)’s uncertainty visualisation design framework explicitly support. Because the NEUVis design framework focuses on the users’ needs and the impact may have on them, all stakeholders can engaged with framework. It does not take technical statistical expertise to empathise with how uncertainty will impact the interpretation of insights of another non-expert.
121 CHAPTER 6. DISCUSSION
The framework is accessible for both the design and analytics novices and experts. For the design-novice it provides an intuitive structure for breaking down the communication challenge by audience, user needs, data, insights and uncertainty information. For the design-expert it provides a familiar design framework but specifically considers the scientific, data, data analysis and uncertainty information that they may not be familiar with, but which are important to the communication challenge.
Insights from consulting the user
Consulting the user through focus groups enabled unexpected user-needs and behaviours specific to the context of the ACA to be identified.
In general, health care practitioners did not perceive uncertainty information, in a cancer mapping context, to be valuable in their clinical responsibilities. This perspective from health care workers was backed up by the general audiences who generally ignored the uncertainty information in the maps. However, although this information was ignored initially, once it was explained the general audience found it valuable and useful. This sup- ports the study from Joslyn and Savelli (2010) who suggested that it is not the uncertainty itself that is difficult for the non-expert user to understand, it is the representation method which makes it inaccessible to them. Studies in the social sciences indicate that people anticipate uncertainty (Morss et al., 2008; Lazo et al., 2009) suggesting that audiences are prepared to understand uncertainty, if they can access the information.
Error bars on existing cancer maps were often misinterpreted. Participants interpreted their position within the error bar range to be determined by lifestyle choices. That is, the lower risk end of the error range contained individuals that make positive lifestyle decisions in regards to health, diet, exercise etc, while the higher risk end contained people that did not take care of their health.
It was interesting that the focus group participants personalised the uncertainty interval in that they assumed that the interval represented a collection of individuals and that an individual’s personal position within the interval (their personal risk) was influenced by their lifestyle choices or situation, rather than the modellers ability to predict the risk. As far as I am aware, this observation has not been identified in other uncertainty research
122 CHAPTER 6. DISCUSSION explicitly. This interpretation by the non-expert may be more pertinent in health research, where the user often attempts to personalise the information available. This would be an interesting question to explore further in a non-health context to see if this interpretation of the uncertainty interval is the same.
This personalisation is also seen when focus group participants expressed a preference for risk to be presented in relative frequencies, for example if there were 10 people just like me, 1 would develop bowel cancer, rather than a 10% chance of developing bowel cancer. They felt this was more comparable with other risks in life. Much work from
David Speigelhalter has also shown non-experts find this format much easier to relate to
(Spiegelhalter, Pearson, and Short, 2011).
Diagnosing sources of uncertainty
One of the necessary steps for incorporating uncertainty into any communication output is identifying the sources of uncertainty within a research project. Within this study I attempted to achieve this systematically, using Morgan and Henrion (1990)’s taxonomy of uncertainty sources as a diagnostic tool. All stakeholders, including the statistical modellers, struggled to identify the uncertainty sources within the ACA using Morgan and Henrion (1990)’s taxonomy. There were three issues that limited the use of this taxonomy as a tool, they were: understandability of the taxonomy categories by all stakeholders, lack of technical knowledge of the project’s methods by all stakeholders, and a lack of perceived usefulness of identifying these uncertainty sources to the overall communication challenge.
The first task was a challenge for all stakeholders, even the experienced analysts who, despite their experience and training, still debated the differences between several of
Morgan and Henrion (1990)’s uncertainty sources. Allocating more time to this task would have been beneficial, allowing participants to digest the taxonomy and then apply it to the
ACA. More guidance and clear examples of each source may have also supported partici- pants to digest the taxonomy and more readily apply it. For the visualisation specialists within the group, while the taxonomy was thorough, it did not help them understand the significance of these uncertainty sources, or the impact they have on the interpretation of
123 CHAPTER 6. DISCUSSION outputs of the ACA. They could not engage with the task without a technical understand- ing of the different sources. Other attempts at turning traditional taxonomies into more useful tools (Walker et al. (2003); Knol et al. (2009)) have also struggled to support the participants in understanding the taxonomy. One investigation showed that the taxonomy defined by Walker et al. (2003), which includes nature, location and level for diagnosing uncertainties was limited by differing abilities and experience (Gillund et al. (2008); Knol et al. (2009); Skinner, Rocks, and Pollard (2016)) of those applying the taxonomy. Skinner et al. (2013) suggest structured guidance for understanding these taxonomies is essential and showed this was successful for an extension of Walker et al. (2003)’s taxonomy within environmental risk assessments.
Further to this, the non-technical stakeholders also could not apply the uncertainty sources to the ACA as this required technical understanding of the methods and data. This resulted in the non-technical stakeholders with no access point to contribute to mapping the uncertainty sources to the ACA, and also did not support their understanding of why these uncertainties were important to the project outputs. The technical nature of Morgan and Henrion (1990)’s taxonomy, while thorough and structured, was not user friendly to the non-technical team members.
Project stakeholders also found it difficult to see the usefulness of characterising all sources listed in the taxonomy, or how the mental energy required to understand the different sources warranted the return. An example is systematic bias. Firstly, time was spent debating the definition by the group, secondly bias in any project is difficult to identify and formalise, and thirdly, participants questioned the usefulness of formalising sources of bias, as they cannot be measured, reduced, and would most probably not be relevant to the audience of the ACA. Diagnosing this source of uncertainty appeared to not contribute to the overall communication design challenge. A more effective way to identify the uncertainty sources and engaged the full team would be to connect more explicitly the key insights of the project with the uncertainty sources, working backwards from the key insights, rather than forwards form the taxonomy. In this way, work would be focused on identifying uncertainty sources that influence the key research findings, rather than all uncertainty sources. A hybrid between Morgan and Henrion (1990)’s taxonomy and
124 CHAPTER 6. DISCUSSION the guidelines for assessing uncertainty developed by Schneider and Moss (1999) could be a starting point for this approach. Schneider and Moss (1999)’s guidelines start with identifying uncertainties that will most influence the final insights, but they do not provide any framework for defining the uncertainty sources in the way that Morgan and Henrion
(1990) do.
Identifying which uncertainties are present in a project is an important component of uncertainty communication design. Morgan and Henrion (1990)’s taxonomy, in its current form, was not effective as a tool for diagnosing uncertainty sources. The taxonomy was difficult to understand and further guidance was required. A successful tool in the future should connect the key uncertainty sources to the key messages of the project, and also facilitate communication and technical stakeholders to discuss the uncertainty sources and their impact on the analysis outputs.
6.2 Testing uncertainty representation methods
Semantic version of uncertainty promoted more variation in behaviour and more risk averse behaviour
The semantic version of the upper and lower bounds of an uncertainty interval promoted more risk-averse behaviour (resource spreading) on average, as well as more variation in how players allocated the resource. Ambiguity in the interpretation of the uncertainty terms ‘average’, ‘above average’ or ‘below average’ may have led to players having less confidence in which ship to select. Semantic intervals compared to text intervals have been previously shown to be more open to misinterpretation (Savelli and Joslyn, 2013). The variation seen in the results is possibly due to more room for personal interpretation of the semantic uncertainty terminology between users. Studies by Joslyn and Savelli (2010) and Morgan and Henrion (1990) support this potential for misinterpretation and have shown that users interpret semantic terminology differently. These results further justify the need for standardisation of termiology as outlined by Buttenfield and Beard (1991) above and by Schneider and Moss (1999)‘s call for consistent mapping of semantic terms such as ’average’, and ‘above average’ to numeric values within the IPCC publications.
125 CHAPTER 6. DISCUSSION
Finally, reduced feedback could also be a contributing factor to the variation seen in the semantic version. In the game, when uncertainty was presented numerically, 1 gold doubloon spent on a particular ship automatically changed the risk displayed to the player, even if it was only 1%. However, with the semantic version, sufficient gold had to be allocated to move to the next semantic category, so 1 gold doubloon may not result in feedback to the player, while 10 gold doubloons might. Further investigation of the role feedback has on behaviour in these contexts would be valuable. At the time of writing, there was no known literature on the impact of feedback on risk behaviour or in uncertainty communication research.
Point estimates ± error
The players’ ability to maximise how much gold they won (and minimise how much they missed) was higher for the point estimate and the point estimate ± error game modes compared to the numeric uncertainty interval. Interestingly, the addition of an error term to the point estimate did not appear to have an influence. The lower performance of the confidence intervals aligns with results of Sanyal et al. (2010), which demonstrated that error bars underperformed in an information seeking task, although this research explored visual rather than numeric intervals. A reduction in performance is not always a negative outcome, depending on the stakes of the risk scenario. Studies have shown that interval forecasts allow participants to more effectively distinguish situations in which precautionary action was warranted, despite the fact that their decisions have a lower performance overall (Joslyn and LeClerc, 2011), intervals may be more appropriate when caution is warranted.
Influence of risk level
This study demonstrated that how a player allocates resources across options is influenced by the risk level of the available options. In terms of performance it is not surprising that games with a uniform risk profile perform better than those with a non-uniform risk profile. When all options have the same risk one variable is eliminated from the decision process, thus making the choice of allocating resources easier as players are not weighing up multiple risk vs reward combinations, and simply target the highest potential reward.
It is interesting however, that behaviour was not the same between the uniformly high
126 CHAPTER 6. DISCUSSION and uniformly low risk games, suggesting that players use a different strategy depending on the level of risk present.
There are a range of possible explanations for this observed difference in behaviour between uniform high and low risk profiles. Players may have an internal level of risk they are willing to accept, and thus they allocate resources to one ship until the risk reaches a level that aligns with the internal meter they are comfortable with, after which they turn their attention to the other ships. The violation of the assumption of heteroscedasticity however, means these results should be interpreted with caution and further analyses are required to validate these observations.
6.3 Critique & limitations
6.3.1 Literature Review
The uncertainty literature is broad and diverse. The best attempt was made to consolidate the most important aspects of this literature. Other areas of investigation may have been valuable to include in this review, but were considered to be outside the scope of this thesis. An example is uncertainty quantification. Understanding the uncertainty quantification landscape may have had valuable insights for the process of designing uncertainty communication, as well as selection of uncertainty representation methods used in the online game.
6.3.2 Research Activity 1.A: Grey literature review of internet published cancer maps
The grey literature review of internet published cancer maps aimed to provide an overview of the current practices in cancer mapping. Considering how quickly technology moves, this review will quickly date. Particularly as programming languages that support bespoke visualisations, such as R and d3.js, become both more user friendly and more widely used.
6.3.3 Research Activity 1.B: User centred uncertainty communication design
Diagnosing uncertainty
Diagnosing uncertainty sources is a complex task. A limitation is that in the evaluation of
127 CHAPTER 6. DISCUSSION
Morgan and Henrion (1990)’s taxonomy was available time in the workshop, and also that this task was set at the end of a full day workshop. A second application of the taxonomy would have provided a more data on the taxonomy’s strengths and weaknesses.
Focus groups
Focus group participants were difficult to attract, and only one participant from the policy- maker audience participated. I would have liked to have had more participation from the policy-maker or policy-advisor audience, as they are an important audience for both the
ACA and uncertainty communication more broadly. The policy-maker is far more likely to need to make decisions informed by scientific information than a general audience.
Further to this, I believe more focused research on the non-expert decision-maker, rather than just the non-expert would lead to more nuanced research outcomes. This is not to say that the general audience should not be a target for uncertainty communication research, but their needs and motives differ to that of the decision maker.
An additional limitation of the focus groups was both their sample size and the recruitment strategy. Holding two focus groups for each target audience could limit generalisable insights. It is often difficult to dictate the conversation in a focus group. Some level of freedom is required for participants to feel that their opinion is valued and to allow their perspectives to emerge. Naturally this results in different focus groups often taking different paths and discussing different content. Repeating the focus groups for each audience more than twice could have been beneficial. In terms of recruitment, participants for these focus groups were strongly recruited through the Cancer Council Queensland contact database. This represents a biased sample in which participants are likely to have a lived experience of cancer, either personally or through friends/family. By nature, the cancer council contact database contains people with these lived experiences. Therefore, this audience may have been more emotionally motivated by the topic than a general audience that has had no experience of cancer.
6.3.4 Research Activity 2: User study - uncertainty representation in an online game
Designing, building, implementing and analysing data from the online user study was a very valuable learning experience. Uneven sample sizes of the sub-groups of interest
128 CHAPTER 6. DISCUSSION was a limiting factor in this study, and something that could have been addressed with more careful experimental design. Specifically, utilising set combinations of risk profiles and uncertainty profiles across the four different game modes would have enabled a more balanced investigation of the effects of risk profile and uncertainty profile. Defining set combinations of variables could also have enabled analysis of interaction effects between uncertainty, risk and reward. That was not possible as some sub-groups did not have sufficient sample size.
An additional limitation of this study was the repeated games played by some players.
For the players that had repeat games, their strategy and approach to the game may have changed as they played the game more times. However, there were insufficient numbers of the repeated games across all game modes to investigate the effect of behaviour over multiple games.
6.4 Future Work
Uncertainty communication is a complex challenge that will require contributions from both qualitative and quantitative domains to address. Future challenges that need to be addressed within this include:
• Specific focus on the non-expert decision maker.
• Developing and validating a practical tool that is accessible for a range of stakeholder
for diagnosing uncertainty sources in research projects.
• Further case studies to solidify the use of an extended version of the NEUVis design
method for other contexts outside of cancer mapping.
• Investigate if the differences seen between point estimate (and point estimates ±
error) and intervals extends to graphic intervals.
129 CHAPTER 6. DISCUSSION
6.5 Conclusion
In this research I have contributed to a growing body of literature on uncertainty commu- nication, making a unique contribution by comparing uncertainty representation method that are commonly used, but so far are under investigated. I have demonstrated that the use of Morgan and Henrion’s (1990) taxonomy of uncertainty is insufficient as a tool for diagnosing uncertainty in cross disciplinary teams, and provided a case study for the use of the NEUVis design framework as a valuable tool in the design of uncertainty communications for the non-expert audience, including the value of focus groups to consult end-users within this design process. The outputs of both studies culminated in an outline of design considerations and user insights that will be used by the communication designers of the Australian Cancer Atlas.
130 Appendix A
Appendix: Literature Review
A.1 Uncertainty Representation in Mapping & GISciences
Motivated by the use of the Australian Cancer Atlas as a case study in this research, the scope of this section has been limited to uncertainty representation methods for
GIScience and mapping. Building on a strong tradition of quantitatively standardizing visual variables for communication the GISciences and mapping have a made significant contributions to research in this area (MacEachren 1992, Goodchild et al., 1994, Leitner and
Buttenfield, 2002; Brodlie et al., 2012 ((Leitner2013 Aerts et al (2003a); pham and brown
2003; Li and Zhang, 2006; bostrom et al., 2007; viard et al., 2011; dong and hayes, 2012), and these insights are applicable to any visual display (McGranaghan, 1993).
Initial approaches to new uncertainty representations methods began with Bertin’s (1981) visual variables. Bertin (ref) was a well renowned cartographer and was one of the first to standardise visual variables in mapping and visualisation and suggested location, size, value, texture, color, orientation, and shape, as visual variables for encoding information.
Much research has been done to explore the effectiveness of these variables for uncertainty representation (Mathews et al., 2008; Brodlie et al. 2012; Potter et al., 2012; Zuk & Carpen- dale, 2006; Pang et al., 1997; Johnson & Sanderson, 2003; Johnson, 2004; Evans, 1997;
Wittenbrink et al., 1996; Aerts et al., 2003; Sanyal et al., 2009, Leitner & Buttendfield 2000)).
MacEachren’s (1992) and Slocum (Slocum et al., 2004) added to Bertin’s list and suggested
131 APPENDIX A. APPENDIX: LITERATURE REVIEW edge crispness (fuzziness), fill clarity, fog, resolution, transparency and saturation as spe- cific visual valuables for representing uncertainty and Gershon (1992) suggested boundary
(thickness, texture, and color), transparency, and animation. Newman and Lee (2004) evaluated techniques for the visualization of uncertainty in volumetric data comparing glyph-based techniques, such as cylinders and cones, with colormapping and transparency adjustments. They found that while each method was useful for identifying uncertainty in the scenario test, the glyph techniques were most beneficial. But this depended on the question being asked (Sanyal et al., 2010).
Despite this growing body of research, there are still contradictory results, and more work is needed to validate these methods further. Color value and texture have been repeatedly suggested for the display of data quality information. For example, darker value or finer texture should be applied to display data of higher quality in mapping, whereas lighter value or coarser texture should be used to visualize lower data quality
(Buttenfield 1991; MacEachren 1992; van der WeI et al. 1994). The same relationship was found by McGranaghan (1986) when using symbols rather than just hue. However, this relationship can be dependent on the medium used. One result testing hue suggests that on a cathode-ray tube (CRT), compared to paper, people seem to associate lighter value, not darker value with more certain information (Leitner & Buttenfield, 2013). MacEachren
(1992) explored the use of high saturated hues for data of higher quality to unsaturated (i.e., gray) hues for lower data quality. However, Robinson (1952) does not regard saturation as avery useful dimension of color, and McGranaghan (1986) did not find saturation, by itself, effective at conveying differences in magnitude
132 Appendix B
Appendix: Research Acitivity 1. A - Grey Literature Review
B.1 Search Protocol
Research Question: What cancer maps are currently available to the public on the internet and what methodologies and technologies have been used to generate them.
Aim: To summarise the breadth of cancer atlases published publicly on the internet in terms of: statistical methods used, outcome measures, inclusion of uncertainty, map interactivity features, available functions, access to data, availability of explanations or supporting material explaining methods and data sources, technology platform used to create the web product, country, the area of resolution, smoothing methods are used, date of the data used, date of publication, generated by (gov, research institution, university), academic publications associated with the map.
Pre-Scoping
Cancer Atlas Synonyms:
Cancer map, oncology map, geospatial health statistics, geospatial cancer
statistics, Health atlas, disease Atlas, health map, spatial statistics, spatial
133 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
cancer statistics geographic clustering, geographic cancer variation, geographic
variation, Geographic patterns of disease, spatial patterns, geographic disease
distribution, atlas of disease distribution, disease distribution, bayesian cancer
map, spatial epidemiology, geospatial health data, geovisuali$ation, health
geographics, Geographic maldistribution, disease distribution, thematic cancer
map .
Search Details
Search Strings
The following list details the final search strings used:
1. intitle: spatial AND epidemiology AND cancer AND map OR mapping
OR atlas -campus
2. allintitle: cancer AND map OR mapping OR atlas -campus -kinase
-kinases -concept
3. allintitle: spatial AND cancer AND statistics
4. allintitle: spatial OR geographic AND cancer AND variation OR
distribution
5. allintitle: spatial AND epidemiology AND cancer AND map OR mapping
OR atlas -campus
6. intitle: cancer AND atlas
Within these search strings, we used the context-specific terms of “allintitle” (which requires all the search terms to be in the title) and “intitle” (which requires only the first search term to be in the title and the rest anywhere in the document). Hits containing in their title “campus”, “kinase”, “kinases” and “concept” were excluded. “Kinase” and
134 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
“kinases” refer to a protein enzymes often the focus of research when investigating the biology of cancer, but not relevant to geospatial mapping of cancer incidence or survival.
“Campus” and “concept” were excluded for their obvious connection with “campus map” and “concept map” neither of which are relevant to cancer mapping.
A search testing log that outlines the testing and refinement of these search strings is detailed in table A.1 below. Search qualifiers have the following action:
• allintitle - restricts the results to those with all of the query words in the
title. For instance, [allintitle: google search] will return only documents
that have both “google” and “search” in the title. Without this limitation
all the search strings listed above return in excess of 100,000,000 hits,
many of which were irrelevant.
• intitle - restricts the results to documents containing that word in the
title. For instance, [intitle:google search] will return documents that
mention the word “google” in their title, and mention the word “search”
anywhere in the document (title or elsewhere).
• date - all searches were limited to pages published between 01/01/2010
and 01/05/2016.
Search Engine
Google was used for all searches. No other search engines were explored.
Language
Only English was used in these searches. Searching in additional languages is outside of the resources of this project. Atlases that were identified in the searches but are not published in English were still extracted.
Eligibility Criteria
Hits were selected for data extraction if they met the following criteria:
135 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
• contained a visual geographical map of cancer incidence, mortality,
survival or risk (either pdf, static image or interactive web interface).
• were accessible without a password or log in.
• were published or updated on or after the 1st of January 2010.
Table B.1: Search string development testing log
Search String Hits Date Updated Hits
1a Cancer AND map* OR Atlas 260 × 106 22/10/15 12/5/16 126
1b intitle:cancer AND map* OR atlas 109 × 104 23/10/15 12/5/16 120 ×
103
1c allintitle:cancer AND map* OR 2,620 23/10/15
atlas
1c allintitle:cancer AND map* OR 2,620 23/10/15
atlas -campus
1c allintitle:cancer AND map* OR 2,490 23/10/15
atlas -campus -kinase
1c allintitle:cancer AND map* OR 189 23/10/15
atlas -campus -kinase
1c restricted to publications after
1/1/2010
1c allintitle:cancer AND map* OR 182 23/10/15 12/5/16 31
atlas -campus -kinase -kinases
1c restricted to publications after
1/1/2010
1d allintitle:cancer AND map OR 7,160 23/10/15 12/5/16 122
mapping OR atlas -campus -kinase
-kinases
136 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Search String Hits Date Updated Hits
1d restricted to publications after 623 23/10/15
1/1/2010
1d restricted in the past 2 yrs 327 23/10/15
1d restricted to publications in the 149 23/10/15
past 12 months
1d restricted to publications in the 23 23/10/15
past month
1d restricted to publications in the 4 23/10/15
past week
1e allintitle:cancer AND map OR 6,960 29/10/15 122
mapping OR atlas -campus -kinase
-kinases -concept
1e restricted to publications after 625 29/10/15
1/1/2010
1e restricted to publications after 333 29/10/15
29/10/2013
1e restricted to publications after 155
29/10/2014 (past yr)
1e published in the last month 20 29/10/15
(29/09/2015
1f allintitle:cancer AND map OR 7,110 23/10/15
mapping OR atlas -campus -kinase
-kinases -concept
2 allintitle:Oncology AND map OR 8,250 23/10/15
mapping OR atlas -campus -kinase
-kinases -concept
3 allintitle:Spatial cancer statistics 75 23/10/15
137 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Search String Hits Date Updated Hits
4 allintitle:spatial OR geographic 1,030 30/10/15
AND cancer AND variation OR
distribution
4a restricted to pages published after 90 29/10/15
01/01/2010
5 allintitle:Bayesian AND cancer 0 23/10/15
AND Map OR atlas OR mapping
6 allintitle: thematic AND cancer 0 23/10/15
AND Map OR atlas
7 allintitle:Spatial AND 0 23/10/15
epidemiology AND cancer AND
map OR mapping
7a intitle:Spatial AND epidemiology 12,200 30/10/15
AND cancer AND map OR
mapping OR atlas -campus
7b restricted to pages published after 1,880 23/10/15
1/1/2010
8 intitle:cancer AND atlas 258,000 9/11/15
8a restricted to publications between 39,800 9/11/15
01/01/2010 to 09/11/2015
8b -genome 24,200 9/11/15
9 intitle:atlas AND cancer 207,000 11/11/15
9a restricted to publications between 19,500 11/11/15
01/01/2010 to 09/11/2015
9b -genome 18,200 11/11/15
138 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
B.2 Database of identified cancer atlases
Table B.2: Title, url and key for each cancer map identified
Key Map Title URL
1 All Ireland Cancer Atlas 1995-2007 http://www.ncri.ie/publications/ cancer-atlases
2 Breast Cancer Mortality in Canada http://www.ehatlas.ca/ light-pollution/maps/
breast-cancer-mortality
3 Globocan 2012: Estimated Cancer http://globocan.iarc.fr/Pages/
Incidence, Mortality and Prevalence Map.aspx
Worldwide in 2012
4 The Cancer Atlas http: //canceratlas.cancer.org/data/#?
view=map&metric=INCID_ALL_M
5 Global Cancer Map http://globalcancermap.com/
6 Spatio-Temporal Atlas of Mortality in http://www.geeitema.org/AtlasET/
Comunitat Valenciana index.jsp?idioma=I
7 United States Cancer Statistics: An https://nccd.cdc.gov/DCPC_INCA/
Interactive Cancer Statistics Website
8 MapNH Health http://www.mapnhhealth.org/
9 Pensylvania Cancer Atlas http://www.geovista.psu.edu/ grants/CDC/
10 NCI Geoviewer (NIH GIS Resources https:
for Cancer Research) //gis.cancer.gov/geoviewer/
11 Longer Lives Healthier Lives
12 Lung Cancer Map - Global Lung http://www.lungcancercoalition.
Cancer Coalition org/e-atlas/
139 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Key Map Title URL
13 Environmental Facilities and Cancer https://apps.health.ny.gov/
Mapping statistics/cancer/environmental_ facilities/mapping/map/
14 An Atlas of Cancer in South Australia https://www.cancersa.org.au/ assets/images/pdfs/An%20Atlas%
20of%20Cancer%20in%20South%
20Australia%20-%20Full%20Report.
15 Bowel Cancer Australia Atlas http://www.bowelcanceratlas.org/
16 Epidemiologisches krebsregister http://www.krebsregister.nrw.de/
Nordhein-Westfalen index.php?id=116
17 Helseatlas - Dagkirurgi, 2011 - 2013 http://www.helse-nord.no/
(Skulderkirurgi) helseatlas/atlas.html
18 Cancer Incidence in Switzerland http://www.nicer.org/ NicerReportFiles2015-2/EN/
report/atlas.html?&geog=0
19 Age Adjusted Invasive Cancer http:
Incidence Rate: All Sites: 2011 //mcriaweb.col.missouri.edu/IAS/
(experimental dashboard) dataviews/report?reportId=13& viewId=3&geoReportId=62&geoId=1&
geoSubsetId=
20 CINA+ Online Cancer in North http:
America //www.cancer-rates.info/naaccr/
21 The Environment and Health Atlas of http://www.envhealthatlas.co.uk/
England and Wales eha/Breast/
22 UK Cancer e-atlas NCIN
23 Map of Cancer Mortality Rates in Spain http: //elpais.com/elpais/2014/10/06/
media/1412612722_141933.html
140 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Key Map Title URL
24 The Florida Prostate Cancer Atlas http: //prostatecanceradvisorycouncil.
org/categorynews/
florida-prostate-cancer-atlas-2014/
#prettyPhoto
25 Atlas of Cancer in Queensland https: //cancerqld.org.au/research/
queensland-cancer-statistics/
queensland-cancer-atlas/
26 Atlas of Childhood Cancer in Ontario http://www.pogo.ca/ research-data/pogo-atlas/
27 Atlas of Cancer Mortality in the http:
European Union and the European //www.iarc.fr/en/publications/
Economic Area 1993-1997 pdfs-online/epi/sp159/ AtlasCancerMortalityEU-10.pdf
28 National Cancer Registry of Ireland - http://www.ncri.ie/data/maps?
Cancer Atlas field_cancers_tid_selective=61
29 Geographic Variation in Primary Liver http://www.ncin.org.uk/
and Gallbladder Cancer publications/data_briefings/ liver_and_gall_bladder
30 Cancer Atlases of UK and Ireland http://www.ons.gov.uk/ons/rel/ cancer-unit/
cancer-atlas-of-the-united-kingdom-and-ireland/
1991---2000/index.html
31 Cancer Mortality Maps (US) http://ratecalc.cancer.gov/ archivedatlas/
32 Cape Cod Breast Cancer Atlas http://silentspring.org/ cape-cod-atlas-breast-cancer-incidence
141 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Key Map Title URL
33 Arizona Cancer Rates by Community http:
Health Analysis Area (CHAA) > 2005 - //www.azdhs.gov/preparedness/
2009 public-health-statistics/ cancer-registry/chaa/index.php
142 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Table B.3: General details of each map identified in grey literature review
pub data publisher key date range publisher type country coverage
1 2011 1995 - National Cancer Gov Ireland Multiple
2007 Registry Ireland - Nations
Cancer Atlas
2 2012 2012 Social Sciences and Gov Canada Nation
Humanities
Research Council of
Canada & Canadian
Institutes of health
Research
3 2015 2012 WHO & IARC NFP Global Global
(international
Agency on Research
on Cancer)
4 2014 2000 - The American NFP & Global Global
20111 Cancer Society & Gov
International
Agency of Research
on Cancer
5 2012 2008 www.pri.org NFP Global Global
6 2008 1987 - Valencia health Gov Spain State
2006 department
7 unknown 1999 - Centers for Disease Gov US Nation
(after 2012 Control and
2012) Prevention
8 unknown 2001 - MapNH Health Gov US state
2011
1date of data depends on the cancer type
143 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
pub data publisher key date range publisher type country coverage
9 2007 1994 - Penn State Milton S. Hospital US State
2002 Hershey Medical &
Center ( Hospital Research
and teaching
hospital)
10 1/10/15 2008 - National Cancer Gov US Nation
2012 Institute (GIS
Resource Centre for
Cancer Research,
NIH
11 unknown 2012 Public Health Gov England Nation
England
12 2014 2012 Global Lung Cancer NFP Global Global
Coalition
13 2013 2005 - New York State Gov US State
2009 (Department of
Health)
14 2012 2 Cancer Council SA NFP Australia State
15 2014 2011 - Bowel Cancer NFP Australia Nation
2013 Australia
16 2013 2002 - ? not published in Gov Germany Nation
2011 English
17 unknown3 2011 - unknown4 unknown5 Norway Nation
2013
2depends on cancer type 3not published in english 4not published in english 5not published in english
144 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
pub data publisher key date range publisher type country coverage
18 2015 2008 - NICER Foundation - NFP Switzerland nation
2012 National Institute
for Cancer
Epidemiology and
Registration
19 2012 1996 - Missouri Cancer NFP US State
2011 Registry and
Research Centre
20 2015 2008 - NAACR - North Gov US Nation
2012 American
Association of
Central Cancer
Registries
21 2014 unknown Small Area Health NFP England Multiple
Statistics Unit (UK) + Wales Nations
22 2011 2008-2011 NCIN (National Gov UK
& 2009- Cancer Intelligencen
20116 Network
23 2014 2004 - elpais news and Spain
2008 media
organisation
24 post 1998 - Florida Prostate advisory/ US State
2012 2007 Cancer Advisory advocacy
Council org.
25 2011 1998 - Cancer Council NFP & Australia State
2007 Queensland Gov
organisations
6* differs by cancer type
145 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
pub data publisher key date range publisher type country coverage
26 2015 1985 - Pediatric Oncology NFP Canada State
2004 Group of Ontario
27 2008 1993- WHO - NFP/ EU Nultiple
1997 International Gov Nations
Agency for Research
on Cancer
28 unknown 1994 - National Cancer Gov Ireland Nation
2012* Registry of Irelands
29 2010 1998 - Publich Health Gov UK Nation
2006 England ( National
Cancer Intelligence
Network)
30 2000 1991 - Office of National Gov UK + Multiple
2000 * Statistics Ireland Nations
31 1999 1950 - National Institute of Gov US Nation
19947 Health & National
Cancer Institute
(US)
32 unknown 1982 - Silent Spring NFP US
1994, Institute
1995 -
2012
33 unknown 2005 - Arizona Gov US State
2009 Department of
Health Services
7differs by cancer type
146 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Table B.4: Report Measures of each map identified in grey literature review
incidence/
mortality/death/
key report measure counts/ other details
1 Incidence Ratio / SIR (standardised incidence ratio)
Incidence Relative Relative Risks (Smoothed Age
Risk Standardised incidence ratios)
2 rate per 100,000 Mortality Rate Crude Mortality Rate per 100,000
3 age adjusted Incidence Rate / Age adjusted rate incidence per
rate per 100,000 Mortality Rate/ 100,000 (incidence, mortality and
Prevalence prevalence)
4 age adjusted Incidence Rate / Sge adjusted rate per 100,000 people,
rate per 100,000 Survival Rate incidence and survival
5 New cancer Incidence Rate / Age adjusted new Cancer cases
cases per Mortality Rate annually per 100,000 (incidence and
100,000 (age mortality )
adjusted)
6 na Mortality Rate / Spatio-Temporal Standardised
Mortality smoothed Mortality Ratio and
Probability Probability (Excess Risk) * no further
details available on how measure was
calculated
7 age adjusted Incidence Rate / Incidence rate (number of new cancer
new cancer Death Rate * cases), Death rate (number of deaths
cases per /Incidence Count / due to cancer), incidence count and
100,000 Death Count death count . Rates are per 100,000
persons, per year and are age adjusted
to the 2000 U.S standard
population(19 age groups)
147 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
incidence/
mortality/death/
key report measure counts/ other details
8 age adjusted Incidence Rate Projected (2020 & 2030) age adjusted
rate per 100,000 (Projected) cancer incidence per 100,000
9 age adjusted Incidence Rate age adjusted incidence per 100,000
rate per 100,000
10 age adjusted Incidence Rate / age adjusted annual incidence rates &
rate per 100,000 Death Rate Age adjusted death rates (per 100,000)
11 na Mortality Rate Age standardised premature mortality
per 100,000. legend indicates if
mortality rates are significantly
different ot the national average - no
further details is provided about the
methods used to calculate these rates.
12 count, age Incidence Rate / counts and age standardised rates of
adjusted rate Incidence Count / lung cancer incidence and mortality
per 100,000 Mortality Rate (death from cancer.
(death from cancer)
/ Survival Rate.
13 Below or above Incidence five year cancer counts and and
expected (unknown) indication of if the rate is below
expected or above expected.
14 age adjusted Incidence Rate Age Standardised incidence rates per
rate per 100,000 100,000 people (ASR per 100,000)
15 unknown Incidence % of cases in population. Measure is
indeciferable, no titles on the graphs
or legend and units associated with
the key.
148 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
incidence/
mortality/death/
key report measure counts/ other details
16 age adjusted Incidence Rate Age adjusted Incidence rates per
rate per 100,000 100,000
17 age adjusted Incidence Rate Age adjusted Incidence rates per
rate per 100,000 100,000
18 age adjusted Incidence Rate / Age standardized incidence/mortality
rate per 100,000, Incidence Crude rate per 100,000 per year (Crude rate +
crude rate, Rate/ Incidence ASR + number fof cases)
number of cases Counts / Mortality
rate (cancer deaths)
19 age adjusted Incidence Rate / Age-Adjusted Invasive Cancer
rate Incidence Counts Incidence Rate (presumably per
100,000 persons, but this is not stated
on the map. (crude numbers also
reported)
20 age adjusted Incidence Rate / Age adjusted incidence rate per
rate per 100,000, Incidence Counts 100,000 (also reported - area
crude rate, population, cases, crude rates, age
number of adjusted rates.
cases, cases
21 age adjusted Incidence Ratio Relative Risk (Age Adjusted incidence
relative risk (Relative Risk ratio)
22 age adjusted Incidence Rate / Age standardised
rate per 100,000 Mortality Rate incidence/mortality/survival rate per
(cancer deaths) / 100,000.
Survival Rate (for
some cancers)
149 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
incidence/
mortality/death/
key report measure counts/ other details
23 na Mortality Counts / Mortality, but no methods are
Mortality Relative provided. Appears to be a Relative
Rate (Mortality Mortality Rate - legend shows
Risk) mortality as a % risk greater or lower
than the average.
24 age adjusted Incidence Rate Age adjusted incidence rate per
rate per 100,000 100,000
25 standardised Incidence Ratio / Smoothed SIR (standardised incidence
incidence ratio Mortality Rate ratio) and Smoothed RER (Relative
(SIR), relative Excess Risk) Incidence & Mortality
excess risk
(RER)
26 age adjusted Incidence Rate / Age standardised incidence rates per
rate per 100,000 Mortality Rate / 100,000 children 0 - 14yrs. + Age
Survival Rates standardised mortality rate (ASMR) +
Age adjusted surival rates
27 na Mortality Rate Age standardised mortality rate per
100,000 + RRS (relative risk standard
deviation)
28 standardise Incidence Ratio SIR (Age standardised ratios)
(age) incidence (Relative Risk) / observed incidence (counts) are also
rations (SIR), Incidence Counts provided
counts
29 age adjusted Incidence Rate Age standardised incidence rates per
rate per 100,000 100,000
150 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
incidence/
mortality/death/
key report measure counts/ other details
30 standardised Incidence Ratio / SIR - Ratio of directly
incidence ratio Mortality Ratio age-standardised rate in health
(SIR), counts authority to overall UK + Ireland
average
31 na Mortality Rate Mortality rates per 100,000 people
(areas with sparse data were not
reported)
32 standardised Incidence Rate / SIR (standardised incidence ratio) &
incidence ratio Mortality Rate SMR (standardised mortality rate)
(SIR)
33 age adjusted Incidence Rate Age Adjusted incidence rates per
rate per 100,000 100,000
151 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Table B.5: Modelling methods for each map identified in grey literature review
smoothing key co-variates modelling methods method
1 socio-economic Indirect Age Standardised adjusted BYM9
indicators for age structure of each small area
discussed in region 8
methods
2 nil Mortality rate per 100,000. Not nil
adjusted for age, only crude rates
reported.
3 nil Incidence: Sex and age specific nil
incidence rates. Mortality: varied
depending on country. Prevalence:
Sex and cancer adjusted.
4 Risk factors, actions nil
being tacken to
address cancer
across the world
5 nil Age adjusted incidence and nil
mortality rates per 100,000
6 nil no information provided no information
provided
7 age-adjusted incidence/death rates nil
8for more details see - http://www.ncri.ie/atlas/232-spatial-analysis-and-smoothing 9Besage et al. (1991)
152 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
smoothing key co-variates modelling methods method
8 Population, obesity, projected, age adjusted cancer nil
Tobacco use, births, incidence rates per 100,000.
poverty, diabetes,
alzheimers disease.
But the relationship
with cancer and
these diseases is
not explored.
9 race, age, cancer age adjusted rate per 100,000 nil
stage, time period.
10 crowding, age adjusted annual incidence rates nil
education, income, (per 100,000). data is supressed if the
insurance, counts are smaller than 16.
population,
poverty, mobility,
workforce.
11 nil Age standardised rates nil
12 mortality, incidence Age standardised incidence & nil
& survival mortality rates per 100,000 per
country.
13 environmental indirectly adjusted for age, and sex nil
facilities assuming equal risk throughout the
state of Nwe York.
14 smoking, alcohol age adjusted incidence rate per nil
consumptio, 100,000
obesity,
socio-economic
status, rurality
153 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
smoothing key co-variates modelling methods method
15 nil no information provided on nil
methods used.
16 nil Age standardised incidences rates Unknown10
(per 100,000).
17 Maybe (no in Age standardised incidences rates nil
english) (per 100,000).
18 language regions Age standardized incidence and nil
mortality rates. Mortality rates are
based on cause of death database.
19 nil age-adjusted Cancer Incidence Rates nil
20 nil Age adjusted rate per 100,000 nil
21 nil Age Adjusted rates - Bayesian BYM12
Hierarchical model11
22 nil Age standardised nil
incidences/mortality rates (per
100,000).
23 nil No methods provided No methods
provided
24 Treatment rates (by Age adjusted rate per 100,000. nil
treatment type),
ethnicity ,
Urban/non
Hospital service
areas, deprivation,
economic , rurality,
education. + others
10not in english (German) 11http://onlinelibrary.wiley.com/doi/10.1002/env.571/abstract 12Besage et al. (1991) , Fairley et al. (2008)
154 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
smoothing key co-variates modelling methods method
25 Socioeconomic Age adjusted standardised rates Incidence:
categories, rurality using bayesian Hierarchical BYM13.Survival:Poisson
modelling - With CAR prior for piecewise with
spatial smoothing BYM
components14
26 nil Age standardised incidence rates nil
per 100,000 children 0 - 14yrs. Age
standardised mortality rate
(ASMR).The Brenner method for
Survival rates, modelled after period
life tables.
27 nil Age standardised mortality rates . regional
Average annual rates per 100,000 variation:1.
population, directly age Poisson-gamma
standardised to world standard model (1
population (SICE, 1964) unstructured
random effect, no
spatial structure)15
2. Multilevel
model with 3
geographic
hierarchies (no
spatial structure)16
28 nil Age Standardised Incidence Ratio no information
provided
13Besage et al. (1991) 14Fairley et al. (2008) 151. Pennello et al. (1990) 16Similar to Langford et al (1990)
155 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
smoothing key co-variates modelling methods method
29 nil Age Standardised incidence rates nil
per 100,000
30 socioeconomic Ratio of directly age-standardised nil - no
deprivation is rate over UK and Ireland average. information
mentioned but not provided
analysed in this
report, or shown in
the map.
31 nil Age -adjusted cancer mortality rates nil
per 100,000 (binomial approximation
of the age adjusted rates were
calculated)17
32 nil Standardised Incidence Ratio No geographical
adjusted for age structure of region smoothing.
(areas with sparse data were not Smoothed
reported) temporaly
however no
details provided
33 Indian population age standardised rate per 100,000 nil
(yes/no)
17further detail of the methods can be found here - http://ratecalc.cancer.gov/archivedatlas/pdfs/ text.pdf
156 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Table B.6: Uncertainty information within each map identified in grey literature review
uncert. uncert. uncert.
key included measure visualisation notes
1 no na na na
2 no na na na
3 no na na na
4 no na na na
5 no na na na
6 no na na
7 yes CI18 CI bar, CI Confidence Interval upper and
bounds data lower bounds reported in data table.
table CIs also appear in mouse tip
function when rolling over a region.
8 no na na na
9 yes CI, box plot, CI box plot of overall data, Confidence
interquartile bar in tool tip, Interval of each estimate visualised
range, CI bounds in when hover over estimate on
variance data table, scatterplot. Upper & Lower bound
of CI reported in data table, range of
data shown on scatterplot
10 no na na na
11 no na na na
12 no na na
13 no na na na
14 no na na na
15 no na na
16 no na na
17 yes Confidence CI Bar 95% CI shown in barchart of
Interval estimates.
18Credible/Confidence Interval
157 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
uncert. uncert. uncert. key included measure visualisation notes
18 yes Credible error bar, CI 95% CI show on barchart of
Interval bounds in data estimates. Numeric upper and lower
table, bounds appear when the tool tip
hovers over the barchart.
19 yes CI, CI bar 95% Confidence Intervals shown on
interquartile barchart. Interquartile ranges shown.
range Statistically significant difference to
state average indicated. (no details
on the measure of statistical
significance)
20 yes CI nil 95% CI bounds reported in a table
alongside the map. No uncertainty
shown on map directly
21 yes Credible CI bar 95% Credible intervals on line graph
Interval of region vs relative risk (ascending).
Bayesian methods that incorporate
uncertainty
22 yes CI, statistical 1. Shows confidence intervals of
significance estimate vs region graph. 2. shows a
symbol for areas that are statistically
significantly different from the
national average (Standardised
incidence rates)
23 no na na
24 yes indication of no colour small counts = fewer than 10
small sample applied to (excluded for privacy reasons) or
size areas of small few than 25 (unstable areas.
samepl size.
158 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
uncert. uncert. uncert.
key included measure visualisation notes
25 yes Credible box plot, CI Credible Intervals shown on graph
Interval line of relative risk vs region, box plots
shown for socioeconomic and
rurality
26 yes small sample shading Shading indicates which census
size divisions have less than 30 cases
27 yes distribution, box plot, box plots, full distribution of
standard distribution, incidence data shown, + RRSD
deviation, (relative Risk Standard deviation)
interquartile
range
28 no na na
29 no na na na
30 no error bars nil not in map, but error bars are
present on supplementary graphs
31 no small sample nil areas with sparse data not reported
size
32 yes statistical nil Reported if the estimated
significance Standardised Incidence Ratio
difference from normal was
statistically signficant at a p-value
(0.05) in graph.
33 yes CI CI bar Confidence intervals shown on
barchart
159 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Table B.7: Technology platforms used to generate cancer maps identified in grey literature review
Programming language (where
key Technology platform applicable)
1 ESRI ArcMap 9.3 (Part of the ESRI na
ArcGIS Desktop suit) used to
generate pdf
2 Custom Built jpeg + javascript
3 Custom Built - D3.js d3.js + javascript
4 Custom Built javascript + CSS + google maps
API
5 Custom Built modest maps + javascript +
mapbox
6 Custom Built could not determine
7 InstantAtlas na
8 Custom Built - D3.js d3.js + javascript + GIS data
cabailities
9 GeoVista Outputs to a flash application
10 ESRI ArcMap (Part of the ESRI
ArcGIS Desktop suite)
11 Custom Built Javascript + Googlemaps api
12 Googlemaps based visualisation
13 Googlemaps based visualisation
14 InstantAtlas na
15 InstantAtlas na
16 InstantAtlas na
17 InstantAtlas na
18 InstantAtlas na
19 InstantAtlas na
20 Custom Built javascript
160 APPENDIX B. APPENDIX: RESEARCH ACITIVITY 1. A - GREY LITERATURE REVIEW
Programming language (where
key Technology platform applicable)
21 Custom Built - D3.js +leaflet Professional (javascript + d3.js +
leaflet + html)
22 InstantAtlas na
23 pdf or infographic na
24 pdf na
25 pdf na
26 pdf na
27 pdf na
28 pdf na
29 pdf na
30 pdf na
31 pdf na
32 pdf na
33 InstantAtlas na
161
Appendix C
Research Activity 1.B - User-centred de- sign for uncertainty communication
C.1 Project Partners Workshop
C.1.1 Workshop Programme
Time Activity
10:00 - 10:20 Welcome & stakeholder introductions
10:20 - 10:40 Why is communicating uncertainty an important problem?
10:40 - 11:00 Morning tea
11:00 - 11:30 Who are the audiences of the Australian Cancer Atlas and
what are their characteristics?
11:30 - 12:00 Grouped by the level of information detail they require?
12:00 - 1:00 Lunch
1:00 - 2:00 What will the Atlas report (output measure or measures)?
2:00 - 3:00 Diganosing uncertianty sources in the Australian Cancer
Atlas
3:00 - 3:45 Groups present uncertainty sources back to the total group
4:00 Close
163 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
This appendix summarises the outcomes of the Uncertainty Communication Design
Workshop that was held with the project partners of the Australian Cancer Atlas project.
Date: September 9, 2015, 1pm - 4pm
Attendees: Kerrie Mengersen, Fiona Harden, Peter Baade, Joanne Aitken, William Watson,
Tomasz Bednarz, Jessie Roberts
Topics discussed through this workshop:
1. Why is communicating uncertainty an important problem?
2. Who are the Audiences of the Australian Cancer Atlas and what are their character-
istics?
3. Can these audiences be grouped by the level of information detail they require?
4. What will the Atlas report (output measure or measures)?
5. What are the sources of uncertainty within the Atlas?
C.1.2 Discussion 1: Why is Communicating Uncertainty an Important Problem
Workshop participants were asked to consider why communicating uncertainty is an important problem in the context of: the Australian Cancer Atlas, geospatial health statistics or disease mapping, and within science communication generally. Results of this discussion are summarised below.
Science and Science Communication Generally?
• Important in evaluating the quality of scientific research and the reliability of data
driven insights.
• Essential in comparing the accuracy of research methods and comparing between
similar studies. highlighting areas of uncertainty guides future research priorities by
focusing on gaps in our current state of knowledge.
• Important in evaluating the effectiveness, reliability and performance of new tech-
nology and methodologiess.
• Supports valid interpretations of results and applications of insights to real world
settings.
164 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• Lack of uncertainty can degrade the general public’s trust in scientific insights
and degrade the reputation of science generally. When widely accepted ‘facts’ are
questions due to a new scientific discoveries, the public does not know who to trust.
Hidding uncertainty, hides the inaccuracies in all scientific discoveries and present
the scientific process to be more solid than it actually is.
• The clear communication of uncertainty would better inform the general audience
of the scientific process.
• Changes in uncertainty demonstrates if our knowledge and methods are improving
over time (is our uncertainty reducing over time).
Geospatial Data and Disease Mapping
• Creating a map of modeled disease occurrence or risk can present estimates as more
certain and accurate than they may actually be. This can be particularly misleading
and lead to suboptimal decision making.
• Data aggregation decisions can influence final model outputs. Uncertainty can be
useful tool in both evaluating which decisions lead to the most accurate results and
also can make these inaccuracies more transparent. the Australian Cancer Atlas
• The small sample sizes present in some of the regions within the Cancer Atlas can
lead to uncertain estimates. while the model outputs may be our best estimate, it is
important that decision makers understand these uncertainties.
• Provides a guide to applying these insights to policy developments. Informs deci-
sions makers regarding the accuracy and reliability of estimates.
• Uncertainty is important in applying the regional generalisations from the atlas to
individual situations.
• Uncertainties support the need for future research in cancer outcomes and can help
prioritize research projects.
165 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• Inclusion of uncertainty enhances the research output of the atlas. - tells the whole
story and communicates clearly our current state of knowledge about inequalities in
cancer incidence and survival in Australia.
• Provides examples of uncertainty communication methodologies for other Cancer
Councils.
C.1.3 Discussion 2: Who are the Audiences of the Australian Cancer Atlas and what are their characteristics?
Workshop participants identified the following eight target audiences as important to the
Australian Cancer Atlas.
1. General Audience/ General Public
2. Media
3. Gov, lobby groups and health policy makers and advisors
4. Health managers
• Regional
• Local
5. Clinicians
6. Cancer patients and their carers, family or supporters
7. Researchers
8. Other Cancer Councils and Health Reporting organisations
For each of these audiences, participants then defined the characteristics of each, in terms of:
• The key messages we want to communicate to them through the Atlas?
• What decisions or questions is each audience trying to answer when exploring the
Atlas?
• What is the skill levels of each audience (in terms of formal statistical training and
analytical skills)?
166 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• What potential risks are there when including uncertainty in the Atlas (misinterpre-
tations, disregard info, etc)?
• What potential benefits could each audience gain from including uncertainty in the
Atlas?
• What level of interest does each audience have in uncertainty?
General Audience
Key Messages:
• highlight the regions with variation in cancer incidence and survival.
• Show any relationship between cancer risk/survival and socio-demographic or
rurality variables.
Knowledge & Skills:
• Formal Statistical training: low
• Analytical skills: Low
Decisions or Questions
• How does my region compare to other regions in Australia.
• What are the reasons for areas of low or high risk?
• have I ever lived in an area of high risk?
Interest in uncertainty?
• minimal - most probably not aware of the presence of uncertainty.
Risks of including uncertainty
• Key messages could be lost in information overload.
• too complex/difficult graph to interpret . There is a risk the audience will disengage.
167 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Benefits of including uncertainty
• May calm an over-reaction to high risk regions.
Media
Key Messages
• Simple, short, graphs, infographics that are accurate and sharable.
• Where is there the greatest variation in cancer outcomes geographically. Are there
any reasons why these areas have greater variation?
“this is really important work”
“this is innovative work”
• clearly explain the uncertainty in any high risk regions. Provide examples and words
they can use to embed the uncertainty into their media messaging.
Skills
• Formal Stats training: low
• Analytical skills: low to medium
• May have some specialist training
Decisions & Questions
• Looking for a hook
• Is this newsworthy?
• Where are the highest risks?
• What is the government doing about these inequalities in cancer outcomes?
• What resources are available for people most at risk or with the highest needs?
Interest in Uncertainty
168 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• averse to uncertainty
• Confuses the hook. looking for simple, clear news stories.
• Risks of including uncertainty
• Misinterpretation or misrepresentation?
• May misinterpret uncertainty for poor quality research?
Benefits of including uncertainty
• General promotion and education about uncertainty.
• may reduce anxiety in small regions with high risk incidence. e.g. “A high risk in
Mackay doesn’t mean that everyone in Mackay will get cancer.”
Government, lobby groups, health policy makers and advisors
Key Messages
• Are the current cancer treatment, screening and support services sufficient?
• Are there inequalities and if so, where?
• Are the government programs working? Jas there been a change over time?
• How does their jurisdiction compare to others?
• What are the highest priority interventions and researcj for the future.
• So what -> how best to translate these insights into policy.
• What are the most pressing inequalities in cancer outcomes and are there any recom-
mendations for addressing these?
Skills
• Formal Stats Training: low to medium
• Analytical skills: medium to high
• Other: mostly communication and decision making skills not statistical
Decisions and Questions
169 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• What can we do to improve survival rates and reduce inequalities in cancer out-
comes?
• Can we show our current or recent health services are creating change?
Interest in uncertainty
• uncertainty can be confusing and can hinder or slow down decision making. Often
viewed as a bad thing and decision makers would generally want to see a definite
number.
• May not know how to apply the uncertainty in their current decision making frame-
work.
• Could be seen as a valuable tool if presented the right way or if they have had
sufficient training.
Risks of including uncertainty
• Information may be regarded as of poorer quality if uncertainty but of greater long
term benefit because policy will be developed for better future outcomes.
• May lead to decision paralysi.
Benefits of including uncertainty
• Better represents our current state of knowledge
• Uncertainty may help quantify how much money should be spend on a program
and when. May be very valuable in designing milestones and clarification points for
a health program. May mean policy decisions are made that embed flexibility when
the current state of knowledge contains uncertainty.
Other
• Need to ensure that the scientific evidence provided can inform the decision making
process.
• This audience will have many competing priorities.
• Likes to be able to show improvement over time.
170 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Cancer Patients/Survivors and their family, carers and friends
Key Messages
• Insights at their community.
• May be interested in additional informaiton about where can they access services,
support and information
Decisions and Questions
• What are the benefits of my different treatment options and ancillary side effects?
• What is the best treatment available to me?
• What have other people with my cancer diagnosis, and/or in my region,s done?
What services did they access? What treatment did they have?
• Is the risk of survival lower or higher in my region?
• what treatment options are available to be in my regions?
• What resources are available in my region?
• How far do I have to travel for my treatment?
• What resources are available to me in my community?
• Is there a lower than average risk of survival in my region? If so why? What can I
do about it?
• How does my community compare to other similar communities? (in the same peer
group)
• Looking for more accurate information to replace “Dr Google”
Skills and Knowledge
• Formal statistical training: overall low, but highly varied
• Analytical skills: overall low, but highly varied
• Will be looking for more accurate information to replace Dr Google.
Interest in uncertainty
171 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• How likely is my treatment to be unsuccessful/ successful
• could provide comfort for people living in high risk areas.
• May affect their life and their treatment choices.
• uncertainty about life and treatment options will lead to high anxiety.
• Risks of including uncertainty in the Atlas.
• The uncertainty may create greater physical and emotional stress for the patient and
their family. Difficulty of the unknown and not having a clear right answer.
Benefits of including uncertainty in the Atlas
• may enable more informed decisions about how they manage their treatment.
• may provide comfort if they live in an area of high risk. (for example if their family
also live in the same region).
Researchers
Key Messages
• Here are the gaps in our knowledge.
• Here is the uncertainty in our outputs.
• The methods we used for developing these disease maps are accurate and robust.
• The methods we used to communicate the uncertainty are clear and accurate.
• Our methods of communicating /representing uncertainty have been successful and
are accessible to non-expert audiences and decision makers.
• our research is awesome and our methods robust. !!!
Decisions and Questions
• What is the quality/accuracy/uncertainty of the estimates?
• Are the inferences made from the data appropriate.
• What is the current state of knowledge in this area, current best practice?
172 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• What are the gaps in the current knowledge, how can this research relate to my
research.
• Are these methods applicable to my area of research?
Skills & Knowledge
• Formal statistical training: high
• Analytical skills: high
• What does uncertainty mean to this audience?
• Highlights the quality of the research.
• Highlights where future research should focus.
• Guides the application of the scientific insights to real world practice
Interest in uncertainty
• High
Risk of including uncertainty information
• minimal
Risks of excluding uncertainty information
• excluding uncertainty information can give a false representation of our current state
of knowledge. This could result in important research problems of knowledge gaps
being missed because our knowledge us presented more certain than it is.
• Inaccuracies are missed and future research is misguided.
• missed opportunities for research and for patient outcomes.
Benefits of including uncertainty information
• Clear spotlight on future research opportunities.
• Clear support for the need of research they may be applying for funding for.
173 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Health Managers (Regional and Local)
Key Messages
• Where are the demands for services greatest?
• ‘These’ regions need to focus on these support services.
• Quantify what the needs of their region are.
• These are the services available in your region.
Decisions and Questions
• How do I budget and allocate resources to best meet the needs of residents in my
region.
• How does my region /jurisdiction compare to other regions in Australia? Bet-
ter/worse/same.
• What services are available in my region and what services should I be advocating
for?
• Are there any shortfalls in screening or support services in my regions?
• Do I need to budget any extra services to meet the needs of this group?
• Are these results what I expect? better/worse/the same?
Skills
• Formal statistical training: medium
• Analytical skills: medium
Interest in uncertainty
• low to medium
Risks of including uncertainty in communications
• Confusing or difficult to understand (time poor audience)
174 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Risks of excluding uncertainty information
• State of knowledge and information appear more accurate than they actually are.
• Recommendations and advice to patients could be represented as more solid than it
actually is.
• high or low risk in their region may be interpreted as more certain or accurate than
it actually is. Leading to over/under prescription.
Benefits of including uncertainty in communications
• Helps ensure that health strategies and spending are meeting real needs
• Optimise cashflow (reduce the risk of spending money when the estimates/insights
are not reliable)
Clinicians
(Similar to health managers)
Key Messages
• Information on the needs of the region they work in.
• Type of services available, and should be provided to this patient group.
• Which Regions have higher than average risk of cancer incidence or lower survival
• Which Regions that have higher needs or are a higher ‘disadvantage’ (due to rurality
or socio-demographic aspects).
Skills & Knowledge
• Formal statistical training: low
• Analytical skills: low
Decisions or Questions
• What services do I need to ensure are available in the region I work.
175 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
• Do I need to promote a higher rate of screening in my region?
• Do I need to promote the services that are available in my region? (e.g support
for travel, or other support, treatment options that might be impacted by travel
challenges).
• Are residents in my region facing greater challenges due to socio-economic or
gepgraphic boundaries?
Interest in Uncertainty
• low to medium - time poor.
Risk of including uncertainty information
• May overwhelm a time poor audience. They may give up on the atlas because the
uncertianty makes it difficult to digest the information quickly.
Benefits of including uncertainty information
• Can clarify the Atlas outputs and ensure that under or over treatment is not pre-
scribed due to estimates appearing more accurate or certain than they actually
are.
C.1.4 Discussion 3: Can these audiences be grouped by the level of information detail they require?
Product Details Audience
Executive Summary short clear statements of insights Media
Map + results results and uncertainty information media, general
presented in a formal accessible to a audience, cancer
non-expert patients & carers
176 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Product Details Audience
Map + numbers + includes technical estimates and cancer patients &
technical measures of uncertainty carers, clinicians,
uncertainty Health policy
advisors, health
managers
Technical report + Contains details of methods, access Researchers
data set to data, other statistical outputs
C.1.5 Discussion 4: What will the Atlas report (output measure or measures)?
What measures could be used in the Atlas? what are the pros and cons or each, and which are applicable to which audiences.
The workshop participants briefly discussed the pros and cons of: Relative Excess Risk,
Incidence, Survival and Crude Probabilities.
Outcome: The discussion highlights that further research was required into the different measures commonly used in cancer maps as well as the need for agreed upon definitions of the terms, as different participants understood the terms differently depending on their domain of expertise.
C.1.6 Discussion 5: Sources of uncertainty within the Australian Cancer Atlas?
Workshops participants were asked to focus only on scientific uncertianty and to consider that this could be present in either the data, the methods, the model or the outputs.
The following table summarises the outcome of these discussions. In general, this process highlighted the need to develop an agreed understanding of what scientific uncertainty was, where it arises from and a template to help diagnose (or layout) uncertainty sources within the Australian Cancer Atlas.
Location Uncertainty Source
Data - Estimated population of each regions (ABS)
177 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Location Uncertainty Source
- Estimated demographic breakdown of each region (ABS)
- Socio-economic status is generalised across the entire
region.
- Classification uncertainty around the cause of death.
- Classification uncertainty around indigenous identification
- Residential address does not contain any info of time at that
residence or region.
Methodologies - Smoothing algorithm
- Model prior distributions (may also be a input rather than a
method)
Model Assumptions - Residential address does not contain any info of length of
time at that residence.
Disagreements - - Spatial smoothing methods model/methods
Outputs - linguistic - Meaning of: probability, uncertainty, risk, cause, correlation, uncertainty random
178 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
C.2 NEUVis Audience Profiles
Table C.4: QI: How does this new knowledge benefit the user?
Audience General Atlas Uncertainty information
1. General Incidence Benefit - more informed More informed,
Audience decision makers. - What issues are understanding of the cancer
important in my area - Does the prevalence and survival of
information on the map resonate residences in their region.
with a personal story?- are actions
being taken that need to be to
address inequalities? Survival
Benefit - being informed.-
inequalities of health care
2. Government, - Tool for campaigning and - A greater understanding of
lobby groups, lobbying to address the needs to how to apply the information
policy makers/ regions. - Being informed of the presented to influence
advisors needs of the region. decisions. I.e. regions with
medium risk & high
uncertainty vs medium risk
and low uncertainty.
3. Health - help target health campaigns for Greater understanding - is
managers (local local area needs. their region a high priority
& regional) for action to address
inequalities and/or
understanding inequalities &
reducing uncertainties
179 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
4. Clinicians Greater understanding of the health
inequalities that may be present to
their patients, and if additional
resources are available to support
them.
5. Cancer Edification, showing they are not
patients (their alone. - does this info align with my
carers or personal story? Am i typical?
family)
6. Other Cancer
Councils and
Health
Reporting
organisations
7. Researchers - Highlights areas of greatest
uncertainties and therefore research
priorities/opps - Justifies extension
of current research. - validates
findings of previous work. -
Highlights research opportunities
that expand knowledge not only
addresses inequalities. - Supports
applications for $ to conduct
research or collect data. - tracking
uncertainty over time can show
how knowledge is improving.
180 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
8. Media - News worthy stories. - being
informed. - topical - regional stories.
- add value to other regional
personal or other stories. - advocate
for new policy. - educate regions/
audiences that could benefit from
greater support/knowledge.
181 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Table C.5: Q2: What about this data is relevant or important to the user in their context?
Audience General Atlas Uncertainty information
1. General - Incidence & Survival estimates
Audience in ‘my area’ - uncertainty -
regional borders
2. Government, - What are the high risk regions.
lobby groups,
policy makers/
advisors
- How has that - What type of action is best
risk changed over (address inequalities or
time. understand risk better)
3. Health - SIR, RER, uncertainty info,
managers (local & regional boundaries, compare
regional) ‘my region’ with socio-economic
& rurality like regions.
4. Clinicians - comparing regions to ‘like’
regions. - learning from others
5. Cancer patients - SIR, - RER, - Risk of Death
(their carers or
family)
6. Other Cancer
Councils and
Health Reporting
organisations
7. Researchers - Uncertainty info, SIR, RER,
Spatial variation, methods used
to model estimates.
182 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
8. Media Cluster, Why?, What is being
done to address this, areas of
high risk, areas of low risk
Table: Q3: What does this data show that is otherwise inaccessible for this user?
183 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
1. General All info shown in the QLD atlas currently unavailable
Audience is currently inaccessible to the
general audience.
2. Government, Same as Aud 1 + :uncertainty Same as Aud 1 +: info about
lobby groups, info, change over time why risk estimates have
policy makers/ different levels of uncertainty
advisors for different regions, sample
size info and change over
time.
3. Health Same as Audience 1 +:ability to -
managers (local & compare 2 regions.
regional)
4. Clinicians Same as Aud 3
5. Cancer patients Same as Aud 1 +:Personal stories -
(their carers or of other cancer patients or their
family) family/carers.
6. Other Cancer --
Councils and
Health Reporting
organisations
7. Researchers The raw data, sample size -
information.
8. Media As above(?) -
184 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Table C.7: Q4: What can this user access for themselves?
Audience General Atlas Uncertainty information
1. General The QLD Cancer Atlas Nil
Audience
2. Government, Same as Audience 1 +: historical -
lobby groups, census data
policy makers/
advisors
3. Health Same as Audience 1 -
managers (local &
regional)
4. Clinicians Same as Audience 1 -
5. Cancer patients Same as Audience 1 -
(their carers or
family)
6. Other Cancer Same as Audience 1 -
Councils and
Health Reporting
organisations
7. Researchers methods, uncertainty info. CIs, standard deviation,
distributions, statistical
significance (p-values)
8. Media Same as Audience 1 +: -
Interviews with researchers &
others.
185 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Table C.8: Q5: What myths/misconceptions are relevant to this data set?
Audience General Atlas Uncertainty information
1. General - High risk = I will get cancer. All regions have the same
Audience Environmental factors are level of accuracy, reliability &
causing high risk regions. All certainty.
regions have the same level of
accuracy, reliability & certainty.
all cancers have the same level of
accuracy, reliability and
uncertainty. All cancer have the
same prevalence (cannot
compare relative risk between
cancers). Correlation does not =
cause. The
knowledge/information
presented is complete. All point
estimates are created equal.
2. Government, Spatial variation is due to
lobby groups, environmental or geographical
policy makers/ factors.
advisors
3. Health Two point estimates are the same.
managers (local & All region estimates are equally
regional) reliable. All city regions are
better than rural regions for all
cancers.
4. Clinicians As above
5. Cancer patients Same as Aud 1 +: Correlation =
(their carers or cause. Statistics defines the
family) individual.
186 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
6. Other Cancer
Councils and
Health Reporting
organisations
7. Researchers na
8. Media - all point estimates are created
equal. Correlation = cause.
Regional variation is due to
some environmental or
geographical factors.
187 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Table C.9: Q6: Potential impact on Audience
Audience General Atlas Uncertainty information
1. General - High risk = I will get cancer. -
Audience Environmental factors are
causing high risk regions. all
regions have the same level of
accuracy, reliability & certainty
all cancers have the same level of
accuracy, reliability and
uncertainty. All cancer have the
same prevalence (cannot
compare relative risk between
cancers). Correlation does not =
cause. The
knowledge/information
presented is complete. All point
estimates are created equal.
2. Government, - - Discredit or degrade info.
lobby groups, choice paralysis. promote
policy makers/ further research, and support
advisors need for investing in further
research. support better
decisions. Show change in
knowledge and reliability of
information over time. choice
paralysis
188 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Audience General Atlas Uncertainty information
3. Health Adds to health managers tacit Uncert info could
managers (local & knowledge of their patients. overwhelm.
regional) Better manage local needs .
Motivate to engage in future
research (provide data etc)
4. Clinicians - tacit knowledge similar to Aud
3.
5. Cancer patients Depression, Anger, Denial,
(their carers or Advocacy, Survival, ask for
family) additional support if areas of
lower survival. Provide
additional support if in areas of
lower survival. Access screening
earlier.
6. Other Cancer
Councils and
Health Reporting
organisations
7. Researchers Explore research opportunities - -
highlight research opportunities
8. Media overwhelmed by uncert info, -
mis-interpretation of info &
inflation of myths. Advocacy for
policy and action that addresses
inequalities.
189 APPENDIX C. RESEARCH ACTIVITY 1.B - USER-CENTRED DESIGN FOR UNCERTAINTY COMMUNICATION
Table C.10: Q7: Potential for change
Audience General Atlas Uncertainty information
1. General
Audience
2. Government, - Action that address inequalities.
lobby groups, Further research to understand
policy makers/ inequalities.
advisors
3. Health
managers (local &
regional)
4. Clinicians
5. Cancer patients
(their carers or
family)
6. Other Cancer
Councils and
Health Reporting
organisations
7. Researchers Generate greater knowledge and
understanding of methods, and
inequalities.
8. Media
190 Appendix D
Appendix: Online Game
D.1 Untransformed Performance Data
## Warning in logit(performance): proportions remapped to (0.025, 0.975)
0.20 1.0
0.15
0.9 0.10 Proportion of Games 0.05
0.8
0.00 (non−transformed) ratio Performance
0.8 0.9 1.0 Performance ratio (non−transformed)
Figure D.1: Performance ratio
D.2 Logit(PR) by game mode: LME output and diagnostic plots
## Linear mixed-effects model fit by REML
## Data: game_perf
## AIC BIC logLik
## 1344.415 1371.503 -666.2073
##
## Random effects:
## Formula: ~1 | session
191 APPENDIX D. APPENDIX: ONLINE GAME
## (Intercept) Residual
## StdDev: 0.2238952 0.6161685
##
## Fixed effects: perf_logit_transform ~ game_mode
## Value Std.Error DF t-value p-value
## (Intercept) 2.6674917 0.05775239 577 46.18842 0.0000
## game_mode1 0.1683792 0.06894343 577 2.44228 0.0149
## game_mode2 0.0989419 0.06852640 577 1.44385 0.1493
## game_mode3 0.2000226 0.07026788 577 2.84657 0.0046
## Correlation:
## (Intr) gm_md1 gm_md2
## game_mode1 -0.622
## game_mode2 -0.628 0.519
## game_mode3 -0.614 0.510 0.519
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.00097516 -0.60233524 0.09753618 0.74692413 1.89738498
##
## Number of Observations: 679
## Number of Groups: 99
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 2.55406117 2.66749171 2.7809223
## game_mode1 0.03296851 0.16837920 0.3037899
## game_mode2 -0.03564971 0.09894188 0.2335335
## game_mode3 0.06201063 0.20002263 0.3380346
## attr(,"label")
192 APPENDIX D. APPENDIX: ONLINE GAME
## [1] "Fixed effects:"
##
## Random Effects:
## Level: session
## lower est. upper
## sd((Intercept)) 0.1620591 0.2238952 0.3093259
##
## Within-group standard error:
## lower est. upper
## 0.5825994 0.6161685 0.6516718
## numDF denDF F-value p-value
## (Intercept) 1 577 5412.032 <.0001
## game_mode 3 577 3.215 0.0225
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lme.formula(fixed = perf_logit_transform ~ game_mode, data = game_perf,
## random = ~1 | session)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## 1 - 0 == 0 0.16838 0.06894 2.442 0.0438 *
## 2 - 0 == 0 0.09894 0.06853 1.444 0.2232
## 3 - 0 == 0 0.20002 0.07027 2.847 0.0265 *
## 2 - 1 == 0 -0.06944 0.06743 -1.030 0.3638
## 3 - 1 == 0 0.03164 0.06893 0.459 0.6462
193 APPENDIX D. APPENDIX: ONLINE GAME
2
2 1
0 0 −1
−2 −2 Standardized residuals Standardized Quantiles of standard normal −3
−3 −2 −1 0 1 2 2.4 2.6 2.8 3.0 3.2 3.4 Standardized residuals Fitted values
Figure D.2: Diagnostic Plots of Linear Mixed Effects Model (Logit(PR) by Game Mode)
## 3 - 2 == 0 0.10108 0.06811 1.484 0.2232
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
## (Adjusted p values reported -- BH method)
## Levene’s Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.3015 0.07601 .
## 675
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
D.3 logit(PR) by risk profile - LME model output and diagnostic plots
## numDF denDF F-value p-value
## (Intercept) 1 577 5553.198 <.0001
## game_risk 3 577 4.053 0.0072
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
194 APPENDIX D. APPENDIX: ONLINE GAME
2
2 1
0
0 −1
−2 −2 Standardized residuals Standardized Quantiles of standard normal −3
−3 −2 −1 0 1 2 2.4 2.6 2.8 3.0 3.2 3.4 Standardized residuals Fitted values
Figure D.3: Diagnostic Plots of Linear Mixed Effects Model (Risk by Game Mode)
## Fit: lme.formula(fixed = perf_logit_transform ~ game_risk, data = game_perf,
## random = ~1 | session)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## lhh - hhh == 0 0.029867 0.082866 0.360 0.88413
## llh - hhh == 0 0.027574 0.082035 0.336 0.88413
## lll - hhh == 0 0.256790 0.094996 2.703 0.01374 *
## llh - lhh == 0 -0.002293 0.056632 -0.040 0.96771
## lll - lhh == 0 0.226923 0.073763 3.076 0.00629 **
## lll - llh == 0 0.229216 0.072826 3.147 0.00629 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
## (Adjusted p values reported -- BH method)
## Levene’s Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 11.827 1.475e-07 ***
## 675
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
195 APPENDIX D. APPENDIX: ONLINE GAME
# since power is calculated through simulation,
#this code will not run when this file is rendered.
#the ouput of the following code is is included as an image below.
#library(lme4)
#lme_per_lme4 <- lmer(perf_logit_transform ~ game_mode + (1|session), data=game_perf)
#summary(lme_per_lme4)
#power_game_mode <- simr::powerSim(lme_per_lme4)
#power_game_mode
# Risk profile
#lme_per_risk_lme4 <- lmer(perf_logit_transform ~ game_risk + (1|session), data=game_perf)
#summary(lme_per_risk_lme4)
#power_risk <-simr::powerSim(lme_per_risk_lme4)
#power_risk
196 APPENDIX D. APPENDIX: ONLINE GAME
197
Appendix E
Ethics Approvals
E.1 Focus Groups Recruitment Flyer and Consent Form
199 APPENDIX E. ETHICS APPROVALS
Figure E.1: Focus Group Recruitment Flyer
200 APPENDIX E. ETHICS APPROVALS
Figure E.2: Focus Group Consent Form
201 APPENDIX E. ETHICS APPROVALS
Figure E.3: Focus Group Consent Form
202 APPENDIX E. ETHICS APPROVALS
Figure E.4: Focus Group Consent Form
203 APPENDIX E. ETHICS APPROVALS
E.2 Online Game
Figure E.5: Online Game Recruitment flyer
204 APPENDIX E. ETHICS APPROVALS
Figure E.6: Online Game Consent Form
205 APPENDIX E. ETHICS APPROVALS
Figure E.7: Online Game Consent Form
206 Bibliography
Aerts, JC, KC Clarke, and AD Keuper (2003). Testing popular visualization techniques for
representing model uncertainty. Cartography and Geographic Information Science 30(3),
249–261.
Ali, M, M Emch, C Ashley, and PK Streatfield (2001). Implementation of a medical ge-
ographic information system: concepts and uses. Journal of Health, Population and
Nutrition, 100–110.
Alpert, M and H Raiffa (1982). A progress report on the training of probability assessors.
Atkinson, PM and A Graham (2006). Issues of scale and uncertainty in the global remote
sensing of disease. Advances in parasitology 62, 79–118.
Authority, GBRMP (2010). Water Quality Guidelines for the Great Barrier Reef Marine
Park.
Beale, L, JJ Abellan, S Hodgson, and L Jarup (2008). Methodologic issues and approaches
to spatial epidemiology. Environmental health perspectives 116(8), 1105–10.
Bedford, T and R Cooke (n.d.). Probabilistic Risk Analysis: Foundations and Methods ().
Begg, SH and M Welsh (2014). Uncertainty vs . Variability : What ’ s the Difference and
Why is it Important ? (May), 19–20.
Belia, S, F Fidler, J Williams, and G Cumming (2005). Researchers misunderstand confi-
dence intervals and standard error bars. Psychological methods 10(4), 389.
Bellu, LG and P Liberati (2006). Inequality Analysis The Gini Index.
Benjamini, Y and Y Hochberg (1995). Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the royal statistical society. Series B
(Methodological), 289–300.
Bertin, J (1981). Graphics and graphic information processing. Walter de Gruyter.
207 BIBLIOGRAPHY
Besag, J, J York, and A Mollie (1991). Bayesian image restoration, with two applications in
spatial statistics. Annals of the Institute of Statistical Mathematics 43(1).
Blenkinsop, S, P Fisher, L Bastin, and J Wood (2000). Evaluating the perception of uncer-
tainty in alternative visualization strategies. Cartographica: The International Journal for
Geographic Information and Geovisualization 37(1), 1–14.
Bobkoff, D (2015). Seed by seed, acre by acre, big data is taking over the farm. Business
Insider 15.
Box, GE (1979). “Robustness in the strategy of scientific model building”. In: Robustness in
statistics. Elsevier, pp.201–236.
Brodlie, K, RA Osorio, and A Lopes (2012). Expanding the Frontiers of Visual Analytics
and Visualization. Ed. by J Dill, R Earnshaw, D Kasik, J Vince, and PC Wong, 81–109.
Brown, MB and AB Forsythe (1974). Robust tests for the equality of variances. Journal of
the American Statistical Association 69(346), 364–367.
Brown, R (2004). Animated visual vibrations as an uncertainty visualisation technique.
In: Proceedings of the 2nd international conference on Computer graphics and interactive
techniques in Austalasia and Southe East Asia - GRAPHITE ’04. Vol. 1. 212. ACM Press,
pp.84–89.
Brugnach, M, A Dewulf, C Pahl-Wostl, and T Taillieu (2008). Toward a relational concept
of uncertainty: about knowing too little, knowing too differently, and accepting not to
know. Ecology and society 13(2).
Burgman, M (2005). Risks and decisions for conservation and environmental management.
Cambridge University Press.
Buttenfield, BP (1993). Representing data quality. Cartographica: The International Journal
for Geographic Information and Geovisualization 30(2-3), 1–7.
Buttenfield, BP and MK Beard (1991). This paper represents part of Research Initiative# 7,"
Visualizing the Quality of Spatial Information", of the National Center for Geographic
Information and Analysis, supported by flagrant from the National Science Foundation
(SES-88-10917); support by NSF is gratefully acknowledged.
Carroll, LN, AP Au, LT Detwiler, T chieh Fu, IS Painter, and NF Abernethy (2014). Visual-
ization and analytics tools for infectious disease epidemiology: A systematic review.
Journal of Biomedical Informatics 51, 287–298.
208 BIBLIOGRAPHY
Choi, M, B Afzal, and B Sattler (2006). Geographic information systems: a new tool for
environmental health assessments. Public Health Nursing 23(5), 381–391.
Cleveland, WS and R McGill (1984). Graphical perception: Theory, experimentation, and
application to the development of graphical methods. Journal of the American statistical
association 79(387), 531–554.
Cliburn, DC, JJ Feddema, JR Miller, and TA Slocum (2002). Design and evaluation of a
decision support system in a water balance application. Computers & Graphics 26(6),
931–949.
Cohen, MS, JT Freeman, and S Wolf (1996). Metarecognition in time-stressed decision
making: Recognizing, critiquing, and correcting. Human Factors 38(2), 206–219.
Conover, WJ and WJ Conover (1980). Practical nonparametric statistics.
Conover, WJ and RL Iman (1979). On multiple-comparisons procedures. Los Alamos Sci.
Lab. Tech. Rep. LA-7677-MS, 1–14.
Couclelis, H (2003). The certainty of uncertainty: GIS and the limits of geographic knowl-
edge. Transactions in GIS 7(2), 165–175.
Cummings, G, F Fidler, and DL Vaux (2007). Error bars in experimental biology. 177(1),
7–11.
Daniel, WW (1990). Kruskal–Wallis one-way analysis of variance by ranks. Applied non-
parametric statistics, 226–234.
Davis, TJ and CP Keller (1997). Modelling and visualizing multiple spatial uncertainties.
Computers & Geosciences 23(4), 397–408.
Deitrick, S and R Edsall (2008). Making uncertainty usable: Approaches for visualizing
uncertainty information. Geographic Visualization: Concepts, Tools and Applications, 277–
291.
Deterding, S, D Dixon, R Khaled, and L Nacke (2011). From game design elements to
gamefulness. In: Proceedings of the 15th International Academic MindTrek Conference on
Envisioning Future Media Environments - MindTrek ’11. New York, New York, USA: ACM
Press, pp.9–11. eprint: 11/09 (ACM 978-1-4503-0816-8).
DiBiase, D, AM MacEachren, JB Krygier, and C Reeves (1992). Animation and the role of
map design in scientific visualization. Cartography and geographic information systems
19(4), 201–214.
209 BIBLIOGRAPHY
Du, N, DV Budescu, MK Shelly, and TC Omer (2011). The appeal of vague financial
forecasts. Organizational Behavior and Human Decision Processes 114(2), 179–189.
Dunn, OJ (1964). Multiple comparisons using rank sums. Technometrics 6(3), 241–252.
Eaton, C, C Plaisant, and T Drizd (2005). Visualizing missing data: classification and em-
pirical study. In: IFIP International Conference on Human-Computer Interaction: September
12-16 2005; Rome, Italy. Springer, pp.861–872.
Eberhardt, L (1987). Population projections from simple models. Journal of Applied Ecology,
103–118.
Eiser, JR, A Bostrom, I Burton, DM Johnston, J McClure, D Paton, J Van Der Pligt, and MP
White (2012). Risk interpretation and action: A conceptual framework for responses to
natural hazards. International Journal of Disaster Risk Reduction 1, 5–16.
Elliott, P and D Wartenberg (2004). Spatial epidemiology: current approaches and future
challenges. Environmental health perspectives 112(9), 998.
Engler, R, A Guisan, and L Rechsteiner (2004). An improved approach for predicting the
distribution of rare and endangered species from occurrence and pseudo-absence data.
Journal of applied ecology 41(2), 263–274.
Evans, BJ (1997). Dynamic display of spatial data-reliability: Does it benefit the map user?
Computers & Geosciences 23(4), 409–422.
Fairley, L, D Forman, R West, and S Manda (2008). Spatial variation in prostate cancer
survival in the Northern and Yorkshire region of England using Bayesian relative
survival smoothing. British journal of cancer 99(11), 1786.
Farrell, S (2017). Group Notetaking for User Research. https://www.nngroup.com/articles/
group-notetaking/ (visited on 07/17/2017).
Fauerbach, E, R Edsall, D Barnes, and A MacEachren (1996). Visualization of uncertainty in
meteorological forecast models. In: Proceedings of the International Symposium on Spatial
Data Handling, Delft, The Netherlands, pp.465–76.
Fischhoff, B (2012). Communicating uncertainty fulfilling the duty to inform. Issues in
Science and Technology 28(4), 63–70.
Fisher, P (1994). Animation and sound for the visualization of uncertain spatial information.
Wiley Chichester„ UK.
210 BIBLIOGRAPHY
Frewer, L and B Salter (2002). Public attitudes, scientific advice and the politics of regula-
tory policy: the case of BSE. Science and public policy 29(2), 137–145.
Funtowicz, SO and JR Ravetz (1990). Uncertainty and quality in science for policy. Vol. 15.
Springer Science & Business Media.
Gannett, H (1903). Statistical atlas. United States Census Office.
Garnelo, L, LC Brandão, and A Levino (2005). Dimensions and potentialities of the geo-
graphic information system on indigenous health. Revista de saude publica 39(4), 634–
640.
Gavankar, S, S Anderson, and AA Keller (2015). Critical Components of Uncertainty Com-
munication in Life Cycle Assessments of Emerging Technologies. Journal of Industrial
Ecology 19(3), 468–479.
Gerharz, LE and EJ Pebesma (2009). Usability of interactive and non-interactive visualisa-
tion of uncertain geospatial information. Geoinformatik 4, 223–230.
Gesteland, PH, Y Livnat, N Galli, MH Samore, and AV Gundlapalli (2012). The EpiCanvas
infectious disease weather map: an interactive visual exploration of temporal and
spatial correlations. Journal of the American Medical Informatics Association 19(6), 954–
959.
Gigerenzer, G and U Hoffrage (1995). How to improve Bayesian reasoning without in-
struction: frequency formats. Psychological review 102(4), 684.
Giles, J (2002). When doubt is a sure thing. Nature 418(6897), 476–478.
Gillund, F, KA Kjølberg, MK von Krauss, and AI Myhr (2008). Do uncertainty analyses
reveal uncertainties? Using the introduction of DNA vaccines to aquaculture as a case.
Science of the total environment 407(1), 185–196.
Gini, C (1912). Variabilità e Mutuabilità. Contributo allo Studio delle Distribuzioni e delle
Relazioni Statistiche.
Gough, P, T Bednarz, C de Bérigny, and J Roberts (2016). A process for non-expert user
visualization design. In: Proceedings of the 28th Australian Conference on Computer-Human
Interaction. ACM, pp.247–251.
Gough, P, CDB Wall, and T Bednarz (2014). Affective and effective visualisation: commu-
nicating science to non-expert users. In: Visualization Symposium (PacificVis), 2014 IEEE
Pacific. IEEE, pp.335–339.
211 BIBLIOGRAPHY
Green, LW, L Richard, and L Potvin (1996). Ecological foundations of health promotion.
American Journal of Health Promotion 10(4), 270–281.
Gregorio, JD and JW Lee (2002). Education and income inequality: new evidence from
cross-country data. Review of income and wealth 48(3), 395–416.
Grubler, A, Y Ermoliev, and A Kryazhimskiy (2015). Coping with uncertainties-examples
of modeling approaches at IIASA. Technological Forecasting and Social Change 98, 213–
222.
Gurobi Optimization, I (2016). Gurobi Optimizer Reference Manual. http://www.gurobi.com.
Han, PKJ, WMP Klein, T Lehman, B Killam, H Massett, and AN Freedman (2011). Com-
munication of Uncertainty Regarding Indi- vidualized Cancer Risk Estimates: Effects
and Influential Factors. Med Decis Making 31, 354–366.
Harbage, B and AG Dean (1999). Distribution of epi info software: an evaluation using the
Internet. American journal of preventive medicine 16(4), 314–317.
Harrower, M and NP Street (2003). Representing Uncertainty : Does it Help People Make
Better Decisions ? (1).
Hertwig, R and G Gigerenzer (1999). The’conjunction fallacy’revisited: How intelligent
inferences look like reasoning errors. Journal of behavioral decision making 12(4), 275.
Hope, S and G Hunter (2007). Testing the effects of positional uncertainty on spatial
decision-making. International Journal of Geographical Information Science 21(6), 645–665.
Hora, SC (1996). Aleatory and epistemic uncertainty in probability elicitation with an
example from hazardous waste management. Reliability Engineering & System Safety
54(2-3), 217–223.
Hu, PJH, D Zeng, H Chen, C Larson, W Chang, C Tseng, and J Ma (2007). System for
infectious disease information sharing and analysis: design and evaluation. IEEE
Transactions on information technology in biomedicine 11(4), 483–492.
Hughes, C (1989). The representation of uncertainty in medical expert systems. Medical
Informatics 14(4), 269–279.
Hunter, GJ and MF Goodchild (1996). Communicating uncertainty in spatial databases.
Transactions in GIS 1(1), 13–24.
Johnson, C (2004). Top scientific visualization research problems. IEEE Computer Graphics
and Applications 24(4), 13–17.
212 BIBLIOGRAPHY
Johnson, CR and AR Sanderson (2003). A next step: Visualizing errors and uncertainty.
IEEE Computer Graphics and Applications 23(5), 6–10.
Jonassen, DH (2012). Designing for decision making. Educational technology research and
development 60(2), 341–359.
Joslyn, SL and JE LeClerc (2011). Uncertainty forecasts improve weather-related decisions
and attenuate the effects of forecast error. Journal of experimental psychology: applied
18(1), 126.
Joslyn, SL and RM Nichols (2009). Probability or frequency? Expressing forecast uncer-
tainty in public weather forecasts. Meteorological Applications 16(3), 309–314.
Joslyn, S and S Savelli (2010). Communicating forecast uncertainty: Public perception of
weather forecast uncertainty. Meteorological Applications 17(2), 180–195.
Kahneman, D and a Tversky (1982). Variants of uncertainty. Cognition 11(2), 143–157.
Karlsson, D, J Ekberg, A Spreco, H Eriksson, and T Timpka (2013). Visualization of infec-
tious disease outbreaks in routine practice. In: MedInfo, pp.697–701.
Kinkeldey, C, AM MacEachren, and J Schiewe (2014). How to assess visual communication
of uncertainty? A systematic review of geospatial uncertainty visualisation user studies.
The Cartographic Journal 51(4), 372–386.
Kitzinger, J (1995). Qualitative Research: Introducing focus groups. BMJ 311(7000), 299–
302.
Knol, AB, AC Petersen, JP Van der Sluijs, and E Lebret (2009). Dealing with uncertainties
in environmental burden of disease assessment. Environmental Health 8(1), 21.
Koenig, A, E Samarasundera, and T Cheng (2011). Interactive map communication: Pilot
study of the visual perceptions and preferences of public health practitioners. Public
Health 125(8), 554–560.
Kothari, A, SM Driedger, J Bickford, J Morrison, M Sawada, ID Graham, and E Crighton
(2008). Mapping as a knowledge translation tool for Ontario Early Years Centres: views
from data analysts and managers. Implementation Science 3(1), 4.
Kraemer, MU, SI Hay, DM Pigott, DL Smith, GW Wint, and N Golding (2016). Progress
and challenges in infectious disease cartography. Trends in parasitology 32(1), 19–29.
213 BIBLIOGRAPHY
Kruskal, WH and WA Wallis (1952). Use of Ranks in One-Criterion Variance Analysis.
Journal of the American Statistical Association 47(260), 583–621. eprint: https://www.
tandfonline.com/doi/pdf/10.1080/01621459.1952.10483441.
Krygier, JB (1994). “Sound and geographic visualization”. In: Modern cartography series.
Vol. 2. Elsevier, pp.149–166.
Kujala, H, MA Burgman, and A Moilanen (2013). Treatment of uncertainty in conservation
under climate change. Conservation Letters 6(2), 73–85.
Lai, PC, FM So, and KW Chan (2008). Spatial epidemiological approaches in disease mapping
and analysis. CRC press.
Landesberger, T von, S Bremm, and M Wunderlich (2017). Typology of Uncertainty in
Static Geolocated Graphs for Visualization. IEEE computer graphics and applications
37(5), 18–27.
Langford, M, D Unwin, and D Maguire (1990). Generating improved population density
maps in an integrated GIS. In: EGIS’90: Proceedings of the First European Conference
on Geographical Information Systems, EGIS Foundation, Utrecht, The Netherlands. Vol. 2,
pp.651–660.
Lapinski, ALS (2009). A strategy for uncertainty visualization design. Tech. rep. Defence
Research and Development Atlantic Dartmouth (Canada).
Lazo, JK, RE Morss, and JL Demuth (2009). 300 billion served: Sources, perceptions, uses,
and values of weather forecasts. Bulletin of the American Meteorological Society 90(6),
785–798.
Leitner, M and BP Buttenfield (2000). Guidelines for the display of attribute certainty.
Cartography and Geographic Information Science 27(1), 3–14.
Leitner, M and BP Buttenfield (2013). Cartography and Geographic Information Science
Guidelines for the Display of Attribute Certainty Guidelines for the Display of At-
tribute Certainty.
Lempert, RJ, SW Popper, and SC Bankes (2003). Shaping the next one hundred years: new
methods for quantitative, long-term policy analysis. Rand, Santa Monica, CA.
Lipshitz, R and O Strauss (1997). Coping with Uncertainty: A Naturalistic Decision-Making
Analysis. Organizational Behavior and Human Decision Processes 69(2), 149–163.
214 BIBLIOGRAPHY
MacEachren, AM (1992). Visualizing uncertain information. Cartographic Perspectives (13),
10–19.
Manski, CF (2014). Communicating uncertainty in official economic statistics. Tech. rep. Na-
tional Bureau of Economic Research.
Manski, CF (2015). Communicating uncertainty in official economic statistics: an appraisal
fifty years after Morgenstern. Journal of Economic Literature 53(3), 631–53.
McGranaghan, M (1993). A cartographic view of spatial data quality. Cartographica: The
International Journal for Geographic Information and Geovisualization 30(2-3), 8–19.
Miles, S and LJ Frewer (2003). Public perception of scientific uncertainty in relation to food
hazards. Journal of risk research 6(3), 267–283.
Mistry, PK and JS Trueblood (2017). An Investigation of Factors that Influence Resource
Allocation Decisions.
Morgan, MG and M Henrion (1990). Uncertainty: a Guide to dealing with uncertainty in
quantitative risk and policy analysis Cambridge University Press. New York, New York,
USA.
Morgenstern, O et al. (1963). On the accuracy of economic observations. Princeton University
Press.
Morss, RE, JL Demuth, and JK Lazo (2008). Communicating uncertainty in weather
forecasts: A survey of the US public. Weather and forecasting 23(5), 974–991.
Mullner, RM, K Chung, KG Croke, and EK Mensah (2004). Introduction: geographic informa-
tion systems in public health and medicine.
Munzner, T (2009). Visualization.
Munzner, T (2014). Visualization analysis and design. CRC press.
Myers, MF, D Rogers, J Cox, A Flahault, and S Hay (2000). “Forecasting disease risk for
increased epidemic preparedness in public health”. In: Advances in Parasitology. Vol. 47.
Elsevier, pp.309–330.
Newman, TS and W Lee (2004). On visualizing uncertainty in volumetric data: techniques
and their evaluation. Journal of Visual Languages & Computing 15(6), 463–491.
Nusrat, S and S Kobourov (2016). The state of the art in cartograms. In: Computer Graphics
Forum. Vol. 35. 3. Wiley Online Library, pp.619–642.
215 BIBLIOGRAPHY
Nykiforuk, CI and LM Flaman (2011). Geographic information systems (GIS) for health
promotion and public health: a review. Health promotion practice 12(1), 63–73.
O’Hagan, A (2012). Probabilistic uncertainty specification: Overview, elaboration tech-
niques and their application to a mechanistic model of carbon flux. Environmental
Modelling & Software 36, 35–48.
Olsen, SF, M Martuzzi, and P Elliott (1996). Cluster analysis and disease mapping–why,
when, and how? A step by step guide. BMJ: British Medical Journal 313(7061), 863.
Ord, JK (2010). “Spatial Autocorrelation: A Statistician’s Reflections”. In: Perspectives on
Spatial Data Analysis. Springer, pp.165–180.
Palmer, TN and PJ Hardaker (2011). Handling uncertainty in science.
Pang, AT, CM Wittenbrink, and SK Lodha (1997). Approaches to uncertainty visualization.
The Visual Computer 13(8), 370–390.
Pawson, R, G Wong, and L Owen (2011). Known knowns, known unknowns, unknown
unknowns: the predicament of evidence-based policy. American Journal of Evaluation
32(4), 518–546.
Pennello, GA, SS Devesa, and MH Gail (1999). Using a mixed effects model to estimate
geographic variation in cancer rates. Biometrics 55(3), 774–781.
Pinheiro, J, D Bates, S DebRoy, D Sarkar, and R Core Team (2017). Linear and Nonlinear
Mixed Effects Models. R package version 3.1-131. https://CRAN.R- project.org/
package=nlme.
Politi, MC, PK Han, and NF Col (2007). Communicating the uncertainty of harms and
benefits of medical interventions. Medical Decision Making 27(5), 681–695.
Potter, K, M Kirby, D Xiu, and CR Johnson (2012). Interactive visualization of probability
and cumulative density functions. International journal for uncertainty quantification 2(4).
R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing. Vienna, Austria. https://www.R-project.org/.
Ramirez, AJ, AC Jensen, and BH Cheng (2012). A taxonomy of uncertainty for dynami-
cally adaptive systems. In: Proceedings of the 7th International Symposium on Software
Engineering for Adaptive and Self-Managing Systems. IEEE Press, pp.99–108.
Regan, HM, M Colyvan, and Ma Burgman (2002). A taxonomy and treatment of uncer-
tainty for ecology and conservation biology. Ecological Applications 12(2), 618–628.
216 BIBLIOGRAPHY
Ristovski, G, T Preusser, HK Hahn, and L Linsen (2014). Uncertainty in medical visualiza-
tion: Towards a taxonomy. Computers & Graphics 39, 60–73.
Robinson, AC (2009). Needs assessment for the design of information synthesis visual
analytics tools. In: Information Visualisation, 2009 13th International Conference. IEEE,
pp.353–360.
Robinson, AC, AM MacEachren, and RE Roth (2011). Designing a web-based learning
portal for geographic visualization and analysis in public health. Health informatics
journal 17(3), 191–208.
Robinson, T (2000). Spatial statistics and geographical information systems in epidemiol-
ogy and public health. Advances in Parasitology 47, 81–128.
Rosa Dias, P (2009). Inequality of opportunity in health: evidence from a UK cohort study.
Health Economics 18(9), 1057–1074.
Sabesan, S and K Raju (2005). GIS for rural health and sustainable development in India,
with special reference to vector-borne diseases. Current Science 88(11), 1749–1752.
Sanyal, J, Song Zhang, J Dyer, A Mercer, P Amburn, and RJ Moorhead (2010). Noodles:
A Tool for Visualization of Numerical Weather Model Ensemble Uncertainty. IEEE
Transactions on Visualization and Computer Graphics 16(6), 1421–1430.
Savelli, S and S Joslyn (2013). The advantages of predictive interval forecasts for non-expert
users and the impact of visualizations. Applied Cognitive Psychology 27(4), 527–541.
Scheufele, DA and BV Lewenstein (2005). The public and nanotechnology: How citizens
make sense of emerging technologies. Journal of Nanoparticle Research 7(6), 659–667.
Schneider, SH and R Moss (1999). Uncertainties in the IPCC TAR: Recommendations to
lead authors for more consistent assessment and reporting. Unpublished document.
Schneiderman, B, C Plaisant, and B Hesse (2013). Improving health and healthcare with
interactive visualization methods. HCIL Technical Report 1, 1–13.
Schrage, M (2016). How the big data explosion has changed decision making. Harvard
Business Review.
Scupin, R (1997). The KJ method: A technique for analyzing data derived from Japanese
ethnology. Human organization 56(2), 233–237.
Shaffer, JP (1995). Multiple hypothesis testing. English. Annual Review of Psychology 46,
561.
217 BIBLIOGRAPHY
Siegel, S (1956). Nonparametric statistics for the behavioral sciences, New York, 1956.
Google Scholar.
Skinner, DJ, Sa Rocks, SJ Pollard, and GH Drew (2013). Identifying Uncertainty in Environ-
mental Risk Assessments: The Development of a Novel Typology and Its Implications
for Risk Characterisation. Human and Ecological Risk Assessment: An International Journal
7039(November), 130301143601004.
Skinner, DJ, SA Rocks, and SJ Pollard (2016). Where do uncertainties reside within envi-
ronmental risk assessments? Expert opinion on uncertainty distributions for pesticide
risks to surface water organisms. Science of the Total Environment 572, 23–33.
Slocum, TA, RM McMaster, FC Kessler, HH Howard, and RB Mc Master (2008). Thematic
cartography and geographic visualization.
Soll, JB and J Klayman (2004). Overconfidence in interval estimates. Journal of Experimental
Psychology: Learning, Memory, and Cognition 30(2), 299.
Spiegelhalter, D, M Pearson, and I Short (2011). Visualizing uncertainty about the future.
science 333(6048), 1393–1400.
Tatem, AJ, N Campiz, PW Gething, RW Snow, and C Linard (2011). The effects of spatial
population dataset choice on estimates of population at risk of disease. Population
Health Metrics 9(1), 4.
Thomson, J, E Hetzler, A MacEachren, M Gahegan, and M Pavel (2005). A typology for
visualizing uncertainty. In: ed. by RF Erbacher, JC Roberts, MT Grohn, and K Borner.
International Society for Optics and Photonics, pp.146.
Thunnissen, DP (2003). Uncertainty classification for the design and development of
complex systems. In: 3rd annual predictive methods conference, pp.1–16.
Tufte, ER (1983). The Visual Display of. Quantitative Information.
Tufte, ER and D Robins (1997). Visual explanations. Graphics Cheshire, CT.
Tukey, JW (1949). Comparing individual means in the analysis of variance. Biometrics,
99–114.
Uusitalo, L, A Lehikoinen, I Helle, and K Myrberg (2015). An overview of methods
to evaluate uncertainty of deterministic models in decision support. Environmental
Modelling & Software 63, 24–31.
218 BIBLIOGRAPHY
Van der Wel, FJ, RM Hootsmans, and F Ormeling (1994). “Visualization of data quality”.
In: Modern Cartography Series. Vol. 2. Elsevier, pp.313–331.
Walker, W, P Harremoës, J Rotmans, J van der Sluijs, M van Asselt, P Janssen, and M
Krayer von Krauss (2003). Defining Uncertainty: A Conceptual Basis for Uncertainty
Management in Model-Based Decision Support. Integrated Assessment 4(1), 5–17.
Wernerfelt, B and A Karnani (1987). RESEARCH NOTES AND COMMUNICATIONS
COMPETITIVE STRATEGY UNDER UNCERTAINTY. English. Strategic Management
Journal (1986-1998) 8(2). Copyright - Copyright Wiley Periodicals Inc. Mar/Apr 1987;
Last updated - 2011-08-09; CODEN - SMAJD8, 187.
Wolfert, S, L Ge, C Verdouw, and MJ Bogaardt (2017). Big data in smart farming–a review.
Agricultural Systems 153, 69–80.
Yaniv, I and DP Foster (1995). Graininess of judgment under uncertainty: An accuracy-
informativeness trade-off. Journal of Experimental Psychology: General 124(4), 424.
Yin, S and O Kaynak (2015). Big data for modern industry: challenges and trends [point of
view]. Proceedings of the IEEE 103(2), 143–146.
Zeileis, A (2014). ineq: Measuring Inequality, Concentration, and Poverty. R package version
0.2-13. https://CRAN.R-project.org/package=ineq.
Zinszer, K, C Jauvin, A Verma, L Bedard, R Allard, K Schwartzman, L de Montigny,
K Charland, and DL Buckeridge (2010). Residential address errors in public health
surveillance data: a description and analysis of the impact on geocoding. Spatial and
Spatio-temporal Epidemiology 1(2-3), 163–168.
Zuk, T, MST Carpendale, and WD Glanzman (2005). Visualizing Temporal Uncertainty in
3D Virtual Reconstructions. In: VAST. Vol. 2005, pp.6th.
219