Visualizing Uncertainty What We Talk About When We Talk About Uncertainty
Total Page:16
File Type:pdf, Size:1020Kb
Visualizing Uncertainty What we talk about when we talk about uncertainty Zheng Yan Yu 1 Thesis Presented by Zheng Yan Yu to The Department of Arts, Media and Design In Partial Fulfillment of the Requirements for the Degree of Master of Fine Arts in Information Design and Visualization Northeastern University Boston, Massachusetts April, 2018 2 Visualizing uncertainty Thesis defense 3 Abstract Most visualizations have been designed on the deal with visualizing uncertainty but they are not assumption that the visually represented data are accessible to non-experts outside of the scientific free from uncertainty. However, this is rarely the community. That is partly because the science case. Visualizing uncertainty is essential if we want researchers proposed the methodologies to deal to improve how people understand statistics. We with visualizing uncertainty but they did not connect are surrounded by summary statistics and statistical them to the problems the general public more care inference on our daily lives, such as means and about. probabilities in newspapers, government journals, and mobile applications, and these statistics Second, the traditional methods for visualizing often lead us to biased understanding of the data. uncertainty are limited. For example, the box plot The summary statistic and statistical inference cannot reflect the true distribution of data. In often cannot reflect the reality of data. Visualizing addition, traditional ways to visualize uncertainty uncertainty hidden beneath statistics can improve usually do not consider visual cognition and the way we interpret statistics by looking at whole perception. They often require the users have datasets to find comparisons. This thesis focuses an understanding of statistics to understand the on the uncertainties of summary statistics and visualization. statistical inference. I propose an updated taxonomy of uncertainty to Dealing with visualization of uncertainty is an help people deal with visualizing uncertainty. By important issue. However, there are few existing connecting the examples of visualizing uncertainty guidelines for best practices for people to deal with to the scientific uncertainty taxonomy, I bridge the visualizing uncertainty. Although many of the data gap between the science and the other domains. visualization practitioners are highly interested in Furthermore, I propose three experiments of this topic and they have produced many related visualizing uncertainty of statistical predictive examples, visualizing uncertainty is regarded as models. By visualizing my research outcomes, the one of the unsolved problems in data visualization visualization touches the less discussed topic on community. There are many reasons causing this visualizing the uncertainty of statistical predictive problem. models. First, the gap between the scientific visualization community and other domains such as news agencies, consulting companies. In the scientific community, researchers have a systematic way to 4 Visualizing uncertainty Thesis defense 5 Acknowledgements This thesis cannot be completed without many build and polish the artifacts of my thesis. people’s help. I would like to express my best appreciation to all of them. Thank you, my IDV colleagues. Besides the help from professors, all of you provided great ideas and Without the emotional and financial support from helpless on my thesis through the past two whole my family, I cannot study abroad in the United semesters. The discussion during class is the most States. Thank you, my grandfather, you gave me valued treasure I have received from learning in IDV. all of your earnings to support the tuitions. It is your hard-earned money but you gave me all of Thank you all. them when you know I have decided to pursue my master degree in the USA. Thank you, my father and mother, you always support me especially when I decided to study in Beijing and Boston. When I am not sure the decisions I make are correct or not, you always say I will not regret when I think it through. Thank you, my advisor, Pedro Cruz. When I felt confused on my thesis, you guided me out of the maze (several times). Thank you, the instructor of thesis course and the program lead of IDV, Dietmar Offenhuber. You put much energy caring each student, you listen to us carefully, solve our issues patiently. You pointed out many problems with my thesis. I felt very grateful for these suggestions which made my thesis solid. Thank you, Miso Kim, for encouraging me to keep working on the same topic of my thesis after the proposal. Thank you, Nathan Felde, for helping me go through an extensive and conceptual exploration of uncertainty. The exploration helped me build the thinking system of uncertainty. Thank you, Paul Kahn, for continuously giving feedbacks on my thesis exhibition. The discussion during a couple of weeks helped me 6 Visualizing uncertainty Thesis defense 7 Contents Part 1 Introduction Part 2 What we talk about when we talk about uncertianty 1 Uncertainty visualization precursors 2 An overview of visualizing uncertainty in the scientific community 3 The topic of visualizing uncertainty in the news agencies, consulting companies, and other domains 4 New ways to visualize uncertainty that consider human visual perception 5 Visualization of certain types of uncertainty that are yet to be addressed Part 3 Updated taxonomy of uncertainty 1 Updated taxonomy of uncertainty 2 Examples of the taxonomy Part 4 Visualizing the uncertainty of statistical predictive models 1 Types of uncertainty of statistical models used in this chapter 2 The three instantiations 3 First instantiation: visualizing probability 4 Second instantiation: conveying uncertainty through data points 5 Third instantiation: conveying uncertainty through physical metaphor 6 Conclusion Part 5 Conclusion and next step Part 6 References 8 Visualizing uncertainty Thesis defense 9 Introduction Most visualizations have been designed on the assumption that the visually represented data are free from uncertainty. However, this is rarely the case. Visualizing uncertainty is essential if we want to improve how people understand statistics. We are surrounded by summary statistics and statistical inference on our daily lives, such as means and probabilities in news paper, government journals, and mobile applications. These statistics often lead us to biased understanding of the data. Visualizing uncertainty hidden beneath statistics can improve the way we interpret statistics by looking at whole datasets and/or possible outcomes to find comparisons. This thesis focuses on the uncertainties generated from inefficient representation of data (summary statistics) and statistical inference (data sample and probability). Visualizing uncertainty can help us to “reveal” the uncertainty. First, what is the uncertainty in inefficient representation of data? The uncertainty in inefficient representation of data is generated from summary statistics. Merely looking at summary statistics, we cannot know the reality of Suppose two datasets have the same mean value. The dataset with wider range is more uncertain than the dataset with narrower range. 10 Visualizing uncertainty Thesis defense 11 data. To “reveal” the uncertainty hidden in summary the dataset and the range of the box is also called statistics, we need to visualize the whole datasets. interquartile range. If the data is above or below 1.5 To be more specific, if you are told that the mean interquartile range of the maximum or minimum of (averaged value) of a dataset is 5, have you ever the box, they are called outliers. wondered what the overall dataset looks like? It may be 1,5,9. It may be 4,5,6 as well. When we are told However, the box plot cannot reflect the reality of that the mean of a dataset is 5, we often believe/ data well. In addition, most people cannot read guess the whole data is close to 5, or even think box plot because it requires readers to know what each of the data is 5. The summary statistics, such as is quartiles, medians and outliers before they can mean and median, hide the reality of data. The two interpret box plot. datasets are different, but the means are the same. The uncertainty comes from we have no idea what When the range of the the real data look like because it is summarized data change, the three by statistics. Visualizing uncertainty in summary boxplots remain the statistics means visualizing the reality of data. same The most common and inefficient example of visual representation to show the one dimensional data in statistics is box plot. Median is the middle point of the data and is shown by the line that divides the box into two parts. The A boxplot with all its middle “box” represents the middle 50% of data for elements annotated 12 Visualizing uncertainty Thesis defense 13 In addition, the uncertainty due to inefficient This is a very famous video about the relation Five same summary representation of two (or more than two) between lifespan and income of each country. This statistics (X mean, dimensions of data is about distribution. For example shows the uncertainty in distribution. If we Y mean, X standard example, scatterplot is the distribution which shows only look at the data of China as a whole, without deviation, Y standard the relation of x and y. separate Shanghai (the richest province) and Hans Rosling’s 200 deviation, and Guizhou (the poorest province), we cannot have a Countries, 200 Years, correlation of X and Nevertheless, if we use summary statistics, such as better understanding of the dataset and may have 4 Minutes - The Joy of Y) with different mean, standard deviation, and correlation to present biased understanding. Stats - BBC Four datasets and different the relations of x and y, we will misunderstand the visualizations whole dataset. 14 Visualizing uncertainty Thesis defense 15 Second, what is the uncertainty in statistical is an estimated value and it cannot reflect the inference? real outcome perfectly. To be more specific, the averaged value in statistics is not as same as the Things are getting more and more complex.