A Bounded Measure for Estimating the Benefit of Visualization
Total Page:16
File Type:pdf, Size:1020Kb
arXiv Report 2021 Volume 0 (1981), Number 0 A Bounded Measure for Estimating the Benefit of Visualization: Theoretical Discourse and Conceptual Evaluation Min Chen1 and Mateu Sbert2 1University of Oxford, UK and 2 University of Girona, Spain Visual Mapping with Topological Abstraction Visual Mapping with a Volume Rendering Integral Color Pixel Color Opacity Color Pixel Color Opacity ...... Color Pixel Color Opacity (a) mapping from different sets of voxel values to the same pixel color (b) mapping from different geographical paths to the same line segment Figure 1: Visual encoding typically features many-to-one mapping from data to visual representations, hence information loss. The significant amount of information loss in volume visualization and metro maps suggests that viewers not only can abide the information loss but also benefit from it. Measuring such benefits can lead to new advancements of visualization, in theory and practice. Abstract Information theory can be used to analyze the cost-benefit of visualization processes. However, the current measure of benefit contains an unbounded term that is neither easy to estimate nor intuitive to interpret. In this work, we propose to revise the existing cost-benefit measure by replacing the unbounded term with a bounded one. We examine a number of bounded measures that include the Jenson-Shannon divergence and a new divergence measure formulated as part of this work. We describe the rationale for proposing a new divergence measure. As the first part of comparative evaluation, we use visual analysis to support the multi-criteria comparison, narrowing the search down to several options with better mathematical properties. The theoretical discourse and conceptual evaluation in this paper provide the basis for further comparative evaluation through synthetic and experimental case studies, which are to be reported in a separate paper. arXiv:2103.02505v1 [cs.IT] 3 Mar 2021 1. Introduction In terms of information theory, the types of inaccuracy featured in Figure1 are different forms of information loss (or many-to- To most of us, it seems rather intuitive that visualization should be one mapping). Chen and Golan proposed an information-theoretic accurate, different data values should be visually encoded differ- measure [CG16] for analyzing the cost-benefit of data intelligence ently, and visual distortion should be disallowed. However, when workflows. It enables us to consider the positive impact of informa- we closely examine most (if not all) visualization images, we can tion loss (e.g., reducing the cost of storing, processing, displaying, notice that inaccuracy is ubiquitous. The two examples in Figure1 perceiving, and reasoning about the information) as well as its neg- evidence the presence of such inaccuracy. In volume visualization, ative impact (e.g., being mislead by the information). The measure when a pixel is used to depict a set of voxels along a ray, many provides a concise explanation about the benefit of visualization different sets of voxel values may result in the same pixel color. In because visualization and other data intelligence processes (e.g., a metro map, a variety of complex geographical paths may be dis- statistics and algorithms) all typically cause information loss and torted and depicted as a straight line. Since there is little doubt that visualization allows human users to reduce the negative impact of volume visualization and metro maps are useful, some “inaccurate” information loss effectively using their knowledge. visualization must be beneficial. © 2021 The Author(s) with LATEX template from Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd. M. Chen and M. Sbert / A Bounded Measure for Estimating the Benefit of Visualization: Theoretical Discourse and Conceptual Evaluation The mathematical formula of the measure features a term based Visual Mapping & Displaying Performing Visualization Tasks on the Kullback-Leibler (KL) divergence [KL51] for measuring the P P potential distortion of a user or a group of users in reconstructing i i+1 the information that may have been lost or distorted during a vi- increasing AC can cause more PD sualization process. While using the KL-divergence is mathemati- cally intrinsic for measuring the potential distortion, its unbound- human knowledge can edness property has some undesirable consequences. The simplest reduce PD in visualization phenomenon of making a false representation (i.e., always display- increasing AC PD PD can AC PD ing 1 when a binary value is 0 or always 0 when it is 1) happens to lead to be a singularity condition of the KL-divergence. The amount of dis- more cost tortion measured by the KL-divergence often has much more bits Cost Cost than the entropy of the information space itself. This is not intuitive increasing AC can lead to cost reduction to interpret and hinders practical applications. In this paper, we propose to replace the KL-divergence with a Figure 2: Alphabet compression may reduce the cost of Pi and Pi+1, bounded term. We first confirm the boundedness is a necessary especially when human knowledge can reduce potential distortion. property. We then conduct multi-criteria decision analysis (MCDA) [IN13] to compare a number of bounded measures, which include uation by Wei et al. [WLS13], measuring observation capacity the Jensen–Shannon (JS) divergence [Lin91], and a new divergence by Bramon et al. [BRB∗13a], measuring information content by k measure Dnew (including its variations) formulated as part of this Biswas et al. [BDSW13], proving the correctness of “overview first, work. We use visual analysis to aid the observation of the math- zoom, details-on-demand” by Chen and Jänicke [CJ10] and Chen ematical properties of these candidate measures, narrowing down et al. [CFV∗16], and confirming visual multiplexing by Chen et from eight options to five. In a separate but related paper (included al. [CWB∗14]. in the supplementary materials), we use synthetic and experimen- tal case studies to to instantiate values that may be returned by the Ward first suggested that information theory might be an un- five options. It also explores the relationship between measuring derpinning theory for visualization [PAJKW08]. Chen and Jänicke the benefit of visualization and measuring the viewers’ knowledge [CJ10] outlined an information-theoretic framework for visualiza- used during visualization. The search for the best way to measure tion, and it was further enriched by Xu et al. [XLS10] and Wang and the benefit of visualization will likely entail a long journey. The Shen [WS11] in the context of scientific visualization. Chen and main contribution of this work is to initiate this endeavor. Golan proposed an information-theoretic measure for analyzing the cost-benefit of visualization processes and visual analytics work- flows [CG16]. It was used to frame an observation study showing 2. Related Work that human developers usually entered a huge amount of knowledge Claude Shannon’s landmark article in 1948 [Sha48] signifies the into a machine learning model [TKC17]. It motivated an empirical birth of information theory. It has been underpinning the fields study confirming that knowledge could be detected and measured of data communication, compression, and encryption since. As quantitatively via controlled experiments [KARC17]. It was used a mathematical framework, information theory provides a collec- to analyze the cost-benefit of different virtual reality applications tion of useful measures, many of which, such as Shannon en- [CGJM19]. It formed the basis of a systematic methodology for tropy [Sha48], cross entropy [CT06], mutual information [CT06], improving the cost-benefit of visual analytics workflows [CE19]. and Kullback-Leibler divergence [KL51] are widely used in appli- It survived qualitative falsification by using arguments in visual- cations of physics, biology, neurology, psychology, and computer ization [SEAKC19]. It offered a theoretical explanation of “visual abstraction” [VCI20]. The work reported in this paper continues the science (e.g., visualization, computer graphics, computer vision, ∗ data mining, machine learning), and so on. In this work, we also path of theoretical developments in visualization [CGJ 17], and is consider Jensen-Shannon divergence [Lin91] in detail. intended to improve the original cost-benefit formula [CG16], in order to make it a more intuitive and usable measurement in prac- Information theory has been used extensively in visualization tical visualization applications. [CFV∗16]. It has enabled many applications in visualization, in- cluding scene and shape complexity analysis by Feixas et al. 3. Overview and Motivation [FdBS99] and Rigau et al. [RFS05], light source placement by Gumhold [Gum02], view selection in mesh rendering by Vázquez Visualization is useful in most data intelligence workflows, but the et al. [VFSH04] and Feixas et al. [FSG09], attribute selection by usefulness is not universally true because the effectiveness of vi- Ng and Martin [NM04], view selection in volume rendering by sualization is usually data-, user-, and task-dependent. The cost- Bordoloi and Shen [BS05], and Takahashi and Takeshima [TT05], benefit ratio proposed by Chen and Golan [CG16] captures the multi-resolution volume visualization by Wang and Shen [WS05], essence of such dependency. Below is the qualitative expression focus of attention in volume rendering by Viola et al. [VFSG06], of the measure: feature highlighting