Tiling Heuristics and Evaluation Metrics for Treemaps with a Target Node Aspect Ratio

EXAMENSARBETE INOM DATATEKNIK, AVANCERAD NIVÅ, 30 HP STOCKHOLM, SVERIGE 2017 Tiling heuristics and evaluation metrics for treemaps with a target node aspect ratio RODRIGO ROA RODRÍGUEZ KTH SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION Tegelläggningsheuristiker och evalueringsm˚att för treemaps med ett m˚alsatt bredd-höjd-förh˚allande för noder Sammanfattning Treemaps är populära rymd-e↵ektiva visualiseringar av hierarkisk data som mappar ett datums attribut eller aggregat till en proportionell area. Med en treemap som best˚ar av rekursivt inbäddade rektangulära noder (ocks˚akallade för tegel) finns det m˚angamöjliga och giltiga te- gelläggningar. Ett vanligt optimeringskriterium för treemaps är bredd-höjd-förh˚allande för noder. D˚atreemaps best˚ar av ett flertal noder m˚aste breddförh˚allandet aggregeras. Den grundläggande definitionen av breddförh˚allande (bredd delad med höjd) kan inte aggregeras p˚aett meningsfullt sätt. I litteraturen föresl˚as därför en definition av bredd-höjd-förh˚allande som inte skiljer mellan höjd och bredd. Denna definition möjliggör betydelsefull aggregering, men en- bart om det inte finns stora skillnader i värde mellan dataelement och det m˚alsatta bredd-höjd-förh˚allandet är 1:1. Ett m˚al bredd-höjd-förh˚allande p˚a1:1 antogs ursprungligen axiomatiskt vara optimalt i litteraturen. Därefter har perceptuella studier visat att just detta förh˚allande leder till det största areauppskattningsfelet. Detta examensarbete föresl˚ar en korrigerad version av m˚attet som kan användas även när det finns stora skillnader i värde mellan dataelement. Därutöver har b˚ade det ursprungliga och det korrigerade m˚attet genera- liserats s˚aatt de fungerar med ett godtyckligt m˚alvärde. Ytterligare ett bekymmer med avseende p˚aevalueringsmetodik är att tegelläggningsalgoritmer har än s˚alänge evaluerats med Monte Carlo- metoden. I denna metod genereras syntetisk data och som sedan aggregeras till ett slutgiltigt resultat. Däremot kan ett enda värde inte sam- manfatta en algoritms beteende d˚adess prestanda är beroende p˚avilken värdefördelning datan har. Alternativet som föresl˚as p˚adetta examensarbete är visuell klusteranalys som p˚ast˚as ha en större förutsägande förm˚aga. Allt det ovannämnda realiseras via ett experiment. I experimentet evalue- ras en ny art av algoritmer, som har baserats p˚aresultaten av perceptuella tester i litteraturen, i jämförelse mot den nuvarande mest populära te- gelläggningsalgoritmen: Squarify. Resultaten visar att det finns stora men sammanhängande skillnader p˚avärde beroende p˚avilken fördelning datan har. Atminstone˚ för ett m˚al p˚a1,5 och vid de allra flesta fördelningarna visar det sig att de nya algoritmerna presterar bättre än Squarify med avseende p˚adet resulterande bredd-höjd-förh˚allandet. Tiling heuristics and evaluation metrics for treemaps with a target node aspect ratio Rodrigo Roa Rodríguez School of Computer Science and Communication, KTH Royal Institute of Technology [email protected] ABSTRACT are shown to perform better than Squarify for most use cases Treemaps are a popular space-filling visualization of hierar- in terms of aspect ratio. chical data that maps an attribute of a datum, or a data aggre- gate, to a proportional amount of area. Assuming a rectangular treemap consisting of nested rectangles (also called tiles), Author Keywords there are multiple possible valid tiling arrangements. Treemap, heuristics, tiling, tessellation, metrics, aspect ratio, A common criterion for optimization is aspect ratio. Never- orientation agnostic, OAAR, FOAAR, orientation, offset theless, treemaps usually consist of multiple rectangles, so factor, offset quotient, information visualization, infoviz, the aspect ratios need be aggregated. macro-economic metaphor, eat the poor, eat the rich, subsidy, welfare The basic definition of aspect ratio (width divided by height) cannot be meaningfully aggregated. Given this, a definition of aspect ratio that does not differentiate height from width INTRODUCTION was suggested. This definition allows for meaningful aggre- Treemaps are one of multiple methods for visualization for gation, but only as long as there are no large differences in hierarchical data in the form of aggregating trees. The un- the data distribution, and the target aspect ratio is 1:1. derlying mechanism is assigning numeric values to the leaves Originally, a target aspect ratio of 1:1 was deemed to be ax- and then recursively calculating the value of the parent nodes iomatically ideal. Currently, perceptual studies have found an by aggregating values of the children all the way up to the aspect ratio of 1:1 to lead to the largest area estimation error. root. However, with any other target this definition of aspect ratio To represent these numeric values visually they are mapped cannot be meaningfully aggregated. to area. Traditionally, treemaps consist of nested rectangles This thesis suggests a correction that can be applied to the that represent each data element. In spite a fixed area acting current metric and would allow it to be meaningfully aggre- as constraint, there is an infinite number of valid rectangle gated even when there are large value differences in the data. configurations that would represent the data. Nevertheless, Furthermore, both the uncorrected and corrected metrics can there are differences in how desirable the configurations are be generalized for any target (i.e. targets other than 1:1). for the purpose of visualization. Another issue with current evaluation techniques is that algo- As originally proposed [11] treemaps would draw parallel rithm fitness is evaluated through Monte Carlo trials. In this lines to subdivide the root area either vertically or horizon- method, synthetic data is generated and then aggregated to tally and then switch direction for the next level of nesting. generate a single final result. However, tiling algorithm per- This approach is now known as the Dice and Slice tiling al- formance is dependant on data distribution, so a single aggre- gorithm (for an example see fig. 2). Dice and Slice has two gate result cannot generalize overall performance. The alter- desirable qualities beyond simplicity: ordering transparency native suggested in this thesis is visual cluster analysis, which and update stability [16]. should hold more general predictive power. However, Slice and Dice is not the most popular tiling algo- All of the above is put into practice with an experiment. In the rithm anymore (see fig.1). Presently, Squarify (see fig. 3) ap- experiment, a new family of tiling algorithms, based on crite- pears to be the de facto standard. Squarify is the default tiling ria derived from the results of the perceptual tests in literature, algorithm in the visualization library D3 [5] and the only is compared to the most popular tiling algorithm, Squarify. available algorithm in popular visualization software such as Excel, Tableau and Google Charts. The results confirm that there are indeed vast but consistent value fluctuations for different normal distributions. At least Squarify was developed under the assumption that an aspect for a target aspect ratio of 1.5, the new proposed algorithms ratio of 1 (a square) is ideal for the purpose of visualization lit- eracy [6]. Nevertheless, this assumption was not made based on experimental data or any explicit perceptual principles. 1 0.340:1 0.518:1 5.28% 0.954:1 8.00% 8% 0.712:1 14.7% 11% 8 44.0% 11 0.672:1 0.604:1 0.547:1 0.499:1 1.39:1 8.84‰ 1.44:1 1.18% 1.06% 9.67‰ 81% 3.85% 2.02% 1.63:1 1.54:1 0.803:1 0.753:1 0.709:1 1.73% 8.13‰ 6.11‰ 5.75‰ 5.42‰ 1.87:1 1.67:1 2.99% 7.52‰ 1.38:1 0.784:1 0.746:1 1.88:1 5.12‰ 4.39‰ 4.19‰ 1.50% 1.80:1 1.45:1 1.55:1 0.354:1 6.99‰ 4.85‰ 3.67‰ 2.31:1 4.00‰ 81 2.14:1 2.42% 1.94:1 1.32% 1.53:1 1.63:1 6.53‰ 4.61‰ 3.83‰ 3 Figure 3. A Squarify(r = 2 ) tiled treemap visualization of Zipf(k,s) 1 < k 30 k N s = 1 . Although the dataset is the same{ as fig. 2|, ≤ ∧ ∈ ∧ } Squarify Pivot Slice and Dice the aspect ratios are less extreme. However, the layout is unstable under data updates, ordering has been obfuscated and most aspect ratios differ 3 significantly from the target aspect ratio of 2 ). Figure 1. Pie-chart visualization of proportion of tiling algorithms used Motivation by the first 100 Google images of rectangular treemaps (May 9, 2017, search term = "treemap") Given that the optimization criterion, an aspect ratio of 1, 5.4e-3:1 5.0e-3:1 4.6e-3:1 4.3e-3:1 4.0e-3:1 3.7e-3:1 3.4e-3:1 7.4e-3:1 6.8e-3:1 6.3e-3:1 5.8e-3:1 4.00‰ 3.83‰ 3.67‰ 9.7e-3:1 8.8e-3:1 8.1e-3:1 4.85‰ 4.61‰ 4.39‰ 4.19‰ 1.3e-2:1 1.2e-2:1 1.1e-2:1 5.75‰ 5.42‰ 5.12‰ 1.9e-2:1 1.7e-2:1 1.5e-2:1 7.52‰ 6.99‰ 6.53‰ 6.11‰ 2.6e-2:1 2.2e-2:1 9.67‰ 8.84‰ 8.13‰ 3.7e-2:1 3.0e-2:1 1.32% 1.18% 1.06% 4.6e-2:1 2.02% 1.73% 1.50% does not match the experimental evidence, as well as the ag- 6.0e-2:1 8.3e-2:1 2.99% 2.42% 0.127:1 5.28% 3.85% 0.236:1 8.00% gregate metrics not seeming representative of the actual dis- 14.7% tribution of aspect ratios there is a real need for alternative 0.712:1 quantitative evaluation metrics for treemap tiling algorithms.

Tiling Heuristics and Evaluation Metrics for Treemaps with a Target Node Aspect Ratio

Stable Treemaps Via Local Moves

Treemap User Guide

Treemap Art Project

Immersive Data Visualization and Storytelling Based on 3D | Virtual Reality Platform: a Study of Feasibility, Efﬁciency, and Usability

Uncertainty Treemaps

UNIVERSITY of CALIFORNIA SANTA CRUZ PLAYING with WORDS: from INTUITION to EVALUATION of GAME DIALOGUE INTERFACES a Dissertation

Issn –2395-1885 Issn

Treemaps for Space-Constrained Visualization of Hierarchies

A Visual Analysis Tool for Medication Use Data in the ABCD Study

Hsuanwei Michelle Chen

Conceptualizing an Interactive Graphical Interface

Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society