Tiling Heuristics and Evaluation Metrics for Treemaps with a Target Node Aspect Ratio
Total Page:16
File Type:pdf, Size:1020Kb
EXAMENSARBETE INOM DATATEKNIK, AVANCERAD NIVÅ, 30 HP STOCKHOLM, SVERIGE 2017 Tiling heuristics and evaluation metrics for treemaps with a target node aspect ratio RODRIGO ROA RODRÍGUEZ KTH SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION Tegell¨aggningsheuristiker och evalueringsm˚att f¨or treemaps med ett m˚alsatt bredd-h¨ojd-f¨orh˚allande f¨or noder Sammanfattning Treemaps ¨ar popul¨ara rymd-e↵ektiva visualiseringar av hierarkisk da- ta som mappar ett datums attribut eller aggregat till en proportionell area. Med en treemap som best˚ar av rekursivt inb¨addade rektangul¨ara noder (ocks˚akallade f¨or tegel) finns det m˚angam¨ojliga och giltiga te- gell¨aggningar. Ett vanligt optimeringskriterium f¨or treemaps ¨ar bredd-h¨ojd-f¨orh˚allande f¨or noder. D˚atreemaps best˚ar av ett flertal noder m˚aste breddf¨orh˚allandet aggregeras. Den grundl¨aggande definitionen av breddf¨orh˚allande (bredd delad med h¨ojd) kan inte aggregeras p˚aett meningsfullt s¨att. I litteraturen f¨oresl˚as d¨arf¨or en definition av bredd-h¨ojd-f¨orh˚allande som inte skiljer mellan h¨ojd och bredd. Denna definition m¨ojligg¨or betydelsefull aggregering, men en- bart om det inte finns stora skillnader i v¨arde mellan dataelement och det m˚alsatta bredd-h¨ojd-f¨orh˚allandet ¨ar 1:1. Ett m˚al bredd-h¨ojd-f¨orh˚allande p˚a1:1 antogs ursprungligen axiomatiskt vara optimalt i litteraturen. D¨arefter har perceptuella studier visat att just detta f¨orh˚allande leder till det st¨orsta areauppskattningsfelet. Detta examensarbete f¨oresl˚ar en korrigerad version av m˚attet som kan anv¨andas ¨aven n¨ar det finns stora skillnader i v¨arde mellan dataelement. D¨arut¨over har b˚ade det ursprungliga och det korrigerade m˚attet genera- liserats s˚aatt de fungerar med ett godtyckligt m˚alv¨arde. Ytterligare ett bekymmer med avseende p˚aevalueringsmetodik ¨ar att tegell¨aggningsalgoritmer har ¨an s˚al¨ange evaluerats med Monte Carlo- metoden. I denna metod genereras syntetisk data och som sedan aggre- geras till ett slutgiltigt resultat. D¨aremot kan ett enda v¨arde inte sam- manfatta en algoritms beteende d˚adess prestanda ¨ar beroende p˚avil- ken v¨ardef¨ordelning datan har. Alternativet som f¨oresl˚as p˚adetta exa- mensarbete ¨ar visuell klusteranalys som p˚ast˚as ha en st¨orre f¨oruts¨agande f¨orm˚aga. Allt det ovann¨amnda realiseras via ett experiment. I experimentet evalue- ras en ny art av algoritmer, som har baserats p˚aresultaten av perceptuella tester i litteraturen, i j¨amf¨orelse mot den nuvarande mest popul¨ara te- gell¨aggningsalgoritmen: Squarify. Resultaten visar att det finns stora men sammanh¨angande skillnader p˚av¨arde beroende p˚avilken f¨ordelning datan har. Atminstone˚ f¨or ett m˚al p˚a1,5 och vid de allra flesta f¨ordelningarna visar det sig att de nya algoritmerna presterar b¨attre ¨an Squarify med avseende p˚adet resulterande bredd-h¨ojd-f¨orh˚allandet. Tiling heuristics and evaluation metrics for treemaps with a target node aspect ratio Rodrigo Roa Rodríguez School of Computer Science and Communication, KTH Royal Institute of Technology [email protected] ABSTRACT are shown to perform better than Squarify for most use cases Treemaps are a popular space-filling visualization of hierar- in terms of aspect ratio. chical data that maps an attribute of a datum, or a data aggre- gate, to a proportional amount of area. Assuming a rectangu- lar treemap consisting of nested rectangles (also called tiles), Author Keywords there are multiple possible valid tiling arrangements. Treemap, heuristics, tiling, tessellation, metrics, aspect ratio, A common criterion for optimization is aspect ratio. Never- orientation agnostic, OAAR, FOAAR, orientation, offset theless, treemaps usually consist of multiple rectangles, so factor, offset quotient, information visualization, infoviz, the aspect ratios need be aggregated. macro-economic metaphor, eat the poor, eat the rich, subsidy, welfare The basic definition of aspect ratio (width divided by height) cannot be meaningfully aggregated. Given this, a definition of aspect ratio that does not differentiate height from width INTRODUCTION was suggested. This definition allows for meaningful aggre- Treemaps are one of multiple methods for visualization for gation, but only as long as there are no large differences in hierarchical data in the form of aggregating trees. The un- the data distribution, and the target aspect ratio is 1:1. derlying mechanism is assigning numeric values to the leaves Originally, a target aspect ratio of 1:1 was deemed to be ax- and then recursively calculating the value of the parent nodes iomatically ideal. Currently, perceptual studies have found an by aggregating values of the children all the way up to the aspect ratio of 1:1 to lead to the largest area estimation error. root. However, with any other target this definition of aspect ratio To represent these numeric values visually they are mapped cannot be meaningfully aggregated. to area. Traditionally, treemaps consist of nested rectangles This thesis suggests a correction that can be applied to the that represent each data element. In spite a fixed area acting current metric and would allow it to be meaningfully aggre- as constraint, there is an infinite number of valid rectangle gated even when there are large value differences in the data. configurations that would represent the data. Nevertheless, Furthermore, both the uncorrected and corrected metrics can there are differences in how desirable the configurations are be generalized for any target (i.e. targets other than 1:1). for the purpose of visualization. Another issue with current evaluation techniques is that algo- As originally proposed [11] treemaps would draw parallel rithm fitness is evaluated through Monte Carlo trials. In this lines to subdivide the root area either vertically or horizon- method, synthetic data is generated and then aggregated to tally and then switch direction for the next level of nesting. generate a single final result. However, tiling algorithm per- This approach is now known as the Dice and Slice tiling al- formance is dependant on data distribution, so a single aggre- gorithm (for an example see fig. 2). Dice and Slice has two gate result cannot generalize overall performance. The alter- desirable qualities beyond simplicity: ordering transparency native suggested in this thesis is visual cluster analysis, which and update stability [16]. should hold more general predictive power. However, Slice and Dice is not the most popular tiling algo- All of the above is put into practice with an experiment. In the rithm anymore (see fig.1). Presently, Squarify (see fig. 3) ap- experiment, a new family of tiling algorithms, based on crite- pears to be the de facto standard. Squarify is the default tiling ria derived from the results of the perceptual tests in literature, algorithm in the visualization library D3 [5] and the only is compared to the most popular tiling algorithm, Squarify. available algorithm in popular visualization software such as Excel, Tableau and Google Charts. The results confirm that there are indeed vast but consistent value fluctuations for different normal distributions. At least Squarify was developed under the assumption that an aspect for a target aspect ratio of 1.5, the new proposed algorithms ratio of 1 (a square) is ideal for the purpose of visualization lit- eracy [6]. Nevertheless, this assumption was not made based on experimental data or any explicit perceptual principles. 1 0.340:1 0.518:1 5.28% 0.954:1 8.00% 8% 0.712:1 14.7% 11% 8 44.0% 11 0.672:1 0.604:1 0.547:1 0.499:1 1.39:1 8.84‰ 1.44:1 1.18% 1.06% 9.67‰ 81% 3.85% 2.02% 1.63:1 1.54:1 0.803:1 0.753:1 0.709:1 1.73% 8.13‰ 6.11‰ 5.75‰ 5.42‰ 1.87:1 1.67:1 2.99% 7.52‰ 1.38:1 0.784:1 0.746:1 1.88:1 5.12‰ 4.39‰ 4.19‰ 1.50% 1.80:1 1.45:1 1.55:1 0.354:1 6.99‰ 4.85‰ 3.67‰ 2.31:1 4.00‰ 81 2.14:1 2.42% 1.94:1 1.32% 1.53:1 1.63:1 6.53‰ 4.61‰ 3.83‰ 3 Figure 3. A Squarify(r = 2 ) tiled treemap visualization of Zipf(k,s) 1 < k 30 k N s = 1 . Although the dataset is the same{ as fig. 2|, ≤ ∧ ∈ ∧ } Squarify Pivot Slice and Dice the aspect ratios are less extreme. However, the layout is unstable under data updates, ordering has been obfuscated and most aspect ratios differ 3 significantly from the target aspect ratio of 2 ). Figure 1. Pie-chart visualization of proportion of tiling algorithms used Motivation by the first 100 Google images of rectangular treemaps (May 9, 2017, search term = "treemap") Given that the optimization criterion, an aspect ratio of 1, 5.4e-3:1 5.0e-3:1 4.6e-3:1 4.3e-3:1 4.0e-3:1 3.7e-3:1 3.4e-3:1 7.4e-3:1 6.8e-3:1 6.3e-3:1 5.8e-3:1 4.00‰ 3.83‰ 3.67‰ 9.7e-3:1 8.8e-3:1 8.1e-3:1 4.85‰ 4.61‰ 4.39‰ 4.19‰ 1.3e-2:1 1.2e-2:1 1.1e-2:1 5.75‰ 5.42‰ 5.12‰ 1.9e-2:1 1.7e-2:1 1.5e-2:1 7.52‰ 6.99‰ 6.53‰ 6.11‰ 2.6e-2:1 2.2e-2:1 9.67‰ 8.84‰ 8.13‰ 3.7e-2:1 3.0e-2:1 1.32% 1.18% 1.06% 4.6e-2:1 2.02% 1.73% 1.50% does not match the experimental evidence, as well as the ag- 6.0e-2:1 8.3e-2:1 2.99% 2.42% 0.127:1 5.28% 3.85% 0.236:1 8.00% gregate metrics not seeming representative of the actual dis- 14.7% tribution of aspect ratios there is a real need for alternative 0.712:1 quantitative evaluation metrics for treemap tiling algorithms.