Generation of Semantic Layouts for Interactive Multidimensional Data Visualization

Generation of Semantic Layouts for Interactive Multidimensional Data Visualization

Generation of Semantic Layouts for Interactive Multidimensional Data Visualization Erick Gomez-Nieto∗y and Luis Gustavo Nonatoy ∗Research and Innovation Center in Computer Science Universidad Católica San Pablo, Arequipa, Peru yInstituto de Ciências Matemáticas e de Computação Universidade de São Paulo São Carlos-SP, Brazil Abstract—Visualization methods make use of interactive are easily readable, but they pay the price of scalability. graphical representations embedded on a display area in order Hierarchical methods such as Treemaps mitigate the issue to enable data exploration and analysis. These typically rely of scalability while making an efficient use of display area. on geometric primitives for representing data or building more sophisticated representations to assist the visual analysis process. However, readability and semantic organization of data are One of the most challenging tasks in this context is to determinate aspects not so easily handled by those methods. Overlap-free an optimal layout of these primitives which turns out to be semantic preserving techniques generate somewhat structured effective and informative. Existing algorithms for building layouts layouts and keep instances with similar content close to each from geometric primitives are typically designed to cope with other. Nevertheless, they are not designed to make an efficient requirements such as orthogonal alignment, overlap removal, optimal area usage, hierarchical organization, dynamic update use of display area and also suffer from scalability. among others. However, most techniques are able to tackle just Handling many requirements is not straightforward because a few of those requirements simultaneously, impairing their distinct requirements can compete with each other during use and flexibility. In this dissertation, we propose a set of layout construction. For instance, to facilitate readability, approaches for building layouts from geometric primitives that layouts should be built with as large as possible geometric concurrently addresses a wider range of requirements. Relying on multidimensional projection and optimization formulations, entities. However, large objects easily fill up the display area, our methods arrange geometric objects in the visual space so as thus limiting the number of instances that can be visualized. to generate well-structured layouts that preserve the semantic Therefore, finding an optimal balance among multiple concur- relation among objects while still making an efficient use of rent requirements is a challenging task, which has not been display area. A comprehensive set of quantitative comparisons completely tackled by existing methods. against existing methods for layout generation and applications on text, image, and video data set visualization prove the In this dissertation we addressed this challenging problem effectiveness of our approaches1 2. by proposing new techniques for building layouts. We denom- inate to Semantic Layout Arrangement as the task of allocating I. INTRODUCTION efficiently a set of geometric instances, which summarizes Arranging geometric primitives to generate meaningful lay- a multidimensional dataset, into a fixed-size display area, outs is a major task in visualization, which inherently appears subject to preserve, as much as possible and at all times, in important applications such as word cloud constructionand the semantic relationships among instances. The arrangement visual boards.The difficulty in building layouts made up of should simultaneously play with several requirements, such dozens of geometric objects rests in the set of requirements as area usage optimization, overlap removal, object scaling, to be handled simultaneously, e.g., readability, overlaps, object orthogonal alignment and dynamic updating. size, semantic proximity and area usage. Moreover, the number In the proposed methods, semantic relationships are estab- of data instances represented as geometric entities is typically lished by a similarity measure. We use dimensionality reduc- much larger than the visualization area, demanding the use of tion techniques for embedding data from multidimensional clustering, hierarchies, and navigation resources to assist the to visual space in order to, subsequently, build a map of visualization. geometric entities. Then, we applied the proposed optimization Although significant advances have been made towards operators to rearrange entities according to the requirements building meaningful layouts from geometric primitives, exist- we deem the most relevant, thereby generating meaningful ing techniques are formulated to deal with a limited number visualizations. of requirements simultaneously, restricting their use to specific This document presents a compilation of different tech- applications. For instance, techniques such as visual boards niques for interactive and semantic layouts generation for data and small multiples provide well structured layouts which visualization. Each proposed method brings new contributions for the field with the purpose of addressing and solving 1Ph.D. thesis at ICMC-Universidade de São Paulo. specific problems involved in generating geometric semantic 2Email: [email protected], [email protected] layouts for interactive data visualization. Essentially, three A. The energy functional The energy functional E involves two components, one that considers the overlap of snippets, denoted by EO, and a second component related to the neighborhood relations Fig. 1: Pipeline to produce the neighborhood-preserving snip- resulting from the multidimensional projection step, denoted pet visualization. Snippet textual content is represented as by EN . In mathematical terms, the energy E is written as: multidimensional data points (left). The multidimensional data E = (1 − α)E + αE (1) is mapped to the visual space and snippets are embedded into O N rectangles (middle). Optimization is applied to avoid overlap where the parameter α 2 [0; 1] balances the relative contribu- while preserving neighborhoods. tions of both EO and EN in the total energy. Energy E, as well as EO and EN , are functions of the coordinates of the bottom-left corners of the rectangles embed- proposed methods (ProjSnippet [1], MIOLA [2] and Dealing ding the snippets, which initially correspond to the projected with Multiple Requirements [3]) rely on novel optimization coordinates of the multidimensional snippet vectors. We omit formulations for simultaneously dealing with requirements that the independent variables from the equations to simplify the were identified as the most relevant during our study. The last notation. method (Semantically Aware Dynamic Layouts [4]) presented The example illustrates a visualization displaying the results in this summary focuses on an application that demands of a query on the terms “jaguar features” submitted to Google’s semantic preserving layout updates during analyst’s interactive search engine. The view in Figure 2a shows the 10 best ranked data exploration. snippets shown in the first page. Figure 2b displays a ProjSnip- To our knowledge, the proposed approaches figure among pet view with the 64 best ranked snippets. Inspection discloses the most straightforward, efficient and intuitive alternatives to that the snippets on the left (cyan, red, blue, yellow) all refer explore large amounts of multidimensional data while enabling to different models of Jaguar cars, whereas the green ones on a fluent interaction with data analysts. the right refer to a surprising variety of topics, that include multiple references to the wild animal (3 snippets) and also to II. PROJSNIPPET supercomputer models named Jaguar (2 instances). There are also unique references to an earlier MacOs operating system The proposed technique comprises three steps as shown named Jaguar, to a video game, a swimming pool brand, a hair in the pipeline in Figure 1: pre-processing of search results, product brand, an aircraft model and a few other varied stuff. multidimensional projection, and optimization. In the first step Looking at the left region, one identifies that most snippets in each entry returned from a textual query is processed and its the blue cluster contain general references to the car brand, term frequency vector extracted whereas the each of the three other clusters refer mostly to a Each term frequency vector may be handled as a point in specific Jaguar model, namely most yellow snippets refer to a multidimensional space that can be mapped to the visual the XK model, cyan snippets refer to XJ and red to XF models. space with a multidimensional projection technique. Albeit our There are some noticeable exceptions, e.g., a yellow snippet current implementation adopts the Least Squares Projection refers to the XF model and a blue one refers to the XJ model. (LSP) [5] – due to its good accuracy in terms of distance Still, overall the final layout depicts a representative overview preservation and low computational cost – any projection of the search hits, as far grouping/separating similar/dissimilar technique with similar properties might be employed. The results is concerned. Notice that it is pretty difficult to handle projection preserves much of the neighborhood structure of such a variety topics and subtopics in Google’s list-based view, the original data, ensuring that similar instances are placed which indeed brings only results on cars, animals and the game close to each other in the visual space. in the first page. The following step is to embed the content of each

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us