Revision: Automated Classification, Analysis and Redesign of Chart

ReVision: Automated Classification, Analysis and Redesign of Chart Images Anonymous Institution Location [email protected] ReVision Gallery Hepatitis B Cardiovascular... AIDS ABSTRACT Hepatitis C Parkinson' Input Image Diabetes Prostate Poorly designed charts are prevalent in reports, magazines, Alzheimer's Alzheimer's books and on the Web. Yet, most of these charts are only Prostate Diabetes available as bitmap images. Without access to the underly- Parkinson' Hepatitis C ing data it is prohibitively difficult for viewers to create more Hepatitis B AIDS Cardiovascular... effective visual representations. In response we present Re- 0% 10% 20% 30% 40% 50% 60% 70% 80% Vision, a system that automatically redesignsImage: pie_44 visualizations Figure 1: Chart Redesign. (a) A pie chart of NIH expenses to improve graphical perception. Given aData bitmap Table image of a per condition-related death. The chart suffers from random chart as input, our system applies computerLabel vision% of Total and machine learning techniques to identify theAIDS chart type70% (e.g. pie sorting, highly saturated colors, and erratic label placement. Parkinson' 6% (b) Plotting the data as a sorted bar chart enables more ac- chart, bar chart, scatterplot, etc.). It thenProstate extracts the5.2% graph- Alzheimer's 5.1% curate comparisons of data values [6, 19]. ical marks and the data encoded by eachDiabetes mark. Using5.1% the Hepatitis C 4% resulting data table ReVision applies perceptually-basedHepatitis B 3.5% design principles to populate an interactive galleryCardiovascular... that1.1% enables proved in multiple ways: slices are ordered haphazardly, la- users to view alternative chart designs and retarget content to bels are placed erratically, and label text competes with satu- different visual styles. Color Palettes Apple Spectrum rated background colors. Moreover, the chart encodes values Apple Blue Apple Brown as angular extents, which are known to result in less accurate ACM Classification: H5.2 [InformationApple interfaces Green and pre- Apple Grey value comparisons than position encodings [6, 16, 23]. Fig- sentation]: User Interfaces. - Graphical userR Grey interfaces. R Rainbow ure 1(b) shows the same data in a redesigned visualization: Economist Tableau 10 the bar chart sorts the data, uses a perceptually-motivated Keywords: Visualization, chart understanding,Tableau 20 information ManyEyes color palette [25], and applies a position encoding of values. extraction, redesign, computer vision. ManyEyes Blue ManyEyes Green ManyEyes Yellow For analysts working with their own data, visual design prin- INTRODUCTION ManyEyes Red ciples [6, 11, 16, 27] and automated design methods [19, 20] Over the last 300 years, charts, graphs, and other visual de- can lead to more effective visualizations. However, the vast pictions of data have become a primary vehicle for communi- majority of visualizations are only available as bitmap im- cating quantitative information [27]. However, designers of ages. Without access to the underlying data it is prohibitively visualizations must navigate a number of decisions, includ- difficult to create alternative visual representations. ing choices of visual encoding and styling. These decisions influence the look-and-feel of a graphic and can have a pro- We present ReVision, a system that takes bitmap images of found effect on graphical perception [6]: the ability of view- charts as input and automatically generates redesigned visu- ers to decode information from a chart. Despite progress in alizations as output. For example, ReVision produces Figure the development of design principles for effective visualiza- 1(b) as a suggested redesign when given Figure 1(a) as input. tion [6, 11, 16, 19, 20, 23, 27], many charts produced today Our system identifies the type of chart, extracts the marks exhibit poor design choices that hamper understanding of the and underlying data, and then uses this data in tandem with a underlying data and lead to unaesthetic displays. list of guidelines to provide alternate designs. ReVision also Consider the pie chart in Figure 1(a), which depicts data supports stylistic redesign; users can change mark types, col- concerning the 2005 research budget of the National Insti- ors or fonts to adhere to a specific visual aesthetic or brand. tutes of Health (NIH). The design of this chart could be im- We make the following research contributions: Classification of Chart Images. ReVision determines the Permission to make digital or hard copies of all or part of this work for type of chart using both low-level image features and ex- personal or classroom use is granted without fee provided that copies are tracted text-level features. We propose a novel set of features not made or distributed for profit or commercial advantage and that copies that achieve an average classification accuracy of 96%. Our bear this notice and the full citation on the first page. To copy otherwise, to image features are more accurate, simpler to compute, and republish, to post on servers or to redistribute to lists, requires prior specific easier to interpret than those used in prior work. While we fo- permission and/or a fee. UIST’11, October 16-19, 2011, Santa Barbara, CA, USA. cus on the task of chart redesign, our classification method is Copyright 2011 ACM 978-1-4503-0716-1/11/10...$10.00. applicable to additional tasks such as indexing and retrieval. Area Graphs (90+17) Bar Graphs (363+81) Curve Plots (318) Maps (249) Pareto Charts (168) Pie Charts (210+21) Radar Plots (137) Scatter Plots (372) Tables (263) Venn Diagrams (263) Figure 2: Our 10-category chart image corpus. Numbers in parentheses: (strictly 2D images + images with 3D effects). Extraction of Chart Data. We present a set of chart-specific in the images is ambiguous. We are interested in image fea- techniques for extracting the graphical marks and the data tures at the level of graphical marks, as we hypothesize that values they represent from a visualization. Our implemen- they may predict the chart type more accurately. tation focuses on bar and pie charts. We first locate the bars and pie slices that encode the data. We then introduce heuris- Extracting Marks from Charts tics for associating these marks with related elements such A few researchers have investigated techniques for extracting as axes and labels. Finally, we use this information about marks from charts. Zhou and Tan [29] combine boundary the graphical marks to extract the table of data values under- tracing with the Hough transform to extract bars from bar lying the visualization. Limiting the classification corpus to charts. Huang et al. [17, 18] generate edge maps, vectorize bar and pie charts without 3D effects or non-solid shading, the edge maps, and use rules to extract marks from bar, pie, we achieve an average accuracy of 64.5% in extracting the line, and low-high charts. However, they focus on extracting marks, and an average accuracy of 66.5% in extracting the the mark geometry rather than the underlying data. Their data for charts whose marks were successfully extracted. We techniques also rely on an accurate edge map, which can be note that the classification corpus contains a wide diversity of difficult to retrieve from many real-world charts, due to large charts with respect to image resolution and quality. We de- variance in image quality. We designed our techniques to be tail the challenges of robustly extracting data from such chart more robust to such variance. images. Automated Visualization Design We demonstrate how these contributions can be combined Researchers have applied graphical perception principles to with existing design guidelines to automatically generate a automatically produce effective visualizations. Mackinlay’s variety of alternative visualizations. In the process of pur- APT [19] generates charts guided by rankings of visual vari- suing these goals, we also compiled a corpus of more than ables for specific data types (e.g., categorical, ordinal, or 2,500 chart images labeled by chart type. We are making quantitative data). Stolte et al.’s Polaris [24] is a system this corpus publicly available in the hope of spurring contin- for generating small multiples [27] displays based on user ued research on computational visualization interpretation. queries of a multidimensional data table. Mackinlay et al. [20] later extend this work to support a broader range of auto- RELATED WORK mated visualization designs. These systems assist users in Our ReVision system builds on three areas of related work. producing visualizations directly from data; we seek to first extract data from chart images and then redesign the visual- Classifying Visualization Images izations to improve graphical perception. Our current work Classification of natural scenes is a well-studied problem in is complementary to these earlier systems: once we have ex- computer vision [1, 3]. A common approach is to use a “bag tracted the data table we could feed it into any of these sys- of visual words” representation [28] where words are low- tems to generate improved alternative designs. level image features (e.g., gradients, local region textures, etc.). Each input image is encoded as a feature vector us- SYSTEM OVERVIEW ing these words. Machine learning methods then classify the ReVision is comprised of a three stage pipeline: 1) chart clas- feature vectors and corresponding images into categories. sification, 2) mark and data extraction and 3) redesign. In stage 1, Revision classifies an input image according to its Researchers have developed specialized techniques for clas- chart type. This stage uses a corpus of test images to learn sifying chart images by type (e.g., bar chart, pie chart, etc.). distinctive image features and train classifiers. In stage 2, Shao and Futrelle [22] first extract high-level shape types ReVision locates graphical marks, associates them with text (e.g., line segments) from vectorized representations of charts labels, and extracts a data table. In stage 3, ReVision uses the and then use these shape types as features for classifying six extracted data to generate a gallery of alternative designs.

Revision: Automated Classification, Analysis and Redesign of Chart

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support