Stacked Graphs – Geometry & Aesthetics
Total Page:16
File Type:pdf, Size:1020Kb
Stacked Graphs – Geometry & Aesthetics Lee Byron & Martin Wattenberg Abstract — In February 2008, the New York Times published an unusual chart of box office revenues for 7500 movies over 21 years. The chart was based on a similar visualization, developed by the first author, that displayed trends in music listening. This paper describes the design decisions and algorithms behind these graphics, and discusses the reaction on the Web. We suggest that this type of complex layered graph is effective for displaying large data sets to a mass audience. We provide a mathematical analysis of how this layered graph relates to traditional stacked graphs and to techniques such as ThemeRiver, showing how each method is optimizing a different “energy function”. Finally, we discuss techniques for coloring and ordering the layers of such graphs. Throughout the paper, we emphasize the interplay between considerations of aesthetics and legibility. Index Terms — Streamgraph, ThemeRiver, listening history, last.fm, aesthetics, communication-minded visualization, time series. 1 INTRODUCT I ON In February 2008, The New York Times stirred up a debate. The graphic and accompanying online interactive visualization of the box famous newspaper is no stranger to controversy, but this time the office revenue for 7500 movies over a 21-year period. issue was not political bias or anonymous sources—it was an unusual In this paper we first provide a case study of the New York Times graph of movie ticket sales. On information design blogs, opinions and last.fm visualizations. We pay special attention to the response of the chart ranged from “fantastic” to “unsavory.” Meanwhile, on on the web and the role of aesthetics in the appeal of visualizations. other online forums and blogs, hundreds of people posted insights Second, we perform a detailed analysis of the algorithms that define and questions spurred by the visualization. these graphs. A key theme is the role of aesthetics in visualization The story of the design process and algorithms behind this engag- design, and the process and trade-offs necessary to create engaging ing (and polarizing) graphic makes an illuminating case study in the information graphics. role of aesthetics in visualization design. Our goal in this paper is to tell this story, while documenting and analyzing the specific geo- 2 RELATED WORK metric algorithms used in creating the visualization. We believe that both the design process and the algorithms may be of use in other Visualizations of multiple time series date back centuries. Schol- contexts. ars have long recognized that despite their simplicity, time series The visualization method behind the Times graphic was origi- graphs involve many subtle tradeoffs. Bertin [1] and Cleveland [5] nally developed by the first author to visualize trends in personal both noted that the aspect ratio of a graph has a significant effect on music listening. Data for that visualization came from last.fm [11], a readability of slopes. Bertin also pointed out that for seeing shapes at social music service that tracks the listening histories of its members. different levels of detail, different aspect ratios might be optimal. Heer These histories, one time series per artist representing the number of and Agrawala [9] introduced the “multi-scale banking” technique for “listens” per week, were shown on last.fm only via bar charts of the automatically handling these compromises. In some systems inter- activity over the last week and overall top artist rankings. activity has been a concern, with tools such as TimeSearcher [10] Since this data was of obvious personal significance, finding a introducing elegant ways to filter many series. While not directly better way to display it was a natural challenge. One conventional applicable to stacked graphs, these efforts to handle conflicting crite- method is a stacked graph, with each layer representing an artist’s ria for different levels of detail presage much of our work. time series. For histories with a large number of artists, however, Graphs that show time series by using stacked layers date back legibility of the individual layers became a problem. Equally trouble- at least to Playfair’s work [15]. Only recently, however, have ver- some, however, was the sense that this type of graph was too “statis- sions been created that can scale to larger number of time series. tical” and did not visually embody the rich emotional connection that The inspirational ThemeRiver system of Havre et al [17] may be the listeners have with their music. first advance to exploit computation to enhance to power of stacked To solve these problems, the first author created a new form of graphs. In this system, the layers represented the frequency of occur- stacked graph, called a Streamgraph (see frontispiece). A Stream- rence of certain terms or “themes” in a historical news feed. graph layout emphasizes legibility of individual layers, arranging the Among the innovations in ThemeRiver were a novel technique layers in a distinctively organic form. Applied to last.fm data as part for creating a smooth interpolation from discrete data, and a layout of an academic project called Listening Histories, the Streamgraph method in which layers were not stacked starting on the x-axis, but design received strong popular response online by both information rather in a symmetrical shape with the x-axis at the center. visualization enthusiasts and music lovers. It eventually drew atten- In [19] the second author introduced a highly interactive layered tion by the New York Times, where it was used to create a printed graph, the NameVoyager, which enabled rapid exploration of more than 6,000 data sets at once. While the layout method of the Name- Voyager was not novel—it used a standard stacked graph layout, with some level-of-detail calculations—the popular response to the applet suggested that stacked graphs have the ability to engage mass audiences. A follow-up design to the NameVoyager, described in [20], showed hierarchical time series. That is, it used interactivity and color to display time series that were arranged into categories and subcategories. In the Many Eyes system [17], this technique was made broadly available on the web. A final related work is the Revisionist [7] visualization of changes in source code over time. While not technically a stacked graph, the geometry is related since each line of code is represented by a curved stripe. Revisionist minimizes visual distortion by having a curved baseline that allows the visualization to roughly align identical lines of code between releases. 3 LAST .FM AND THE NE W YORK TI MES 3.1 Listening History - Last.fm Listening History was created by the first author for a class project at Carnegie Mellon University. The six-week assignment was to collect and display a data set in an interesting and novel way. As described in the introduction, Listening History [4] visualizes trends fig 2 – films from the summer of 2007 in an individual’s music listening, as derived from data in the last. fm service. The x-axis represents time and each stripe represents an After Listening History was made public, there was high artist. The thickness of a stripe shows the number of times that songs demand for personalized versions of these graphics by other last.fm from the artist were listened to in a given week. The color, as detailed members. In fact this demand was so strong that a number of imita- in section 5, encodes two dimensions: the saturation is determined by tors emerged, including Maya’s Extra Stats [12] and Godwin’s Last the overall number of times an artist is listened, and the hue is related Graph [13] Interestingly, these services and other imitators use the to the earliest date at which one of the artist’s songs were heard. simpler ThemeRiver layout and a simpler color scheme. A critical design goal for this visualization was to create a graphic The popularity of these imitators (Last Graph has created visu- that did not look scientific or mathematical, but rather felt organic alizations for more than 24,000 users) suggests the hypothesis that and emotionally pleasing. In section 5 we will see that, ironically, stacked graphs have an ability to communicate large amounts of data achieving this goal relied on significant computation. A side effect to the general public in an intriguing and satisfactory way. of the algorithm is the signature asymmetry between the top and bottom curves which form the organic shape and, as discussed later, 3.2 New York Times - Box Office Revenue minimizes internal distortion. At the end of the course, a few large-scale posters, some over 12 The Box Office Revenue graph, created by the first author and the feet long, were printed as holiday gifts. The reaction of the recipients graphics department of the Times [2,6] highlighted the dichotomy provides evidence, if anecdotal, that the graphic succeeded in elicit- between box office hits and Oscar nominations, discussed in the orig- ing strong emotional reactions when people saw their own listening inal article. The printed graphic ran vertically to best use the avail- history. People often remarked at the ability to see critical life events able space, time running top to bottom; the online version ran left reflected in their music listening habits. to right. To allow a quick reading of the graph, coloring was much One pointed to the beginning and end of three separate relation- simpler than in Listening History: a discrete palette signified ranges ships, and how his listening trends changed dramatically. Another of overall revenue. Furthermore, stroke lines were added because of noted the moment when her dog had died, and the resulting impact issues with print registration. on the next month of listening.