why does this suck? Information

Jeffrey Heer UC Berkeley | PARC, Inc. CS160 – 2004.11.22

(includes numerous slides from Marti Hearst, Ed Chi, Stuart Card, and Peter Pirolli)

Basic Problem Scientific Journals

Journals/person increases 10X every 50 years

1000000 100000 Journals We live in a 10000 1000 new ecology. Journals/People x106 100 10 1 0.1 0.01 Darwin V. Bush You 1750 1800 1850 1900 1950 2000 Year

Web Ecologies Human Capacity

10000000 1000000

1000000 100000

100000 10000

10000 1000 1000 1 new server every 2 seconds 100

Servers 7.5 new pages per second 100 10 10 1 1 Aug-92 Feb-93 Aug-93 Feb-94 Aug-94 Feb-95 Aug-95 Feb-96 Aug-96 Feb-97 Aug-97 Feb-98 Aug-98 0.1 0.01 Darwin V. Bush You Source: World Wide Web Consortium, Mark Gray, Netcraft Server Survey 1750 1800 1850 1900 1950 2000

1 Attentional Processes Human-Information Interaction

“What information consumes is rather obvious: it z The real design problem is not increased consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and access to information, but greater efficiency a need to allocate that attention efficiently among the in finding useful information. overabundance of information sources that might z Increasing the rate at which people can find consume it.” and use relevant information improves ~Herb Simon human intelligence. as quoted by Hal Varian Scientific American Amount of Accessible September 1995 Knowledge

Cost [Time]

Information Visualization Information Visualization z Leverage highly-developed human visual z “Transformation of the symbolic into the geometric” system to achieve rapid understanding of (McCormick et al., 1987) abstract information. z “... finding the artificial memory that best supports our natural means of perception.'‘ (Bertin, 1983)

1.2 b/s (Reading) z The depiction of information using spatial or 2.3 b/s (Pictures) graphical representations, to facilitate comparison, pattern recognition, change detection, and other cognitive skills by making use of the visual system. (Hearst, 2003)

Why Visualization? z Use the eye for pattern recognition; people good at Visualization z scanning Success Stories z recognizing z remembering images z Graphical elements facilitate comparisons via z length z shape z orientation z texture z Animation shows changes across time z Color helps make distinctions z Aesthetics make the process appealing

2 Visualization Success Story

Illustration of John Snow’s Visualization Success Story deduction that a cholera epidemic was caused by a bad water pump, circa 1854.

Horizontal lines indicate Mystery: what is causing a cholera location of deaths. epidemic in London in 1854?

From Visual Explanations by , Graphics Press, 1997

Visualization Success Story of John Snow’s deduction that a cholera epidemic was A Visualization caused by a bad water pump, circa 1854. Expedition Horizontal lines indicate location of deaths. (a tour through past and present)

From Visual Explanations by Edward Tufte, Graphics Press, 1997

Perspective Wall Starfield Displays

Slide adapted from Chris North 18

3 Film Finder Lens

Distortion Techniques Indented Hierarchy Layout

Places all items along vertically spaced rows Uses indentation to show parent child relationships Breadth and depth end up fighting for space resources

Reingold-Tilford Layout TreeMaps

Space-filling technique that divides space recursively Top-down layout Segments space Uses separate according to ‘size’ of dimensions for children nodes breadth and depth of the market – smartmoney.com

tidier drawing of trees - reingold, tilford

4 SpaceTree Cone Trees

Tree layout in three dimensions Shadows provide 2D structure Can also make “Balloon Trees” – 2D version of ConeTree cone tree – robertson, mackinlay, and card

Degree-of-Interest Trees Hyperbolic Trees

Network visualization Network Visualization

Often uses physics models (e.g., edges as springs) to perform layout. Can be animated and interacted with.

Skitter, www.caida.org

5 WebBook Web Forager

Document Lens Data Mountain

Supports document organization in a 2.5 dimensional environment.

Graphical Excellence [Tufte] Designing z the well-designed presentation of interesting Visualizations data – a matter of substance, of statistics, and of design (some tricks of the trade) z consists of complex ideas communicated with clarity, precision and efficiency z is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space z requires telling the truth about the data.

6 Interactive Tasks [Shneiderman] Proposed Data Types

1. Overview: Get an overview of the collection 1. 1D: timelines,… 2. Zoom: Zoom in on items of interest 2. 2D: ,… 3. Filter: Remove uninteresting items 3. 3D: volumes,… 4. Details on demand: Select items and get 4. Multi-dimensional: databases,… details 5. Hierarchies/Trees: directories,… 5. Relate: View relationships between items 6. 6. History: Keep a history of actions for undo, Networks/Graphs: web,… replay, refinement 7. Document collections: digital libraries,… 7. Extract: Make subcollections This is useful, but what’s wrong here?

Ranking of Applicability of Properties for Different Data Types Basic Types of Data (Mackinlay 88, Not Empirically Verified) z Nominal (qualitative) z (no inherent order) QUANT ORDINAL NOMINAL z city names, types of diseases, ... Position Position Position z Ordinal (qualitative) Length Density Color Hue z (ordered, but not at measurable intervals) Angle Color Saturation Texture Slope Color Hue Connection z first, second, third, … Area Texture Containment z cold, warm, hot Volume Connection Density z Mon, Tue, Wed, Thu … Density Containment Color Saturation Color Saturation Length Shape z Interval (quantitative) Color Hue Angle Length z integers or reals

Visualization Design Patterns Visualization Design Patterns z Pre-Attentive Patterns z Focus+Context z Leverage things that automatically “pop-out” to human attention z Highlight regions of current interest, while de-emphasizing but z Stark contrast in color, shape, size, orientation keeping visible surrounding context. z Gestalt Properties z Can visually distort space, or use degree-of-interest function to z Use psychological theories of visual grouping control what is and isn’t visualized. z proximity, similarity, continuity, connectedness, closure, z Dynamic Queries symmetry, common fate, figure/ground separation z Allow rapid refinement of visualization criteria z High Data Density z Range sliders, Query sliders z Maximize number of items/area of graphic z z This is controversial! Whitespace may contribute to good visual Panning and Zooming design… so balance appropriately. z Navigate large spaces using a camera metaphor z Small Multiples z Semantic Zooming z Show varying visualizations/patterns adjacent to one another z Change content presentation based on zooming level z Enable Comparisons z Hide/reveal additional data in accordance with available space

7 Software Architectures Evaluating z The Information Visualization Reference Model [Chi, Card, Mackinlay, Shneiderman] Visualizations

Evaluating Visualizations Evaluating Hyperbolic Trees z Visualizations are user interfaces, too…established z The Great CHI’97 Browse-Off: Individual methodologies can be used. browsers race against the clock to perform z Questions to ask various retrieval and comparison tasks. z What tasks do you expect people to perform with the visualization? z Hyperbolic Tree won against M$ File Explorer z What int