why does this suck? Information

Jeffrey Heer UC Berkeley | PARC, Inc. CS160 – 2004.11.22

(includes numerous slides from Marti Hearst, Ed Chi, Stuart Card, and Peter Pirolli) Basic Problem

We live in a new ecology. Scientific Journals

Journals/personJournals/person increasesincreases 10X10X everyevery 5050 yearsyears

1000000 100000 10000 Journals 1000 Journals/People x106 100 10 1 0.1 Darwin V. Bush You 0.01 Darwin V. Bush You 1750 1800 1850 1900 1950 2000 Year Web Ecologies

10000000

1000000

100000

10000

1000 1 new server every 2 seconds

Servers 100 7.5 new pages per second

10

1 Aug-92 Feb-93 Aug-93 Feb-94 Aug-94 Feb-95 Aug-95 Feb-96 Aug-96 Feb-97 Aug-97 Feb-98 Aug-98

Source: World Wide Web Consortium, Mark Gray, Netcraft Server Survey Human Capacity

1000000 100000 10000 1000 100 10 1 0.1 Darwin V. Bush You 0.01 Darwin V. Bush You 1750 1800 1850 1900 1950 2000 Attentional Processes

“What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” ~Herb Simon as quoted by Hal Varian Scientific American September 1995 Human-Information Interaction z The real design problem is not increased access to information, but greater efficiency in finding useful information. z Increasing the rate at which people can find and use relevant information improves human intelligence. Amount of Accessible Knowledge

Cost [Time] Information Visualization z Leverage highly-developed human visual system to achieve rapid understanding of abstract information.

1.2 b/s (Reading) 2.3 b/s (Pictures) Information Visualization z “Transformation of the symbolic into the geometric” (McCormick et al., 1987) z “... finding the artificial memory that best supports our natural means of perception.'‘ (Bertin, 1983) z The depiction of information using spatial or graphical representations, to facilitate comparison, pattern recognition, change detection, and other cognitive skills by making use of the visual system. (Hearst, 2003) Why Visualization? z Use the eye for pattern recognition; people good at z scanning z recognizing z remembering images z Graphical elements facilitate comparisons via z length z shape z orientation z texture z Animation shows changes across time z Color helps make distinctions z Aesthetics make the process appealing Visualization Success Stories Visualization Success Story

Mystery: what is causing a cholera epidemic in London in 1854? Visualization Success Story

Illustration of John Snow’s deduction that a cholera epidemic was caused by a bad water pump, circa 1854.

Horizontal lines indicate location of deaths.

From Visual Explanations by , Graphics Press, 1997 Visualization Success Story of John Snow’s deduction that a cholera epidemic was caused by a bad water pump, circa 1854.

Horizontal lines indicate location of deaths.

From Visual Explanations by Edward Tufte, Graphics Press, 1997 A Visualization Expedition

(a tour through past and present) Wall Starfield Displays

Slide adapted from Chris North 18 Film Finder Lens Distortion Techniques Indented Hierarchy Layout

Places all items along vertically spaced rows Uses indentation to show parent child relationships Breadth and depth end up fighting for space resources Reingold-Tilford Layout

Top-down layout Uses separate dimensions for breadth and depth

tidier drawing of trees - reingold, tilford TreeMaps

Space-filling technique that divides space recursively Segments space according to ‘size’ of children nodes of the market – smartmoney.com SpaceTree Cone Trees

Tree layout in three dimensions Shadows provide 2D structure Can also make “Balloon Trees” – 2D version of ConeTree cone tree – robertson, mackinlay, and card Degree-of-Interest Trees Hyperbolic Trees Network visualization

Often uses physics models (e.g., edges as springs) to perform layout. Can be animated and interacted with. Network Visualization

Skitter, www.caida.org WebBook Web Forager Document Lens Data Mountain

Supports document organization in a 2.5 dimensional environment. Designing Visualizations

(some tricks of the trade) Graphical Excellence [Tufte] z the well-designed presentation of interesting data – a matter of substance, of statistics, and of design z consists of complex ideas communicated with clarity, precision and efficiency z is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space z requires telling the truth about the data. Interactive Tasks [Shneiderman]

1. Overview: Get an overview of the collection 2. Zoom: Zoom in on items of interest 3. Filter: Remove uninteresting items 4. Details on demand: Select items and get details 5. Relate: View relationships between items 6. History: Keep a history of actions for undo, replay, refinement 7. Extract: Make subcollections Proposed Data Types

1. 1D: timelines,… 2. 2D: ,… 3. 3D: volumes,… 4. Multi-dimensional: databases,… 5. Hierarchies/Trees: directories,… 6. Networks/Graphs: web,… 7. Document collections: digital libraries,…

This is useful, but what’s wrong here? Basic Types of Data z Nominal (qualitative) z (no inherent order) z city names, types of diseases, ... z Ordinal (qualitative) z (ordered, but not at measurable intervals) z first, second, third, … z cold, warm, hot z Mon, Tue, Wed, Thu … z Interval (quantitative) z integers or reals Ranking of Applicability of Properties for Different Data Types (Mackinlay 88, Not Empirically Verified)

QUANT ORDINAL NOMINAL

Position Position Position Length Density Color Hue Angle Color Saturation Texture Slope Color Hue Connection Area Texture Containment Volume Connection Density Density Containment Color Saturation Color Saturation Length Shape Color Hue Angle Length Visualization Design Patterns z Pre-Attentive Patterns z Leverage things that automatically “pop-out” to human attention z Stark contrast in color, shape, size, orientation z Gestalt Properties z Use psychological theories of visual grouping z proximity, similarity, continuity, connectedness, closure, symmetry, common fate, figure/ground separation z High Data Density z Maximize number of items/area of graphic z This is controversial! Whitespace may contribute to good visual design… so balance appropriately. z Small Multiples z Show varying visualizations/patterns adjacent to one another z Enable Comparisons Visualization Design Patterns z Focus+Context z Highlight regions of current interest, while de-emphasizing but keeping visible surrounding context. z Can visually distort space, or use degree-of-interest function to control what is and isn’t visualized. z Dynamic Queries z Allow rapid refinement of visualization criteria z Range sliders, Query sliders z Panning and Zooming z Navigate large spaces using a camera metaphor z Semantic Zooming z Change content presentation based on zooming level z Hide/reveal additional data in accordance with available space Software Architectures z The Information Visualization Reference Model [Chi, Card, Mackinlay, Shneiderman] Evaluating Visualizations Evaluating Visualizations z Visualizations are user interfaces, too…established methodologies can be used. z Questions to ask z What tasks do you expect people to perform with the visualization? z What interfaces currently exist for this task? z In what ways do you expect different visualizations to help or hurt aspects of these tasks? z Metrics: task time, success rate, information gained (e.g., test the user, or exploit priming effects), eye tracking. Evaluating Hyperbolic Trees z The Great CHI’97 Browse-Off: Individual browsers race against the clock to perform various retrieval and comparison tasks. z Hyperbolic Tree won against M$ File Explorer and others. z Can we conclude that it is the better browser?

vs. Evaluating Hyperbolic Trees z No! z Different people operating each browser. z Tasks were not ecologically valid. z Can’t say what is better for what. z PARC researchers did extensive eye-tracking studies uncovering very nuanced visual psychology. z Found Hyperbolic Tree is better when underlying information design (e.g., tree structure and labeling) is better. z In case of CHI Browse Off, the Hyperbolic Tree had a quicker human user “behind the wheel”. z Moral: Exercise judicious study design, but also don’t feel let down if task times are not being radically improved… subtleties abound. Questions?

Jeffrey Heer [email protected] http://prefuse.sourceforge.net Accuracy Ranking of Quantitative Perceptual Tasks Estimated; only pairwise comparisons have been validated (Mackinlay 88 from Cleveland & McGill) Interpretations of Visual Properties

Some properties can be discriminated more accurately but don’t have intrinsic meaning

z Density (Greyscale) Darker -> More z Size / Length / Area Larger -> More z Position Leftmost -> first, Topmost -> first z Hue ??? no intrinsic meaning z Slope ??? no intrinsic meaning Micro-Aspects of Visualization Design

(aka fun with visual psychology) Preattentive Processing z A limited set of visual properties are processed preattentively z (without need for focusing attention). z This is important for design of visualizations z what can be perceived immediately z what properties are good discriminators z what can mislead viewers

All Preattentive Processing figures from Healey 97 http://www.csc.ncsu.edu/faculty/healey/PP/PP.html Example: Color Selection

Viewer can rapidly and accurately determine whether the target (red circle) is present or absent. Difference detected in color. Example: Shape Selection

Viewer can rapidly and accurately determine whether the target (red circle) is present or absent. Difference detected in form (curvature) Pre-attentive Processing z < 200 - 250ms qualifies as pre-attentive z eye movements take at least 200ms z yet certain processing can be done very quickly, implying low-level processing in parallel z If a decision takes a fixed amount of time regardless of the number of distractors, it is considered to be preattentive. Example: Conjunction of Features

Viewer cannot rapidly and accurately determine whether the target (red circle) is present or absent when target has two or more features, each of which are present in the distractors. Viewer must search sequentially.

All Preattentive Processing figures from Healey 97 http://www.csc.ncsu.edu/faculty/healey/PP/PP.html Example: Emergent Features

Target has a unique feature with respect to distractors (open sides) and so the group can be detected preattentively. Example: Emergent Features

Target does not have a unique feature with respect to distractors and so the group cannot be detected preattentively. Asymmetric and Graded Preattentive Properties

z Some properties are asymmetric z a sloped line among vertical lines is preattentive z a vertical line among sloped ones is not z Some properties have a gradation z some more easily discriminated among than others Use Grouping of Well-Chosen Shapes for Displaying Multivariate Data SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC TextText NOTNOT PreattentivePreattentive

SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC Preattentive Visual Properties (Healey 97) length Triesman & Gormican [1988] width Julesz [1985] size Triesman & Gelade [1980] curvature Triesman & Gormican [1988] number Julesz [1985]; Trick & Pylyshyn [1994] terminators Julesz & Bergen [1983] intersection Julesz & Bergen [1983] closure Enns [1986]; Triesman & Souther [1985] colour (hue) Nagy & Sanchez [1990, 1992]; D'Zmura [1991] Kawai et al. [1995]; Bauer et al. [1996] intensity Beck et al. [1983]; Triesman & Gormican [1988] flicker Julesz [1971] direction of motion Nakayama & Silverman [1986]; Driver & McLeod [1992] binocular lustre Wolfe & Franzel [1988] stereoscopic depth Nakayama & Silverman [1986] 3-D depth cues Enns [1990] lighting direction Enns [1990] Gestalt Principles

z Idea: forms or patterns transcend the stimuli used to create them. z Why do patterns emerge? z Under what circumstances?

z Principles of Pattern Recognition z “gestalt” German for “pattern” or “form, configuration” z Original proposed mechanisms turned out to be wrong R les themsel es are still sef l Gestalt Properties Proximity

Why perceive pairs vs. triplets? Gestalt Properties Similarity

Slide adapted from Gestalt Properties Continuity

Slide adapted from Tamara Munzner Gestalt Properties Connectedness

Slide adapted from Tamara Munzner Gestalt Properties Closure

Slide adapted from Tamara Munzner Gestalt Properties Symmetry

Slide adapted from Tamara Munzner Gestalt Laws of Perceptual Organization (Kaufman 74)

z Figure and Ground z Escher are good examples z Vase/Face contrast z Subjective Contour More Gestalt Laws z Law of Common Fate z like preattentive motion property z move a subset of objects among similar ones and they will be perceived as a group Colors for Labeling z Ware recommends to take into account: z Distinctness z Unique hues z Component process model z Contrast with background z Color blindness z Number z Only a small number of codes can be rapidly perceived z Field Size z Small changes in color are difficult to perceive z Conventions Ware’s Recommended Colors for Labeling

Red, Green, Yellow, Blue, Black, White, Pink, Cyan, Gray, Orange, Brown, Purple. The top six colors are chosen because they are the unique colors that mark the ends of the opponent color axes. The entire set corresponds to the eleven color names found to be the most common in a cross-cultural study, plus cyan (Berlin and Kay)