Readings Covered Further Readings Evaluation, Carpendale

Evaluating Information Visualizations. Sheelagh Carpendale. Chapter in Task-Centered , Clayton Lewis and John Rieman, thorough survey/discussion, won’t summarize here Information : Human-Centered Issues and Perspectives, Chapters 0-5. Springer LNCS 4950, 2008, p 19-45. The challenge of information visualization evaluation. . Lecture 14: Evaluation The Perceptual Scalability of Visualization. Beth Yost and Chris North. Proc. Advanced Visual Interfaces (AVI) 2004 Proc. InfoVis 06, published as IEEE TVCG 12(5), Sep 2006, p 837-844. Information Visualization Effectiveness of Animation in Trend Visualization. George G. Robertson, CPSC 533C, Fall 2011 Turning Pictures into Numbers: Extracting and Generating Information Roland Fernandez, Danyel Fisher, Bongshin Lee, and John T. Stasko. from Complex Visualizations. J. Gregory Trafton, Susan S. IEEE TVCG (Proc. InfoVis 2008). 14(6): 1325-1332 (2008) Kirschenbaum, Ted L. Tsui, Robert T. Miyamoto, James A. Ballas, and Artery Visualizations for Heart Disease Diagnosis. Michelle A. Borkin, Paula D. Raymond. Intl Journ. Human Computer Studies 53(5), Krzysztof Z. Gajos, Amanda Peters, Dimitrios Mitsouras, Simone 827-850. Melchionna, Frank J. Rybicki, Charles L. Feldman, and Hanspeter Pfister. UBC Computer Science IEEE TVCG (Proc. InfoVis 2011), 17(12):2479-2488. Wed, 2 November 2011

1 / 46 2 / 46 3 / 46 4 / 46 Psychophysics Cognitive Psychology Structural Analysis Comparative User Studies

method of limits repeating simple, but important tasks, and measure requirement analysis, task analysis hypothesis testing find limitations of human perceptions reaction time or error structured interviews error detection methods Miller’s 7+/- 2 short-term memory experiments can be used almost anywhere, for open-ended questions hypothesis: a precise problem statement find threshold of performance degradation Fitts’ Law (target selection) and answers ex: Participants will be faster with a coordinated Hick’s Law (decision making given n choices) staircase procedure to find threshold faster rating/Likert scales overview+detail display than with an uncoordinated display or a detail-only display with the task requires method of adjustment interference between channels commonly used to solicit subjective feedback reading details find optimal level of stimuli by letting subjects control multi-modal studies ex: NASA-TLX (Task Load Index) to assess mental MacLean 2005, Perceiving Ordinal Data Haptically workload measurement: faster the level objects of comparison: Under Workload “it is frustrating to use the interface” coordinated O+D display using haptic feedback for interruption when the Strongly Disagree Disagree Neutral Agree Strongly Agree | | | | uncoordinated O display participants were visually (and cognitively) busy uncoordinated D display condition of comparison: task requires reading details

5 / 46 6 / 46 7 / 46 8 / 46 Comparative User Studies Comparative User Studies Comparative User Studies Comparative User Studies

study design: factors and levels study design: within, or between? measurements (dependent variables) result analysis within performance indicators: task completion time, error should know how to analyze the main results/hypotheses factors everybody does all the conditions rates, mouse movement BEFORE study independent variables can lead to ordering effects subjective participant feedback: satisfaction ratings, hypothesis testing analysis (using ANOVA or t-tests) ex: interface, task, participant demographics can account for individual differences and reduce noise closed-ended questions, interview tests how likely observed differences between groups are levels thus can be more powerful and require fewer participants observations: behaviors, signs of frustration due to chance alone number of variables in each factor combinatorial explosion number of participants ex: a p-value of 0.05 means there is a 5% probability the limited by length of study and number of participants severe limits on number of conditions depends on effect size and study design: power of difference occurred by chance possible workaround is multiple sessions experiment usually good enough for HCI studies between possible confounds? pilots! divide participants into groups learning effect: did everybody use interfaces in a certain should have good idea of forthcoming results of the each group does only some conditions order? study BEFORE running actual study trials if so, are people faster because they are more practiced, or because of true interface effect?

9 / 46 10 / 46 11 / 46 12 / 46 Evaluation Throughout Design Cycle Initial Assessments Iterative Design Process Benchmarking

user/task centered design cycle what kind of problems are the system aiming to address? does your design address the users’ needs? how does your system compare to existing ones? initial assessments analyze a large and complex dataset can they use it? empirical, comparative studies iterative design process who are your target users? where are the usability problems? ask specific questions benchmarking data analysts compare an aspect of the system with specific tasks deployment what are the tasks? what are the goals? evaluate without users Amar/Stasko task taxonomy paper quantitative, but limited find trends and patterns in the data via exploratory cognitive walkthrough identify problems, go back to previous step The Challenge of Information Visualization Evaluation, analysis action analysis Catherine Plaisant, Proc. AVI 2004 Task-Centered User Interface Design, Clayton Lewis and John Rieman, Chapters 0-5. what are their current practices heuristics analysis statistical analysis evaluate with users why and how can visualization be useful? usability evaluations (think-aloud) visual spotting of trends and patterns bottom-line measurements

talk to the users, and observe what they do task analysis

13 / 46 14 / 46 15 / 46 16 / 46 first study of macro/micro effects breaking new ground

many possible followups physical navigation vs. virtual navigation The Effects of Peripheral Vision and Physical Navigation in Large Scale Visualization. GI 08 Move to Improve: Promoting Physical Navigation to Increase user Performance with Large Displays. CHI 07

Deployment Compare Systems vs. Characterize Usage Perceptual Scalability Perceptual Scalability

how is the system used in the wild? user/task centered design cycle: what are perceptual/cognitive limits when screen-space design how are people using it? initial assessments constraints lifted? 2 display sizes, between-subjects iterative design process 2 vs. 32 Mpixel display (data size also increased proportionally) does the system fit into existing work flow? environment? benchmarking: head-to-head comparison macro/micro views 3 visualization designs, within deployment perceptually scalable small multiples: bars embedded graphs (identify problems, go back to previous step) no increase in task completion times when normalize to contextual studies, field studies embedded bars understanding/characterizing techniques amount of data 7 tasks, within tease apart factors 42 tasks per participant when and how is technique appropriate 3 vis x 7 tasks x 2 trials line is blurry: intent

[The Perceptual Scalability of Visualization. Beth Yost and Chris North. IEEE TVCG 12(5) (Proc. InfoVis 06), Sep 2006, p 837-844.] 17 / 46 18 / 46 19 / 46 20 / 46 Embedded Visualizations Small Multiples Visualizations Results Results

attribute-centric instead of space-centric 20x increase in data, but only 3x increase in absolute task significant 3-way interaction times between display, size, task

[The Perceptual Scalability of Visualization. Beth Yost and Chris North. IEEE TVCG 12(5) (Proc. InfoVis 06), Sep 2006, p 837-844.] [The Perceptual Scalability of Visualization. Beth Yost and Chris North. IEEE TVCG 12(5) (Proc. InfoVis 06), Sep 2006, p 837-844.] [The Perceptual Scalability of Visualization. Beth Yost and Chris North. IEEE TVCG 12(5) (Proc. InfoVis 06), Sep 2006, p 837-844.] [The Perceptual Scalability of Visualization. Beth Yost and Chris North. IEEE TVCG 12(5) (Proc. InfoVis 06), Sep 2006, p 837-844.]

21 / 46 22 / 46 23 / 46 24 / 46 Results Critique Critique Trends: Animation, Trails, SmallMult

visual encoding important on small displays first study of macro/micro effects Gapminder: animated bubble + human DS: mults sig slower than graphs on small breaking new ground x/y position, size, color, animation DS: mults sig slower than embedded on large is animation effective? OS: bars sig faster than graphs for small presentation vs analysis many possible followups trend vs transitions OS: no sig difference bars/graphs for large physical navigation vs. virtual navigation The Effects of Peripheral Vision and Physical Navigation spatial grouping important on large displays in Large Scale Visualization. GI 08 embedded sig faster+preferred over small mult Move to Improve: Promoting Physical Navigation to Increase user Performance with Large Displays. CHI 07 no bar/graph differences

[Effectiveness of Animation in Trend Visualization. Robertson et al. IEEE TVCG (Proc. InfoVis 2008). 14(6): 1325-1332 (2008)] 25 / 46 26 / 46 27 / 46 28 / 46 Trends Small Multiples Design Results

many countertrends lost in clutter individual plots get small 2 use: presentation vs. analysis (between-subjects) small multiples more accurate than animation 3 vis encodings: animation vs. traces vs. small mults animation faster for presentation, slower for analysis 2 dataset size: small vs. large than small multiples and trends 3 encoding x 2 size: within-subjects dataset size matters (unsurprisingly) 24 tasks per participant 4 tasks x 3 encodings x 2 sizes

[Effectiveness of Animation in Trend Visualization. Robertson et al. IEEE TVCG [Effectiveness of Animation in Trend Visualization. Robertson et al. IEEE TVCG (Proc. InfoVis 2008). 14(6): 1325-1332 (2008)] (Proc. InfoVis 2008). 14(6): 1325-1332 (2008)]

29 / 46 30 / 46 31 / 46 32 / 46 nice idea to investigate the gapminder phenomenon! well done study

video coding is huge amount of work, but very illuminating untangling complex story of real tool use methodology of CTA construction not discussed here often bottomup/topdown mix

quantitative user study advanced Harvard Med School students real patient data 91% with 2D/diverging vs. 39% with 3D/rainbow medical experts now convinced to use new system

[Fig 1. Borkin et al. Artery Visualizations for Heart Disease Diagnosis. Proc InfoVis 2011.]

Critique Critique Pictures Into Numbers Cognitive Task Analysis

nice idea to investigate the gapminder phenomenon! field study initialize understanding of large scale weather well done study participants: professional meterologists build qualitative mental model (QMM) two people: forecaster, technician verify and adjust QMM interfaces: multiple programs used write the brief protocol talkaloud task breakdown part of paper contribution videotaped sessions with 3 cameras [Turning Pictures into Numbers: Extracting and Generating Information from Complex Visualizations. Trafton et al. Intl J. Human Computer Studies 53(5), 827-850.]

33 / 46 34 / 46 35 / 46 36 / 46 Coding Methodology Results Critique Critique

interface sig difference between vis used at CTA stages video coding is huge amount of work, but very which interface used charts to build QMM illuminating whether picture//graph images to verify/adjust QMM untangling complex story of real tool use usage (every utterance!) all kinds during brief-writing methodology of CTA construction not discussed here goal many others... often bottomup/topdown mix extract quant/qual goal-oriented/opportunistic integrated/unintegrated brief-writing quant/qual QMM/vis/notes

[Turning Pictures into Numbers: Extracting and Generating Information from Complex Visualizations. Trafton et al. Intl J. Human Computer Studies 53(5), 827-850.]

37 / 46 38 / 46 39 / 46 40 / 46 User Study Goals HemoVis: Round 1 HemoVis: Round 2 HemoVis: Round 2

compare systems formative qualitative study with experts experts balk: demand familiar 3D and rainbows experts balk: demand familiar 3D and rainbows (Untangling the Usability of Fisheye Menus, Hornbaek task taxonomy led to design of HemoVis quantitative user study 2007) advanced Harvard Med School students characterize methods real patient data Sizing the Horizon, Heer 2009 91% with 2D/diverging vs. 39% with 3D/rainbow formative feedback medical experts now convinced to use new system Hotel Visits, Weaver 2007 Genealogical Graphs, McGuffin 2005 summative judgement LiveRAC, McLachlan 2008 Genealogical Graphs, McGuffin 2005 , Heer 2005 convince stakeholders (InfoVis Eval in Large Companies, Sedlmair 2011) [Fig 1. Borkin et al. Artery Visualizations for Heart Disease Diagnosis. (Evaluation of Artery Visualizations for Heart Disease Proc InfoVis 2011.] [Fig 1. Borkin et al. Artery Visualizations for Heart Disease Diagnosis. Diagnosis, Borkin 2011) 41 / 46 42 / 46 43 / 46 Proc InfoVis 2011.] 44 / 46 Credits Readings For Next Time

significant influence from Heidi Lam guest lecture Process and Pitfalls in Writing Information Visualization Research Papers. Tamara Munzner. Chapter from Information Visualization: http://www.cs.ubc.ca/ tmm/courses/cpsc533c-06-fall/#lect10 Human-Centered Issues and Perspectives. Andreas Kerren, John T. ∼ Stasko, Jean-Daniel Fekete, Chris North, eds. Springer LNCS Volume 4950, p 134-153, 2008. Reproducible Research in Signal Processing - What, why, and how. Patrick Vandewalle, Jelena Kovacevic and Martin Vetterli. IEEE Signal Processing Magazine, 26(3):37-47, May 2009.

45 / 46 46 / 46