Quick viewing(Text Mode)

Direct Pen Input and Hand Occlusion

Direct Pen Input and Hand Occlusion

Direct Pen Input and Hand Occlusion

by

Daniel Vogel

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Computer Science University of Toronto

© Copyright 2010 Daniel Vogel

Direct Pen Input and Hand Occlusion

Daniel Vogel Doctor of Philosophy Department of Computer Science, University of Toronto 2010 Abstract

We investigate, model, and design interaction techniques for hand occlusion with direct pen input. Our focus on occlusion follows from a qualitative and quantitative study of direct pen usability with a conventional graphical user interface (GUI). This study reveals overarching problems relating to poor precision, ergonomics, cognitive differences, limited input, and problems resulting from occlusion. To investigate occlusion more closely, we conduct three formal experiments to examine its area and shape, its affect on performance, and compensatory postures. We find that the shape of the occluded area varies across participants with some common characteristics. Our results provide evidence that occlusion affects target selection performance: especially for continuous tasks or when the goal is initially hidden. We observe how users contort their wrist posture during a simultaneous monitoring task, and show this can increase task time. Based on these investigations, we develop a five parameter geometric model to represent the shape of the occluded area and extend this to a user configurable, real-time version. To evaluate our model, we introduce a novel analytic testing methodology using optimization for geometric fitting and precision- recall statistics for comparison; as well as conducting a user study. To address problems with occlusion, we introduce the notion of occlusion-aware interfaces: techniques which can use

ii

our configurable model to track currently occluded regions and then counteract potential

problems and/or utilize the occluded area. As a case study, we present the Occlusion-Aware

Viewer: an interaction technique which displays otherwise missed previews and status messages in a non-occluded area. Within this thesis we also present a number of methodology contributions for quantitative and qualitative study design, multi-faceted study logging using synchronized video, qualitative analysis, image-based analysis, task visualization, optimization-based analytical testing, and user interface image processing.

iii

Acknowledgements

I feel incredibly lucky to have Ravin Balakrishnan as an advisor, whose guidance and encouragement made the completion of this dissertation possible. I was also fortunate to have an outstanding and well-rounded committee: Ron Baecker, Khai Truong, Karan Singh, and my external examiner, Brad Myers. There are many students, faculty members, and administrative staff at the University of Toronto who contributed directly and indirectly. John

Hancock deserves specific mention for technical assistance; and Géry Casiez, whom I met when he was a Postdoctoral Researcher at the University of Toronto, developed an initial version of the real-time occlusion model.

There are several individuals at Mount Allison University who made it much easier for me to complete this work after I moved to Sackville, New Brunswick: Liam Keliher, Anna

Sheridan-Jonah, Jeff Ollerhead, Laurie Ricker, and Ron Beattie in particular. Matthew

Cudmore’s assistance with programming and facilitation of experiments, in addition to video analysis, was especially valuable.

Of course, I owe a debt of gratitude to Jennifer, family, and friends who encouraged and supported me throughout.

iv

Copyright Notices and Disclaimers

Sections of this document have appeared in publications or are forthcoming (at the time of writing). In all cases, permission has been granted by the publisher for these works to appear here. Below, the publisher’s copyright notice and/or disclaimer is given, with thesis chapter(s) and corresponding publication(s) noted.

Taylor and Francis Copyright © 2010 Taylor & Francis. Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf. This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

portions of chapters 2 and 3

Vogel, D., and Balakrishnan, R. (forthcoming). Direct Pen Interaction with a Conventional Graphical User Interface. -Computer Interaction. Taylor and Francis.

Association for Computing Machinery

Copyright © 2009, 2010 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Send written requests for republication to ACM Publications, Copyright & Permissions at the address above or fax +1 (212) 869-0481 or email [email protected]. Copyright © 2009, 2010 ACM Inc. Included here by permission.

v

portions of chapters 4 and 5

Vogel, D., Cudmore, M., Casiez, G., Balakrishnan, R., and Keliher, L. (2009). Hand occlusion with tablet-sized direct pen input. In Proceedings of the 27th international Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 557-566. portions of chapters 5 and 6

Vogel, D., and, Balakrishnan, R. Occlusion-Aware Interfaces. (2010). In Proceedings of the 28th international Conference on Human Factors in Computing Systems (Atlanta, GA, USA, April 10 - 15, 2010). CHI '10. ACM, New York, NY, 263-272.

vi

Table of Contents

1 Introduction ...... 1 1.1 Research Objectives and Overview ...... 5 1.2 Contributions...... 8 1.3 Dissertation Outline ...... 12

2 Background Literature ...... 13 2.1 The Hand and the Pen ...... 14 2.2 The Pen as a Computer Input Device ...... 27 2.3 Pen Input Performance and Capabilities ...... 34 2.4 Pen Interaction Paradigms ...... 46 2.5 Summary ...... 50

3 Observational Study of Pen Input ...... 51 3.1 Related Work ...... 52 3.2 Study ...... 55 3.3 Analysis...... 66 3.4 Results ...... 75 3.5 Interactions of Interest ...... 95 3.6 Discussion ...... 109 3.7 Summary ...... 116

4 Investigating Occlusion ...... 119 4.1 Related Work ...... 120 4.2 Experiment 4-1: Area and Shape ...... 126 4.3 Experiment 4-2: Performance ...... 144 4.4 Experiment 4-3: Influence on Hand and Arm Posture...... 172 4.5 Design Implications ...... 183 4.6 Summary ...... 184

5 Modelling Occlusion ...... 187 5.1 Related Work ...... 189 5.2 Geometric Model for Occlusion Shape ...... 191 5.3 Space of Fitted Parameters and Mean Model ...... 199 5.4 User Configurable Model ...... 203 5.5 Experiment 5-1: Occlusion Model Evaluation ...... 209 5.6 Future Directions ...... 215 5.7 Summary ...... 217

6 Occlusion-Aware Interfaces ...... 221

vii

6.1 Related Work ...... 222 6.2 Occlusion-Aware Interfaces ...... 226 6.3 Occlusion-Aware Viewer...... 227 6.4 Experiment 6-1: Occlusion-Aware Viewer Evaluation ...... 233 6.5 Deployment Issues and Future Directions ...... 245 6.6 Other Occlusion-Aware Techniques ...... 246 6.7 Summary ...... 249

7 Conclusions ...... 251 7.1 Summary ...... 252 7.2 Assumptions and Limitations ...... 254 7.3 Future Research ...... 256 7.4 Final Word ...... 264

viii

List of Tables

Table 2-1. Anthropomorphic measurements for hand and arm...... 18

Table 2-2. Comparisons between pen input with ...... 35

Table 2-3. Comparisons between direct pen input and indirect stylus input...... 39

Table 3-1. Ideal amount of widget and action usage in our study...... 65

Table 3-2. Ideal number of expected interactions by interaction type...... 66

Table 3-3. Wrong click errors...... 80

Table 3-4. Unintended action errors...... 81

Table 3-5. Repeated invocation, hesitation, and inefficient operation errors...... 83

Table 4-1. Linear regression values for Time from Index of Difficulty (ID)...... 171

Table 5-1. Overview of model tests...... 194

Table 5-2. Summary statistics of fitted geometric model parameters...... 201

ix

List of Figures

Figure 1-1. Illustration of occlusion...... 4

Figure 1-2. Research path showing research problems, activities, and main results. ....7

Figure 2-1. Sensiomotor continuum of human hand function...... 14

Figure 2-2. Bones and joints of the hand...... 16

Figure 2-3. Anthropomorphic measurements...... 18

Figure 2-4. Selected hand and wrist postures...... 19

Figure 2-5. Principle range of motion for hand and wrist...... 19

Figure 2-6. Horizontal arc of grasp...... 20

Figure 2-7. Jones’s force-displacement framework for manual dexterity...... 21

Figure 2-8. Dynamic tripod pen grip illustrated by Merctor, 1540...... 23

Figure 2-9. Examples of different adult pen grips reported in the literature...... 24

Figure 2-10. Commercial ergonomic pen designs...... 27

Figure 2-11. Illustration of pen point placements in Kao et al.’s experiment...... 27

Figure 2-12. Direct input and indirect input ...... 29

Figure 2-13. Sutherland’s Sketchpad with light pen input...... 30

Figure 2-14. RAND Tablet ...... 31

Figure 2-15. Electromagnetic pen position sensor...... 32

Figure 2-16. Hardware and visual parallax...... 34

Figure 2-17. Forearm and hand postures observed by Wu & Luo...... 44

Figure 2-18. Extreme pen grips observed by Wu & Luo...... 45

Figure 2-19. Wu and Luo’s ergonomic Tablet PC pen...... 45

Figure 2-20. Pen sizes evaluated by Wu and Luo ...... 46

Figure 3-1. Experimental setup and apparatus...... 59

Figure 3-2. Study screen captures taken from initial task sequence...... 61

Figure 3-3. Screen captures of selected scenario tasks...... 62

Figure 3-4. Illustration of selected widgets...... 64

x

Figure 3-5. Analysis software tool...... 67

Figure 3-6. Motion capture player...... 68

Figure 3-7. Coding decision when participant makes a noticeable pause...... 72

Figure 3-8. Coding decision tree when participant attempts an action...... 73

Figure 3-9. Mean time for all constrained tasks per group...... 76

Figure 3-10. Mean non-interaction errors per group...... 77

Figure 3-11. Mean interaction errors per group...... 78

Figure 3-12. Mean interaction errors by error type...... 78

Figure 3-13. Pen participant heat map plots for taps/click and errors...... 79

Figure 3-14. Estimated interaction error rate for widget and action contexts...... 86

Figure 3-15. Average pen or mouse movement distance per minute...... 88

Figure 3-16. Proportion of movements greater than 0.25 mm per frame...... 89

Figure 3-17. Mean Euclidian distance between down and up click ...... 90

Figure 3-18. Obtrusive tooltip hover visualizations, “hover junk” ...... 91

Figure 3-19. Tablet or laptop movement per minute for all constrained tasks...... 92

Figure 3-20. Heat map plot of forearm and pen/mouse rest positions...... 93

Figure 3-21. Examples of occlusion contortion: the “hook posture.” ...... 94

Figure 3-22. Example of occluded status message when pressing save button...... 96

Figure 3-23. Button trajectory example...... 97

Figure 3-24. Scrollbar parts...... 98

Figure 3-25. Example of scrollbar occlusion causing “ramp” movement...... 100

Figure 3-26. Pen tip trajectories during scrollbar interaction...... 100

Figure 3-27. Proportion of left-to-right and right-to-left text selection directions. ...102

Figure 3-28. Pen tip (and selected wrist trajectories) during text selection...... 104

Figure 3-29. Handwriting examples...... 106

Figure 3-30. Tracing examples...... 107

Figure 3-31. Occlusion resulting from MiniBar floating palette...... 108

Figure 4-1. Brandl et al.’s occlusion area experiment...... 121

xi

Figure 4-2. Bieber, Rahman, and Urban’s analytic study...... 122

Figure 4-3. Experimental tasks used by Forlines and Balakrishnan...... 123

Figure 4-4. Hancock and Booth’s results for direct and indirect input task time...... 125

Figure 4-5. Inkpen et al.’s left-handed users and right-aligned scrollbars...... 126

Figure 4-6. Anthropomorphic measurements...... 127

Figure 4-7. Estimated error introduced by monocular versus stereo view...... 129

Figure 4-8. Experiment apparatus...... 129

Figure 4-9. Head mounted camera...... 130

Figure 4-10. Experiment 4-1 experimental stimuli...... 131

Figure 4-11. Estimated rectification error from head-mounted camera...... 133

Figure 4-12. Image processing steps...... 134

Figure 4-13. Mean occlusion ratio...... 135

Figure 4-14. Participant size (S) vs. max occlusion ratio...... 136

Figure 4-15. Occlusion shape silhouettes for each participant...... 138

Figure 4-16. Mean occlusion silhouettes...... 139

Figure 4-17. Pixels most likely to be occluded...... 141

Figure 4-18. Video stills of observed grip styles...... 143

Figure 4-19. Left-handed participant results...... 144

Figure 4-20. Crosshair to minimize effect of occlusion...... 147

Figure 4-21. Tapping, Dragging, and Tracing tasks...... 148

Figure 4-22. Target directions and distances...... 149

Figure 4-23. Error rate by Task and Visibility...... 152

Figure 4-24. Selection time by Task and Visibility...... 153

Figure 4-25. Mean Selection Time by Distance, Direction, Task, and Visibility...... 154

Figure 4-26. Error rate by Distance, Direction for Tracing and Hidden Visibility. ..155

Figure 4-27. Overshoot errors with Tracing Task and Hidden Visibility...... 156

Figure 4-28. Illustration of other performance measures...... 157

Figure 4-29. Movement Direction Change (MDC) by Task, and Visibility...... 158

xii

Figure 4-30. Movement Direction Change (MDC) by Distance, Direction, ...... 159

Figure 4-31. Movement Error (ME) by Task, and Visibility...... 160

Figure 4-32. Movement Error (ME) by Distance, Direction, Task, and Visibility. ...161

Figure 4-33. Out-of-Range (OOR) by Task, and Visibility...... 162

Figure 4-34. Out-of-Range (OOR) by Distance, Direction, ...... 163

Figure 4-35. Comparison with Hancock and Booth’s results...... 164

Figure 4-36. Comparison of target position and mean occlusion silhouette...... 165

Figure 4-37. Motion paths by Direction for 61.2 mm Distance...... 167

Figure 4-38. Motion paths by Direction for 102.0 mm Distance...... 168

Figure 4-39. Relationship of Time to Index of Difficulty (ID)...... 171

Figure 4-40. Simultaneous monitoring task...... 173

Figure 4-41. Simultaneous monitoring task positioning...... 175

Figure 4-42. Completion time by target box Position...... 178

Figure 4-43. Pen azimuth angle by target box Position...... 178

Figure 4-44. Mean occlusion silhouette by Position...... 179

Figure 4-45. Comparison of target box position and mean occlusion silhouette...... 180

Figure 4-46. Different occlusion contortion strategies ...... 182

Figure 4-47. Design guidelines for avoiding occluded areas...... 184

Figure 5-1. Approximating the actual occluded area with a model...... 188

Figure 5-2. Previous implicit and explicit occlusion models...... 191

Figure 5-3. Different geometric models of occlusion...... 192

Figure 5-4. Offset circle and pivoting rectangle model parameters...... 193

Figure 5-5. Illustration of precision and recall...... 195

Figure 5-6. Illustration of objective function area calculation...... 197

Figure 5-7. Precision-recall plots for bounding box and fitted geometry...... 199

Figure 5-8. Mean configuration for the geometric model...... 202

Figure 5-9. Precision-recall plot for mean model...... 203

Figure 5-10. Occlusion model user configuration steps...... 206

xiii

Figure 5-11. Vertically sliding elbow to set Θ...... 207

Figure 5-12. Precision-recall plots for analytical configurable model...... 209

Figure 5-13. Experiment 5-1 experimental stimuli...... 210

Figure 5-14. Experiment precision-recall for mean model and fitted geometry...... 213

Figure 5-15. Precision-recall plots for experimentally configured model...... 214

Figure 5-16. Summary of precision recall performance for tested models...... 217

Figure 5-17. Using occlusion model in formal experiment analysis...... 219

Figure 5-18. Using occlusion model to design an interaction technique...... 220

Figure 6-1. Occlusion-Aware Viewer technique...... 222

Figure 6-2. Vogel and Baudisch’s Shift touch screen selection technique...... 224

Figure 6-3. Simple occlusion-awareness in Apple’s iPhone...... 224

Figure 6-4. Brandl et al.’s occlusion-aware pie menu...... 225

Figure 6-5. Occlusion-Aware Viewer demonstration...... 228

Figure 6-6. Detecting importance and callout positioning...... 232

Figure 6-7. Simultaneous monitoring task...... 235

Figure 6-8. Completion times of Technique by Angle...... 240

Figure 6-9. Participant ratings...... 241

Figure 6-10. Sample task completion times and occlusion silhouettes...... 242

Figure 6-11. Ambiguity problems when feedback box is at Angle 45...... 243

Figure 6-12. Left-handed sample task completion times and occlusion silhouettes. .245

Figure 6-13. Occlusion-Aware Dragging...... 247

Figure 6-14. Occlusion-Aware Pop-Ups...... 248

Figure 6-15. Hidden Widget...... 249

Figure 7-1. Inflatable Widget...... 259

Figure 7-2. Conté manipulations for GUI interaction...... 263

xiv

List of Video Figures

Video 3-1. Time-lapse demonstration of study scenario...... 61

Video 3-2. Obtrusive tooltip hover visualization examples...... 91

Video 3-3. Occlusion contortion examples: the “hook posture.” ...... 94

Video 3-4. Button trajectory example...... 98

Video 3-5. Scrollbar trajectory examples...... 101

Video 3-6. Text selection trajectory examples...... 105

Video 4-1. Area and shape experiment demonstration...... 131

Video 4-2. Performance experiment demonstration...... 149

Video 4-3. Simultaneous monitoring demonstration...... 174

Video 5-1. Geometric model fitting demonstration...... 198

Video 5-2. Model configuration demonstration...... 206

Video 6-1. Occlusion-Aware Viewer demonstration...... 228

Video 6-2. Occlusion-Aware Viewer experiment demonstration...... 238

Video 6-3. Occlusion-Aware Dragging technique demonstration...... 247

Additional Information for Video Figures

All digital videos are encoded using the MPEG-4 H.264 codec and saved in a “.mp4” file container. The filename of each video figure is given in text below the thumbnails, with a hyperlink for viewing:

video filename, click to view

xv

List of Appendices

A. Observational Study Scenario Script ...... 281

xvi

1 Introduction

Given our familiarity with using pens and pencils, one might expect that operating a computer using a pen would be more natural and efficient. The second generation of commercial pen input devices, such as the Tablet PC, are reasonably priced and readily available. Yet, they have failed to live up to analysts’ prediction for marketplace adoption (Spooner & Foley, 2005; Stone & Vance, 2009). When the first wave of commercial pen computing devices were released in the early 1990s, marketers claimed that non-typists such as business executives would find pen input faster than using a keyboard (Bricklin, 2002). Today, this claim seems more tenuous, since users are more likely to have keyboard experience. Perhaps the problem with pen computing is entirely due to entering text without a physical keyboard? For example, the average computer user types 20 to 40 words-per-minute (wpm) (C. Karat, Halverson, Horn, & J. Karat, 1999) and 60 wpm or more if proficient (Matias, I. S. MacKenzie, & Buxton, 1996).

Contrast this to tapping a pen on a QWERTY soft keyboard (a keyboard rendered on the display) which has a predicted maximum speed of 30 wpm (Soukoreff & I. S. MacKenzie, 1995). Or, if natural handwriting recognition is used, text entry speeds can be no better than actual writing speeds, between 12 – 23 wpm for printing (Card, Moran, & Newell, 1986) or up to 30 wpm for cursive (Wiklund, Dumas, & Hoffman, 1987). When compared to typing speeds, this suggests a performance deficit for pen-based text entry. However, there exist alternative pen-based text entry techniques which could perform as fast, or faster, than most typists. For example, users can attain speeds of 41 wpm with

Zhai, Hunter, and Smith’s ATOMIC optimized soft keyboard layout (2002), and Kristensson

1

2 and Zhai’s technique (2004) has produced speeds as high as 80 wpm. These speeds are encouraging and variants of these techniques can be installed in many devices – but, like keyboard typing, they require training and practice to master. Regardless, even if there is some performance loss for pen-based text entry, is this the only problem? Another, perhaps less obvious, problem is that commercial pen-based devices use a graphical user interface (GUI). A GUI is built on the premise of pointing and clicking for target selection and direct manipulation. Note that the verb click is used instead of tap or touch – the typical assumption is that a mouse is used for input. The style of GUI used in the Tablet PC was designed for indirect input using a mouse, where there is a spatial separation between the input space and output display (Meyer, 1995). Thus issues specific to direct pen input, where the input and output spaces are coincident (Whitefield, 1986), have not been considered. The research community has responded with pen-specific interaction paradigms such as crossing (Accot & Zhai, 2002; Apitz & Guimbretière, 2004), gestures (Aliakseyeu, Irani, Lucero, & Subramanian, 2008; Grossman, Hinckley, Baudisch, Agrawala, & Balakrishnan, 2006; Kurtenbach & Buxton, 1991a, 1991b), pen tilting (Tian et al., 2008), pen rolling (Bi, Moscovich, Ramos, Balakrishnan, & Hinckley, 2008), and pressure (Ramos, Boulos, & Balakrishnan, 2004); pen-tailored GUI widgets (Bi et al., 2008; Fitzmaurice, Khan, Pieké, Buxton, & Kurtenbach, 2003; Guimbretière & Winograd, 2000; Hinckley et al., 2006; Ramos & Balakrishnan, 2005); and pen-specific applications (Agarawala & Balakrishnan, 2006; Bae, Balakrishnan, & Singh, 2008; Hinckley et al., 2007; Ramos & Balakrishnan, 2003; Schilit, Golovchinsky, & Price, 1998; Zeleznik, Bragdon, C. Liu, & Forsberg, 2008). While these all demonstrate ways to improve pen usability, retrofitting the vast number of existing software applications to accommodate these new paradigms is arguably not practically feasible. Moreover, the popularity of convertible Tablet PCs, which operate in laptop or slate mode, suggests that users may prefer to switch between using a mouse and keyboard for certain working contexts (such as when seated at a desk), and using a pen for other situations (such as when standing, or seated without a flat surface on a bus or park bench) (Twining et al., 2005). Thus any pen-specific GUI refinements or pen-tailored applications should also be compatible with mouse and keyboard input.

2

So, if we accept that conventional GUIs are unlikely to change in the near future, are there still ways to improve pen input? The first step towards such a goal is determining what the major issues are with pen interaction and a conventional GUI. A large body of work has already investigated low-level aspects of pen performance mostly using controlled experiments. While this type of investigation is certainly important, we agree with Ramos et al. (2006) and Briggs et al. (1993), who argue that investigating pen-based interaction with realistic tasks and applications may provide a more complete picture of actual performance. Indeed, there are examples of qualitative and observational pen research (Briggs et al., 1993; Turner, Pérez- Quiñones, & Edwards, 2007; Inkpen et al., 2006; Fitzmaurice, Balakrishnan, Kurtenbach, & Buxton, 1999; Haider, Luczak, & Rohmert, 1982). Unfortunately, these use older technologies like indirect styli with opaque tablets and light pens, or they focus on a single widget or specialized task. In this thesis, we present the results of an observational study of direct pen interaction with a realistic scenario involving popular office applications and tasks designed to exercise standard GUI components, and covered typical interactions such as parameter selection, object manipulation, text selection, and ink annotation. Based on our analysis, we believe that improvements can be made at three levels without altering the fundamental behaviour and layout of conventional GUIs: hardware, base interaction, and widget behaviour. Hardware improvements can reduce parallax and lag, increase input sensitivity, and reduce the weight of the tablet. Individual widget behaviour could be tuned for pen input without altering their initial size or appearance. Base interaction improvements focus on improving fundamental input operations such as pointing, tapping, dragging, as well as adding enhancements which address global issues specific to direct pen input. We believe that well- designed base interaction improvements may have the greatest capability to dramatically improve direct input overall since they use current hardware technology and could potentially apply to all GUI operations and widgets. In our observational study, we found overarching problems, many of which could be addressed at the base interaction level: poor precision when pointing or tapping; instability and fatigue due to ergonomics; cognitive differences such as pen “tapping” versus mouse

3

“double clicking”; and frustration due to limited input capabilities such as the lack of short- cut keys. Researchers have already presented ideas which seek to address many of these problems. Many could be implemented as base interaction improvements, and many are, or could be, made compatible with conventional GUIs. Examples include: improving precision with new target selection techniques (Accot & Zhai, 2002; Ramos, Cockburn, Balakrishnan, & Beaudouin-Lafon, 2007; Ren, Yin, Zhao, & Li, 2007; Ren & Moriya, 2000); reducing ergonomic problems associated with direct pen input and reaching (Forlines, Vogel, & Balakrishnan, 2006); and pen-specific command invocation techniques to address problems with limited input (Grossman et al., 2006; Ramos & Balakrishnan, 2007). However, we found another issue which is under-researched and believe can also be addressed with base interaction improvements. That issue is occlusion: when the user’s hand and forearm covers portions of the display during interaction (Figure 1-1). In our observational study, we found that occlusion likely contributed to user errors, led to fatigue, and created inefficient movements.

Figure 1-1. Illustration of occlusion. If a light source was placed at a user’s eyes, the resulting shadow of the hand and forearm against the tablet would be the region of the display hidden (or “occluded”) from the user.

Past researchers have suggested that occlusion impedes performance in specific contexts and widgets (Brandl et al., 2009; Hancock & Booth, 2004; Inkpen et al., 2006), used occlusion as motivation for interaction techniques (Apitz & Guimbretière, 2004; Ramos &

4

Balakrishnan, 2003; Schilit et al., 1998), and argued for its effect during experiments and usability studies (Forlines & Balakrishnan, 2008; Grossman et al., 2006; Hinckley, Baudisch, Ramos, & Guimbretière, 2005; Hinckley et al., 2007; Ramos et al., 2007). However, as of yet, there has been no systematic study of the fundamental characteristics of occlusion or its effect on performance, nor have general techniques been developed to address it at a base interaction level. Thus, after reporting results from an initial observation study of direct pen input, we will focus on examining, modelling, and designing techniques for hand occlusion.

1.1 Research Objectives and Overview

The research objective of this thesis can be simply stated as:

Identify issues with direct pen interaction with a conventional GUI, and improve the experience by investigating, modelling, and addressing hand occlusion.

To ultimately reach this goal, we investigate a series of primary research problems, many of which build on (or are dependent on) research outcomes from previous steps. The six research problem statements are:

(a) Why is direct pen input difficult with a conventional GUI?

(b) What is the area and shape of hand occlusion?

(c) How does occlusion affect performance when tapping, dragging, or tracing?

(d) How do people compensate for occlusion?

(e) Is it possible to model the occluded area and update it in real time using only conventional pen input?

(f) Can techniques be developed for a conventional GUI interface to counteract occlusion?

5

To answer these research problems, we took the following steps (also illustrated in Figure 1-2):

1. To answer the first question (a), we conduct an observational study with realistic tasks and common software applications. Our results verify that occlusion is an aspect worth investigating and place it in context with other direct pen input issues.

2. To investigate the phenomena of occlusion more closely, we answer questions (b), (c), and (d) by conducting three formal experiments to examine its area and shape, its affect on performance, and compensatory postures.

3. The results from inquiry (b) lead us to design a five-parameter geometric model which captures the general shape of the occluded area, and can be configured for a particular individual so that it can be updated in real time using only the current pen position and optionally, pen tilt.

4. Motivated by results from inquiry’s (a), (c), and (d), we design and evaluate an occlusion-aware interface technique called the Occlusion-Aware Viewer. This provides a case study for other occlusion-related base interaction techniques that would be compatible with current GUIs.

6

Chapter 3: a Why is direct pen input difficult with a conventional GUI? We conduct an observation study with realistic tasks and common software applications. There are five overarching issues, but one is the nearly uninvestigated issue of hand occlusion.

Chapter 4: b What is the area and shape of hand occlusion? We conduct a controlled Chapter 5: experiment to capture and e Is it possible to model the analyze rectified and registered occluded area and update it in images of the occluded area. real time using only conven- The hand and arm can occlude a tional pen input? large area, and although the We design a user-configurable, shape varies across participants, geometric model and test it there are common features. analytically and in a user study. The occluded area can be modeled c How does occlusion effect using a five parameter geometric performance when tapping, representation, the model can be dragging, or tracing? configured for individual users with a four-step process and can be We conduct a controlled updated in real time using only pen experiment to investigate how location and optionally pen tilt. occlusion affects time, error, and other performance metrics. It is difficult to experimentally control for occlusion within a single direct input context, but there is reasonable evidence Chapter 6: that occlusion has an effect. f Can techniques be developed for a conventional interface to counteract occlusion? d How do people compensate for occlusion? We design and evaluate the Occlusion-Aware Viewer We conduct a controlled interaction technique. experiment with a simultaneous monitoring task. The Occlusion-Aware Viewer interaction technique can reduce People use different posture the effect of occlusion in a contortion strategies which can simultaneous monitoring task reduce task performance. and it functions as a case study for creating other occlusion-n- aware interface techniques.

Figure 1-2. Research path showing research problems, activities, and main results. Bold text is the research problem statement; italic text is the research activity; and the final block of text is the primary contribution which leads to the next stage. Highlighted text and arrows illustrate dependencies forming the research path used in this thesis.

Note that our focus is on GUI manipulation rather than text entry – text entry is an isolated and difficult problem with a large body of existing work relating to handwriting

7

recognition and direct input keyboard techniques – Shilman, Tan, and Simard (2006) and Zhai and Kristensson (2003) provide overviews of this literature. We see our work as complementary; improvements in pen-based text entry are needed, but our focus is on improving direct input with standard GUI manipulation.

1.2 Contributions

We make the following contributions relating to human factors, interaction design, and methodology.

Issues for Direct Pen Interaction with a Conventional GUI

To our knowledge, there has been no comprehensive qualitative, observational study of Tablet PC or direct pen interaction with realistic tasks and common GUI software applications. Our study, described in chapter 3, presents results that can help guide future pen input researchers. We found that pen participants made more errors, performed inefficient movements, and expressed frustration compared to mouse users. When examined as a whole, our quantitative and qualitative observations reveal overarching problems with direct pen input: poor precision when pointing or tapping, problems caused by hand occlusion; instability and fatigue due to ergonomics; cognitive differences between pen and mouse usage; and frustration due to limited input capabilities. We believe these to be the primary causes of non- text errors and contribute to user frustration when using a pen with a conventional GUI. We feel that these issues can be addressed by improving hardware, base interaction, and widget behaviour without sacrificing the consistency of current GUIs and applications. Moreover, previous research has focused on issues other than occlusion, yet our results suggest that occlusion has a profound effect on the usability of direct pen input.

Characteristics of Direct Pen Occlusion

Our investigation into fundamental aspects of direct pen occlusion in chapter 4 – its size and shape relative to the pen position, how it affects performance and ways in which

8

users contort their hand posture to minimize its affect – reveal new insights into the characteristics of hand occlusion and its effect on user performance.

The Area and Shape of Occlusion

In Experiment 4-1, we use a novel combination of computer vision and image processing to capture an image showing the shape of the occluded area from the perspective our participants (we call these images occlusion silhouettes). We find that the hand and arm can occlude a large area, as much as 47% of a 12 inch display, and that the shape of the occluded portion of the display varied across participants according to anatomical size and the style of pen grip. However, there are some common features and similar grip characteristics among users. Using mean images of the occluded area, we present three basic design implications which take the occluded area into account.

The Effect of Occlusion on Performance; Compensatory Postures

We support our qualitative findings from the observational study with further evidence from controlled experiments. The results of Experiment 4-2 suggest that occlusion has an effect on performance. Moreover, for continuous tasks in which the pen remains against the display surface, such as dragging, or when the desired target is initially hidden, the effect appears more pronounced. However, we also found that it is difficult to experimentally control for occlusion within a single direct input context. The results from Experiment 4-3 show that users contort their posture to minimize the effect of occlusion during a simultaneous monitoring task, and different users utilize different contortion strategies. Compensating for occlusion in this way reduces performance by increasing task time.

A Configurable Model of Occlusion

We show that a five parameter geometric model can adequately represent the general shape of the occluded area examined in Experiment 4-1. We use this geometric model to design a configurable model of occlusion which can be interactively customized for a particular individual using a four-step interactive process. Once completed, the model can be updated in real time based only on pen location and pen tilt if available. We evaluate our model analytically using a novel methodology, and in a user study (Experiment 5-1). Finally,

9

we illustrate three examples showing how the model can be used by designers and researchers.

Occlusion-Aware Interface Techniques

We introduce the notion of occlusion-aware interfaces: interaction techniques which know what regions of the display are currently occluded, and use this knowledge to counteract potential problems with occlusion and/or utilize the occluded area. As a case study, we present a fully realized design for an interaction technique called the Occlusion- Aware Viewer which displays otherwise missed previews and status messages in a non- occluded area using a bubble-like callout. Based on results from a user study, the Occlusion- Aware Viewer can decrease the time of a simultaneous monitoring task by up to 23%. However, the study also revealed that techniques such as this need to carefully consider cases where the occluded area is ambiguous, or else performance will decrease. In spite of this problem, our participants rated using our technique as better than no technique. We also describe designs and ideas for other occlusion-aware interface techniques.

Methodology, Analysis, and Implementation

In addition to contributions pertaining to direct pen interaction and occlusion, this thesis contains a number of contributions for quantitative and qualitative study design, multi- faceted logging, qualitative analysis, image-based analysis, optimization-based analytical testing, and user interface image processing.

Hybrid Quantitative and Qualitative Study Design

In the observational study presented in chapter 3, we describe a study design which is a hybrid of typical controlled HCI experimental studies, usability studies, and qualitative research. We believe this enables more diverse observations involving a variety of contexts and interactions, and moves researchers closer to studying how people might perform in real settings.

Multi-Faceted Logging

In every study we conducted for this dissertation, we captured head-mounted video in addition to conventional input event logs. For the observational study in chapter 3, we also

10

captured 3-D positions of their forearm, pen, Tablet PC, and head, as well as a full scale screen capture video. To use this extra logging data, we describe techniques for synchronizing, segmenting, and annotating – and we developed a reasonably complete software application to perform these tasks efficiently.

Qualitative Analysis

For qualitative analysis, we introduce an adapted open coding approach (Strauss & Corbin, 1998) which includes a preliminary step of identifying important events before performing actual coding. We demonstrate the utility of creating coding decision tree to train raters and reduce coding ambiguity. We also provide a strategy to combine events identified by two different raters.

Image Based Analysis

The experiments in chapters 4, 5, and 6 utilize a novel combination of head-mounted video logging, augmented reality marker tracking, and image processing techniques to capture images of hand and arm occlusion from the point-of-view of the user. We use these images, which we call occlusion silhouettes, to visualize the mean shape of occlusion for individual participants, to compute a quantitative occlusion ratio in Experiment 4-1, for analytical tests in chapter 5, and to test whether the experimental stimuli was occluded in Experiments 4-2 and 4-3.

Optimization-Based Analytical Testing

To analytically test our configurable model of occlusion, we created what we believe is a novel methodology using techniques taken from classification and numerical optimization.

We use mean F2 scores (Van Rijsbergen, 1979), calculated from the model’s precision-recall performance for a corpus of occlusion silhouettes, to compare different version of models.

We demonstrate how to establish a theoretical maximum F2 score by fitting geometric parameters to a corpus of silhouettes using non-linear optimization.

User Interface Image Processing

As part of our Occlusion-Aware Viewer implementation, we introduce what we believe is a novel application of image processing and computer vision techniques for real-time user interface analysis to enable an interaction technique. In real-time, we monitor what regions of

11

the interface are changing, and use this to recognise occluded status messages and document previews. This makes our technique compatible with current GUIs by functioning at a base interaction level.

1.3 Dissertation Outline

The remainder of this document is organized as follows (see also Figure 1-2):

In chapter 2, we summarize relevant background information regarding how we use our hand to grasp and manipulate a pen, and discuss pen input technologies. Then, we summarize previous research results pertaining to performance and input characteristics, and give a brief overview of pen-specific interaction techniques and applications that have been developed.

In chapter 3, we describe the methodology and results for our observational study of direct pen input.

In chapter 4, we describe three controlled experiments which investigate occlusion.

In chapter 5, we present a configurable, real-time, geometric model of the area occluded by the hand and forearm.

In chapter 6, we introduce occlusion-aware interfaces and describe and evaluate a case study technique called the Occlusion-Aware Viewer.

In chapter 7, we draw conclusions, summarize limitations, and suggest possible future work.

12

2 Background Literature

Using a pen (or stylus) to operate a computer is not new. Tethered light-pens and digitizer tablets have existed since at least the 1950s (Davis & Ellis, 1964; Gurley & Woodward, 1959). Pen-based personal computers have been commercially available since the early 1990s with early entrants such as Go Corporation’s PenPoint, Apple’s Newton, and the Palm Pilot. In fact, Microsoft released Windows for Pen Computing in 1992 (Bricklin, 2002; Meyer, 1995). It is not surprising then, that there exists a large body of academic and commercial research seeking to understand and improve pen input for computers. A common benefit touted by industry marketing teams and suggested by some researchers (e.g., Mark D. Gross & Do, 1996; Whitefield, 1986) is that pen computing should be more natural than other input modalities since we are already familiar with pen and paper. This is based on the simple notion that drawing on paper should be the same as drawing on a digitizer display. Thus, we begin by describing pen input from the most basic level: how we use our hand to grasp and manipulate a pen. This includes a review of hand and upper limb anatomy and mechanical capabilities; followed by an overview of prehension and grip as it relates to pen usage. With these fundamentals in mind, we move to pen input with a computer. After an overview of pen sensing technologies and input modalities, we summarize research results pertaining to performance and input characteristics. This includes low-level performance for basic tasks such as target selection, docking (dragging), and path following; and human capabilities for controlling additional input channels such as pressure, tilt, and rotation. Next, we discuss findings examining pen interaction with specific widgets and in common usage

13

contexts. Finally, we give an overview of pen-specific interaction techniques and applications that have been developed. Based on our background survey, we conclude that investigating the usability of direct pen input with a conventional GUI is an important, but overlooked context. Furthermore, we note that the effect of hand occlusion has been an often mentioned, but under-researched aspect of direct pen input.

2.1 The Hand and the Pen

One does not need to be convinced that our hands are incredibly important for many daily functions. We use them to explore, touch, manipulate, and move objects around us (Napier, 1993; F. R. Wilson, 1999). Jones and Lederman (2006) present a conceptual framework of human hand function along a continuum from predominantly sensory to predominantly motor (Figure 2-1).

Sensory Motor Tactile Active haptic Prehension Non-n- sensing sensing prehensile skilled movements

Figure 2-1. Sensiomotor continuum of human hand function. (from L. A. Jones & Lederman, 2006, fig. 1.1)

Many of these basic functions are further refined with the use of hand-held tools. For example, we can break a piece of wood in two with our bare hands, but a saw will make this much easier and more accurate. Other common tasks are nearly impossible without the assistance of a hand-held tool (Chris Baber, 2006); consider cutting metal without a hack saw or writing on paper without an ink pen. When wielded by capable and dexterous hands, a handwriting instrument becomes an extremely flexible tool. Pens, pencils, styli, brushes, crayons, and chalk enable a wide range of expression with relatively simple technology. In most cases a keyboard may be more efficient for entering written text (Zhai, Hunter, & Barton A. Smith, 2000), but it lacks flexibility in spite of its increased technical complexity. Of course speech requires no tool at all, but it is really only effective for certain modes of communication. Using speech to

14

describe schematics, identifying specific areas of interest, or render a portrait can be challenging if not impossible. Writing a word, or drawing a shape with a pen presupposes the existence of a hand for support and manipulative control. How successful the hand is at manipulating the pen is partially due to human anatomical properties and capabilities such movement range, stability, strength, and precision. In the same way, the physical properties and ergonomics of the pen such as its size, mass, and friction, also affect manipulative performance.

Hand and Upper Limb Anatomy

It is the anatomy of the hand, as well as the arm and shoulder which enable its positioning in space. We briefly review relevant aspects of skeletal structure, external observable structure and movement capability. For more detailed descriptions, the reader should consult Napier (1993, chap. 2), Jones and Lederman (2006, chap. 2), or C. L. MacKenzie and Iberall (1994, pp. 349-).

Bones

The hand is comprised of 27 bones: 14 phalanges in the digits, 5 metacarpals in the palm, and 8 carpals in the wrist (Figure 2-2a). The most common names for the digits are thumb, index, middle, ring, and little. The latter four are collectively referred to as the fingers. Beginning at the finger tip, each finger has 3 phalanges: the distal, middle, and proximal. The thumb has no middle phalanx1. The proximal phalanges are connected to metacarpals, which are in turn connected to the wrist carpals.

1 phalanx is the singular form of phalanges

15

middle ring index

little distal interphalageal (DIP) proximal interphalageal (PIP) distal phalanges thumb metacarpophalageal (MP) middle phalanges

proximal phalanges interphalageal (IP) metacarpophalageal (MP) metacarpals

carpals carpometacarpal

ulna radius

(a) bones (b) joints Figure 2-2. Bones and joints of the hand. (skeleton illustration based on C. L. MacKenzie & Iberall, 1994)

The terms proximal and distal are used in anatomy to describe the relative position of body structures. Distal denotes a structure attached farther from the centre of the body and proximal nearer. To avoid ambiguity, a standard body reference position is needed – otherwise, we could point the tip of our finger at our heart and change the relative positioning. This body reference position is called the standard anatomical position. It places the arms straight and slightly way from the side of the body with the palm facing forward. This standard position comes from the practice of suspending cadavers for dissection in the eighteenth century (Napier, 1993, p. 13).

Joints and Range-of-Motion

The finger joints are the distal interphalangeal joint (DIP), and the proximal interphalangeal joint (PIP) (Figure 2-2b). Based on the standard anatomical position, the DIP is farther from the centre of the body than the PIP. These are primarily hinge joints which restrict most finger movement to bending or extending with some small side-to-side movement. The thumb has only a single interphalangeal joint (IP). Each proximal phalanx in the digits connects to a metacarpal with the aptly named metacarpophalangeal joint (MP). For the fingers, this joint functions similar to the interphalangeal joints, but enables much greater side-to-side movement. To assist forming a

16

precise oppositional grip with the thumb (such as holding a pen) this joint also has some capability for rotation, especially with the index finger. The metacarpals are connected to the wrist carpals by the carpometacarpal joints. The carpometacarpal joint for the thumb permits a very wide range of motion side-to-side and when flexing and straightening, functioning like a saddle joint. The eight carpals that make of the palm are able to move independently to varying amounts, with most movement occurring along the axis aligned with the middle digit, permitting the hand to be cupped.

Musculature

Anatomically, there are 29 muscles to control hand movement, but some of these muscles perform different functions by tendon subdivisions. The majority of muscles for the hand are actually located in the forearm; and their movements are transfered to the digits, palm, and wrist using a system of tendons. This enables the hand to have strength without bulk. For fingers, there are two sets of primary muscles, the superficial and the deep, which divide their work between controlling the DIP joint and the PIP joint respectively. Each set of muscles are arranged in opposition, which enable joints to bend (or “flex”) and extend (a flexion-extension movement), and through some ingenious routing, can also control how fingers spread apart as well. The thumb and the wrist are each controlled by three primary muscles which suggest the greater range of motion.

Anthropometry and Range of Motion

The human hand comes in different shapes and sizes, but overall there are some consistent trends (Napier, 1993, p. 18). For western men and women, the ratio of hand breadth to hand length is remarkably consistent, with only a slight indication of women’s hands being more slender. The longest digit is the middle finger with length decreasing as digits deviate from the centre. Other digit lengths are not symmetrical, the thumb is much shorter than the little finger, and interestingly, the relative ordering of length for the index and ring finger varies by individual (although females have longer index fingers more often than men). Table 4-1 and Figure 4-6 provide selected 50th percentile anthropomorphic measurements for the hand and forearm. Additional percentiles and dimensions can be found in and Hastlegrave (2006) or Kroemer and Grandjean (1997).

17

Dimension Men Women

EL elbow to fingertip length 480 435

SL shoulder to elbow length 365 335

UL upper limb length including hand 790 715

HL hand length 189 174

HB hand breadth 87 76

Table 2-1. Anthropomorphic measurements for hand and arm. All dimensions in millimetres, given for 50th percentile only (from Pheasant & Hastlegrave, 2006, tables 6.1 and 10.11).

UL

HL SL FL HB

EL

Figure 2-3. Anthropomorphic measurements. Note that FL = EL - HL (based on Pheasant & Hastlegrave, 2006, figs. 2.11 and 6.1)

In spite of variation of shape, Napier (1993) argues that human hand function is universal. We already gave general characteristics for the range-of-motion of digits, but as we shall see, during pen manipulation, their primary function is to hold the pen and perform fine movements. Larger movements involve the wrist, forearm, and upper arm. Within the kinematic chain from shoulder to pen, hand movements using the wrist and elbow joints afford a large range-of-motion (Figure 2-4 and Figure 2-5).

18

extension radial deviation neutral ulnar deviation flexion Figure 2-4. Selected hand and wrist postures. (based on Pheasant & Hastlegrave, 2006, fig. 6.2)

(a) extension (e) supination (f) pronation (c) abduction

(d) adduduction (b) flexion

Figure 2-5. Principle range of motion for hand and wrist. Values given for 50th percentile males and females: (a) wrist extension 62°, 72°; (b) wrist flexion 68°, 72°; (d) wrist adduction 22°, 27°; (e) wrist abduction 32°, 28°; (e) forearm supination, 108°, 109°; (f) forearm pronation 65°, 81°; (illustration and measurements from Kroemer & Grandjean, 1997, fig. 4.9)

The maximum distance addressable by the hand is more difficult to precisely define due to anatomical differences and joint coordination. When seated at a standard desk most individuals can grasp objects in a horizontal area bounded by a 35 – 45 cm arc (Figure 2-6) (based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6). With upper arm movement, the arc can be increased to 55 – 65 cm; and with torso movement, such as when pianists lean slightly when reaching for distant keys, this distance can be extended further.

19

Figure 2-6. Horizontal arc of grasp. (based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)

Manual Dexterity

Manual dexterity tasks can be mapped in a force-displacement framework (Figure 2-7). The force typically exerted by the hand for most tasks ranges from approximately 0.1 to 100 N and can perform movements as fine as 0.1 mm (L. Jones, 1998). When writing with a pen, researchers have measured barrel grip forces as high as 7.3 N (Chau, 2006). For comparison, when manipulating a mouse, researchers report mean grip forces near 0.8 N (Visser, De Loose, De Graaff, & Dieen, 2004). The strongest fingers in the hand are the index and middle fingers (Radwin et al., 1992, as cited in Pheasant & Hastlegrave, 2006, p. 149), which oppose the thumb and prevent an object from slipping during these fine manipulations.

20

100

human hand and upper limb function

10 timed playing piano dexterity tests

electronic keyboard 1 assembly typing Force (N) micro- surgery

0.1

0.01

0.0001 0.001 0.01 0.1 1 Displacement (m) Figure 2-7. Jones’s force-displacement framework for manual dexterity. (from L. Jones, 1998, fig. 4.1)

A considerable amount of research (especially in human-computer interaction) has focused on relatively large, forceful movements, where dexterity is primarily measured by speed and accuracy and modeled using Fitts’ Law2. Many of these studies are one- and two- dimension variations of Fitts’ original experiment (Fitts, 1954) , and typically involve a rapid coordinated movement between fingers, wrist, and forearm over distances more than a few centimetres (Soukoreff & I. S. MacKenzie, 2004). A notable exception is Balakrishnan and I. S. MacKenzie (1997), who tested the performance of isolated limbs with motor-space distances as small as 3 mm. They found that pointing with isolated index finger movements were slower and less accurate than using a wrist or forearm individually, or when manipulating a pen. Balakrishnan and I. S. MacKenzie conclude that “... stylus3-type input devices that exploit the high bandwidth of the thumb and index finger, working in unison, is

2 Since Fitts’ Law is generally accepted to be a core component of the Human Computer Interaction literature, and since it is not the focus of this dissertation, we will refrain from providing an explanation here. The uninitiated reader should consult MacKenzie (1992) and Soukoreff and MacKenzie (2004) for detailed background, tutorials, and practical guidelines for its application. 3 The terms “pen” and “stylus” are often used interchangeably in Human-Computer Interaction literature and industry. We will adopt a more strict convention explained on page 28.

21

likely to yield high performance.” We survey additional research investigating pen input performance below.

The Pen

Writing instruments have been in existence for more than five thousand years, and using sticks and fingers for drawing have been used much longer. The general form of writing and drawing instruments remain largely unchanged – a cylindrical shank with a tip to leave an impression. For example, Sumerian’s from 3500 BC made marks in clay tablets to keep inventories of items such as food stocks (Fischer, 2001, p. 28). These standardized marks were created with a hollow reed stylus which was pushed into wet clay at different angels to create a vocabulary of symbols. The use of a stylus to leave a mark by scratching into a surface continued for thousands of years, and advanced to incorporate re-usable tablet surfaces such as wax. Modern writing instruments such as the pen and pencil enable the creation of continuous lines and marks by leaving a trail of ink or lead. It is interesting to note that in their early form, the pen and pencil were designed to serve different purposes. The pencil was primarily used for drawing lines and a nib pen for writing (Petroski, 1992). This was not due to convention, but rather because writing was calligraphic which required different stoke widths. Today we still use specialized pen-like mark making tools such as the cabinetmakers’ marking knife, or artists’ conté crayons and brushes. These are more accurate or more expressive than modern two-dimensional pens and pencils – in fact, Fischer (2001, p. 51) argues that a stylus making clay is capable of a richer visual vocabulary.

Pen Grips

The interface between the hand and pen is the grip, the third stage of prehension during which manipulation of the acquired object occurs. There are many different types of grips and much different taxonomies for their categorization (C. L. MacKenzie & Iberall, 1994, provide a comprehensive discussion). Using a functional categorization, most grips

22

incorporating thumb and finger opposition can be labelled as power grips or precision grips4 (Napier, 1956). With a power grip, the object is held immobile against the palm of the hand with fingers and thumb wrapped around. This makes the object function as a static extension of wrist and arm movements: a common example is when swinging a hammer. With a precision grip, the object is held with the tips of the thumb and fingers. The enables some fine movements of the object using only the digits, but with a reduction in gripping force. When manipulating a pen, most adults use a type of precision grip, most often a variation on the dynamic tripod (C. L. MacKenzie & Iberall, 1994, p. 27). The dynamic tripod grip is named for the way in which the thumb, index, and middle finger work in opposition to support and manipulate the pen (Figure 2-8). Gripping an object for manipulation requires balancing a firm hold to keep the object stationary and securely attached to the hand, while at the same time enabling the manipulation of objects independent of the hand (Napier, 1956). In most cases, the goal is to amplify or attenuate finger movements and enable a wider variety of tool control (Elliott & Connolly, 1984).

Figure 2-8. Dynamic tripod pen grip illustrated by Merctor, 1540. (from Kao, Van Galen, & Hoosain, 1986)

Although the dynamic tripod is considered by many teachers and therapists to be the ideal pen grip (Selin, 2003, p. 4), it is not the only way in which individuals hold a pen. The

4 There is some debate whether this is a grip per se, since the pen is not held in a single, static phase. The term precision handling has been proposed (Landsmeer, 1962). For simplicity, and to remain consistent with most of the literature, we will continue to use the term grip.

23

greatest diversity of grips is seen with young children (Elliott & Connolly, 1984; Sassoon, 1993; Selin, 2003), but adults also employ different grips, not all of which are considered efficient (Elliott & Connolly, 1984). Common adult grip variants include the lateral tripod (Figure 2-9b), and adapted tripod (Figure 2-9c), as well as many variations on the dynamic tripod itself (Figure 2-9a,b,c). Examples of less common, and argued to be less efficient (Sassoon, 1993; Selin, 2003) adult grips include the thumb wrap (Figure 2-9g), ventral grip (Figure 2-9h), and index grip (Figure 2-9i)5.

(a) dynamic tripod (b) lateral tripod (c) adapted tripod

(d) dynamic tripod variation: (e) dynamic tripod variation: (f) dynamic tripod variation: thumb and finger extended thumb flexed, finger hyperextended thumb hyperextended, finger flexed

(g) thumb wrap (h) ventral grip (i) index grip Figure 2-9. Examples of different adult pen grips reported in the literature. Grips (a) through (f) are considered efficient, grips (g) through (i) are not. (based on illustrations in Greer & Lockman, 1998; Sassoon, 1993; Selin, 2003)

For the most part, research has shown that adults use a consistent grip when writing (e.g., Greer & Lockman, 1998). As we noted earlier, when using a precision grip, the held

5 For interested readers, Selin (2003) provides a comprehensive overview of pen grip research.

24

object has some freedom for independent movement. Greer and Lockman (1998) observed that adults often tilt the pen slightly to the right when making vertical lines, and towards their body when making horizontal lines. They also found individuals tilt the pen more at the beginning of a line than at the end, and the degree of tilt was independent of the drawn line’s position on the page. Other characteristics, such as the distance from pen tip to grip fulcrum, the amount of the pen barrel that extends beyond the grip, or even the grip force applied to the pen, appear to be less frequently considered. Part of the reason may be the difficulty of accurately measuring these statistics, or establishing a baseline set of tasks for which to measure them. Chau (2006) demonstrated a pen capable of accurately measuring the grip forces applied around the pen barrel. In a small pilot experiment, he establishing the first accurate estimate for pen grip forces (which we discussed earlier) and found that participants held the pen with a mean grip height of 35 mm (SD 13.7).

Writing Posture

The hand and fingers which grip the pen are positioned in space by the forearm, which is positioned by the arm, which is positioned by the shoulder, which is positioned by the upper body. Part of the reason why the dynamic tripod is advocated as a preferred grip is because it also suggests an ideal posture for the rest of the body. The flexed ring and little fingers are the hand’s connection to the writing surface, and form an arch with the elbow (Erhardt, 1994, in Selin, 2003). The wrist, forearm, and upper arm are therefore in a relaxed position (Rosenbloom and Horton, 1971, in Selin, 2003). The position of the body relative to the table has been studied by Sassoon (1993, chap. 2) using different ages of school children. She coded general posture categories which are comprised of individual observations relating to forearm pronation, wrist flexion and extension, and body lean direction. With the oldest 15 year-old right-handed group, she found that in 78% of her observations, the children used a neutral, upright posture. The remaining observations were almost evenly distributed between leaning forward and left, with a few instances of leaning right. Observations with left-handed children of the same age revealed roughly mirrored result. Sassoon does not comment whether her observations confirmed a common association between left-handedness and a flexed (or “hooked”) wrist. Enstrom (1962) conducted a large

25

study of postural methods employed by left-handed elementary school students when writing6. He classified 15 different techniques broken into two primary groups: 6 techniques used by students who kept their hand below the writing line; and 9 techniques for students who kept their hand above the writing line (Enstrom refers to the second group as “hookers” [sic]). Based on factors comprised of writing quality, speed, good posture, and lack of smearing (the students used graphite pencils), three techniques in the first below-the-writing- line group are recommended, and only one technique in the hooking group was considered good, but with reservations. Overall, 69% of the students in the below-the-writing-line group were already using the recommended techniques, but only 20% of the students in the hooking group used the technique identified as good. No statistical breakdown is given for the groups themselves, so it is still unclear the frequency of extreme postures used by left-handed writers. In three smaller observational studies, also with school children, Selin (2003) was not able to statistically prove or disprove this left-handed characteristic.

Pen Ergonomics

The majority of pens in use today have a similar cylindrical shape which some variation in the diameter of the barrel, the addition of faceted sides, or style of tapering at the bottom or top. The overall weight and balance can vary quite a bit, and different materials can be used to wrap the barrel to increase friction, or cushion against fatigue from tight grips. Some the ergonomically inspired material changes are motivated by writer’s cramp and carpal tunnel syndrome (Harris and Hodges, 1995, in Selin, 2003). Somewhat radical departures in pen design have been patented and are available commercially (Figure 2-10). Note that therapists train patients suffering from chronic writer’s cramp to use the adapted tripod grip (Figure 2-9c) since it provides similar support to these ergonomic pen designs.

6 The study involved observing more than 1000 left-handed elementary school students over two years.

26

(a) RingPen (b) PenAgain (c) Evo pen Figure 2-10. Commercial ergonomic pen designs. (a) RingPen (Gorbunov, 1995); (b) PenAgain (Roche & Ronsse, 2003); (c) Evo pen (derivative of Debbas, 1995).

More subtle improvements to pen design may also provide ergonomic benefits. Kao, Smith, and Knutson (1969) suggest that the relationship between the pen point and the shank are the most important factors for writing comfort and efficiency. They found that if a pen tip was off centred, so that it aligned with the edge of the shank rather than a conventional central placement (Figure 2-11), overall writing time can be reduced. They attribute this finding to the improved visibility of the pen point during manipulation which “... slightly enhances the space-displacement of visual feedback of the writing point, as compared to the feedback from movement of the pen shank.”

(a) off centred pen point

(a) centred pen point

Figure 2-11. Illustration of pen point placements in Kao et al.’s experiment. (a) off-centred; (b) centred (from Kao et al., 1969, fig. 2).

2.2 The Pen as a Computer Input Device

Using a pen for computer input has long been through to be a natural solution for interactive control. In 1962, Licklider and Clark describe how they were currently addressing the second “immediate problem” to enable effective human-machine communication: Devise an electronic input-output surface on which both the operator and the computer can display, and through which they can communicate, correlated

27

symbolic and pictorial information. ... We are employing an oscilloscope and light pen to fulfill the function ... (Licklider & Clark, 1962, p. 121) Note that Licklider and Clark’s comments specify a single surface for input and output. Before presenting our survey of pen input technology, performance, techniques, and applications, we explore the possible device configuration of input and output space, as well as different control to display mappings. Licklider and Clark are specifying a direct input device, which is different than a device which uses indirect input (Figure 2-12) (Forlines et al., 2006; Whitefield, 1986). A direct input device combines the input and display together into one coincident space (most often a planar surface, given the 2-D nature of current displays). The most common control mapping is absolute: the input device is physically moved to the actual target position shown on the display. With indirect input, the input and display spaces are separated. Because of this separation, the current position specified by the input device is usually shown on the display as a cursor. Although the mapping can be absolute, it can also be relative, meaning that the input device specifies a new offset for the display cursor, rather than a unique position. With a relative or absolute mapping, movements made by the input device can be amplified or attenuated in display space. With an absolute mapping this is a direct result on the ratio of tablet size to display size. But with relative mapping, this is achieved with a control-to- display transfer function. Two common functions are: multiplying movements by a constant gain factor; and multiplying movements by a dynamic gain factor based on current device velocity (Casiez, Vogel, Balakrishnan, & Cockburn, 2008). Note that direct input devices can use a relative mapping as well, but research has indicated it is only beneficial for very large displays (Forlines et al., 2006). Unfortunately, the literature does not always make these distinctions clear. The term pen is used for both direct pen input and indirect pen input. Moreover, the term stylus is used interchangeably with pen in either context. In some cases, for indirect input, the type of mapping function is not given at all, or its parameters not specified. For clarity and consistency in this dissertation, we will use the term stylus to refer to indirect input (Figure 2-12b), and pen for direct input (Figure 2-12a). Our use of these terms

28

can be justified if one considers that digital ink is being emitted directly from the pen in the direct case, and in the indirect case, the stylus leaves no ink trail. We will also explicitly note the absolute or relative mapping for stylus input. For pen input, the assumption is that the mapping is absolute unless stated otherwise.

(a) direct input, pen (b) indirect input, stylus on tablet (c) indirect input, mouse Figure 2-12. Direct input and indirect input (a) direct input using a pen, the pen input space and the output display are coincident; (b,c) indirect input with a stylus on opaque tablet or a mouse, the input space and the output display space are separated.

Technology

Early Devices

Pen and stylus input devices are perhaps the earliest forms of X-Y input to a computer, and were demonstrated years before Engelbart’s invention of the mouse in 1964 (Myers, 1998, p. 49)7. The earliest type of pen input was a tethered light pen designed in 1957 and reported in 1959 (Gurley & Woodward, 1959) at MIT’s Lincoln Laboratory (Hurst, Mahoney, Gilmore, Roberts, & Forrest, 1989)8. A light pen is a direct input device which works with a cathode-ray-tube display. It calculates the current pointing position by detecting a pulse produced by an electron gun in the display as it refreshes the image. Sutherland used

7 There are sources that claim the joystick was the first form of X-Y computer input, but we could not find a trustworthy reference to verify this claim. Regardless, a joystick is a rate control device, so specifying an X-Y position is inherently relative and indirect. However, pen input is certainly pre-dated by “light gun” input, see next footnote. 8 The functionally of the light pen is the same as an earlier device called a “light gun”, invented by Bob Everett in 1952 also at MIT. The light gun had a large gun-like handle and trigger attached to the barrel.

29

this type of light pen to provide drawing and selection input for his Sketchpad system (Sutherland, 1963).

Image MIT archives. Figure 2-13. Sutherland’s Sketchpad with light pen input.

The first indirect stylus input device was the RAND tablet (Davis & Ellis, 1964)9. It functioned conceptually similar to a light pen, with the stylus sensing pulses emitted from the digitiser surface. Unlike the light pen, which sensed one pulse during a linear scan of the entire display, the RAND tablet generated a unique pattern of binary pulses at each X-Y location. The stylus detected the pulse pattern at the current physical location, and translated this to a unique X-Y location creating an absolute mapping. The stylus contained a lightweight tip switch to detect when it was pressed against the tablet. The version of the RAND tablet reported by Davis and Ellis (1964) had a resolution of 1024 × 1024 within a 10.4 inch square surface. Davis and Ellis use the terms “ink” and “stroke” to describe the marks created by the stylus (in spite of it being indirect), and motivate their design as one which “maintains ‘naturalness’”.

9 There are sources which claim that the Stylator (Dimond, 1958) was the first pen input device. While it is true that the Stylator enabled stylus input to a computer, it was purpose-built to recognize only handwritten characters which were entered one at a time on a physically and electronically constrained template: it had no capacity for sensing X-Y position.

30

Image courtesy of Computer History Museum. Figure 2-14. RAND Tablet (image courtesy Computer History Museum, www.computerhistory.org/collections/accession/102630781)

In their conclusion, Davis and Ellis speculate that the RAND tablet could be adapted for direct input by using a translucent surface and a back projected display. Gallenson (1967) describes a working prototype of this configuration which he calls a graphic tablet display. He notes that although Davis and Ellis report that users could adapt to a side-by-side indirect tablet and display context, Gallenson found people had difficulty with this configuration. However, with his direct input graphic tablet display, Gallenson suggests the effect is one of a “live piece of paper”. He writes: The superposition of the display on the tablet surface is a natural evolution and makes the displayed feedback more meaningful to the user, as well as easier to use. (p. 693)

Current Technology

Pen input technology continued to evolve using other hardware sensing techniques such as resistive, capacitive, acoustical, electromagnetic (Meyer, 1995; J. Ward & M. Phillips, 1987), and computer vision. Most pen-based hand-held mobile devices use a resistive digitizer, which detects where the tip of the stylus is pressed against the display. However, resistive digitizers are sensitive to anything pressed against the display, so they are more practical for small displays where other objects such as the hand typically rest outside the sensing area. Capacitive digitizers were originally designed to sense finger contact, but

31

special pens can be used as well. Of course, when used with a compatible pen, they will suffer from the same hand sensitivity problems as resistive digitizers. Acoustical sensors monitor changing characteristics of sound pulse to calculate pen position. They are relatively simple to implement and require no dedicated surface, but interference from environmental noise and potential inconsistencies in air pressure make them less practical, especially for larger surfaces (J. Ward & M. Phillips, 1987, p. 33). Recently, computer vision techniques have been used for pen input. This has been achieved by integrating cameras into the bezel, behind the surface, or in the pen tip itself. When integrated into the pen tip, a special pattern printed on the writing surface enables very accurate absolute positioning. This pattern can be made very subtle on a transparent surface so that it may be used in conjunction with a back projected display (Leitner et al., 2009). Electromagnetic sensors are currently the most common pen input technology for Tablet PCs and medium sized direct input devices such as Wacom Cintiq tablets. They work by sensing the characteristics of a magnetic pulse sent through a grid of conductors to a powered wire loop in the pen (or vice versa) (J. Ward & M. Phillips, 1987, p. 32). For a more detailed explanation of electromagnetic sensing, see Schomaker’s (1998) illustration and description reproduced in Figure 2-15.

Figure 2-15. Electromagnetic pen position sensor. “A controller samples the field strength emitted by the resonating tuned circuit at each line of a relatively coarse grid. Low-pass filtering of the sensed signal strength followed by differentiation yields a good position estimate on the basis of the time of zero crossing.” (illustration and description from Schomaker, 1998)

32

The main advantage is that only the pen is sensed with no interference from hands or fingers. Early implementations of electromagnetic sensing required a battery powered pen, but most current Tablet PCs use a slight variation called EMR® (Electro-Magnetic Resonance) Technology from Wacom (“Wacom EMR,” 2009). This uses the magnetic field to power a resonant circuit in the pen which in turn returns the magnetic pulse using the wire coil back to the digitizer surface (no other power is provided to the pen). Other sensors such as pen tilt and pressure also receive power from the resonant circuit and communicate their state within the pattern of return pulses. This non-contact technology also enables detection of pen movement up to 14 mm from the sensor grid (“Wacom EMR,” 2009). However, to reduce magnetic interference from ferrous materials and electronics, great care must be taken to shield the sensor grid and compensate for known irregularities (for example, when the pen is near the edge of the display).

Digitizer Problems

Ward and Phillips (1987) and Meyer (1995) survey potential problems with digitizer technology. Many of the problems described by Ward and Phillips appear to be corrected in modern digitizers like Wacom, but there are still four problems which seem to persist. There are two types of parallax errors (Figure 2-16). Hardware parallax errors are caused by the divergence of the pen coil and pen tip contact against the display glass. Visual parallax errors are caused by the thickness of the glass causing divergence of the rendered position form the pen tip. Two other common problems are eccentricity, when rotating the pen barrel perturbs the sensed position, and magnet field effects caused by ferrous hand jewellery or poor shielding near the bezel. A final problem noted by Ward and Phillips (1987, p. 43) is when large pen tips obscure (or occlude) what the user is drawing.

33

user’s eye

pen and coil

glass 4 mm LCD display digitiser sensor

Figure 2-16. Hardware and visual parallax. Hardware parallax: as the pen is tilted, the sensed position of the pen coil in the digitiser sensor (red dot) diverges from the tip contact point on the glass display surface (black dot). Visual parallax: as the user’s viewing angle diverges, the visual position of the cursor rendered by the LCD display (green dot) diverges from the tip contact point (block dot). Many digitizers attempt to compensate for parallax with user calibration (shown with adjusted position of green dots in LCD display), but this rarely works for all pen tilt angles.

2.3 Pen Input Performance and Capabilities

Comparisons of Pen Input with Other Devices

Studies which directly compare pen input to the mouse and other devices are summarized in Table 2-2. The studies cover a wide range of tasks, and almost all find pen input to be faster than using the mouse in some cases. However, there are potential problems due to the type of pen device used, mouse settings, and in some cases, inadequate study reporting.

34

Study Device(s) Task(s) Results Notes

I. S. MacKenzie et indirect stylus (A*) 1-D tapping stylus faster when *mapping is not reported, dragging but assumed to be al., 1991 mouse** 1-D dragging absolute based on stylus less accurate author’s later work when dragging * *mouse assumed to use no difference when CG 1.88 based on tapping author’s later work

Kabbash et al., indirect stylus (A) 1-D tapping stylus “somewhat” faster * mouse uses CG 2.0 1993 mouse* 1-D dragging stylus more accurate when dragging stylus less accurate when tapping

Accot & Zhai, 1999 indirect stylus (A) straight and circular stylus faster for circular * mouse uses two-stage steering and narrow threshold, with maximum mouse* tracing (“steering”) straight steering CG 2.0

Guiard, et al., 1999 indirect stylus (A) 1-D multi-scale stylus faster for high- *puck functioned like an absolute mapped mouse puck “mouse” (A)* pointing accuracy tasks puck faster for low- accuracy tasks

Kotani & Horii, indirect stylus (R) 1-D tracing stylus faster and more 2003 accurate for 1-D tracing mouse 2-D tracing no difference with 2-D tracing less muscle activity with stylus

Charness et al., direct pen* “menu”** direct pen faster *used a tethered light pen 2004 mouse mouse has lower **task not defined workload*** ***according to participant rating

Jastrzembski et al., direct pen* “web browsing mouse faster *used a tethered light pen 2005 task”** mouse **task not defined

Myers et al., 2002 direct pen* 1-D tapping pen faster *hybrid direct pen technique mouse mouse has fewer errors

Table 2-2. Comparisons between pen input with mouse. In Device(s) column, (A) is absolute mapping and (R) is relative mapping. In Notes column, CG is control-to-display gain factor for mouse transfer function.

35

I. S. MacKenzie, Sellen, and Buxton (1991) are often cited regarding pen pointing performance. In their comparison with mouse input10 when pointing and dragging, they find no difference between indirect stylus input and the mouse for pointing, but a significant speed benefit for the stylus when dragging. However, the mouse had a lower error rate when dragging. The authors conclude that the stylus can outperform a mouse in a GUI, especially if drawing or gesture activities are common. Like most Fitts’ law style research, they use a highly controlled, one-dimensional, reciprocal pointing task. Kabbash, I. S. MacKenzie, and Buxton (1993) compare the same devices with similar 1-D pointing and dragging tasks in their study of dominant and non-dominant hand performance. They report that the stylus is “somewhat” better than the mouse (no post-hoc test results are reported), and that the stylus more accurate when tapping, but worse for dragging. In an evaluation of two tracing (or “steering”) tasks, straight tracing and circular tracing, Accot and Zhai (1999) found that an indirect stylus with absolute mapping was faster than a mouse for narrow straight tracing, and circular tracing. They argue that the concept of error rate does not apply to steering tasks and do not report results pertaining to accuracy. Guiard, Beaudouin-Lafon, and Mottet (1999) compared an indirect stylus with a tablet puck in multi-scale pointing tasks. A puck is physically similar to a mouse, but the authors configured it to operate in absolute mode. They found the puck was faster for low-accuracy pointing tasks, but stylus performance is higher for tasks requiring high precision hand movements. No error rates are given. Kotani and Horii (2003) found that with practice, participants had higher speed and accuracy when using an indirect stylus with a relative mapping, compared to a mouse for simple tracing tasks. Surprisingly, they did not find any difference with a precise tracing task. They also examined their participant’s electromyograms (EMG) and found lower activity with the fingers and bicep for the stylus compared to the mouse.

10 Many of these studies compare pen input with other devices in addition to the mouse (such as a trackball, touchpad, etc.). We only report results for the mouse comparison since it is the best baseline device. For the most part, other input devices performed worse than both mouse and pen device.

36

Charness et al. (2004) compared a tethered light pen version of a direct pen to mouse input with different age groups. They used a “menu selection task”, but it is not described in detail (there is no indication of essential elements such as target distance, width, or direction). Their results found that the light pen is faster than the mouse. However, using the NASA workload scale (Hart & Staveland, 1988, cited in Charness et al. 2004) they found participants rated the mouse as having lower workload. An odd experimental condition asked participant to use the pen or mouse with their non-dominant hand, and some participants said they found it easier to use the pen in this way. The authors suggest this may be due to hand occlusion. Jastrzembski et al. (2005) also compare a tethered light pen version of a direct pen direct pen with mouse input. They use a loosely defined “web browsing task” with interleaved keyboard use, and tested a light pen for the direct pen condition. They found the light pen to be less efficient than the mouse. These are favourable results, but one has to be somewhat cautious. Charness et al. (2004) and Jastrzembski et al. (2005) do not report experimental task parameters, and they use a tethered light pen making their results difficult to confirm and less relevant. Guiard et al. (1999) use an absolute mapped puck in their comparison. Kotani and Horii (2003) use an indirect stylus with a relative mapping, which is quite different than direct pen input. The direct-pen related results from Myers et al. (2002) are interesting in that the speed- accuracy trade-off makes the pen and mouse favourable in terms of time and error rate respectively. These results are part of a larger study comparing the performance of laser pointing. The direct pen condition used in their experiment is in actuality a hybrid technique called Semantic Snarfing (Myers, Peck, Nichols, Kong, & R. Miller, 2001). The technique uses a hand-held computer with a built-in laser pointer to select a general area of the display first, and then the pen is used on the hand-held device to select the desired target within this area. We include this study for comparison, but reader should be aware that the nature of study and use of a hybrid pen input technique make it less of a direct comparison. The remaining studies all use an indirect stylus with absolute input, but they may have used an inferior mouse transfer function. Although not reported in I. S. MacKenzie, Sellen, and Buxton (1991), a similar paper by I. S. MacKenzie and Buxton (1992) uses the same mouse hardware and reports a constant control-to-display gain (CG) factor of only 1.88. Kabbash et al. (1993) use a constant CG of 2.0. Accot and Zhai (1999) use a threshold

37

acceleration function which switches between constant CG of 1.0 to 2.0. Recent work by Casiez et al. (2008) suggests that these CG settings are quite low and would likely reduce the performance of the mouse. In fact, Accot and Zhai acknowledge that the type of transfer function used with relative devices can introduce an experimental confound, but for the sake of comparison, they resort to the default CG setting.

Comparisons of Direct Pen and Indirect Stylus

If we ignore the potential confound of CG gain used for the mouse, the positive findings for indirect stylus input could still be relevant for direct input. If the performance of direct pen input is as-good-or-better than indirect stylus input, then we can conclude that direct pen input is advantageous over mouse input. Theoretically, Whitefield argues that a direct input pen is advantageous since it does not need extra workspace like the mouse (Whitefield, 1986). However, he also notes problems with occlusion and parallax: ... if one is inputting more than a single isolated response, one might have to move one's hand away from the screen between responses in order to get a clear view of the screen and thus to locate the next target: and parallax effects can produce a tendency to point at a location usually slightly nearer the centre of the screen than the target. (p. 100) Whitefield continues that indirect devices, after some initial learning, are more comfortable due to a more optimal body position, they eliminate visual feedback issues such as parallax, and could enable CD gain manipulation. There are three studies we are aware of that make such a comparison (summarized in Table 2-3). Hancock and Booth (2004) compared 2-D target selection performance with direct and indirect input. Overall, they found direct pen input to be faster. Phillips, Triggs, and Meehan (2005) investigated differences between direct pen and indirect stylus input. Their direct pen condition actually used a stylus on an opaque tablet, but target locations were painted on the tablet. For the indirect stylus condition, they forced participants to monitor the display by placing a curtain between the participant and their hand operating the stylus. They found significantly higher task times for their indirect stylus condition (when participants were forced to watch the display) compared to direct pen (when watching their

38

hand). Forlines and Balakrishnan (2008) compare crossing and pointing performance with direct pen and indirect stylus input. They find that direct pen input is advantageous for crossing tasks, but when selecting very small targets, there is little difference between direct pen and indirect stylus.

Study Device(s) Task(s) Results Notes

Hancock & Booth, direct pen 2-D tapping direct pen faster 2004 indirect stylus (A)

Phillips, Triggs, & direct pen 2-D tapping direct pen faster *direct pen condition used Meehan, 2005 (simulated)* painted targets on tablet

indirect stylus (A)

Forlines & direct pen 1-D crossing direct pen faster Balakrishnan, 2008 indirect stylus (A) 1-D tapping effect more pronounced with larger and more distant targets

Table 2-3. Comparisons between direct pen input and indirect stylus input. In Device column, (A) is absolute mapping and (R) is relative mapping.

Performance Experiments

Researchers have examined aspects of pen performance such as speed and accuracy in common low-level interactions like target selection, area selection, and dragging. Not all studies use direct pen input with a Tablet PC-sized display: some use indirect pen input, and other researchers have focused on pen input with smaller hand-held mobile devices. Pen characteristics such as mode selection, handedness, tactile feedback, tip pressure control, and barrel rotation have also been investigated quite thoroughly, but other aspects like pen tilt and hand occlusion are often discussed, but not investigated in great detail. In addition, ergonomic factors have been observed and inspired improvements to pen design.

Target Selection

Ren and Moriya (2000) examine the accuracy of six variants of pen tapping selection techniques in a controlled experiment with direct pen input on a large display. They find very high error rates for 0.72 and 3.24 mm targets using two basic selection techniques: Direct On, where a target is selected when the pen first contacts the display (the pen down event) and Direct Off, where selection occurs when the pen is lifted from the display (the pen up

39

event). Note that in a mouse-based GUI, targets are typically selected successfully only when both down and up events occur on the target; hence accuracy will likely further degrade. Ramos et al. (2007) argue that accuracy is further impaired when using direct pen input because of visual parallax and pen tip occlusion – users can not simply rely on the physical position of the pen tip. They found that users had very high error rates when selecting targets smaller than 1.1 mm. To compensate, their Pointing Lens technique enlarges the target area with increased pressure, and selection is trigged by lift off. With this extra assistance, they find that users can reliably select targets very small targets. In Accot and Zhai’s study of their crossing interaction paradigm (2002) they find that when using a pen, users can select a target by crossing as fast, or faster, than tapping in many cases. However, their experiment uses indirect pen input and the target positions are in a constrained space, so it is not clear if the performance they observe translates to direct pen input. Hourcade and Berkel (2006) later compare crossing and tapping with direct pen input (on a small hand-held mobile device), as well as the interaction of age. They find older users have lower error rates with crossing, but find no difference for younger users. Unlike Accot and Zhai’s work, Hourcade and Berkel use circular targets as a stimulus. Without a crossing visual constraint, they find that participants exhibit characteristic movements, such as making a checkmark. The authors speculate that this may be because people do not tap on notepads, but make more fluid actions like writing or making checkmarks supporting the general notion of crossing. Mizobuchi and Yasumura (2004) investigate tapping and lasso selection on a pen-based hand-held mobile device. They find that tapping multiple objects is generally faster and less error prone than lasso circling, except when the group of targets are highly cohesive and form less complex shapes. Note that enhancements introduced with Windows Vista encourage selecting multiple file and folder objects by tapping through the introduction of selection check-boxes placed on the objects. Lank and Edward (2005) note that when users lasso objects, the “sloppy” inked path alone may not be the best indicator of their intention. They find that by also using trajectory information, the system can better infer the user’s intent.

Mode Selection

To operate a conventional GUI, the pen must support multiple button actions to emulate left and right mouse clicks. The Tablet PC simulates right-clicking using dwell time and

40

visual feedback by pressing a barrel button, or by inverting the pen to use the “eraser” end. Li, Hinckley, Guan, and Landay (2005) find that using dwell time for mode selection is slower, more error prone, and is disliked by most by participants. In addition to the increased time for the dwell itself, the authors also found an additional preparation time is needed for the hand to slow down and prepare for a dwell action. Pressing the pen barrel button, pressing a button with the non-preferred hand, or using pressure are all fast techniques but using the eraser or button with non-preferred hand are the least error prone. Hinckley, Baudisch, Ramos, and Guimbretière’s related work examining mode delimiters (2005) also finds dwell timeout to be slowest, but in contrast to Li et al., finds that pressing a button with the non-dominant can be error prone due to synchronization issues. However, Hinckley et al.’s Springboard (2006) shows that if the button is used for temporary selection of a kinaesthetic quasi mode (where the user selects a tool momentarily, but afterwards returns to the previous tool), then it can be beneficial. Grossman et al. (2006) provide an alternate way to differentiate between inking and command input by using distinctive pen movements in hover space (i.e., while the pen is within tracking range above the digitizing surface but not in contact with it). An evaluation shows that this reduces errors due to divided attention and is faster than using a conventional toolbar in this scenario. Forlines, Vogel, and Balakrishnan’s (2006) Trailing Widget provides yet another way of controlling mode selection. The Trailing widget floats nearby, but out of the immediate area of pen input, and can be “trapped” with a swift pen motion.

Handedness

Hancock and Booth (2004) study how handedness affects performance for simple context menu selection with direct pen input on large and small displays. They note that identifying handedness is an important consideration, since the area occluded by the hand is mirrored for left- or right-handed users and the behaviour of widgets will need to change accordingly. Inkpen et al. (2006) studies usage patterns for left-handed users with left- and right-handed scrollbars with a direct pen input hand-held mobile device. By using a range of evaluation methodologies, including a longitudinal field study with an open-ended task and two formal experiments, they find a performance advantage and user preference for left- handed scrollbars. All left-handed participants cite occlusion problems when using the right- handed scrollbar. To reduce occlusion, some participants raised their grip on the pen or

41

arched their hand over the screen, both of which are reported as feeling unnatural and awkward. Their methodological approach includes two controlled experiments and a longitudinal study which lends more ecological validity to their findings.

Pressure

Ramos, Boulos, and Balakrishnan (2004) argue that pen pressure can be used as an effective input channel in addition to x-y position. In a controlled experiment, they found that participants could use up to 6 levels of pressure with the aid of continuous visual feedback and a well-designed transfer function creating the possibility of pressure activated GUI widgets. Ramos and colleagues subsequently explore using pressure in a variety of applications, including an enhanced slider that uses pressure to change the resolution of the parameter (Ramos & Balakrishnan, 2005), a pressure-activated Pointing Lens (Ramos et al., 2007), that is found to be more effective than other lens designs, and a lasso selection performed with different pressure profiles used to denote commands (Ramos & Balakrishnan, 2007).

Tactile Feedback

Lee, Dietz, Leigh, Yerazunis, and Hudson (2004) design a haptic pen using a solenoid actuator that provides tactile feedback along the longitudinal axis of the pen, and show how the resulting “thumping” and “buzzing” feedback can be used for enhancing interaction with GUI elements. Sharmin, Evreinov, and Raisamo (2005) investigate using vibrating pen feedback during a tracing task and find that tactile feedback reduces time and number of errors compared to audio feedback. Poupyrev, Okabe, & Maruyama (2004) evaluate tactile feedback sent through the display to the pen. They found it improved performance when dragging, but did not find a difference when tapping. Forlines and Balakrishnan (2008) compare tactile feedback with visual feedback, for direct and indirect pen input on different display orientations. They find that even a small amount of tactile feedback can be helpful, especially when standard visual feedback is occluded by the hand. Current Tablet PC pens do not support active tactile feedback, but the user does receive passive feedback when the pen tip strikes the display surface. However, this may not always correspond to the system registering the tap: consider why Windows Vista designers included a small “ripple”

42

animation to visually reinforce a tap. A similar type of ripple contact visualization (tested with a multi-touch device) was found to increase accuracy (Wigdor et al., 2009).

Barrel Rotation and Tilt

Bi et al. (2008) investigate pen barrel rotation as a viable form of input. They find that unintentional pen rolling can be reliably be filtered out using thresholds on rolling speed and rolling angle and that users can explicitly control an input parameter with rolling within 10 degree increments over a 90 degree range. Based on these findings, the authors designed pen barrel rolling interaction to control object rotation, simultaneous parameter input, and mode selection. Because most input devices do not support barrel rotation, the authors use a custom built, indirect stylus. Tian et al. (2007; 2008) explore using pen tilt to enhance the orientation of a cursor and to operate a tilt-based pie-menu. The authors argue for an advantage to using tilt in these scenarios, but they used indirect stylus input in their experiments.

Ergonomics

Haider, Luczak, and Rohmert (1982) is perhaps one of the earliest studies of pen computing and focused on ergonomics. They log variables such as eye movement, muscle activity, and heart activity when using a light pen on a vertical display, touch screen, and keypad with a simulated police command and control system. The authors find lower levels of eye movement with the light pen, but high amounts of muscle strain in the arms and shoulders as well as more frequent periods of increased heart rate. They note that since the display was fixed, participants would bend and twist their bodies to reduce the strain. In a study comparing indirect stylus input to pencil on paper, Fitzmaurice et al. (1999) find that when writing or drawing on paper, people prefer to rotate and position the paper with their non-dominant hand rather than reach with their dominant hand. In addition to setting a comfortable working context and reducing fatigue, they also speculate that this reduces hand occlusion when drawing. They find that when using stylus input instead of pencil on paper, this natural rotation tendency is hampered because of the tablet’s thickness, weight, and tethered connections. An unpublished, but often cited study in the commercial world was performed by Global Ergonomic Technologies, a commercial consulting firm. They analyzed joint angles when using a pen and mouse for various tapping and dragging tasks (GET Consulting Study,

43

1998). Their findings favour the pen: unlike the mouse, pen participants adopted postures with almost no wrist pronation, very little flexion, lower ulna deviation, and lower radial deviation. Overall, they found that pen postures were more neutral, and thus “biomechanically superior” to the mouse. However, the experiment used only 8 participants, the tasks, apparatus, and design are not described adequately, and no statistical tests are used in their analysis. For instance, it is not clear whether they used an indirect stylus or direct pen for input. In addition, the study appears to be sponsored by Wacom, a large manufacturer of pen input devices, and may be inherently biased. Wu and Luo (2006a) examined arm, hand, and pen postures with direct pen input when tapping, drawing, and writing using a Tablet PC placed flat on a table. They noted four characteristic postures in which users did or did not support their arm and hand when using the pen: no support at all (“hanging in the air”), wrist local support, little finger local support, and elbow local support (Figure 2-17). When tapping, no participants supported their arm or hand, but when writing, just over half the time, participants used one of the three support methods. In addition, they noted that some participants adopted a high and loose pen grip when tapping (Figure 2-18). Post-study interviews found that participants complained about the thickness of the tablet making it uncomfortable to rest their arm like they would with pencil and paper, and found that several participants were worried about scratching or staining the display.

(a) no support (b) wrist support

(c) finger support (d) elbow support Figure 2-17. Forearm and hand postures observed by Wu & Luo. (from F. Wu & Luo, 2006a, fig. 1)

44

Figure 2-18. Extreme pen grips observed by Wu & Luo. (from F. Wu & Luo, 2006a, fig. 2)

To counteract these issues and in support of past pen ergonomic literature, the authors designed an ergonomic pen that included a fourth support point below the thumb (Figure 2-19). In an experimental evaluation, they found their pen design increased hand stability and reduced hand fatigue with slightly lower error rates and task times.

Figure 2-19. Wu and Luo’s ergonomic Tablet PC pen. (a) pointing; (b) writing; (c) drawing (from F. Wu & Luo, 2006a, fig. 8)

Wu and Luo (2006b) also evaluated the performance of different pen barrel diameters and lengths (see Figure 2-20) when drawing, writing, and pointing. Overall, they found that longer pens were faster and sometimes more accurate across all tasks. The shortest, 80 mm pen, was consistently ranked less favourable by participants. There was an interaction of pen width and task: when pointing, the thinnest, 5.5 mm pen was fastest; when writing, the medium 8 and 11 mm pens faster and more accurate; and when drawing, the thicker 11 and 15 mm pens were faster and more accurate. They suggest the poor overall performance for the short, 80 mm pen, is because the pen is close to most participants hand breadth which led to a palmer grip. They cite evidence suggesting that manipulating short objects in this way is uncomfortable and difficult to hold (Lewis & Narayan, 1993, Stanton, 1998, cited by F. Wu & Luo, 2006b). However, the authors suggest a length of 100 mm would be acceptable for portable hardware.

45

140 mm 110 80

15 11 8 5.5 mm Figure 2-20. Pen sizes evaluated by Wu and Luo (based on F. Wu & Luo, 2006b, fig. 1)

2.4 Pen Interaction Paradigms

Part of the problem with the relatively poor commercial success of direct pen input and the Tablet PC may be that current software applications and GUIs are not well-suited for the pen. One reason for the lack of pen-specific commercial applications is that the primary adopters of pen computing are in education, healthcare, illustration, computer-aided design, and mobile data entry (A. Chen, 2004; Shao, Fiering, & Kort, 2007; Whitefield, 1986). These vertical markets use specialized software which emphasizes handwritten input and drawing, rather than general computing. For the general business and consumer markets, software applications and GUIs are designed for the mouse, thus usability issues specific to direct pen input have not always been considered. Researchers and designers have responded by developing new pen-centric interaction paradigms, widgets that leverage pen input capabilities, and software applications designed from the ground up for pen input.

Gestures and Sketch-Based Interfaces

It is reasonable to argue that tasks which are achieved through drawing are more easily performed with a pen. Perhaps one of the most demanding is hand drawn animation, where users can draw shapes and specify motion through their own movements. Regarding his

GENESYS picture-driven animation system, Baecker (1969) argues a key component is a pen- based interface:

46

An input device such as a light pen, tablet plus stylus, or wand, which allows direct drawing to the computer in at least two spatial dimensions. The operating environment must, upon user demand, provide at least brief intervals during which the sketch may be made in real time. The animator must then be able to draw a picture without any interruption. Furthermore, the computer must record the "essential temporal information" from the act of sketching. (p. 274) The notion of capturing the dynamics of pen movement as temporal and positional input led researchers to gestures as a way of invoking a command with a distinctive motion rather than manipulating GUI widgets. Early explorations include Buxton, Sniderman, Reeves, Patel, and Baecker (1979) who use elementary gestures to enter musical notation, Buxton, Fiume, Hill, Lee, and Woo (1983) who use more complex gestures for electronic sketching, and Kurtenbach and Buxton’s Gedit (1991b) which demonstrates gesture-based object creation and manipulation. Later, completely gesture-based research applications appeared such as Lin, Newman, Hong, and Landay’s DENIM (2000), Moran, Chiu, & Melle’s Tivoli (1997), Forsberg, Dieterich, and Zeleznik’s music composition application (1998), and Bae, Balakrishnan, and Singh’s ILoveSketch (2008). Note these all target very specific domains which emphasize drawing, sketching, and notation. Although these researchers (and others) have suggested that gestures are more natural with a pen, issues with human perception (C. A. J. Long, Landay, Rowe, & Michiels, 2000), biomechanical performance (Cao & Zhai, 2007), and disambiguation between “ink” and gesture (Zeleznik & T. Miller, 2006) can make the design of unambiguous gesture sets challenging. Perhaps most problematic is that gestures are not self-revealing and must be memorized through training. Marking Menus (Kurtenbach & Buxton, 1991a) addresses this problem with a visual preview and directional stroke to help users smoothly transition from novice to expert usage, but these are limited to menu-like command selection. Hinckley et al. (2007) found that previous experience with point-and-click interfaces prevented users from discovering stroking or crossing gestures. Even after the availability of the gestures were explained, the users had no mental model of what they were supposed to do. As a solution, they created on-screen hints with gesture names and a highlighter-like tracing of the gesture stroke.

47

In most commercial operating systems, such as Windows 7 or Apple’s iPhone, the number of available gestures is quite small. Ignoring multi-touch gestures, the most common type of gesture used are directional flicks for pagination and scrolling. Aliakseyeu et al. (2008) evaluated this type of multi-flick gesture, and found its performance superior to using a GUI scrollbar widget. Perhaps due to limitations with gestures, several researchers have created applications which combine standard GUI widgets and gestures. Examples include Schilit, Golovchinsky, and Price’s Xlibris electronic book device (1998), Truong and Abowd’s StuPad (1999), Chatty and Lecoanet’s air traffic control system (1996), Gross and Do’s Electronic Cocktail Napkin (1996), and Zeleznik et al.’s Lineogrammer (2008). These all support free-form inking for drawing and annotations but rely on a conventional GUI tool bar for many functions which suggests a limitation when using gestures with a large command set.

Pen Specific Widgets and Interfaces

Later, researchers introduce pen-specific widgets in their otherwise gesture-based

applications. Ramos and Balakrishnan’s LEAN (2003) is a pen-specific video annotation application which uses gestures along with an early form of pressure widget (Ramos et al., 2004) and two slider-like widgets for timeline navigation. Agarawala and Balakrishnan’s BumpTop (2006) uses physics and 3-D graphics to lend more realism to pen-based object manipulation. Both of these applications are initially free of any GUI, but once a gesture is recognized, or when the pen hovers over an object, widgets are revealed to invoke further commands or exit modes. Hinckley et al.’s InkSeine (2007) presents what is essentially a pen-specific GUI. It combines and extends several pen-centric widgets and techniques in addition to making use of gestures and crossing widgets. Aspects of its use required interacting with standard GUI applications in which the authors found users had particular difficulty with scrollbars. To help counteract this, they adapted Fitzmaurice et al.’s Tracking Menu (2003) to initiate a scroll ring gesture in a control layer above the conventional application. Fitzmaurice et al.’s Tracking Menu (2003) is designed to support the rapid switching of commands by keeping a toolbar near the pen tip at all times. The scroll ring gesture uses a circular pen motion as an alternative to the scrollbar (Grham Smith, schraefel, & Baudisch, 2005; Moscovich & Hughes, 2004). Creating pen-specific GUI widgets has been an area of pen research for some

48

time, for example Guimbretière and Winograd’s FlowMenu (2000) combines a crossing- based menu with smooth transitioning to parameter input. In most cases, compatibility with current GUIs is either not a concern or unproven.

Crossing

Another, perhaps less radical, pen interaction paradigm is selecting targets by crossing through them rather than tapping on them (Accot & Zhai, 2002). Apitz and Guimbretière’s CrossY (2004) is a sketching application which exclusively uses a crossing-based interface including crossing-based versions of standard GUI widgets such as buttons, check-boxes, radio buttons, scrollbars, and menus. Two potential issues with crossing-based interfaces are target orientation and space between targets. Accot and Zhai (2002) suggest that targets could automatically rotate to remain orthogonal to the pen direction, but this could further exacerbate the space dilemma. They note that as the space between the goal target and near-by targets is decreased, the task difficulty becomes a factor of this distance rather than the goal target width. Dixon, Guimbretière and Chen (2008) investigate this “space versus speed tradeoff” in the context of crossing-base dialogue boxes. They find that if the recognition algorithm is relaxed to recognize “sloppy” crossing gestures, then lower operation times can be achieved (with only slightly higher error rates). This relaxed version of crossing could ease the transition from traditional click behaviour, and, with reduced spatial requirements, it could co-exist with current GUIs.

Pen-Specific Operating Systems and Applications

In spite of the activity in the pen research community, commercial applications for the Tablet PC tend to emphasize free-form inking for note taking, drawing, or mark-up while relying on standard GUI widgets for commands. Autodesk’s Sketchbook Pro (Autodesk, 2007) is perhaps the most pen-centric commercial application at present. It uses a minimal interface, takes advantage of pen pressure, and users can access most drawing commands using pen-specific widgets such as the Tracking Menu and Marking Menu. However, it still relies on conventional menus for some commands.

49

2.5 Summary

Using direct pen input seems like a reasonable idea (perhaps even “natural”). The physical capabilities of our hands are well suited to grasping and manipulating a pen. Pen input technology has been around for over five decades. The raw speed of pen input is very encouraging, and may even be faster than a mouse. Researchers have studied various aspects of pen characteristics, and developed new pen-centric interaction techniques, widgets, and applications to leverage pen capabilities. Yet, pen input for general computing has not been widely adopted. Are we missing some aspect of pen performance? Persistent hardware issues such as parallax error and bulk could also be to blame. There could be fundamental problems with pen input in spite of its speed. For example researchers note problems with high-precision tasks and have found possible ergonomic issues. There is also some speculation about the effect of hand occlusion, but as of yet there are no in depth studies of its characteristics or a possible effect on performance. The persistence of current GUIs may also be a factor: many researchers seem to imply that current GUIs must be abandoned or altered significantly to better support pen interaction. It is likely due to a combination of these issues. Although researchers have examined aspects of pen input with conventional GUI widgets in the process of designing and evaluating alternative widgets and techniques, their investigations and solutions have been evaluated in experimental isolation with synthetic tasks. Our belief is that to study how pen input really performs; we need to observe users engaged in realistic tasks using current hardware with current GUIs and standard software applications.

50

3 Observational Study of Pen Input

We begin by investigating what the major issues are with pen interaction and a conventional GUI. Researchers have suggested that more open-ended tasks can give a better idea of how something will perform in real life (Ramos et al., 2006). Indeed, there are some examples of qualitative and observational pen research (Briggs et al., 1993; Turner et al., 2007; Inkpen et al., 2006; Fitzmaurice et al., 1999; Haider et al., 1982). Unfortunately, these have used older technologies like indirect pens with opaque tablets and light pens, or focused on a single widget or a specialized task. To our knowledge, there has been no comprehensive qualitative, observational study of Tablet PC or direct pen interaction with realistic tasks and common GUI software applications. In this chapter, we present the results of such a study. The study includes pen and mouse conditions for baseline comparison, and to control for user experience, we recruited participants who are expert Tablet PC users, power computer users who do not use Tablet PCs, and typical business application users. We used a realistic scenario involving popular office applications with tasks designed to exercise standard GUI components, and covered typical interactions such as parameter selection, object manipulation, text selection, and ink annotation. We base our analysis methodology on Interaction Analysis (Jordan & Henderson, 1995) and Open Coding (Strauss & Corbin, 1998). Instead of examining a broad and open- ended social working context for which these techniques were originally designed, we adapt them to analyze lower level interactions between a single user and software. This style of qualitative study is more often seen in the Computer Supported Cooperative Work (CSCW)

51

community (e.g., Ranjan, Birnholtz, & Balakrishnan, 2006; Scott, Carpendale, & Inkpen, 2004), but CSCW studies typically do not take advantage of detailed and diverse observation data like the kind we gather: video taken from the user’s point-of-view; 3-D positions of their forearm, pen, Tablet PC, and head; screen capture; and pen input events. To synchronize, segment, and annotate these multiple streams of logging data, we developed a custom software application. This allows us to view all data streams at once, annotate interesting behaviour at specific times with a set of annotation codes, and extract data for visualization and quantitative analysis. We see our methodology as a hybrid of typical controlled HCI experimental studies, usability studies, and qualitative research.

3.1 Related Work

In the previous chapter, we examined a large body of work investigating low-level aspects of pen performance mostly using controlled experiments. While evaluating low-level aspects is certainly important, we contend that understanding how pen-based interactions function with real applications and realistic tasks is equally, if not more, important. However, as Briggs et al. notes (1993), this is less understood: While there has been a great deal of prior empirical research studying pen-based interfaces, virtually all prior research has examined the elementary components of pen-based interfaces separately (cursor movement, software navigation, handwriting recognition) for very elementary subtasks such as pointing, cursor movement, and menu selection. (p. 73) Ramos et al. (2006) argue that a more open-ended ecologically-valid study can give users a better idea how a new tool will perform in real life. Studying pen input with more ecological validity necessitates using techniques like field work to examine in situ usage, and observational usability studies of realistic scenarios. Yet, there are few examples of these techniques used to examine direct pen interaction. We discuss them below.

Field Studies of In Situ Usage

Since field studies are most successful when investigating larger social and work related issues, researchers have focused on how pen computing has affected general working

52

practices. For example, a business case study of mobile vehicle inspectors finds that with the addition of handwriting recognition and wireless networking, employees can submit reports faster and more accurately with pen computers (A. Chen, 2004). However, specific results relating to pen-based interaction – such as Inkpen et al. (2006) who include a longitudinal field study in their examination of handedness and PDAs – are less common. Two field studies in education do report some aspects of Tablet PC interaction. Twining et al. (2005) reports on Tablet PC usage in twelve British elementary schools, including some discussion of usage habits. They find that staff members tend to use convertible Tablet PCs in laptop mode, primarily to allow typing with the keyboard; although many still used the pen for GUI manipulation. However, when marking assignments or working with students, they use the pen in slate mode. The students were more likely to use the pen for making notes, though they used the onscreen keyboard or left their writing as digital ink instead of using writing recognition. Pen input enables the students to do things which would be more difficult with a mouse, such as create art work and animations. In fact, comments from several schools indicate that the pen is more natural for children. They also note problems with the initial device cost, battery life, screen size, glare, and frequently lost pens. In a field study of high school students and Tablet PCs, Sommerich et al. (2007) find that the technology does affect schoolwork patterns, but more relevant for our purposes is their discussion of ergonomic issues. For example, they find that the students used their Tablet PC away from a desk or table 35% of the time, 50% reported assuming awkward postures, and 69% experienced eye discomfort. No specific applications or issues with interaction are discussed.

Observational Usability Studies of Realistic Scenarios

Most pen-based research applications have been evaluated informally or with limited usability studies. In an informal study of the Electronic Cocktail Napkin (1996), the authors find that although there is no negative reaction to gestures, users have difficulty accurately specifying the pen position, and encounter problems with hand occlusion when using marking menus. The authors of BumpTop (2006) conducted a small evaluation and find that

53

participants are able to discover and use the functionality, but that crossing widgets are awkward near display edges and note problems with hand occlusion. More formal studies with realistic tasks often focus on specific aspects of pen input. Inkpen et al. (2006) used laboratory and field experiments to observe scrollbar usage with realistic tasks. Haider, Luczak, and Rohmert (1982) focus on ergonomics with their observational study of police officers using a simulated police command and control system with various input devices, including a tethered light pen. Fitzmaurice, Balakrishnan, Kurtenbach, and Buxton (1999) included realistic tasks in their observational study of art board orientation and indirect pen input. Briggs, Dennis, Beck, and Nunamaker (1993) compare user performance and preference when using indirect pen input and mouse/keyboard for operating business applications: word processing, spreadsheets, presentations, and disk management. Only the presentation graphics application and word processor supported mouse input in addition to keyboard commands. The experiment tests each application separately and the authors recruited both novice and expert users. They use custom-made, physical digitizer overlays with “buttons” to access specific commands for each application in addition to devoting digitizer space for controlling an onscreen cursor. Overall, they find that task times for the pen are longer for novice users with the word processor, and for all users when using the spreadsheet and file management. Much of their focus is on hand writing recognition, since at that time it was suggested that the pen was a viable, and even preferred, alternative to the keyboard for novice typists. However, the authors state that “once the novelty wore off, most of the users hated the handwriting recognition component of the pen-based interface.” For operations other than handwriting, the participants said that they preferred the fine motor control of the pen over the mouse when pointing, selecting, moving, and sketching. They also preferred selecting menus and buttons using the digitizer tablet. A more recent study by Turner, Pérez-Quiñones, and Edwards (2007) compares how students revise and annotate UML diagrams using pen and paper, the Tablet PC, and a mouse and keyboard. They found that more detailed editing instructions are given with pen and paper and the use of gestural marks such as circles and errors were more common with pen and paper and Tablet PC. However, with mouse and keyboard, their participants made notes

54

with more explicit references to object names in the diagram. Their evaluation includes only writing and drawing actions with a single application. In spite of these researchers attempting to answer Briggs et al.’s call for more realistic pen input studies, one must remain cautious regarding their results since only the Turner et al. study uses direct pen input with a modern Tablet PC device and operating system. Moreover, only Turner et al. evaluate behaviour with a conventional GUI.

Summary

Although observing tool usage in real life is often done with a field study, these ethnographic inquiries are more suited to addressing general aspects of Tablet PC usage in a larger work context. In contrast, the observational studies by Briggs et al. (1993) and Turner et al. (2007) focus on specific tasks while maintaining more ecological validity. Recent work from the Computer Supported Cooperative Work community (e.g., Ranjan et al., 2006; Scott et al., 2004) have combined aspects of traditional field research methodologies with more specific inquiries into lower-level interaction behaviour with controlled tasks – an approach we draw upon.

3.2 Study

Our goal is to examine how usable direct pen input is with a conventional GUI. For our study, we imagine a scenario where an office worker must complete a presentation while away from their desk using their Tablet PC. They use typical office applications like a web browser, spreadsheet and presentation tool. Because our focus is on GUI manipulation, the scenario could be completed without any text entry. Rather than conduct a highly controlled experimental study to examine individual performance characteristics in isolation, or, at the other extreme, an open-ended field study, we elected to perform a laboratory-based observational study situated between these two ends of the continuum with real tasks and real applications. By adopting this approach, we hope to gain a better understanding of how pen input performs using the status-quo GUI used with current Tablet PC computers. Users primarily interact with a GUI through widgets – the individual elements which enable direct manipulation of underlying variables (see Figure 3-4 for examples). The

55

frequency of use and location of widgets is not typically uniform. For example, in most applications, menus are used more often than tree-views. Buttons can appear almost anywhere, while scrollbars are typically located on the right or bottom. Also, some widgets provide redundant ways to control the same variable, enabling different usage strategies. For example, a scrollbar can be scrolled by either dragging a handle or clicking a button. To further add variability, a series of widgets may be used in quick succession forming a type of phrase (Buxton, 1995). For example making text “10 pt Arial Bold” requires selecting the text, picking from drop-down menus, and clicking a button in quick succession. Controlling all these aspects in a formal experiment would be difficult and we would likely miss effects only seen in more realistic contexts. We had one group of users complete the tasks with a mouse as a control condition for intra-device comparison. We also recruited three groups of pen participants according to their level of computer and Tablet PC experience. To support our observational analysis, we gathered a rich set of logging data including 3-D motion capture, video taken from the participant’s point-of-view, screen capture video, and pen events such as movement and taps. We use this data to augment and refine our observational methodology with high-level quantitative analysis and visualizations to illustrate observations.

Participants

Sixteen volunteers (5 female, 11 male), with a mean age of 30.8 years (SD 5.4) were recruited. All participants were right-handed, had experience with standard office applications, and used a computer for at least 5 hours per day on average. In a pre-study questionnaire, participants were asked to rate their experience with various devices and applications on a scale of 0 to 3, where 3 was a high amount of experience and 0 no experience. All participants said their experience with a mouse was 3 (high). All participants had occupations which required a computer: office worker, researcher, designer, administrative assistant, and illustrator.

Design

A between-participants design was used, with the 16 participants divided into 4 groups of 4 people each. One group used a mouse during the study and acted as a baseline control

56

group. The remaining three groups all used the Tablet PC during the study, but each of these groups contained participants with different levels of Tablet PC or conventional computer experience. In summary, the four groups were:

• Mouse. This was the control group where participants used a conventional mouse. Participants in this group said they used a computer for between 8 and 9 hours per day.

• Pen1-TabletExperts. These were the only experienced Tablet PC users. Unlike the other pen groups, they all reported a high amount of experience with Tablet PC pen based computing in the pre-study questionnaire. They also reported using a computer for between 6 and 10 hours per day.

• Pen2-ConventionalExperts. These were experienced computer users who reported using a wide range of hardware, software and operating systems, but they did not report having any experience with Tablet PC pen based computing. They also reported that, on average, they used a computer between 9 and 10 hours per day.

• Pen3-ConventionalNovices. These were limited experienced computer users who used a single operating system and had a limited range of software experience (primarily standard office applications like word processors, spreadsheets, web browsing, and presentation tools). As with the Pen-2 group, they did not have any experience with Tablet PCs. They reported using a computer between 5 and 7 hours per day, which is less than all other groups.

Apparatus

The study was conducted using a Lenovo X60 Tablet PC with Intel L2400 @ 1.66 GHz and 2 GB RAM. It has a 1050 × 1400 pixel (px) display measuring 184 × 246 mm (12.1 inch diagonal) for a device resolution of 5.7 px/mm. We used the Windows Vista operating system and Office 2007 applications since they were state-of-the art at the time (we conducted this experiment in 2007) and both were marketed as having improvements for pen computing. The scenario applications were Microsoft PowerPoint 2007 (presentation tool), Microsoft Excel 2007 (spreadsheet), and Internet Explorer 7 (web browser). Since the completion of this study, Microsoft released Windows 7. It includes all of Vista’s pen computing improvements and adds two pen- related improvements: faster and more accurate

57

handwriting recognition in more languages and a handwriting entry method for mathematical notation (Microsoft, 2009). Since these improvements are both text-entry related, the results of our study remain as relevant for Windows 7 as they do for Windows Vista. We gathered data from four logging sources:

• User view. A head mounted 640 × 480 px camera recorded the user’s view of the tablet at 30 fps (Figure 3-1a). A microphone also captured participant comments and experimenter instructions.

• Motion capture. A Vicon 3-D motion capture system (“Vicon Motion Systems”) recorded the position and orientation of the head, forearm, tablet, and pen using 9.5 mm markers (Figure 3-1b) at 120 frames-per-second (fps). This data was filtered and down sampled to 30 fps for playback and analysis.

• Screen capture. The entire 1040 × 1400 px display was recorded as a digital screen capture video at 22 fps.

• Pen events. Custom logging software recorded the pen (or mouse) position, click events, and key presses. Our event logger was implemented as a Windows Global Hook process (Microsoft, "Hooks") and we were unable to capture pen specific data such as pressure.

At the basic level, these logs provided a record of the participant’s progress through the scenario. By recording their actions in multiple ways, we hoped we could discern when an intended action was successful or not. Moreover, capturing 2-D and 3-D movements would enable us to visualize characteristic motions. We also felt that a view of the pen, hand, and display together would be particularly useful for analysing direct input interactions. The Motion Capture and User View logs ran on dedicated computers. The Screen and Pen Event logs ran on the tablet without introducing any noticeable lag. Although the Vicon motion tracking system supports sub-millimetre tracking, the captured data can be somewhat noisy due to the computer vision-based reconstruction process. To compensate for this noise, we applied a low pass filter using cut-off frequencies of 2 Hz for position and 6 Hz for rotation before down sampling to 30 fps. Unlike most controlled experiments with Tablet PC input, we intentionally did not place the tablet in a fixed location on a desk. Instead participants were seated in a standard

58

chair with the tablet configured in slate mode and held on their lap (Figure 3-1). This was done to approximate a working environment where tablet usage would be most beneficial (e.g. while riding a subway, sitting at a park bench, etc.). If the device was placed on a desk, then using a mouse becomes practical, and perhaps users in the environment would opt to use a mouse instead of the pen. Mouse participants were seated at a standard desk with the same Tablet PC configured in open laptop mode. A wired, 800 DPI, infra-red mouse was used with the default Windows pointer acceleration (dynamic control-display gain) setting.

(b) (a)

(b)

Figure 3-1. Experimental setup and apparatus. In the Tablet PC conditions, the participant was seated with the tablet on their lap: (a) a head- mounted video camera captured their point-of-view; (b) 9.5mm passive markers attached to the head, forearm, pen and tablet enabled 3-D motion capture.

Protocol

The study protocol required participants to follow instructions as they completed building a short Microsoft PowerPoint presentation slide deck using a web browser, spreadsheet, and presentation tool. The slide deck was partially constructed at the beginning of the study and users could complete all tasks without entering any text. Participants were told that the study was not about memorization or application usability. They were instructed

59

to listen closely to instructions, and then complete the task as efficiently as possible. As they worked, they were told to “think aloud” by saying what they were thinking, especially when encountering problems. If they forgot what they were doing altogether, the experimenter intervened and clarified the task. Each session was conducted as follows: first, each of the participants in the 3 tablet groups completed the pointing, dragging and right-clicking sections of the standard Windows Vista Tablet PC tutorial. Then all participants completed a brief PowerPoint drawing object manipulation training exercise, since early pilot experiments revealed that these widgets were not immediately intuitive. This training period took between 5 and 10 minutes. Once training completed, the main portion of the study began. The experimenter used a prepared script to issue tasks for the participant to complete. The participant completed each task before the experimenter read the next one. At the conclusion of the study, we conducted a brief debriefing interview.

Tasks

In total, our seven page study script contained 47 tasks which had to be completed. The entire script is reproduced in Appendix A, and time-lapse demonstration of the entire scenario can be viewed in Video 3-1 , we will summarize the main sections and give a small example from the script. The first 8 tasks asked the participant to open a PowerPoint file and correct the position and formatting of labelled thumbnails on a map (Figure 3-2). The text for tasks 4 through 8 are shown below to convey the general style:

Task 4: Correct the position of the and thumbnails: the polar bear’s habitat is in the north and the owl is in the south.

Task 5: Make the size, orientation, and aspect ratio of all the thumbnails approximately the same as the . It’s fine to judge this “by eye.”

Task 6: Change all labels to the same typeface and style as the beaver (20pt Arial Bold). You may have to select the “Home” tab to see the font controls.

Task 7: Now, we’ll make all animal names perfectly centered below the thumbnail. Select the label and the thumbnail together and pick “Align/Align Center” from the “Arrange” button in the “Drawing” toolbar.

60

Task 8: This is a good time to save your presentation. Press the save document icon located at the top left of the application.

(a) initial state of slide at beginning of task 4 (b) final state of slide at end of task 8

Figure 3-2. Study screen captures taken from initial task sequence. Here, the participant corrects the formatting of labelled animal thumbnails on a map: (a) before task 4 where some thumbnails are in the wrong position, scaled or rotated incorrectly, or have text labels rendered in different fonts and sizes; (b) at the conclusion of task 8 when the participant said all requested corrections to the thumbnails are complete.

Time: 01:36 Vogel_Daniel_J_201006_PhD_video_3_1.mp4 Video 3-1. Time-lapse demonstration of study scenario.

Tasks 9 through 20 continued with the participant navigating to, and then completing a slide about one of the . This required copying and pasting text from Wikipedia, and inserting a picture from a file folder (Figure 3-3a). Tasks 21 to 26 repeated the same steps with two more partially completed animal slides. In tasks 27 to 37, the participant used Excel to create a chart of animal population trends from a pre-existing data table, copied and pasted

61

it into the final slide of the presentation and added ink annotations such as written text and coloured highlighting (Figure 3-3b). Finally tasks 38 to 47 asked the participant to configure various slide show options and save an HTML version. After viewing the final presentation in the web browser, the study was complete.

(a)

task 11 tasks 12 - 19 task 20

(b)

task 31tasks 32 - 40 task 41

Figure 3-3. Screen captures of selected scenario tasks. (a) tasks 11 and 20, in which the participant is asked to complete a slide about one of the animals by copying and pasting text from Wikipedia, and inserting a picture from a file folder; (b) tasks 31 and 41, in which the participant uses Excel to create a chart of animal population trends from a pre-existing data table, copy and paste it into the final slide of the presentation, and add ink annotations.

62

The tasks in the study covered simple interactions like pressing the save button, as well as more complex tasks like formatting several text boxes on a presentation slide. We included two different types of tasks to reflect real world usage patterns:

• 40 Constrained tasks had a predictable sequence of steps and a clearly defined goal for task completion. Task 8 (shown above), which asked the participant to press the save button, is an example.

• 7 Open-ended tasks had a variable sequence of steps and required the participant to assess when the goal was reached and the task complete. Task 5 (shown above), which asked the participant to match the size and orientation of animal thumbnails, is an example.

Participants took between 40 minutes and 1 hour to complete the study, including apparatus set-up, training time, and debriefing. Comments from our participants suggested that the tasks were not too repetitive or too difficult, and in contrast to formal experiments we have administered, participants felt that time passed quickly – some even said the study was enjoyable.

Widgets and Actions

The tasks in our study were designed to exercise different widgets like menus and scrollbars, and common actions like text selection, drawing, and handwriting. Figure 3-4 illustrates some of the lesser-known widgets with a brief explanation. For our purposes, we categorize widgets according to their functionality, visual appearance, and invocation. For example, a menu is different than a context menu since the latter is invoked with a right- click, a drop-down is different from a menu because the former has a scrollable list of items, and a tree-view is different because hierarchical items can be expanded and collapsed. Note that widgets can also be categorized according to widget capabilities (Gajos & Weld, 2004), but this would consider a menu, tree-view, drop-down, and even a group of radio buttons equivalent, since all are capable of selecting a single choice given a set of possible options. However, in our categorization, these are different widgets because their functionality and visual appearance are different.

63

scrollbar slider up-down

splitter handles

drop-down text-box object tooltip tree-view Figure 3-4. Illustration of selected widgets. a slider adjusts a parameter by dragging a constrained handle; an up-down increments or decrements a parameter by incrementing using a button or by entering the value in a text-box; a drop-down selects one option from a list of choices which are shown upon invocation-- it may use a scrollbar to access a long list; a tree-view enables navigation through a hierarchical collection of items by expanding and collapsing nodes; a text-box object is one kind of drawing object in PowerPoint— it uses handles for scaling and rotation; a splitter changes the proportion of two neighbouring regions by dragging a constrained handle.

During our study, we expected participants to use 20 different widgets and 5 actions (Table 3-1). The actions include 3 types of selection: marquee (enclosing 2 or more objects by defining a rectangle with a drag), cell (selecting an area of spreadsheet cells by dragging), and text (selecting text by dragging). By analyzing our script, we calculated the expected minimum number of widget or action occurrences (Table 3-1, column N) necessary to complete the scenario. If a complex widget is composed of other complex widgets (such as when a drop-down includes a scrollbar) we considered these to be nested but distinct occurrences of two different widgets. In the case of buttons, we did not create a distinct occurrence for a button used to open another widget such as a menu or drop-down, or a button that is inherently part of a widget, such as the pagination buttons on a scrollbar. The specific instance of each widget or action may vary by size, orientation, position, and magnitude of adjustment. Note that the occurrence frequency distribution is not balanced and is dependent on the particular tasks in our script. This is a trade-off when adopting a realistic scenario such as in our study. However, we feel our tasks are representative of those used in other common non- text entry office application scenarios, and compared to repetitive controlled experiments, are much more representative of real usage patterns.

64

Widget/Action N I Widget/Action N I Widget/Action N I

button 52 52 up-down 8 88 window handle 4 8

drop-down 20 40 text select (A) 6 6 writing (A) 3 3

scrollbar 20 20 marquee select (A) 6 6 cell select (A) 2 2

menu 19 36 check-box 6 6 chart object 2 3

text-box object 18 27 slider 5 9 hyperlink 2 2

object handle 17 17 drawing (A) 5 5 color choice 1 1

tab 16 16 image object 5 5 splitter 1 1

context menu 16 32 tree-view 4 7

file object 11 11 radio button 4 4

Table 3-1. Ideal amount of widget and action usage in our study. N is the number of widget occurrences, I is the number of interactions such as clicks and drags. Actions are indicated with (A).

The total number of widget or action occurrences only conveys part of their usage frequency. We also computed the expected ideal number of interactions (clicks, drags, right- clicks, etc.) for each occurrence (Table 3-1, column I). Some widgets, such as a single level context menu, always have a predictable number of interactions – right-click followed by a single-click – resulting in 2 interactions per occurrence. Other widgets such as an up-down widget can have wide variance due to their magnitude of adjustment. If an up-down has to be decremented from a value of 5 to a value of 4 using 0.1 steps, it requires 9 click interactions. Some widgets, such as the scrollbar, enable a task to be completed in different ways. For example, if a browser page must be scrolled to the bottom, the user may either grab the scroll box and drag to the bottom of the page, or make many clicks in the paging region. When calculating types of interactions for this type of widget, we selected the optimum strategy in terms of effort. For the ideal number of interactions we used the minimum. In other words, we use the most efficient interaction style without any errors. For example, our predictions may assume that a scrollbar can be operated with a single drag rather than a sequence of page down taps for a long scrolling action. In practice, this ideal number of interactions may be difficult to achieve, but it does provide a theoretical bound on efficiency and enabled us to get a sense of the widget frequencies a-priori. In total, we calculated ideal numbers of 253

65

widget occurrences, and 407 interactions like clicking and dragging during the study (Table 3-2).

Interaction Number

single-click 301

double-click 11

drag 79

right-click 16

TOTAL 407

Table 3-2. Ideal number of expected interactions by interaction type.

3.3 Analysis

We based our methodology on Interaction Analysis (Jordan & Henderson, 1995) and open coding (Strauss & Corbin, 1998). Interaction Analysis “is an interdisciplinary method for the empirical investigation of the interaction of human beings with each other and with objects in their environment” (Jordan & Henderson, 1995, p. 39). It leverages video as the primary record of observation and uses this to perform much more detailed analyses of subtle observations than would be possible with traditional written field notes. However, we gathered an even richer and more quantitative collection of observational data, and focus on a much more constrained interaction context. Thus, we follow Scott (2005) and use the open coding approach. Open coding begins with a loosely defined set of codes in a first examination of the data logs, and then with subsequent iterations; the codes are refined and focused as trends emerge. In our case, we began coding general errors and refined this to ten specific types of errors, and then iterated again to code specific widgets and actions for more detailed analysis.

Custom Log Viewing and Analysis Software

Our user point-of-view video log, motion capture data, screen capture video, and pen event log were synchronized, segmented, and annotated using custom software we developed (Figure 3-5). Each data source is viewed in a separate player with the ability to pause, time

66

scrub, frame advance, adjust playback speed, etc. Our software is similar to the INTERACT commercial software package (“Mangold”). However, by building our own system, we could include custom visualizations for the pen event log and motion capture (see also Figure 3-6), more accurately synchronize the logs using our “time mark” scheme (explained below), easily gather data points based on annotation, and write code to gather specific data for statistics. Although this tool was purpose built for this analysis, we later revised it to be more general and used it to review video of the experiments discussed in chapters 4, 5 and 6.

(a) (b) (c) (d)

(e) (f) (g)

Figure 3-5. Analysis software tool. (a) screen capture player; (b) pen event player; (c) user view player; (d) synchronization markers and unified playback control; (e) motion capture player; (f) annotation detailed description; (g) annotation codes.

67

Tablet and Pen Laptop and Mouse

(a) (a) (b)

(c) (b) (d) (c)

(d)

Figure 3-6. Motion capture player. In addition to conventional playback controls, a 3-D camera can be freely positioned for different views. In the views shown in the figure the camera is looking over the participant’s shoulder, similar to the photo in Figure 3-1 right. Objects being tracked are: (a) head position; (b) tablet or laptop display; (c) pen tip or mouse; (d) forearm. In the laptop condition, the laptop keyboard was also tracked (shown in red).

Synchronization

The user view video source was used as the master time log. To assist with synchronization, six visual “time markers” were created by the participant during the study session. Each time mark was created by lifting the pen high above the tablet, and then swiftly bringing the pen down to press a large button in the pen event logging application. This created a time stamp in the pen event log, and changed the button to red for one second which created a visual marker for the user view and screen capture logs. The distinctive pen motion functioned as a recognizable mark in the motion capture log.

Segmentation into Task Interaction Sequences

Once synchronized, the logs were manually segmented into 47 sequences in which the participant was actively completing each task in our script. Since our study is more structured than typical interaction analysis, this is analogous to the first step of the open coding process where the data is segmented into a preliminary structure. A sequence began when the participant started to move towards the first widget or object, and the sequence ended when they completed the task. This removed times when the experimenter was introducing a task and providing initial instructions, the participant was commenting on a task after it was

68

completed, or when stopping and restarting the motion tracking system. This reduced the total data log time for each participant to between 20 and 30 minutes.

Event Annotation and Coding

The coding of the 47 task sequences for each of the 16 participants was performed in stages with a progressive refinement of the codes based on an open coding approach (Strauss & Corbin, 1998) with two raters – two different people identified events and coded annotations. First, a single rater annotated where some event of interest occurred, with an emphasis on errors. Next, these general annotations were split into three classes of codes, and one class, interaction errors, was further split into six specific types. A second rater was then trained using this set of codes. During the training process, the codes were further refined with the addition of a seventh type of interaction error and a fourth class (both of these were subsets of existing codes). Training also produced coding decision trees which provided both raters with more specific guidelines for code classification and event time assignment (see below). The second rater used this final set of codes and classification guidelines to independently identify events and code them across all participants. The first rater also independently refined their codes across all participants as dictated by the final set of codes and guidelines. There was a high level of agreement of codes for events found by both raters (Cohen’s Kappa of 0.89), but also a high number of events identified by one rater but not the other. We considered an event to be found by both raters if each rater annotated events with times separated by less than two seconds. Both raters found 779 events, but rater 1 and rater 2 found 238 and 251 additional events respectively. A random check of these additional events strongly indicated that these were valid events, but had simply been missed by the other rater. Moreover, the codes for these missed events did not appear to have a strong bias – raters did not seem to miss a particular type of event. Thus, with the assumption that all identified events are valid, events found by both raters account for 61% of all events found. In a similar coding exercise, Scholtz et al. (2004) also found that raters had difficulty finding events of interest (called “critical incidents” in their domain).

69

Given the high level of agreement between raters when both raters identified the same event, we felt justified to merge all events and codes from both raters. When there was disagreement (66 out of 779 cases), we arbitrarily chose to use the code from rater 1. We should note that rater 1 was the primary investigator. To guard against any unintentional bias, we examined our results when rater 2 was chosen: we found no significant changes in the quantitative results.

Annotation Events and Codes

Each annotation included the code type, synchronized log time for the event, the widget or object context if applicable, and additional description as necessary.

Code Types We identified four classes of codes: experiment instructions, application usability, visual search, and interaction errors. Events coded as experiment instructions, application usability, and visual search are general in nature and should not be specific to an input device. We felt these codes forced us to separate true interaction error codes from these other types of non-interaction errors:

• Experiment Instructions: performed the wrong task, adjusted wrong parameter (e.g., when asked to make photo 4 inches wide, the participant adjusted the height instead), or asked the experimenter for clarification

• Application Usability: application design led directly to an error (we identified specific cases as guidelines for raters, see below)

• Visual Search: performing a prolonged visual search for a known target

Since our focus is on pen based interaction on a Tablet PC versus the baseline mouse based interaction, we were most interested in interaction errors which occurred when the participant had difficulty manipulating a widget or performing an action. We defined eight types of interaction error codes:

• Target Selection: could not click on intended target (e.g., clicking outside the scrollbar)

70

• Missed Click: making a click action, but not registering any type of click (e.g., tapping too lightly to register a click)

• Wrong Click: attempting one type of click action, but a different one is recognized by the system (e.g., right-clicking instead of dragging a scrollbar, single click instead of a double click)

• Unintended Action: attempting one type of interaction, but accidentally invoking a different one (e.g., attempted to open a file with a single click when a double-click is required)

• Inefficient Operation: reaching the desired goal, but without doing so in the most efficient manner (e.g., scrolling a large document with individual page down clicks rather than dragging the scroll box; overshooting an up-down value and having to backtrack)

• Repeated Invocation: unnecessarily invoking the same action multiple times (e.g., pressing the save button more than once just to be sure it registered)

• Hesitation: pausing before clicking or releasing (e.g., about to click on target, then stop to carefully position pen tip)

• Other: errors not described by the above codes

Event Times The time to log for an event was almost always at the beginning. An ambiguous case occurs for some error events, such as when the participant is dragging. In these cases, we defined the time of the event to be when the participant set the error in motion. For example, when selecting text or multiple objects with a marquee, if the click down location constrained the selection such that an error was unavoidable, then the event time is logged at the down action. However, if the error occurs at the up action, such as a movement while releasing the pen tip from the display, then the event time is logged at the up action.

Coding Decision Trees We developed two coding selection decision trees: one is used when a participant makes a noticeable pause between actions (Figure 3-7) and a second is used when a

71

participant attempts an action (Figure 3-8). We defined “action” as an attempted click (“tap”); “right-click”; beginning or ending of a drag; or operating a physical key, wheel, or button. The definition of “noticeable” is somewhat subjective and required training – a rough guideline is to look for pauses more than two seconds that interrupt an otherwise fluid movement. With some practice, these noticeable pauses became more obvious. The participant makes a noticeable pause between actions.

Are they asking the experimenter a question Yes Experiment to clarify the task? Instructions

No

Is the participant searching for the target Yes Visual of their next interaction? Search

No

Was the pause immediately before and Yes Hesitation near the next action (1)?

No

Nothing to code. Figure 3-7. Coding decision when participant makes a noticeable pause. See below for additional notes (numbered notes in parenthesis).

72

The participant attempts an action.

Are they intentionally performing a different Yes Experiment action from the script (2)? Instructions

No

Was the manipulation successful?

No Yes

Was the error caused by Was the action Application Yes a known application performed in an Yes Inefficient Usability usability problem (3)? inefficient manner (4)? Operation

No No

Did the system fail to Did they perform the Missed Yes recognize any action? same action multiple Yes Repeated Click times unnecessarily? Invocation

No No

Did they miss the Nothing to code. Target Yes intended target? Selection

No

Was the resulting click Wrong Yes type different than the Click one expected?

No

Was a different action Unintended Yes triggered than the one Action expected?

No

Other

Figure 3-8. Coding decision tree when participant attempts an action. See below for additional notes (numbered notes in parenthesis).

73

Additional Notes on Decision Trees for Figure 3-7 and Figure 3-8:

(1) “just before”: The participant has found the location for the action, but does not perform the interaction immediately.

(2) “different action”: wrong task, adjusting wrong parameter such as adjusting height instead of width, attempting to gain access to parameter in a different way than requested (e.g., using a context menu instead of a toolbar).

(3) Known usability problems were identified to remove errors that are application specific or exhibit obvious poor design. Many involved the PowerPoint (PPT) textbox:

attempting a move a PPT textbox by trying to drag the text directly (PPT requires the user to first select the text box, then drag it from the edge);

default insertion point in a PPT textbox selects single word only and subsequent formatting does not affect entire textbox (rather than selecting all text first);

marquee selection misses the invisible PPT textbox bounding box;

attempting to select text in a PPT textbox but the down action is off of the visible text which deselects the textbox and begins a selection marquee instead;

changing a checkbox state by clicking on the label text (but the application only supports a direct click on a checkbox);

or, a problem selecting an Excel chart object (when opening a context menu or moving a chart, the application often selects an inner “plot area” rather than entire chart).

(4) Inefficient Manner: reaching the desired goal, but doing so in a noticeably inefficient manner (e.g., scrolling a large document with individual page down clicks rather than dragging the scroll box; overshooting an up-down value by several steps and having to backtrack). This is admittedly somewhat subjective and difficult to quantify; we relied on rater training to ascertain this behaviour.

Interactions of Interest

After a preliminary qualitative analysis of the task sequences and interaction errors, we identified specific interactions of interest and further segmented these for more detailed analysis. We selected widgets and actions that are used frequently (button), highly error prone (scrollbar), presented interesting movements (text selection), or highlighted differences between the pen and mouse (drawing, handwriting, and keyboard use). These are discussed in section 3.5.

Other Annotations

We also transcribed relevant comments from the participant and noted sequences where problems were caused by occlusion or tooltips and other hover-triggered visual feedback.

74

3.4 Results

Time

To get a sense of performance differences between mouse and the three pen groups, we calculated group means for the total time used to complete the 40 constrained tasks (since these tasks have a predictable sequence of steps and a clearly defined goal for task completion). Note that these times include some non-task segments such as when participants provide their comments as per the think-aloud protocol. The graph suggests slightly higher times for the pen condition, decreasing with increased computer experience (Figure 3-9). A similar trend can be seen for the total time for all tasks, but with much higher variance. A

one-way analysis of variance found a significant main effect of group on time (F3,12 = 7.445, p < .005) with the total times for the Pen3-ConventionalNovices group significantly longer than Mouse and Pen1-TabletExperts groups (p < .02)11. No other significant differences were found between the other groups. Perhaps more interesting is the range of completion times for mouse and pen participants. The best and worst times for the Mouse group were 12.8 minutes (P1) and 19.0 minutes (P3); while the best and worst times across all pen groups were 16.1 min (P7 - Pen1-TabletExperts) and 28.0 min (P13 - Pen3- ConventionalNovices). The best time for a mouse participant is well below the best pen user time, even for expert Tablet PC users.

11 All post-hoc analyses use the conservative Bonferroni adjustment.

75

30.0

25.0

20.0

15.0 23.3 10.0 18.5 Time (minutes) 16.0 17.2 5.0

0.0 Mouse Pen1 Pen2 Pen3

Figure 3-9. Mean time for all constrained tasks per group. (Note: error bars in all graphs are 95% confidence interval.) (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)

Errors

We annotated 1276 errors across all 16 participants in all 47 tasks. This included non- interaction errors (experiment instruction, visual search, and application usability), which we briefly discuss before focusing on interaction errors which are the most relevant.

Non-Interaction Errors

We found 72 application usability errors, 151 experiment instruction errors, and 41 visual search errors overall. The mean number per group does not appear to form a pattern, with the possible exception of application usability appearing higher with Pen3- ConventionalNovices (Figure 3-10). However, no significant differences were found. The large variance for experiment instructions with the Mouse group was due to participant 3 who often clarified task instructions (that person had 25 experiment instruction errors compared to the next highest participant in the study with 15, a Pen2-ConventionalExpert).

76

25 Application Usability 20 Experiment Instructions Visual Search 15

10 Number of Errors 5

0 Mouse Pen1 Pen2 Pen3

Figure 3-10. Mean non-interaction errors per group. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)

The breakdown of specific application usability errors (which we identified during the coding process, see above) found a large proportion of errors when attempting to select text in the textbox (49%) and marquee selection missing the “invisible” textbox bounding box (17%).

Interaction Errors

Recall that interaction errors occur when a click or drag interaction is attempted. The mean number of interaction errors in each group suggests a pronounced difference between mouse and pen groups (Figure 3-11). A one-way analysis of variance found a significant main effect of group on interaction errors (F3,12 = 10.496, p = .001). Post-hoc analysis found the Mouse group lower than both Pen2-ConventionalExperts and Pen3-ConventionalNovices, and Pen1-TabletExperts lower than Pen3-ConventionalNovices (all p < .05, using the Bonferroni adjustment).

77

160 140 114 120 100 73 80 60 51

Number of Errors Number 40 15 20 0 Mouse Pen1 Pen2 Pen3

Figure 3-11. Mean interaction errors per group. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)

From a breakdown of interaction error type for each group (Figure 3-12), target selection appears to be most problematic with pen users, especially those with less experience, the Pen3-ConventionalNovices group. For the mouse group, unintended actions and target selection were similar, but all errors were low. Wrong clicks, missed clicks, and unintended actions are roughly comparable across pen groups, with a slight increase with less experience. Not surprisingly, there were no missed click errors with the mouse group – a dedicated button makes this type of error unlikely. We analyze each interaction error type below, with an emphasis on target selection, wrong clicks, unintended actions, and missed clicks.

100 90 Target Selection WrongClick 80 Unintended Action 70 Missed Clicks 60 All Other 50 40 30 Number of Errors Number 20 10 0 Mouse Pen1 Pen2 Pen3

Figure 3-12. Mean interaction errors by error type. All other errors include hesitation, inefficient operation, and repeated invocation. (Pen1- TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)

78

Target Selection Error Location

All Tablet PC participants had trouble selecting targets, with the number of errors increasing with lower computer experience. In reviewing the data logs, we noted that target selection errors may be related to location. A heat map plot – a 2-D histogram using color intensity to represent quantity in discrete bins -- compares the concentration of all taps and clicks for all tasks for all pen participants (Figure 3-13a) and the relative error rate computed by taking the ratio of number of errors to the number of taps/clicks per bin (Figure 3-13b). The concentration of taps/clicks is somewhat centralized (aside from the peak in the upper right where up-down widgets in tasks 15, 21, 23, and 34 required many sequential clicks). The error rate has a different distribution, with higher concentrations near the mid-upper-left and along the right-side of the display compared to all taps/clicks. There may also be an interaction with target size, which we could not control for. We could have carefully annotated actual target size based on the screen capture log, but this would have required considerable effort which did not seem to provide a suitably large gain in analysis. A better approach would be to conduct a controlled experiment in the future to validate this initial finding.

(a) Number of Taps/Clicks (b) Error Rate

300 0.9 250 0.8 0.7 200 0.6 150 0.5 0.4 100 0.3 0.2 50 0.1 0 0.0

Figure 3-13. Pen participant heat map plots for taps/click and errors. (a) all taps/clicks; and (b) target selection error rate, the number of errors over taps/clicks. A heat map is a 2-D histogram using color intensity to represent quantity in discrete bins; in this case, each bin is 105 x 100 pixels.

79

Wrong Click Errors

We observed three types of unintended actions which occurred with seven or more Tablet PC participants (Table 3-3).

Unintended Action Type Tablet PC Mouse Frequent Pen Contexts

a) right-click instead of a single- 32 occurrences none up-down (15), drop-down (7), click 9 participants button (4), menu (2) 0.9 % rate 1

b) right-click instead of drag 26 occurrences none slider (12), scrollbar (7) 9 participants 2.7 % rate 2

c) click instead of double-click 24 occurrences 1 occurrence file object (all) 8 participants 1 participant 18 % rate 3 2 % rate 3

d) click instead of right-click 14 occurrences none context menu invocation (13) 7 participants 7.3 % rate 4

1. Occurrence rate calculated using 300 estimated single-clicks (Table 3-2). 2. Occurrence rate calculated using 79 estimated drags (Table 3-2). 3. Occurrence rate calculated using 11 estimated file operations (Table 3-1). 4. Occurrence rate calculated using 16 estimated right-clicks (Table 3-2).

Table 3-3. Wrong click errors.

Participants had problems accidentally invoking a right-click when attempting to single-click or drag (Table 3-3a,b). Right clicking instead of clicking occurred most often with the up-down widget. In some cases participants held the increment and decrement buttons down expecting this to be an alternate way to adjust the value. Other common contexts were long button presses, and triggering in menus by dwelling on the top-level item to open nested items. Right clicking instead of dragging occurred most often with scrollbars and sliders. With these widgets, pressing and holding the scrolling handle triggered a right- click if the drag motion did not begin before the right-click dwell period. A similar problem occurred when a slow double-click was recognized as two single clicks in file navigation (Table 3-4c). This often put the file or folder object in rename mode, and subsequent actions could corrupt the text. It appears to be because of timing and location: if the two clicks were not performed quickly enough and near enough to each other, they were not registered as a double-click. Accot and Zhai (2002) (2000, sec. 5.12) note that the rapid successive clicking actions required by double-clicking can be difficult to perform

80

while keeping a consistent cursor position, and our data supports this intuition. This is a symptom of pen movement while tapping, which we explore in more detail below. Clicking instead of right clicking was less frequent (Table 3-4d), but when an error occurred, the results were often costly. For example, when invoking a context menu over a text selection, several participants clicked on a hyperlink by accident. This was not only disorienting, but also required them to navigate back from the new page and select the text a second time before trying again. We also noted cases of dragging or right-dragging instead of clicking or right-clicking. Accidental right-dragging was also disorienting. The end of even a short right-drag on many objects opens a small context menu – since this occurred most often when participants were opening a different context menu, they easily become confused.

Unintended Action Errors

We observed three types of unintended actions which occurred with seven or more Tablet PC participants (Table 3-4). Five were caused by erroneous click or drag invocations; one occurred when completing a drag operation; and one was caused when navigating files folders. In fact, two of these unintended action errors occurred almost exclusively with file folder navigation. Another culprit was the right click, which was a factor in four unintended error types.

Unintended Action Type Tablet PC Mouse Frequent Pen Contexts

a) attempt to open file or folder 28 occurrences 1 occurrence file folder navigation (all) with single-click instead of double- 10 participants 1 participant click 1 1 21 % rate 2 % rate

b) movement on drag release 27 occurrences 1 occurrence dragging a drawing object’s 10 participants 1 participant rotation or translation handle 2.8 % rate 2 0.3 % rate 2 (22), or text selection (4)

c) dragging corner resize handle 9 occurrences 1 occurrence PowerPoint image object (all) locks aspect ratio, unable to resize 7 participants 1 participant in one dimension only 0.9 % rate 2 .3 % rate 2

1. Occurrence rate calculated using 11 estimated file interactions (Table 3-1). 2. Occurrence rate calculated using 79 estimated drags (Table 3-2).

Table 3-4. Unintended action errors.

There were three unintended errors with a wide distribution across pen participants. One common error was attempting to open a file or folder with a single-click instead of a double-click (Table 3-4b). 10 out of 12 pen participants attempted at least twice to open a file

81

by single clicking rather than using a conventional double-click. 2 of these participants made this mistake 4 or more times. It seems that the affordance with the pen is to single-click objects, not double-click as one participant commented: "I almost want this to be single-click which I would never use in normal windows, but with this thing it seems like it would make more sense to have a single click." (P9-Pen1-TabletExperts 33:30) 12 Unintended movement when lifting the pen at the end of a drag action was another commonly occurring error (Table 3-4d). This occurred most often when manipulating the rotation handle of a drawing object or when selecting text. The participant would position the rotation handle in the desired orientation, or drag the carat to select the desired text, but as they lifted the pen to end the drag, the handle or carat would unexpectedly shift. A third unintended action error occurred when participants tried to resize a PowerPoint object in one dimension using the corner handle, but the corner handle constrained the aspect ratio. This may have been better classified as an application usability problem. There are two unintended actions with which we expected to find more problems: however, this did not turn out to be the case. We found only one occurrence of a premature end to a dragging action which we expected to be more common since it requires maintaining a threshold pressure while moving (Ramos et al., 2004). Also, we intentionally did not disable the Tablet PC hardware buttons during the study, but yet only found a single instance of an erroneous press.

Missed Click Errors

Due to how we classified missed click errors, 67 out of 75 (91%) of these errors occurred when single-clicking (tapping with the pen). The chance of the system failing to recognize both clicks of an attempted double click would result in a wrong click error instead. All pen participants had at least one missed click, with four participants missing more than 9. The most common contexts were button (25%), menu (21%), and image (13%).

12 The numeric text 33:30 is the time of the quote in the synchronized log. Specifically, the text expresses the time in minutes and seconds separated by a colon. In this example, the quote was recorded at 33 minutes and 30 seconds. We will use this convention for time throughout this dissertation.

82

The cause appears to be too little or too much force. Tapping too lightly is a symptom of a tentative tapping motion. We noted that when participants had trouble targeting small items (such as menu buttons, check-boxes, and menu items) they sometimes hovered above the target to verify the cursor position, but the subsequent tap down motion was short, making it difficult to tap the display with enough force. Tapping too hard13 seemed to be a strategy used by some Pen1-TabletExperts participants as an (not always successful) error avoidance technique – see our discussion below.

Repeated Invocation, Hesitation, Inefficient Operation, and Other Errors

These errors had 184 occurrences in total, across all participants. There were only 3 occurrences of other errors, the remaining were repeated invocation, hesitation, and inefficient operation (Table 3-5).

Error Type Tablet PC Mouse Frequent Pen Contexts

a) Inefficient Operation 68 occurrences 25 occurrences scrollbar (31), up-down (4) 12 participants 4 participants

b) Hesitation 57 occurrences 3 occurrences file folder navigation (all) 11 participants 3 participants

c) Repeated Invocation 27 occurrences none PowerPoint image object (all) 10 participants

Table 3-5. Repeated invocation, hesitation, and inefficient operation errors.

We noted 68 cases of obvious inefficient operation across all 12 pen participants (Table 3-5a). More than half of these cases involved the scrollbar (31) or up-down (9). With the scrollbar, some participants chose to repeatedly click to page down for very long pages instead of dragging the scroll box. With the up-down, we noted several participants overshooting the desired value and having to backtrack. Of the 25 occurrences of inefficient

13 Recall that our pen event logger did not log pressure. Thus, our observations concerning soft or hard taps is based on interpretation of the motion capture and user view logs capturing pen movement as it strikes the display, as well as the loudness of the physical tap as recorded by the user view camera microphone.

83

operation with mouse participants, 14 involved the mouse scroll-wheel: three out of four mouse participants would scroll very long pages instead of dragging the scroll box, resulting in a similar type of inefficient operation as pen participants repeatedly clicking page down. We noted 57 cases of hesitation in all three pen groups, but only 3 in the mouse group (Table 3-5b). There seemed to be two main causes. One is due to the user visually confirming the position of the cursor, rather than trusting the pen tip, when selecting a small button or when opening a small drop-down target. The other is caused by many tooltips popping up and visually obscuring parts of the target. We discuss this type of “hover junk” in more detail below. We identified 27 cases of repeated invocation across 10 pen participants (Table 3-5c). Although a small number in total, it suggests some distrust when making selections. 11 of these occurred when tapping objects to select them (images, charts, textboxes and windows). 9 of these occurred when pressing a button or tab. Recall that a repeated invocation error is only logged if the first tap was successful. This is likely a symptom of subtle or hidden visual feedback which we discuss below, and the fact that the extra cost of repeated invocation “just in case” is not that high (Sellen, Kurtenbach, & Buxton, 1992).

Error Recovery and Avoidance Techniques

We have already discussed error avoidance through repeated invocation, but there were two other related observations with experienced Tablet PC group. These participants seemed to recover from errors much more quickly, almost as though they expected a few errors to occur. For example, when they encountered a missed click error, there was no hesitation before repeating the selection action – one participant rapidly tapped a second time if the result did not instantly appear which caused several repeated invocation errors. Sellen et al. (1992) also observed this type of behaviour with experts using other devices. In contrast, participants in the Pen3-ConventionalNovices and Pen2-ConventionalExperts groups tended to pause for a moment if an error occurred; they seemed to be processing what had just happened before trying again. A related strategy used by three Pen1-TabletExperts participants was to tap the pen very hard and very fast, almost as though they were using a very hard leaded pencil on paper. When asked about this behaviour, one participant commented that this helped avoid missed clicks (suggesting they may have felt the digitizer was not very sensitive). However, we

84

noted cases where a click was missed even with this hard tap, in fact the speed and force seemed to be the cause. A better mental model for Tablet PC users is to think of the digitizer as being as sensitive as an ink pen on paper – this requires a medium speed (to allow the ink to momentarily flow) and a medium pressure (more than a felt tip marker, but less than very hard leaded pencil).

Interaction Error Context

By examining the most common widget or action contexts for interaction errors overall, we can get a sense of which are most problematic with pen input. Recall that since our study tasks follow a realistic scenario, the frequencies of interaction contexts are not balanced. For example, we expected 52 button interactions, but only 6 text selection interactions (see Table 3-1). Thus for relative comparison, a normalized error rate is appropriate. Ideally, we would normalize across the actual number of interactions per widget, but this has inherent difficulties: participants use different widget interaction strategies resulting in an unbalanced distribution, and we were not able to automatically log which widget was being used, much less know what the intended widget was (in the case of a target selection error). However, most interaction errors occur when an interaction is attempted, so we can normalize against the ideal number of interactions per widget, and calculate an estimated interaction error rate. It is important to acknowledge that the ideal number of interactions is a minimum, reflecting an optimal widget manipulation strategy. Thus, although our estimated error rate may be useful for relative comparison between widgets, the actual rates may be somewhat exaggerated if a widget was used more times than expected or manipulated with an inefficient strategy. If we accept this normalized error rate estimation, an ordered plot of widgets and actions reveals differences between mouse and pen (Figure 3-14). We only include widgets or actions with more than 3 expected interactions. The relative ordering is different between mouse and pen with the highest number of pen errors with the scrollbar and highest number of mouse errors with marquee selection.

85

0.5 Pen Mouse

0.4

0.3

0.2

Estimated Error Rate Error Estimated 0.1

0.0 file tab radio slider menu image button handle window text-box scrollbar up-down tree-view marquee* check-box drop-down text select* text context-menu Figure 3-14. Estimated interaction error rate for widget and action contexts. Error rate is calculated using the ideal number of interactions (see Table 3-1). Only widgets or actions with more than 3 expected interactions are shown. (*actions are denoted with an asterisk, all other contexts are widgets)

The top error contexts for the pen range from complex widgets like sliders, tree-views, and drop-downs to simple ones like files, handles, and buttons. Note that three out of five of the top contexts involve dragging exclusively: slider, handle, and text select. The high error rate for the scrollbar is likely due to inefficient usage. Above, we noted that many inefficient usage errors occurred when participants manipulated a scrollbar with a sequence of page down taps, rather than a single drag. Since in most cases, our ideal interaction rate expected a single drag, this would create many errors relative to the normalizing factor. Below, we examine the scrollbar in more detail and establish a more accurate error rate. We noted above that the slider and handle both had small targets, and the dragging action caused wrong click errors occasionally triggering right-clicks instead of drags. The high error rate for file objects is largely due to the high number of unintended action errors where participants attempted to open files with a single tap, and a high number of wrong click errors when double taps where recognized as a single tap. Note that although many errors with the text-box were coded as application usability errors, it still produced a 16% estimated error rate. This is partly because even after participants avoided the application usability error of attempting to reposition a text-box by dragging from the centre, after they tapped it to select, the text-box must be repositioned by

86

dragging a very narrow 7 px (1.2 mm) border target: 39% of text-box interaction errors were target selection. Mouse error rates are consistently much lower, with the exception of the marquee. Several participants had a different mental model with marquee selection: they expected objects to be selected when they touch the marquee. This resulted in marquees missing desired objects for fear of touching an undesired object. There are also two mouse-specific action contexts not shown: scroll-wheel and keys. These contexts were not estimated by our script, so a normalized error rate cannot be computed. Mouse users often used the scroll- wheel to scroll very long documents, instead of simply dragging the scrollbar thumb, and often overshot the target position, both resulting in inefficient action errors. In addition, mouse users sometimes missed keys when using short-cuts.

Movements

In additional to errors, when reviewing the logs, we could see differences in movement characteristics between mouse and pen. Using the pen seemed to require larger movements involving the arm, whereas the mouse had a more subtle movement style, often using only the wrist and fingers. Due to the number of target selection and missed click errors with the pen, we also wished to investigate the movement characteristics of taps.

Overall Device Movement Amount

To measure movements, we first had to establish a threshold movement distance per data sample that would signal a moving or stationary frame. To find this threshold, we graphed Euclidian movement distance by time for different 10 second movement samples taken from four participants. Based on these graphs, we established a dead band cut-off at 0.25 mm of movement per frame (velocity of 7.5 mm/s). With this threshold, we calculated the total amount of pen and mouse movement across all constrained tasks. Since participants completed the study in different times, we calculated a mean movement amount per minute for comparison, and calculated the mean for each group (Figure 3-15). A one-way analysis of variance found a significant main effect of group on device movement (F3,12 = 19.488, p < .001). The total movement for the mouse group was significantly shorter than all pen groups (p < .001, using the Bonferroni adjustment). We

87

found that these statistics supported our observations: pen users moved more than 4.5 times farther than mouse users per minute.

6

5

4

3

2 4.1 0.8 3.4 3.6 1 Average Distance (m/min) 0 Mouse Pen1 Pen2 Pen3

Figure 3-15. Average pen or mouse movement distance per minute. for all constrained tasks (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3- ConventionalNovices)

Since we tracked the movements of the forearm in addition to the pen, we can examine the proportion of forearm movement to wrist and finger movements – with the assumption that if the pen or mouse moves without forearm movement, it must be the result of the wrist or fingers (Figure 3-16). We found that the pen had much greater proportion of combined wrist, finger, and forearm movements than the mouse (67 – 72% compared to 36%). A one- way analysis of variance found a significant main effect of group on combined wrist, finger, and forearm movements (F3,12 = 71.926, p < .001) with less movement in Mouse group compared to all pen groups (p < .001, using the Bonferroni adjustment). This was most evident when pen participants selected items in the top or left side of the display. To do this they always had to move both their arm and hand, where mouse users were often able to use only their wrist and fingers. The reason is partly because of pointer acceleration with the mouse, but also because mouse participants had a more central home position which we discuss later in the Posture section. The larger forearm only portion with mouse group is likely attributed to the fact that unlike the pen, the mouse is not supported by the hand, and the forearm can more easily move independently.

88

100%

80% 36%

67% 72% 72% 60%

40%

48% 20% Proportion of Movement 29% 24% 24% 0% Mouse Pen1 Pen2 Pen3

Wrist and Fingers Only Forearm Only Wrist, Fingers and Forearm

Figure 3-16. Proportion of movements greater than 0.25 mm per frame. (velocity greater than 7.5 mm/s). Euclidean distance in 3-D space. (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)

Pen Tapping Stability

When selecting small objects such as handles and buttons, most GUIs (like Windows Vista used in our study) deem a selection successful only when both pen down and pen up events occur within the target region of the widget (Ahlstroem, Alexandrowicz, & Hitz, 2006). Our observations suggest that this is a problem when using the pen. To disambiguate single clicks from drags, we only included a pen down and pen up event pair if the pen up event occurred within 200ms. Using this threshold, we found that the mean distance between a pen down and up event was 1.3 to 1.8 pixels. This is similar to results reported by Ren and Moriya (2000) and discussed by Zuberec (2000, sec. 5.12). Although a relatively small movement, this can make a difference when selecting near the edge of objects. In contrast, the mean distance between mouse down and up events was only 0.2 pixels.

89

2.5

2.0

1.5

1.0 1.6 1.8

Distance (pixels) Distance 0.2 1.3 0.5

0.0 Mouse Pen1 Pen2 Pen3

Figure 3-17. Mean Euclidian distance between down and up click (for down and up click pairs separated by less than 200ms). (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)

We observed that after participants recognized that they were suffering from frequent selection errors, they began to visually confirm the physical pen position by watching the small dot cursor. One participant commented: “It is a little finicky, eh. It's like I'm looking for the cursor instead of just assuming that I'm clicking on the right place, so it is a little bit slower than normal.” (P9-Pen1-TabletExperts 27:39) Participants also encountered problems with visual parallax, which made selecting targets difficult at the extreme top of the tablet, supporting the arguments of Ward and Phillips (1987) and Ramos et al. (2007): “It seems that when I get to the a ... um ... portions like the menu, ... the dot doesn't seem to be in place of exactly where I'm pressing it. It seems to be rocking a little bit to the right where I need to be more precise.” (P16-Pen3-ConventionalNovices 19:59) One of the purported global pen input improvements introduced with Windows Vista is visual “ripple” feedback to confirm all single and double taps. However, not one participant commented on the ripple feedback or appeared to use it as a visual confirmation to identify missed clicks. In fact, this additional visual feedback seems to only add to the general “visual din” that seems more prevalent with the pen compared to mouse input.

90

Figure 3-18. Obtrusive tooltip hover visualizations, “hover junk” P15-Pen3-ConventionalNovices, task 11. See also Video 3-2.

Time: 00:25 Vogel_Daniel_J_201006_PhD_video_3_2.mp4 Video 3-2. Obtrusive tooltip hover visualization examples.

Pen users seemed to be more adversely affected by tooltips and other information triggered by hover: 20% of pen hesitation errors occurred when tooltips were popping up near a desired target. When compensating for digitizer tracking errors by watching the cursor, participants tended to hover more often before making a selection, which in turn triggered tooltips more often. At times the area below the participant’s pen tip seemed to become filled with this type of obtrusive hover visualization, which we nicknamed “hover junk” (Figure 3-18). “Why doesn't this go away?” [referring to a tooltip blocking the favourites tree- view] (P13-Pen3-ConventionalNovices, 33:05) When there was a lot of tooltip hover visualizations, participants appeared to limit their target selection to the area outside the tooltip, which decreased the effective target size. We did not see this behaviour with the mouse. The nature of direct interaction seems to change how people perceive the depth ordering of device pointer, target, and tooltips. With the pen, the stacking order from the topmost object down becomes: pointer Æ tooltip Æ target,

compared to tooltip Æ pointer Æ object with mouse users. This is supported by comments from Hinckley et al. regarding the design of InkSeine (2007). They tried using tooltips to explain gestures, but found they were revealed too late, can be mistaken for buttons, and that the user’s hand can occlude them altogether.

91

Tablet Movement

Given previous work investigating user preference for setting input frame-of-reference with the non-dominant hand (Fitzmaurice et al., 1999). Unlike Fitzmaurice et al. whose participants used an indirect pen tablet on a desk, our participants held the Tablet PC on their lap. We expected that the lap would create a more malleable support surface – perhaps the legs would assist in orienting and tilting the tablet – so there may be more tablet movement. Using the same 0.25mm deadband threshold used previously, we calculated the mean amount of device movement per minute (Figure 3-19). Not surprisingly, we saw almost no movement in the mouse condition in spite of their ability to adjust the display angle or move the entire laptop on the desk. The amount of tablet movement in the pen condition was also small compared to the amount of movement of the pen itself (Figure 3-15), less than 7%. There is no significant difference in pen groups due to high variance: some participants re- positioned the tablet more than others.

0.5

0.4

0.3

0.2

0.3 0.1 0.2 Distance per Min (m/min) Min per Distance 0.0 0.1 0.0 Mouse Pen1 Pen2 Pen3

Figure 3-19. Tablet or laptop movement per minute for all constrained tasks. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)

Posture

Related to movements are the types of postures that participant adopted to rest between remote target interactions and to avoid occlusion.

Home Position

We observed that at the beginning and end of a sequence of interactions, pen participants tended to rest their hand near the lower right of the tablet (recall that all of our participants were right-handed). A heat map plot of rest positions for forearm and pen

92

illustrates this observation (Figure 3-20a). Pen rest positions approximate the distribution of clicks and taps (see also Figure 3-13). Forearm rest positions are concentrated near the bottom right of the tablet, and do not follow the same distribution as the pen. In contrast, the mouse distributions are more compact with peaks near their centre of mass suggesting a typical rest point in the centre of the interaction space (Figure 3-20b). The spread of movement follows from a greater overall movement with the pen (Figure 3-15).

(a) Tablet PC (b) Mouse

185 x 247mm general mouse Tablet PC display movement area

pen / mouse forearm 1.0

0.8

0.6

0.4

0.2

0.0

each cell 8 x 8 mm

Figure 3-20. Heat map plot of forearm and pen/mouse rest positions. Generated across all participants for (a) Tablet PC participants; (b) Mouse participants. Rest positions are defined as movement less than 0.25 mm per frame (velocity less than 7.5 mm/s). Tablet positions are relative to the plane of the display, Mouse positions are relative to the plane of the laptop keyboard. The dashed areas represent the tablet display and general mouse movement area on the desk where all pen or mouse positions lie.

Occlusion Contortion

We observed all pen users adopting a “hook posture” during tasks in which they had to adjust a parameter while at the same time monitor its effect on another object. For example, adjusting HSL color values in task 17 or image brightness and contrast levels in task 25 (Figure 3-21, Video 3-3). The shape of the hook almost always arched away from the participant, although participant 12 began with an inverted hook and then switched to the typical style midway. We could also see this hook posture when participants were selecting

93

text, although it was more subtle. This type of posture was also observed by Inkpen et al. (2006) when left-handed users were accessing a right-handed scrollbar. Adopting such a hook posture may affect accuracy and increase fatigue. Since it forces the user to lift their wrist above the display, it reduces pen stability. One participant commented that they found it tiring to keep the image unobstructed when adjusting the brightness and contrast. In task 25, where the hook posture was most obvious, the participant had the option to move the image adjustment window below the image which would have eliminated the need for contortion. However, we observed only one pen participant do this (P5-Pen1- TabletExperts), which suggests that the overhead in manually adjusting a dialog position to eliminate occlusion may be considered too high. Yet, previous work found that adjusting the location and orientation of paper to reduce occlusion is instinctive (Fitzmaurice et al., 1999).

Figure 3-21. Examples of occlusion contortion: the “hook posture.” See also Video 3-3.

Time: 00:43 Vogel_Daniel_J_201006_PhD_video_3_3.mp4 Video 3-3. Occlusion contortion examples: the “hook posture.”

94

3.5 Interactions of Interest

Based on our analysis of errors, movement, and posture we identified widgets and actions for more detailed analysis. We selected widgets and actions that are used frequently (button), highly error prone (scrollbar), presented interesting movements (text selection), or highlighted differences between the pen and mouse (drawing, handwriting, the Office MiniBar, and keyboard use).

Button

The single, isolated button is one of the simplest widgets and also the most ubiquitous. We expected participants to use a button (not including buttons that were part of other larger widgets) 52 times during our study which constitutes 21% of all widget occurrences. Although simple, we found an interaction error rate of 16% for pen participants compared to less than 1.5% for mouse (using the expected number of button interactions as a normalizing factor, Table 3-1). 55% of these errors were target selection errors, 17% missed clicks, and 11% hesitation, and 6% repeated invocation. We already discussed problems with target selection with pen taps above, so in this section we concentrate on other errors and distinctive usage patterns. Repeated invocation errors occurred when the user missed the visual feedback of the button press and pressed it a second time. This was most evident when the button was small and the resulting action delayed or subtle. There were three commonly occurring cases in our scenario: when opening an application using the quick launch bar in the lower left, pressing the save file button in the upper left, and closing an application by pressing on the “x” at the top right. Participants did not appear to see, or perhaps did not trust, the standard button invocation feedback, or the visual feedback used for all taps introduced with Windows Vista. Depending on the timing of the second press, the state of the application could simply ignore the second click, save the file a second time, or introduce more substantial errors like opening a second application. Missing a click on a button could result in more severe mistakes. When saving a file, the confirmation of a successful save was conveyed by a message and progress bar located in the bottom right. Since the save button is located at the top left, this meant that the

95

participant’s arm was almost always blocking this confirmation, making it easy to miss (Figure 3-22). We observed 3 participants who thought they saved their file when a missed click or target selection error prevented the save action.

save button

status message

Figure 3-22. Example of occluded status message when pressing save button. P5, Pen1-TabletExperts, task 20, 18:40

Sometimes the location of buttons (and other widgets) prevented participants from realizing the current application state did not match their understanding. For example, in task 6, we saw 4 pen users go through the steps of selecting the font size and style for a text-box, when in fact they did not have the text-box correctly selected, and missed applying these changes to some or all of the text. We did not see this with any of the mouse users. Since these controls are most often accessed in the menu at the top of the display, this is likely because the formatting preview is being occluded by the arm. After making this mistake several times, one participant asks: “How do I know if that's bold? Like I keep pressing the bold button.” (P16, Pen3- ConventionalNovices, 18:27) Although the bold button was visually indicating the bold state, he failed to realize the text he wished to make bold was not selected. While reviewing the logs of button presses, we could occasionally see a distinctive motion which interrupted a smooth trajectory towards the target creating a hesitation error. As the user moved the pen towards the button, they would sometimes deviate away to re- establish visual confirmation of the location, and then complete the movement (Figure 3-23c). In some cases this was a subtle deviation, and other times it could be quite pronounced as the pen actually circled back before continuing. We saw this happen most

96

often when the button was located in the upper left corner, and the deviation was most apparent with our novice group.

1

pen tip 3D path & shadow path start path end 2

3

Figure 3-23. Button trajectory example. when selecting the save button in the upper left corner. c movement deviation to reduce occlusion and sight target; d corrective movements near target; e return to home position, (P15, Pen3-ConventionalNovices, task 26). See also Video 3-4.

97

Time: 00:09 Vogel_Daniel_J_201006_PhD_video_3_4.mp4 Video 3-4. Button trajectory example.

Scrollbar

In our study, we found that pen participants made an average of 10 errors while using a scrollbar. With 20 expected scrollbar interactions, our estimated scrollbar error rate is 50% (Table 3-2). However, we suspected this error rate is inflated due to participants using scrollbars more often than expected (e.g., repeated invocation due to previous task errors), and participants using an inefficient and error-prone scrollbar usage strategy (e.g., multiple paging taps instead of a single drag interaction). For a more detailed examination, we coded scrolling interactions in tasks 2, 6, 21, 23, 25, and 27. According to our script, we expected 15 scrolling interactions within these six tasks breaking down into four types: four when using drop-downs, four during web browsing, two during file browsing, and five while paging slides. We found an average of 14.8 scrolling interactions (SD 0.6) which suggests that, at least in these tasks, our estimated number of scrollbar interactions was reasonable. All pen participants used different scrollbar interaction strategies, except for two Pen2- ConventionalExperts participants: one of these participants always clicked the paging region, and the other always dragged the scroll box (see Figure 3-24 for scrollbar parts). When participants changed their strategy, they often clicked in the paging region for short scrolls and dragged the scroll box for long scrolls, but this was not always the case. We observed only four cases where participants used the scrollbar button: one participant used it to increment down and three participants held it down to scroll continuously as part a mixed strategy. Overall, we counted 91 occurrences of dragging and 54 occurrences of paging. scroll box paging region button

Figure 3-24. Scrollbar parts.

98

There were 17 occurrences of pen participants using a mixed scrolling strategy – where combinations of scrollbar manipulation techniques are used together for one scroll. Six participants used such a mixed strategy two or more times, and all did so exclusively for long scrolls in drop-downs or web browsing. Most often a scroll box drag was followed by one or more paging region clicks, or vice versa. Regarding errors, we found two patterns. First, for scrollbar strategy, we found error rates of 16% for paging, 9% for dragging, and 44% for mixed strategies (rate of strategy occurrences with at least one error). A mixed strategy often caused multiple errors, with an average of 1.6 errors per occurrence. Participants often moved to a mixed strategy after an error occurred – for example if repeated paging was inefficient or resulted in errors, they would switch to dragging. Pure paging and dragging strategies had 0.5 and 0.2 errors per occurrence. Since many errors were target selection related, suggesting that the repetitive nature of clicking in the paging region creates more opportunities for error. Second, regarding errors and location, we found that 77% of scrollbar errors occurred in the 100 right-most pixels of the display (i.e. when the scrollbar was located at the extreme right), but only 61 % of scrollbar interactions were in the 100 right-most pixels. Although not dramatic, this pattern is agreement with our observation of error locations (Figure 3-13). We also noted a characteristic trajectory when acquiring a scrollbar with the pen, which we call the “ramp.” When acquiring a scroll bar to the right of the hand, we observed several users moving down, to the right, and then up to acquire the scroll box (Figure 3-26, Video 3-5). Based on the user view video log, we could see that much of the scroll bar was occluded, and that this movement pattern was necessary to reveal the target before acquiring it (Figure 3-25).

99

occluded scrollbar

(a) (b) (c)

Figure 3-25. Example of scrollbar occlusion causing “ramp” movement. (a) the scrollbar is initially occluded; (b) hand moves beyond scrollbar; (c) hand moves back and up to select scrollbar. P16, Pen3-ConventionalNovices, task 21

1 1

(a) (b)

2

1

pen tip 3D path & shadow (c) path start path end

Figure 3-26. Pen tip trajectories during scrollbar interaction. (a) short drag scroll, centre of display, P16,Pen3-ConventionalNovices, task 21; (b) long drag scroll, edge of display, P7, Pen1-TabletExperts, task 21; (c) paging scroll, P9, Pen2- ConventionalExperts, task 23. In each case, c denotes the characteristic “ramp” movement, d denotes repetitive paging motion segment. See also Video 3-5.

100

Time: 00:21 Vogel_Daniel_J_201006_PhD_video_3_5.mp4 Video 3-5. Scrollbar trajectory examples.

The mouse scroll-wheel can act as a hardware alternative to the scrollbar widget, and we observed three out of four mouse participants use the scroll-wheel for short scrolls. These three participants never clicked to page up or down during short scrolls. In fact, all mouse participants used the scroll-wheel at least once for longer scrolls, but we observed each of them also abandoning it at least once – and continuing the scroll interaction by dragging the scroll box (see Figure 3-24 for scrollbar parts). The scroll-wheel does not always appear to have an advantage over the scrollbar widget, corroborating evidence from Zhai, Smith, and Selker (1997). However, this may be due to the scrolling distance and standard scroll-wheel acceleration function (Hinckley, Cutrell, Bathiche, & Muss, 2002). In fact, all of our mouse participants encountered one or more errors with the scroll-wheel, but there were two mouse errors with the scrollbar widget.

Text Selection

There are 6 expected text selections in the script, in tasks 12, 13, 21, and 23. Three involve selecting a sentence in a web page and three a bullet. We coded all text selections performed by participants in these tasks, and found a mean number of 6.3 (SD 0.6). The slightly higher number is due to errors requiring participants to repeat the selection action. We found high error rates for text selection with the pen. At 40%, this is three times the mouse error rate of 13%. Text selection errors were either target selection, or an unintended action such as accidentally triggering a hyperlink. Most of these errors seem to be related to the style of movement. An immediately obvious characteristic of text selection is the direction of movement, from right-to-left or left-to-right. Across pen participants, we found 43 left-to-right selections and 36 right-to-left (Figure 3-27). Given that our participants are right-handed, a right-to-left selection should in theory have an advantage, since the end target is not occluded by the

101

hand. Instead, we found that all of our expert pen users performed left-to-right selection 2 or more times, with one expert participant (P6) only selecting left-to-right. Five pen participants exclusively performed left-to-right text selections, but three did exclusively use right-to-left selections. Surprisingly, the latter were all in the Pen2-ConventionalExperts group, not the Pen1-TabletExperts group as one might expect. The insistence of a left-to-right motion in spite of occlusion is likely due to the reading direction in Western languages which creates a habitual left-to-right selection direction. Indeed, we found that mouse participants most often used a left-to-right direction, with two participants doing this exclusively. However, even mouse users performed the occasional right-to-left selection suggesting that there are cases when this is more advantageous even in the absence of occlusion. One participant states: “People write left to right, not right to left so my hand covers up where they're going." (P14-Pen3-ConventionalNovices 38:49) Mouse Pen1 Pen2 Pen3 100%

80%

60%

Number 40%

20%

0% 12345678910111213141516 Participant left-to-right right-to-left

Figure 3-27. Proportion of left-to-right and right-to-left text selection directions. Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices

We observed three characteristic pen trajectory patterns which suggest problems with left-to-right selection and occlusion. Note that we did not code for these patterns to enable a quantitative analysis14, we offer a description of what we feel are characteristic examples of these behaviours.

14 Our observations of these behaviours emerged after the coding process was complete.

102

We observed expert pen users intentionally moving the pen well beyond the bounds of the desired text during a left-to-right selection movement (Figure 3-28c). The most likely reason for this deviation is that it moved their hand out of the way so that they could see the end text location target. Another related movement is backtracking (Figure 3-28b) which more often occurred with novice participants. Here, the selection overshoots the end target and back tracks. This appears to be more by accident, but may be the behaviour that leads to the intentional deviation movement we saw with expert users. Another, sometimes more subtle, behaviour is a “glimpse”: a quick wrist roll downwards to momentarily reveal the occluded area above (Figure 3-28a). We also noted a characteristic trajectory when participants invoked the context menu for copying with a right-click. We observed many pen participants routinely introducing an extra movement to invoke it near the centre of the selection, rather than in the immediate area (Figure 3-28d). Since the right-click has to be invoked on the selection itself, this may be to minimize the possibility of missing the selection when opening the context menu. However, this extra movement was most often observed with right-to-left selection. This may be a symptom of participants needing to visually verify the selection before copying by moving their hand.

103

forearm 3D path & shadow

pen tip 3D path & shadow path start path end

(a)

1

(b)

2

(c) 3

(d) 4

Figure 3-28. Pen tip (and selected wrist trajectories) during text selection. (a) left-to-right selection with forearm glimpse at c, P15, Pen3-ConventionalNovices, task 12; (b) left-to-right selection with backtrack at d, P9, Pen2-ConventionalExperts, task 23; (c) left- to-right selection with deviation at e, P6, Pen1-TabletExperts, task 21; (d) right-to-left selection with central right-click invocation at (4), P11, Pen2-ConventionalExperts, task 23. See also Video 3-6.

104

Time: 00:38 Vogel_Daniel_J_201006_PhD_video_3_6.mp4 Video 3-6. Text selection trajectory examples.

Common errors with text selection were small target selection errors such as missing the first character, clicking instead of dragging and triggering a hyperlink, or an unintended change of the selection when releasing. While the first two are related to precision with the pen, the latter is a symptom of stability. As the pen is lifted from the display, a small movement causes the carat to shift slightly which can be as subtle as dropping the final character, or if it moves down, it can select a whole additional line. We noticed this happening often when releasing the handle, another case of precise dragging. One participant commented: "When I'm selecting text I'm accidentally going to the next line when I'm lifting up" (P7-Pen1-TabletExperts 16:40)

Writing and Drawing

Although we avoided formal text entry and handwriting recognition, we did include a small sample of freehand handwriting and drawing. In theory, these are tasks to which the pen is better suited. Tasks 39 and 41 asked participants to make ink annotations on an Excel chart (see Figure 3-3e). In task 39, they traced one of the chart lines using the yellow highlighter as a very simple drawing exercise. In task 41, they wrote “effects of fur trade” and drew two arrows pointing to low points on the highlighted line. In the post study interview, many pen participants said that drawing and writing were the easiest tasks. After finishing tasks 39 and 41, one participant commented: "You know this is the part that is so fun to work with, you know, using a tablet, but all the previous things are so painful to use. I mean, it's just like a mixture of things ..." (P8-Pen1-TabletExperts 38:01)

105

Handwriting

We expected to see a large difference in the quality of mouse and pen writing, but aside from pen writing appearing smaller and smoother, a visual comparison suggests this is not the case (Figure 3-29). We did see some indication of differences in mean times, with pen and mouse participants taking an average of 27.3 and 47.3 seconds respectively (SD 3.5 and 26.2). In terms of style, all mouse handwriting has a horizontal baseline, whereas four of the pen participants wrote on an angle. This supports Fitzmaurice et al.’s (1999) work on workspace orientation with pen input. (a) Mouse

(b) Tablet PC

Figure 3-29. Handwriting examples. (a) mouse and (b) pen. (approx. 70% actual size)

Tracing

When comparing participant’s highlighter paths in task 39, we could see little difference (Figure 3-30). Tracing appears slightly smoother, but not necessarily more accurate. There also appears to be no noticeable difference in task time, with pen and mouse participants taking an average of 15.8 and 13.5 seconds respectively (SD 6.4 and 2). Half the mouse participants traced from right-to-left, as opposed to left-to-right. However, only 3 out of 12 pen participants traced from right-to-left. As explained above, with the pen, tracing right-to-left has distinct advantage for a right handed person since it minimizes pen

106

occlusion. Across all participants, all except one (a Pen2-ConventionalExperts participant) traced the entire line with one drag motion.

(a) Mouse

(b) Tablet PC

Figure 3-30. Tracing examples. (a) Mouse; (b) Tablet PC, discontinuous points highlighted. (approx. 70% actual size)

Office MiniBar

PowerPoint has an alternate method of selecting frequently used formatting options, a floating tool palette called the MiniBar which appears when selecting an object that can be formatted, like a text-box (Harris, 2005). It is invoked by moving the pointer towards an initially “ghosted” version of the MiniBar; moving the pointer away makes it disappear. The behaviour has some similarities to Forlines et al.’s Trailing Widget (2006), except that the MiniBar remains at a fixed position. In theory, the MiniBar should also be well suited to pen input since it eliminates reach. However, in practice it was difficult for some pen users to invoke. The more erratic movements of the pen often resulted in its almost immediate disappearance, preventing several participants from even noticing it and making it difficult for others to understand how to reliably invoke it. We observed one of our expert Tablet PC users try to use the MiniBar more than five times – they finally aborted and returned to using the conventional application toolbar.

107

MiniBar occluded text-box

MiniBar text-box

Figure 3-31. Occlusion resulting from MiniBar floating palette. The text-box preview is occluded by the hand when using the MiniBar. (P12-Pen2-ConventionalExperts, task 6)

The other problem is that the location of the MiniBar is such that when using it, the object receiving the formatting is almost always occluded by the hand (Figure 3-31). We observed participants select multiple formatting options without realizing that the destination text was not selected properly: hand occlusion prevented them from noticing that the text formatting was not changing during the operation. A lesson here is that as widgets become more specialized they may not be suitable for all input devices, at least without some parameter tuning.

Keyboard Usage

Although we gave no direct instructions regarding keyboard usage for the mouse group, all participants automatically reached for the keyboard for common key shortcuts like ctrl-Z, ctrl-C, ctrl-V, ctrl-TAB, and often to enter numerical quantities. In task 6, two mouse participants (P1 and P3) accelerated their drop-down selection by typing the first character. However, they each did this only a single time, in spite of this task requiring them to access the same choice, in the same drop-down, four times. Yet we saw that keyboard use can also lead to errors. For example, P1 accidentally hit a function key instead of closing a dialog with the Esc key – this abruptly opened a large help window and broke the rhythm of the task as they paused to understand what happened before closing it and continuing. Three pen participants explicitly commented on the lack of accelerator keys when using the pen, with comments like: "Where's CTRL-Z?" (while making key press action with left hand), then again later "I can't tell you how much I wish I could use a keyboard..." (P9-Pen2- ConventionalExperts, 24:43 and 29:50)

108

However, not one pen participant commented on what the Tablet PC hardware keys were for, or if they could use them. Yet, we suspect they were conscious of their existence, since only one participant pressed one of these keys by accident.

3.6 Discussion

The goal of our study was to observe direct pen input in a realistic GUI task involving representative widgets and actions. In the analysis above, we presented findings for various aspects: time, errors, movements, posture, and visualization; as well as an examination of specific widgets and actions. While we did not find a significant difference in overall completion times between mouse users and experienced Tablet PC users, this null-result does not mean that direct pen input is equivalent to mouse input (especially considering increased variance due to differing participant levels, task completion strategies, and inclusion of think- aloud comment times). Yet, we found that pen participants made more errors, performed inefficient movements, and expressed frustration. Moreover, widget error rates had a different relative ordering between mouse and pen; the highest number of pen errors were with the scrollbar and highest number of mouse errors with text selection. The top error contexts for the pen range from complex widgets like scrollbars, drop-downs, and tree-views to simple ones like buttons and handles.

Overarching Problems with Direct Pen Input

When examined as a whole, our quantitative and qualitative observations reveal overarching problems with direct pen input: poor precision when pointing or tapping, problems caused by hand occlusion; instability and fatigue due to ergonomics; cognitive differences between pen and mouse usage; and frustration due to limited input capabilities. We believe these to be the primary causes of non-text errors and contribute to user frustration when using a pen with a conventional GUI.

Precision

Selecting objects by tapping the pen tip on the display serves the same purpose as pushing a button on a mouse, but the two actions are radically different. The most obvious difference is that this allows only one type of “click”, unlike pressing different buttons on a

109

mouse. To get around this issue, right click and single click are disambiguated with a time delay, overloading the tapping action to represent more than one action. Although participants did better than we expected, we found that the pen group were not always able to invoke a right-click reliably, and either unintentionally single clicked, or simply missed the click. A related problem occurred with drag-able widgets like scrollbars and sliders: when performing a slow, precise drag, users could unintentionally invoke a right-click. We found these problems affected expert pen users as well as novices. The second difference between mouse and pen selection may not be as immediately obvious: tapping with the pen simultaneously specifies the selection action and position, unlike clicking with the mouse where the button press and mouse movement are designed such that they do not interfere with each other. The higher number of target selection errors with the pen compared to the mouse suggests that this extra coordination is a problem. Our findings also reveal subtle selection and pointing coordination issues: unintended action errors due to movement when releasing a drag-able widget, such as the handle, were non- existent with the mouse, but affected 10 out of 12 pen participants; on average, the distance between pen down and pen up down events were 6 to 9 times greater than the mouse; and problems with pen double-clicks, either missed altogether, or interpreted as two single clicks. We also found problems with missed taps when the tapping motion was too hard or too soft. This could be an issue with hardware sensitivity, but given our other observations, it may also be a factor of the tapping motion. We found that some participants did not notice when they missed a click, leading to potentially serious errors such as not saving a document. It seems that the tactile feedback from tapping the pen tip does not seem to be enough, especially when compared to the sensation of pressing and releasing the micro switch on a mouse button. Although Windows Vista displays a “ripple” feedback for clicks, no participants seemed to make use of this. Surprisingly, we did not observe a large difference in the quality of pen writing and tracing compared to mouse input: pen handwriting appeared smaller and smoother, and pen tracing appeared slightly smoother. In the same way that hardware sensitivity is likely contributing to the number of missed clicks, other hardware problems such as lag and parallax (J. Ward & M. Phillips, 1987) also affect performance. When using a pen, any lag or parallax seems to have an amplified effect

110

since the visual feedback and input are coincident. When users become aware of these hardware problems, they begin to focus on the cursor, rather than trusting the physical position of the pen. This may reduce errors, but the time taken to process visual feedback will hurt performance.

Occlusion

Occlusion from the pen tip, hand, or forearm can make it difficult to locate a target, verify the success of an action, or monitor real-time feedback which may lead to errors, inefficient movements, and fatigue. We observed participants missing status updates and visual feedback because of occlusion. This can lead to serious errors, such as a user assuming they successfully pressed the save button when they had not – hand occlusion of the “file is being saved” message at the bottom of the display prevented verification. Other frustrating moments occurred when users assumed the system was in a certain state, but it was not. For example, we saw more than one case of a wasted interaction in the top portion of the display to adjust formatting because the object to be formatted had been unintentionally de-selected. This occurred in spite of the application previewing the formatting on the object itself. Unfortunately, when the destination object is occluded and the user assumes the system is in a desired state (the object is selected), the previews do not help and the error is not prevented. To reduce the effect of occlusion, we observed users adopting unnatural postures and making inefficient movements. For example, when adjusting a parameter that simultaneously requires visual feedback, we noticed participants changing the posture of their hand rather than adjusting the location of the parameter window. This posture, which we call the “hook,” did enable them to monitor an area of the display that would otherwise be occluded, but unfortunately this type of posture is not stable or comfortable. We also found that occlusion can lead to inefficient movements such as glimpses, backtracks, and ramps. Glimpses and backtracks tend to occur during dragging operations. Since dragging uses a kinaesthetic quasi-mode (Sellen et al., 1992) with the pen pressed against the display, the user may not lift their hand momentarily to perform a quick visual search for an occluded target when in mid-drag. To work around this limitation, we observed expert users intentionally deviating from the known selection area while drag selecting, to visually

111

acquire the target and complete the selection. We call this a glimpse. We also observed novice users backtracking after they accidentally passed the intended target and had to move back again – in some ways an unintentional glimpse. Our tendency to drag and select from left-to-right, to match how text is read in Western languages, seems to make glimpse and backtrack movements more common. Note that this does not only occur when selecting text: 9 out of 12 pen participants chose to trace lines from left-to-right, in spite of commenting that occlusion made this more difficult. The ramp is a characteristic movement which adjusts the movement path to reveal a greater portion of the intended target area. When the hand is in mid movement, it can occlude the general visual search area and require a deviation to visually scan a larger space. We observed ramp movements most often when locating the scrollbar widget on the extreme right side of the display – the pen moves down as it moves to the right, to maximize the non- occluded portion of the scrollbar to increase the chance that the target is not occluded. We also saw ramp movements, sometimes with helical paths, when moving to other targets, most often when the target was located at the upper left. Finally, pen users tend to adopt a consistent home position to provide an overview of the display when not currently interacting. Participants would return their hand to this position between tasks and even after command phrases, such as waiting for a dialog to appear after pressing a toolbar button. For right-handed pen users, the home position is near the lower right corner, just beyond the display.

Ergonomics

Although the display of a Tablet PC is relatively small, there still appear to be ergonomic issues when reaching targets near the top or at extreme edges. We found that pen movements covered a greater distance with more limb coordination compared to the mouse. Not only can this lead to more repetitive strain disorders and fatigue, but studies have shown that coordinated limb movement lead to decreased performance and accuracy (Balakrishnan & I. S. MacKenzie, 1997). In support of this, we found a different distribution of target selection error rate compared to the location of all taps/clicks: more errors seem to occur in the mid-upper-left and right-side. However, as we discuss above, there may be an influence of target size which we did not control for.

112

Possible explanations for the extra distance covered by the pen compared to the mouse include making deviating movements to reduce occlusion, how the pen tip moves more frequently in three dimensions to produce taps, and to arc above the display when travelling between distant targets. However, the main contributing factors are most likely the unavailability of any control display gain manipulation with the pen since it is a direct input device, and the tendency for pen users to return to a home position between tasks. By frequently returning to this home position, there are more round trip movements compared to mouse participants who simply rest at the location of the last interaction. Although the home position allows an overview of the display due to occlusion, it may also serve to rest the arm muscles to avoid fatigue and eliminate spurious errors that could occur if a relaxed hand accidently rests the pen tip on the display. Another issue with reach may be the size and weight of the tablet. Not surprisingly, we found that tablet users moved the device more than mouse users, but they only moved it less than 7% of the distance moved by the pen (in spite of the tablet resting on their lap which we expected would make it easier to move and tilt using the legs and body). Further support can be seen in the characteristic slant of some tablet participants’ written text – these people elected to write in a direction that was most comfortable for their hand, regardless of the position of the tablet. This suggests that the pen is more often moved to the location on the display, rather than the non-dominant hand bringing the display to the pen to set the context. Note that the latter has been shown to be a more natural movement with pen and paper or an indirect tablet on a desk (Fitzmaurice et al., 1999). Our speculation is that the problem may be due to the size and weight of the device.

Cognitive Differences

Cognitive differences between the pen and mouse are difficult to measure, but our observations suggest high level trends. Pen users prefer to single-click instead of double- click and hover visualizations appear more distracting. These may reveal a difference in the conceptual model of the GUI when using a pen compared to a mouse. Pen users preferred to single click, even for objects which are conventionally activated by a double-click, such as file and folder objects. The difficulty of double-clicking also leads to errors such as accidental file rename or duplication; making it important to reduce problems.

113

We observed differences when pen users interacted with objects which displayed tooltips and other information triggered by hover. There seemed to be more tooltips appearing and disappearing with the pen group compared to the mouse group, an effect we refer to as “hover junk.” This can be not only visually distracting, but pen participants also seemed to more consciously avoid tooltips when selecting the underlying target compared to mouse users. It was as though pen users perceived the depth ordering order for the pointer, tooltip, and target differently. The direct input nature of the pen seemed to bring the display pointer above the tooltip instead of below. Mouse users did not seem to be bothered by tooltips and simple clicked “through” them.

Limited Input

It is perhaps obvious, but the lack of a keyboard appears to be a serious handicap for pen users. The main problem of entering text with only a pen has been an active research area for decades: refinements to handwriting recognition, gesture-based text input, and soft keyboards continue. However, even though text entry was not part of our task, several pen participants noted the lack of a keyboard and even mimed pressing common keyboard shortcuts like copy (CTRL-C) and undo (CTRL-Z). We observed mouse users instinctively reaching for the keyboard to access command shortcut keys, list accelerator keys, and entering quantitative values. Although all the tasks in our study could be completed without pressing a single key, this is not the way that users work with a GUI. Recent command-line-like trends in GUIs such as full text search and keyboard application launchers will further contribute to the problem.

Study Methodology

Our hybrid study methodology incorporates aspects of traditional controlled HCI experimental studies, usability studies, and qualitative research. Our motivation was to enable more diverse observations involving a variety of contexts and interactions – hopefully approaching how people might perform in real settings.

114

Degree of Realism

Almost any study methodology will have some effect on how participants perform. In our study we asked participants to complete a prepared set of tasks on a device we supplied, instrumented them with 3-D tracking markers and a head-mounted camera, and ran the study in our laboratory. These steps were necessary to have some control over the types of interactions they performed and to provide us with rich logging data to analyze. It is important to note that our participants were not simply going about their daily tasks as they would in a pure field study. However, given that our emphasis is on lower level widget interactions, rather than application usability or larger working contexts, we feel that we achieved an appropriate degree of realism for our purposes.

Analysis Effort and Importance of Data Logs

Synchronizing, segmenting, and annotating the logs to get multi-faceted qualitative and quantitative observations felt like an order-of-magnitude increase beyond conducting a usability study or a controlled experiment. Our custom built software helped, but it did not eliminate long hours spent reviewing the actions of our participants. Qualitative observations from multiple rich observation logs are valuable, but not easy to achieve. Using two raters proved to be very important. Training a second coder forced us to iterate and formalize our coding decision process significantly. We feel this contributed greatly to a consistent assignment of codes to events and high level of agreement between raters. In addition, with two raters we were able to identify a greater number of events to annotate. Regardless of training and perseverance, raters will miss some events. We found that each of the observational logging techniques gave us a different view of particular interaction and enabled a different aspect to analyze. We found the combination of the pen event log, screen capture video, and head-mounted user view invaluable for qualitative analysis. The pen event log and screen capture video are the easiest to instrument and have no impact on the participant. The head-mounted camera presents a mild intrusion, but observations regarding occlusion and missed clicks would have been very difficult to make without it. For quantitative analysis, we relied on the pen event log and the motion capture logs. Although the motion capture data enabled the analysis of participant

115

movements, posture, and visualizing 3-D pen trajectories, it required the most work to instrument, capture and process.

3.7 Summary

We have presented and discussed results from our study of direct pen interaction with realistic tasks and common software applications. Our findings reveal five overarching issues when using direct pen input with a conventional GUI: lack of precision, hand occlusion, ergonomics when reaching, cognitive differences, and limited input. We feel that these issues can be addressed by improving hardware, base interaction, and widget behaviour without sacrificing the consistency of current GUIs and applications. Moreover, previous research has focused on issues other than occlusion, yet our results suggest that occlusion also has a profound effect on the usability of direct pen input.

Improving Direct Pen Input with Conventional GUIs

Ideally, addressing the overarching issues identified above should be done without radical changes to the fundamental behaviour and layout of the conventional GUI and applications. This would enable a consistent user experience regardless of usage context – for example, when a Tablet PC user switches between slate mode with pen input, and laptop mode with mouse input – and ease the burden on software developers in terms of design, development, testing, and support15. With this in mind, we feel improvements should be made at three levels: hardware, base interaction, and widget behaviour. Hardware improvements which reduce parallax and lag, increase input sensitivity, and reduce the size and weight of the tablet are ongoing, and will likely improve. Other improvements which increase the input bandwidth of the pen, such as pressure, tilt, and rotation sensing, may provide additional utility – but past experience with adding buttons and wheels to pens has not been encouraging. More innovative pen form factors may provide

15 Techniques which automatically generate user interface layouts specific to an input modality (Gajos & Weld, 2004) may ease the design and development burden in the future, but increased costs for testing and support will likely remain.

116

new direction entirely: for example, a pen-like device which operates more like a mouse as the situation requires. Base interaction improvements target basic input such as pointing, tapping, dragging, as well as adding enhancements to address aspects such as occlusion, reach, and limited input. Conceptually these function like a pen-specific interaction layer which sits above the standard GUI. A technique can become active with pen input, but without changing the underlying behaviour of the GUI or altering the interface layout. Windows Vista includes examples of this strategy: the ripple visualization for taps, the tap-and-hold for right-clicking, and “pen flicks” for invoking common commands with gestures. However, the success of these specific techniques is dubious since we did not observe any experienced Tablet PC using them. The behaviour of individual widgets can also be tailored for pen input, but this should be done without altering their initial size or appearance to maintain GUI consistency. Windows operating systems have already contained a simple example of this for some time. There is an explicit option to cause menus to open to the left rather than right (to reduce occlusion with right-handed users). This illustrates how a widget’s behaviour can be altered without changing its default layout – the size and appearance of an inactive menu remains unchanged.

Applications to Other Paradigms and Input Contexts

Our work emphasizes direct pen interaction with a conventional GUI. However, many of our results apply to other interaction paradigms and input contexts as well. During any type of direct manipulation (be it tapping, crossing, gestures, etc.), there are times when a target must be selected on the display. When this is necessary, then many of the same issues we identify above apply. Techniques such as crossing may eliminate the type of tapping errors we observed, and gestures will help reduce problems resulting from limited input, but issues such as occlusion are inherently part of any form of direct input.

117

Next Step: Base Level Techniques which Address Occlusion

After hardware improvements, which will continue to occur with engineering advancements, well-designed base interaction techniques have the greatest capability to improve direct input overall. In fact, researchers have already addressed many of the issues we identified, and in most cases they appear be to be compatible with base level improvements to convention GUIs (though this aspect is not always demonstrated). Ren and Moriya’s enhanced selection techniques (2000), Accot and Zhai’s crossing paradigm (2002), Ramos et al.’s Pointing Lenses (2007), and Ren et al.’s Adaptive Hybrid Cursor (2007) are all designed to enhance precision. Grossman et al.’s Hover Widgets (2006), Ramos and Balakrishnan’s pressure-based combined command and lasso selection (2007), and Forlines et al.’s HybridPointing (2006) address limited input and ergonomic reach. Cognitive differences such as click versus double-click for file selection could be easily implemented. One aspect missing from previous efforts are techniques that address occlusion. Recall that we found that occlusion from the hand and forearm make it difficult to locate targets, verify the success of actions, and monitor real-time feedback. This often leads to errors, inefficient movements, and fatigue. Our hypothesis is that if we can make improvements that address occlusion at a base level, the usability of pen input can be improved. However, before designing new interaction techniques, a thorough understanding of occlusion is needed.

118

4 Investigating Occlusion

In the previous chapter, we found that occlusion likely contributed to user errors, led to fatigue, and created inefficient movements. These results are based on an observational study with realistic tasks and common GUI software applications. The next logical step is to further validate and quantify these fundamental aspects of direct pen occlusion in a controlled setting. Certainly, any designer can simply look down at their own hand while they operate a Tablet PC and attempt to take the perceived occlusion into account, but this type of ad hoc observation is unlikely to yield sound scientific findings or universal design guidelines. To study occlusion properly, we need to employ experimental methods. In this chapter we present three experiments which explore the area and shape of occlusion, how occlusion affects target selection performance, and ways in which users contort their hand posture to minimize its effect. Our first experiment, Experiment 4-1, uses a novel combination of video recording, computer vision marker tracking, and image processing techniques to capture images of the hand and arm as they appear from the point-of-view of the user. We call these images occlusion silhouettes. Analyses of these silhouettes found that the hand and arm can occlude as much as 47% of a 12 inch display and that the shape of the occluded portion of the display varied across participants according to the style and size of their pen grip. The second experiment, Experiment 4-2, examines the effect of occlusion when performing three fundamental GUI interactions: tapping, dragging, and tracing. Our results show that although it is difficult to control for occlusion within a single direct input context,

119

there is reasonable evidence that occlusion has an effect on these fundamental GUI interactions. The third experiment, Experiment 4-3, investigates how participants contort their hand posture to minimize occlusion while performing a simultaneous monitoring task. We found that this posture contortion reduces performance and discuss how different participants use different posture contortion strategies. Based on the results of these experiments, we propose a small set of simple guidelines for designers and researchers regarding how to avoid the occluded area.

4.1 Related Work

Few researchers have investigated occlusion directly, but many have speculated on its effect and use this as motivation for the design of interaction techniques or to explain unexpected behaviour during usability studies and experiments. In pen computing, the design of Ramos and Balakrishnan’s (2003) Twist Lens, a sinusoidal shaped slider, is partially motivated to reduce occlusion from the user’s hand; Apitz and Guimbretières’ (2004) CrossY, uses predominant right-to-left movement to counteract occlusion with right-handed users; and Schilit, Golovchinsky, and Price’s pen- based XLibris ebook reader (1998) places a menu bar at the bottom of the display to avoid occlusion when navigating pages. In touch screen and tablet top interaction, occlusion is also cited as motivation. Shen et al. (2006) discuss table top techniques to combat occlusion, including remote manipulation of objects and visual feedback that expands beyond the area typically occluded by a finger. Other strategies include: placing the hand behind (Wigdor, Forlines, Baudisch, Barnwell, & Shen, 2007) or under the display (Wigdor et al., 2006); and shifting a copy of the intended selection area up and out of the area occluded by the finger (Vogel & Baudisch, 2007). Other researchers have cited problems with occlusion in non-occlusion specific experiments and usability studies. Grossman et al. (2006) found that users sometimes moved away from the experimental target to invoke their hover widget which they attribute to hand occlusion. Hinkley et al. (2007) discovered that conventional GUI tooltips could be easily blocked by the hand. Hinckley et al. (2006) found that users needed a chance to lift their hand to view the screen and verify progress when making a lasso selection. Dixon,

120

Guimbretière, and Chen (2008) located a start button below their main experimental stimulus to counteract hand occlusion. Ramos et al. (2007) argue that accuracy is impaired when using a direct pen because of pen tip occlusion, but provide no evidence. However, these occlusion-related design decisions and observations are based on an ad hoc understanding. To our knowledge, there are only five examples of researchers studying occlusion explicitly.

Brandl et al.

Brandl et al. (2009) use a simple paper-based experiment to determine which slices of a circular pie menu are most often occluded. A circle was drawn on paper with 12 slices identified by letters (Figure 4-1a). Participants placed a pen at the centre of the circle, and self-reported which slices they could see. Based on results from 15 right-handed and 3 left- handed participants, the authors calculated the mean number of times each slice was reported as visible (Figure 4-1b).

(a) experiment (b) results

Figure 4-1. Brandl et al.’s occlusion area experiment. (a) paper based experiment where participants self-reported visibility of 12 labelled pie slices; (b) results showing frequency of occluded pie slice. (from Brandl et al., 2009)

For right-handed participants, they found that four pie slices are occluded more than 50% of the time and that these slices emanate to the right of the pen position. Left-handed results are almost mirrored, with most occluded pie slices rotating clockwise one position. Their results provide adequate justification for the design of their occlusion-aware pie menu, but they are difficult to generalize since they only cover 12 discrete areas in the immediate vicinity of the pen.

121

Bieber, Rahman, and Urban

Bieber et al. (2007) used a simplified analytical analysis to quantify the amount of hand occlusion with a hand-held pen computer. They use a single shape to represent the occluded area (Figure 4-2b), and used this shape to calculate the amount of occlusion across all possible pen positions on the PDAs display (a 160 × 160 px, 5.7 × 5.7 cm display size). Based on this analysis, they found that 55% of the display can be occluded when pointing at the top-left corner, and that the bottom-right corner is almost always occluded. However, they do not describe how they created the occlusion shape in the first place, making reproduction and validation of their results difficult.

(a) occlusion with PDA (b) occlusion shape

p p

Figure 4-2. Bieber, Rahman, and Urban’s analytic study. The authors created a single shape (b) which is intended to represent the area occluded when using a PDA at point p (a). (diagram based on Bieber et al., 2007)

Forlines and Balakrishnan

Forlines and Balakrishnan (2008) investigate the interaction of tactile pen feedback and occlusion with tapping and crossing selection methods. They use the same 1-D tasks as Accot and Zhai (2002), but in addition to the tactile pen condition, they also include a between-subjects factor for direct and indirect input (the authors argue that a within-subjects design for input type may have asymmetric skill transfer). Their assumption is that direct and indirect input controls for occlusion. Based on results from a crossing task, the authors argue that tactile feedback can make up for loss of visual feedback caused by pen and hand occlusion – the implicit finding is that

122

occlusion reduces the benefit of visual feedback. In addition, they suggest that occlusion is less problematic for discrete crossing tasks, where the pen is lifted during the movement, compared to continuous crossing tasks, where the pen tip maintains contact with the display (Figure 4-3b). They explain that the user can survey the display when their hand is lifted and visually acquire the target more quickly. However, these results are based on crossing selection, not tapping. With indirect input and tapping, the authors found no significant differences (Figure 4-3a) when augmenting visual feedback with the tactile pen.

(b)experiment two tasks

(a) experiment one tasks

Figure 4-3. Experimental tasks used by Forlines and Balakrishnan. (from Forlines & Balakrishnan, 2008)

For the tapping task, the authors found a surprising interaction of target width and height with direct and indirect input: at the largest target width of 5.6 mm, direct input was faster than indirect input; and at the largest target distance of 179.8 mm, direct input was faster than indirect input. The authors suggest that this is because users can more easily track the position of the physical pen tip when using direct input, compared to tracking a cursor. Perhaps most alarming is that this interaction shows that using direct versus indirect input as experimental control for occlusion is problematic, especially as the target size and distance increase.

Hancock and Booth

Hancock and Booth (2004) investigate 2-D tapping selection performance with direct and indirect pen input. Like Forlines and Balakrishnan, they use direct versus indirect input as an experimental control for occlusion, but as a within-subjects factor. In their experimental task, a 6.1 mm end target is located at one of 12 radial positions 35 mm from the starting pen

123

position (the task ID is 2.8)16. They divided the display into four quadrants, to control for different starting positions, but found no statistical differences. Since their motivation is the performance of context menu invocation, the end target remains hidden until the pen first taps the display. Overall, they found direct input faster than indirect (with means times of 0.7 s and 1.0 s respectively) – a pronounced difference for such a simple task. Note that the authors used identical hardware for input (a 1024 × 768 px, 21 × 16 cm Tablet PC), but for the indirect condition, they rendered the targets on a much larger vertical display (a 1024 × 768 px, 141 × 102 cm SmartBoard) requiring a control-display ratio of 0.15:1. This may have introduced a confounding effect as participants reconciled the large visual output with the much smaller input. There may also have been an asymmetric transfer effect suggested by Forlines and Balakrishnan. Within the direct or indirect input conditions, there were significant effects for target direction. Post-hoc analysis revealed that, in the direct input condition, there was one target direction significantly slower than three or more other directions for right-handed participants (Figure 4-4b), and three target directions significantly slower than three or more other directions for left-handed participants (Figure 4-4a). There were no directions in the indirect input condition slower than three or more other directions. Since the physical hand movement is identical in either condition, the authors argue that the greater significant differences in the direct condition could be attributed to occlusion. However, the difference at the -60° direction for left-handed participants is unexpected given where we would expect the hand to be located.

16 Note that the target size and distance are reported in the paper as the unbelievably large dimensions of 61 mm and 350 mm respectively. In a personal communication with the first author (June 30, 2009), we confirmed that this was a typographical error.

124

(a) left-handed (b) right-handed N -90° N -90° -120° 1.0 -60° -120° 1.0 -60°

-150° -30° -150° -30° 0.5 0.5

W E W E 0 0 180° 0° 180° 0°

150° 30° 150° 30°

120° 60° 120° 60° S 90° S 90°

direct input indirect input Figure 4-4. Hancock and Booth’s results for direct and indirect input task time. (a) left-handed users; (b) right-handed users. Longitudinal axes in seconds, latitudinal axes compass direction. Data points which are reported as significantly greater than three or more other directions in the same input condition are circled. (data from Hancock & Booth, 2004)

The authors also asked participants which quadrant they preferred to have a context menu located relative to the pen. For right-handed participants, the least preferred location was the bottom-right and the most preferred the bottom-left. Left-handed participants responded with the exact same rankings, but vertically mirrored.

Inkpen et al.

Inkpen et al. (2006) conducted a series of four quantitative and qualitative experiments studying how left-handed users interacted with right- and left-aligned scrollbars on hand-held pen computers. In scrolling tasks in which participants were asked to acquire a specific icon or line of text in a list, participants were faster using a left-aligned scrollbar. Participants also reported that it was difficult to use the right-aligned scrollbar without blocking the display with their hand (Figure 4-5a). The authors observed some participants raise their grip on the pen to so they could keep their hand below the display, or arched their hand over the screen, to reduce occlusion (Figure 4-5b,c). The authors conclude that occlusion makes it difficult to visually scan the list while simultaneously scrolling increasing fatigue and decreasing performance. The scrolling task is an example of a simultaneous monitoring task, but without a strong control for the position of the content to be monitored.

125

(a) (b) (c)

Figure 4-5. Inkpen et al.’s left-handed users and right-aligned scrollbars. (a) occlusion when using right-aligned scrollbar; (b) and (c) compensating gestures used to counteract occlusion.

Summary

These examples suggest three avenues for investigation of occlusion: its area and shape, its effect on target selection performance; and compensatory postures to minimize its effect. However, the results remain inconclusive. Brandl et al. and Bieber et al.’s findings with regard to the area and shape of the occluded area are incomplete. Forlines and Balakrishnan use a 1-D target task with is unlikely to adequately capture the effect of occlusion on selection performance according to target direction. Hancock and Booth only investigate one type of target selection when the target is initially hidden. Inkpen noted compensatory postures, but their experiment tasks enable them to only compare across left- and right- aligned scrollbars.

4.2 Experiment 4-1: Area and Shape

The goal of our first experiment is to measure the size and shape of the occluded area of display. To accomplish this, we record the participant’s view of their hand with a head- mounted video camera as they select targets at different locations on the display. We then extract key frames from the video and isolate an occlusion silhouette of the participant’s hand and pen as they appear from their vantage point.

126

Participants

22 people (8 female, 14 male) with a mean age of 26.1 (SD 8.3) participated. All participants were right-handed and pre-screened for color blindness. Participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level physical behaviour. At the beginning of each session, we measured the participant’s hand and forearm since anatomical dimensions likely influence the amount of occlusion (Figure 4-6).

EL elbow to fingertip length

SL shoulder to elbow length

UL upper limb length including hand

FL upper limb length, elbow to crease of wrist, EL - HL

UL hand length, crease of the wrist to the tip of finger

HB hand breadth, maximum width of palm

UL

HL SL FL HB

EL

Figure 4-6. Anthropomorphic measurements. Diagram adapted from Pheasant & Hastlegrave (2006).

We considered controlling for these dimensions, but recruiting participants to conform to certain anatomical sizes proved to be difficult, and the ranges for each control dimension were difficult to define.

127

Apparatus

The experiment was conducted using a Wacom Cintiq 12UX direct input pen tablet. It has a 307 mm (12.1 inch) diagonal display, a resolution of 1280 × 800 pixels, and a device resolution of 4.9 px/mm (125 DPI). We chose the Cintiq because it provides pen tilt information which is unavailable on current Tablet PCs. We fixed the tablet in portrait-orientation and supported it such that it remained at an angle of 12 degrees off the desk, oriented towards the participant. Participants were seated in an adjustable office chair with the height adjusted so that the elbow formed a 90 degree angle when the forearm was on the desk. This body posture is the most ergonomically sound according to Pheasant & Hastlegrave (2006). To capture the participant’s point-of-view, we use a small head-mounted video camera which records the entire experiment at 640 × 480 px resolution and 15 frames-per-second (Figure 4-9a). The camera is attached to the head harness using hook-and-loop strips. This made it easy to move it up or down so that it could be aligned as close as possible to the center of the eyes without interfering with the participants’ line of sight. In pilot experiments, we found that we could position the camera approximately 40 mm above and forward of the line of sight, and the resulting image was very similar to what the participant saw. We considered mounting a pair of miniature cameras above each eye, and then warping the image using stereo vision to achieve the exact participant perspective. However, we were concerned about additional error introduced by stereo reconstruction. Perhaps more importantly, the characteristics of human stereo vision may horizontally shrink the actual occluded area compared to the area captured by a monocular, centrally located camera. Human eyes are separated by an intraocular distance of about 60 mm (depending on population) which horizontally shifts the image perceived with each eye by a factor of that distance (Steinman & Garzia, 2000). As an example, consider a hand like shape 40 mm wide and located 40 mm above the surface of the tablet with the eye (or eyes) centrally located 500 mm from the tablet (Figure 4-7). In this example, the occluded area produced by the hand would be 37.5 mm wide with stereo vision and 45 mm with monocular vision. The actual error will vary according to height of the occluding object and its width, and the relative location of the eye. This a limitation to single camera capture, but we felt this

128

was acceptable and adequate for our purposes especially considering it would be simpler to process, synchronize, and store.

(a) stereo (b) monocular

60 mm eyes are 500 mm from tablet

hand hand

40 mm 40 mm 40 mm 40 mm

37.5 mm 45.0 mm

Figure 4-7. Estimated error introduced by monocular versus stereo view. Example with eye(s) located 500 mm from the tablet, occluding object such as a hand located 40 mm above the tablet and 40 mm wide. The width of the occluded area would be 37.5 mm for stereo vision and 45.0 mm for monocular. Note: diagram is not to scale.

Fiducial markers were attached around the bezel of the tablet to enable us to transform the point-of-view frames to a standard, registered image perspective for analysis (Figure 4-9b). These are printed paper markers designed to work with the augmented reality marker tracking toolkit we used. Details of the image analysis steps are in the next section.

(a) (b)

Figure 4-8. Experiment apparatus. (a) seated at desk with head mounted camera to capture participants’ point-of-view; (b) fiducial markers attached around the tablet bezel (image from head mounted camera frame).

129

Figure 4-9. Head mounted camera.

Task and Stimuli

Participants were presented with individual trials consisting of an initial selection of a home target, followed by selection of one of two types of measurement targets. The 13.0 × 26.3 mm (64 × 128 px) home target was consistently located at the extreme right edge of the tablet display, 52.0 mm from the display bottom. This controlled the initial position of the hand and forearm at the beginning of each trial. We observed participants instinctively returning to this rest position when in our initial observational study. The location of the measurement target was varied across trials at positions inscribed by a 7 × 11 unit invisible grid (Figure 4-10a). This created 77 different locations with target centers spaced 24.9 mm (122 px) horizontally and 25.1 mm (123 px) vertically. We observed two primary styles of pen manipulation in our initial observational study: localized interactions where the participant rested the palm of their hand on the display (such as adjusting a slider), and singular, short interactions performed without resting the hand (such as pushing a button). Based on this observation, our task had two types of selection of measurement targets: tap – selection of a 13.0 mm (64 px) square target with a single tap (Figure 4-10b); and circle – selection of a circular target by circling within a 5.7 mm (28 px) tolerance between a 0.8 mm (4 px) inner and 6.5 mm (32 px) outer radius (Figure 4-10c). The circle selection is designed to encourage participants to rest their palm, while the tap

130

selection can be quickly performed with the palm in the air. The different shapes for the two selection tasks were intended to serve as a mnemonic to the user as to what action was required. The circle selection used an ink trail visualization to indicate progress. Errors occurred when the pen tip moved beyond the inner or outer diameter. We wanted this to be difficult enough to require a palm plant, but not tedious. In practice, participants took at least half-a- second to circle the target, which seemed to be enough to plant the palm. At the beginning of each trial, a red home target and a gray measurement target were displayed. After successfully selecting the home target, the measurement target turned red and the participant selected it to complete the trial. We logged all aspects of pen input, including pressure and tilt. Video 4-1 provides a demonstration of the experiment apparatus and task.

(a)

(b) (c)

Figure 4-10. Experiment 4-1 experimental stimuli. (a) 7 x 11 grid used to place the measurement target; red start target is located near the bottom right; (b) square measurement target for tapping; (c) circle measurement target showing ink trail (to encourage a longer interaction with the palm resting).

00:27 Vogel_Daniel_J_201006_PhD_video_4_1.mp4 Video 4-1. Area and shape experiment demonstration.

131

Design

We presented 3 blocks of trials for each of the two tasks. A block consisted of 77 trials covering each target position in the grid, making 3 repetitions for each grid position and task type. Trials were presented in randomized order within a block and the presentation order of tasks was balanced across participants. Before beginning the first block of a task, the participant completed 40 practice trials. In summary, the experimental design was: 2 Tasks (Tap, Circle) × 77 Target Positions × 3 Blocks = 462 data points per participant

Image Processing

To transform the point-of-view video into a series of occlusion silhouettes, we performed the following steps with custom built software (Figure 4-12):

Frame Extraction

We extracted video frames taken between the successful down and up pen events selecting the square target or just before the circular target was completely circled. To do this, we had to synchronize the video with the data log. We used a visual time marker which functions similar to a movie clapperboard. The time marker is a large red square containing a unique number. When this square is tapped, it disappears and a timestamp is saved to our data log. After the experiment, we scrubbed through the video and found the video time where the time marker disappeared. Then, using linear interpolation between bounding time marks, we located the corresponding video frame for a given log time. In most cases, the frame captured the pen at the intended target location, but occasional lags during video capture produced a frame with the pen separated from the target location.

Rectification

We used the ARToolkitPlus augmented reality library (Wagner & Schmalstieg, 2007) to track the fiducial markers in each frame and determine the location of the four corners of

132

the display. In practice, this sometimes required hand tuning when the markers were occluded by the hand or were out of frame due to head position. Using the four corner positions, we un-warped the perspective using the Java Advanced Image (Sun Microsystems, n.d.) functions PerspectiveTransform and WarpPerspective with bilinear interpolation, and cropped it to a final 267 × 427 px image. Note that due to our single camera set-up, the rectification step will shift the image of the hand down slightly relative to the actual eye view. As an example, if the eye position is at the end of a vector 500 mm and 50° from the centre of the tablet, and the camera is located 40 mm above and forward of the eye, the rectified image of a point on the hand 40 mm above the tablet will be shifted down by 6.2 mm (about 4 px in our rectified image) (Figure 4-11). The exact error will vary according to participant size and grip style, but the values above are typical. Rather than try to compensate for this slight shift and possibly introduce additional errors, we accepted this as a reasonable limitation of our technique.

40 mm 40 mm

eye camera eye is 500 mm from tablet

m m 8 5.6º 4 4

point on hand

m m 2 40 mm 5 5.6º

p 50º p’

error 6.2 mm 27.3 mm 33.5 mm tablet Figure 4-11. Estimated rectification error from head-mounted camera. Example with eye located 500 mm from the centre of the tablet at an angle 50°, camera located 40 mm above and forward of the eye, and point on hand located 40mm above tablet surface. The rectified position of the point on the hand as captured by the camera will be shifted down by 6.2 mm from p to p’. Note: diagram is not to scale.

133

Isolation

We used simple image processing techniques to isolate the silhouette of the hand. First, we applied a light blur filter to reduce noise. Then we extracted the blue color channel and applied a threshold to create an inverted binary image. We were able to use the blue channel to isolate the hand because the camera’s color balance caused the display background to appear blue (it was actually white). Since the color space of skin is closer to red, this made isolating the hand relatively easy. To remove any edge pixels from the display bezel, we applied standard dilation and erosion morphological operations (Dougherty, 1992). Finally, we filled holes based on the connectivity of pixels to produce the final silhouette.

(a) (b) (c)

Figure 4-12. Image processing steps. (a) frame extraction; (b) rectification; (c) silhouette isolation.

Results

Unfortunately, lighting and video problems corrupted large portions of data for participants 7, 14, 21, and 22 making isolation of their occlusion silhouettes unreliable. Capture problems with participant 8 corrupted the first block, but we kept this participant and their remaining blocks. In the end, our analysis included 18 out of the original 22 participants

(6 female, 12 male) with a mean age of 26.3 (SD 8.4). These types of problems are typical when using video capture to generate empirical data: it is difficult to produce the same kind of “clean” data generated by experiments recording straightforward variables such as performance time and errors. Researchers attempting similar work should recruit extra participants and run multiple trials as we did, to ensure a reasonable amount of clean trials can be obtained.

134

Participants occasionally produced errors (mean 4.4%), but we included the silhouette regardless. Since each target must be successfully tapped or circled before continuing, the final video frame for an error trial would not differ. Also, the logged pen tilt values were very noisy, in spite of silhouette images suggesting tilt should be more uniform. Our attempts to filter them post-hoc were unsuccessful, and we were forced to leave them out of our analysis.

Occlusion Ratio

We define the occlusion ratio as the percentage of occluded pixels within all possible display pixels. We used a ratio, rather than actual area, for unit independence. The actual area can be computed using the display area of 42,654 mm2. Since occlusion ratio varies according to pen location, we calculate the occlusion area for each X-Y target location in the 7 × 11 grid. Not surprisingly, we found the highest occlusion ratios when the pen was near the top left of the display. However, the highest value did not occur at the extreme top, but rather a short distance below (Figure 4-13). The highest values did not differ greatly by task with 38.6% for circle (SD 6.2) and 38.8% for tap (SD 14.2). Participant 1 had the highest occlusion ratio with 47.4% for tap and 46.3% for circle. These mean ratios may reflect a sampling bias among our participants since controlling for aspects such as anatomical size and pen grip style is difficult to do a-priori. To help address this, we compare occlusion ratios given participant size.

38.8% 38.6%

50 50 % 25 Y (pixels) % 25 Y (pixels) 0 0

X (pixels) X (pixels) 800, 1280 800, 1280 (a) circle (b) tap Figure 4-13. Mean occlusion ratio. Plotted by X-Y display location for: (a) circle task; (b) tap task.

135

Influence of Participant Size

We established a simple size metric S to capture the relative size of each participant’s arm and hand compared to the general population. S is the mean of three ratios between a participant measurement and 50th percentile values from a table of anthropomorphic 17 statistics . We use measurements for shoulder length (SL), hand length (HL), and hand

breadth (HB). Since tables of anthropomorphic statistics are divided by gender, we compute S th for men and women using different 50 percentile values. We found mean S values of 0.99

(SD 0.04) and 1.01 (SD 0.06) for men and women respectively, indicating that the size of our participants was representative. We expected to see a relationship between S and the maximum occlusion ratio since, larger hands and forearms should cover more of the display. However, a plot of S vs. maximum occlusion ratio does not suggest a relationship (Figure 4-14).

50

45 female 40 male

35

30 Max Occlusion Raio (%) Raio Occlusion Max 0.90 1.00 1.10 S - Participant Size Ratio Figure 4-14. Participant size (S) vs. max occlusion ratio.

Occlusion Shape

Although occlusion ratio gives some sense of the scope of occlusion, it is the shape of the occluded pixels relative to the pen position that is most useful to designers. Figure 4-15 illustrates the mean shapes for participants for circling and tapping tasks. Since the captured

17 Anthropomorphic statistics for U.S Adults 19 to 65 years old (Pheasant & Hastlegrave, 2006).

136

image of the forearm and hand are increasingly cropped as the pen moves right and downward, we illustrate shapes for positions sampled near the middle-left portion of display. It is immediately apparent that occlusion shape varies between participants. There are differences which are likely due to anatomical size, possibly related to gender: compare how slender female participant 4 appears compared to male participant 5. Some participants adopt a lower hand position occluding fewer pixels above the target: contrast the height of participant 8 with participant 9. The forearm angle also often varies: for example, participant 20 has a much higher angle than participant 10. A few participants grip the pen far away from the tip, occluding fewer pixels around the target: participant 18 in the tapping task is one example. When comparing individual participant shapes between the tap and circle tasks, the visual differences are more subtle and inconsistent. For example, we expected the higher speed of the tapping task to create a more varied posture resulting in blurry mean shapes. This seems to be the case for participants 2, 8, and 17, but there are contrary examples when circling shapes are more blurred: see participants 6 and 20. Only participants 2 and 12 seemed to adopt very different postures for tapping (low) and circling (high).

137

(a) tap task

different grips overall size fist height 1234568910

11 12 13 15 16 17 18 19 20

(b) circle task

1234568910

11 12 13 15 16 17 18 19 20

different grips grip height forearm angle

1000 mm2 female

Figure 4-15. Occlusion shape silhouettes for each participant. (a) tap task and (b) circle task. Generated from 9 samples from 3 pen positions at middle-left portion of display; see text for discussion of participant and task comparison highlights.

The combined participant mean silhouette gives an overall picture of occluded pixels near the pen position across all participants (Figure 4-16). As with individual participants, differences between tasks are subtle. The tapping task mean silhouette appears slightly larger, higher (Figure 4-16a), and sharper compared to the circling task (Figure 4-16b). In both cases

138

many pixels above the horizontal position of the pen tip are typically occluded. Note that fewer pixels are occluded in the immediate vicinity of the pen’s position.

(a) tap (b) circle

100 mm2

Figure 4-16. Mean occlusion silhouettes. (a) tap task; (b) circle task. The lower row is visually augmented to show silhouette areas with greater than 50% concentration.

139

Pixels Most Likely to be Occluded

Another way to view occlusion shape is to look at which display pixels are most likely to be occluded given a distribution of pen positions. To create a simple baseline for analysis, we assume that the probability of accessing any position on the display is uniform. Under this distribution, commonly occluded display pixels across participants and target positions form a cluster of frequently occluded pixels emanating from the lower two-thirds along the right edge (Figure 4-17). There appears to be no difference between the circle and tap tasks. A uniform distribution of pen positions is not representative of common application layouts: consider the frequency of accessing menus and toolbars located along the top of the display. With this in mind, the often occluded pixels near the bottom right are even more likely to be occluded.

140

(a) tap (b) circle

Figure 4-17. Pixels most likely to be occluded. Given a uniform distribution of pen positions: (a) tap task; (b) circle task. Darker pixels are occluded more often, the lower row is visually augmented to show areas with greater than 50% concentration.

Discussion

The results of this experiment reveal four main findings:

1. A large portion of the display can be occluded depending on pen position; with our participants it was typically as high as 38%, but could range up to 47%.

141

2. The pixels immediately below the pen position are not occluded by the hand as much as we expected, but more pixels are occluded above the pen tip horizon than previously thought.

3. Individuals typically have a signature occlusion silhouette, but silhouettes for different individuals can have large differences.

4. There appears to be no simple relationship between the size of the occluded area and anatomical size.

The Impact of Grip Style

The difference in occlusion silhouettes is due to a combination of different pen grip styles (Figure 4-18) with some contribution of anatomical size. In chapter 2, we discussed research investigating and classifying pen grips in traditional pen and paper (Figure 2-9) as well as Wu and Luo’s (2006a) observations of grips used when operating a Tablet PC (Figure 2-18). We did not see the extreme style of grip Wu and Luo describe, but observed more traditional pen grip styles observed in writing. While we could classify our participant grips as variants of the tripod grip or perhaps even inefficient power grips, these classifications alone do not provide a relationship to the area and shape of the occlusion silhouette. Instead, describing pen grip according to three basic dimensions would be more useful: the size of the fist, the angle which the pen is held, and the height of grip on the shaft of the pen. We believe it is these characteristics of grip style that interact with anatomical measurements and ultimately govern occlusion area.

142

(a) (b) (c) Figure 4-18. Video stills of observed grip styles. (a) loose fist, low angle, medium grip height; (b) tight fist, high angle, high grip height; (c) loose fist, straight angle, low grip height.

Left-handed Users

We conducted a small follow-up study with two left-handed users. Similar to Hancock and Booth’s finding with performance (2004), we found that the left-handed data mirrored the right-handed individuals (Figure 4-19).

143

(a) tap task

12

(b) circle task

12

Figure 4-19. Left-handed participant results. Individual occlusion shapes on left, pixels most likely to be occluded on right for (a) tap task; (b) circle task. Darker pixels are occluded more often, the right row is visually augmented to show areas with greater than 50% concentration. Influence of Clothing

We gathered our data for sleeveless participants to maintain a consistent baseline, but we recognize that size of the occlusion silhouette could be much larger when clothed (consider using a tablet outside while wearing a parka and mittens, or even with a loose fitting sweater or jacket). As a general rule, Pheasant and Hastlegrave (2006) suggest adding 25 mm to all anatomical dimensions for men and 45 mm for women to account for thickness of clothing.

4.3 Experiment 4-2: Performance

In this experiment, we wish to measure the effect of occlusion on performance with three fundamental GUI selection tasks: tapping, dragging, and tracing. We extend the work of Hancock and Booth (2004) with the addition of dragging and tracing, multiple target distances, a visible condition for the end target, and controlling for occlusion in a single direct input context. In the language of Forlines and Balakrishnan (2008), tapping is a discrete task, and dragging and tracing are continuous tasks.

144

Previous work using indirect devices has already established that tapping (or “clicking”) is faster than dragging (I. S. MacKenzie et al., 1991), and dragging is faster than tracing (also referred to as a “trajectory task”) (Accot & Zhai, 1997, 1999). This is because of the decreased freedom of movement as one goes from tapping to dragging and from dragging to tracing: when tapping, the location of the pen between targets has no effect on the success or failure of the task; when dragging, the pen must stay pressed against the surface of the display, but the intermediate 2-D location has no effect; and when tracing, the pen must not only be pressed down, but the 2-D path between targets is also constrained. Hancock and Booth (2004) and Forlines and Balakrishnan (2008) argue for an effect of occlusion based on a comparison of direct and indirect input conditions. However, de- coupling input and display space, especially when display orientation is changed, may introduce other hidden variables which confound a strict control for occlusion. We approach this differently, by controlling for occlusion in a single direct input context using a cross-hair visual augmentation (on the end target) intended to circumvent the effect of occlusion. We include two additional conditions for target visibility: where the end target is hidden until the start target is selected; and where the end target is always visible. Hancock and Booth (2004) only include the hidden condition since their motivation was to simulate a context menu. Our experiment has three main goals:

1. Investigate the effect of end target direction and distance on performance:

we expect that when the end target is occluded, or when an occluded portion of the display separates the current location from the end target, performance will suffer;

we expect this effect will be most pronounced with tracing, followed by dragging, and then tapping;

we expect this effect will be most pronounced when the end target is initially hidden.

2. Determine if the cross-hair visual augmentation mitigates the effect of occlusion:

we expect the cross-hair augmentation will have higher performance than other target visibilities.

145

3. Verify the relative performance of tapping, dragging, and tracing with a direct input pen device:

we expect lowest performance with tracing, followed by dragging, and then tapping;

we expect the ordering to be the same regardless of target visibility.

Participants and Apparatus

18 people (7 female, 12 male) with a mean age of 25.3 (SD 7.0) participated18. All participants wrote with their right-hand and were pre-screened for color blindness. As in Experiment 4-1, participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level physical behaviour. The apparatus is identical to Experiment 4-1: a Wacom Cintiq 12UX direct input pen tablet for input and display, fiducial markers attached around the tablet bezel, and a head- mounted video camera to capture the participant’s point-of-view (Figure 4-9).

Task and Stimuli

Participants were asked to complete three types of fundamental target selection tasks: Tapping, Dragging, and Tracing. Each of these tasks has a start target and end target which begin and end the task respectively. We introduced three variations for displaying the end target: Hidden, where the end target only appeared after the start target is selected; Visible, where the end target is visible from the beginning; and Crosshair which is the same as Visible except for the addition of a large crosshair to visually augment the target position. Note that the distinction between Visible and Hidden is based on when the target is rendered on the display, not according to whether the target was visible or hidden due to hand

18 In fact, we ran 22 participants. We used three participants at the beginning as a pilot to fine tune the experimental conditions. After running 18 more participants for the main study, initial analysis found that one participant was an outlier. Their overall mean time was almost twice as slow as all other participants, so that during outlier removal, 37% of their non-error trials would have been removed. This may have been due to an emphasis on accuracy rather than speed (their mean error rate was 1.2% error rate overall, compared to the overall participant mean closer to 5%), or because they were our oldest participant at 50 years of age, a factor shown to reduce performance in pointing tasks (Worden, Walker, Bharat, & Hudson, 1997). We decided to remove this participant and run one additional person in their place.

146

occlusion. Controlling for hand occlusion a-priori is difficult given different occlusion silhouettes we observed in Experiment 4-1. We chose the Visible and Hidden types of end target reveal based on common situations we observed in our observational study in chapter two. The hidden condition models cases such as selecting a tab which in turn reveals a palette of widgets for subsequent selection. The visible condition models cases such as pressing a button on a visible toolbar. We added the Crosshair visibility condition in an effort to minimize the effect of occlusion within a single direct input context. The crosshair is composed of vertical and horizontal gray lines that extend the full length and width of the display and intersect behind the end target. In theory, with the addition of this cross hair, participants should locate the end target more quickly even when it is occluded by their hand (Figure 4-20).

(a) (b) Figure 4-20. Crosshair to minimize effect of occlusion. (a) the end target is initially occluded (indicated with a dashed outline), but visually augmented by the crosshair; (b) the end target is selected.

Details of Tasks

The Tapping task requires selecting a start target and then an end target in rapid succession, both 6.1 mm (30 px) square (Figure 4-21a). The Dragging task required the participant to first tap the pen down on a red, drag-able start target which was 6.1 mm (30 px) square, then, with the pen pressed down, drag it so that it was positioned completely inside the bounds of a 12.2 mm (60 px) square docking area end target displayed as a red dashed outline (Figure 4-21b). The centre of the start target snaps to the pen position, so the 12.2 mm dock and 6.1 mm drag-able target required

147

the same precision as taping a 6.1 mm target. There was no constraint on the path between start target and dock. When an error occurred, the start target remained at the last position instead of forcing the participant to repeat the trial again from the beginning. Like tapping, there are two visibility variants: the dock is visible before the start target is dragged, and the dock is hidden until after the drag has begun. The Tracing task required the participant to draw a line from a start position to an end position while keeping the pen within an 8.2 mm (40 px) wide, straight path (Figure 4-21c). Progress was displayed as an ink trail and errors occurred when the pen crossed the path boundary lines or continued past the end position. If an error occurred, a new start position was displayed at the farthest point traced thus far, and the participant could continue from there. The visibility variant applies to tracing as well; with the hidden variant showing the path only after the participant begins drawing in the start position.

initial state

task completed

(a) Tapping (b) Dragging (c) Tracing Figure 4-21. Tapping, Dragging, and Tracing tasks. Examples are for the Visible end target condition. For each task, the initial state on top row with start target in red and end target in blue. Final state after task completion is shown on bottom row. An in progress state is shown in between. The Hidden condition is similar, except the blue end target is not rendered in the initial state.

Note that only a single target size was used for each task, since unlike previous work examining precision (Ren & Moriya, 2000) or modeling a device with Fitts’ law (I. S. MacKenzie et al., 1991), we are primarily interested in the effect of occlusion given target distance and direction. The distance and direction of the end target relative to the start target was experimentally controlled to cover eight radial directions and four distances (see Figure

148

4-22). The position of the start target was randomly placed to prevent participants from anticipating a trial.

Time: 00:35 Vogel_Daniel_J_201006_PhD_video_4_2.mp4 Video 4-2. Performance experiment demonstration.

N -90°

142.9 mm (700 px)

-135° -45° NW102.0 mm NE (500 px)

61.2 mm (300 px)

20.4 mm (100 px)

180° 0°

W E

135° 45° SW SE

S 90° Figure 4-22. Target directions and distances.

Design

A repeated measures within-participant factorial design was used. The independent variables were: Task (Tapping, Dragging, Tracing); Visibility (Crosshair, Visible, Hidden); Direction (N, NE, E, SE, S, SW, W, NW); and Distance (20.4, 61.2, 102.0, 142.9 mm) (100, 300, 500, 700 px).

149

The presentation of the 3 Tasks were counter balanced across participants and the 3 Visibilities we presented in order, from most visible to least visible (first Crosshair, then Visible, and last Hidden). Ideally, both Task and Visibility would have been fully counter balanced, but this would have required 9 participant orders. We felt that a progression of Visibility was more acceptable since any transfer effect would most likely reduce the strength of significant effects, making our analysis more conservative. All 8 Directions and 4 Distances were presented in random ordering as a block of trials. The position of the start target was also randomized. In early pilots, we tried placing the start target at a constant location at the centre of the display like Ren and Moriya (2000), but participants were able to more easily anticipate the length and direction of the end target. In addition, Hancock and Booth (2004) did not find an effect for start position on their larger display, so a fully random position is reasonable. There were 3 blocks for each Task and Visibility combination creating 3 repetitions of each direction and distance trial. Before each new Task, the experimenter gave a demonstration and the participant completed 9 practice trials. In summary, the experimental design was: 3 Tasks (Tapping, Dragging, Tracing) × 3 Visibility styles (Crosshair, Visible and Hidden) × 3 Blocks × 4 Distances (20.4, 61.2, 102.0, 142.9 mm) × 8 Directions (N, NE, E, SE, S, SW, W, NW) = 864 data points per participant

Data Preparation

Response time measurements tend to be positively skewed, since it is possible for a single trial time to be many times longer than the mean, but virtually impossible for the inverse to occur. This can be due to loss of concentration, momentary hesitation, environment distraction, or hardware problems. We compensate by removing outliers (I. S. MacKenzie & Buxton, 1992). We removed 574 error-free trials (3.7% of 15552 total trials)

which had a selection time more than 3 SD away from the cross-participant mean for a corresponding combination of Task, Visibility, Direction, and Distance.

150

Results

Repeated measures analysis of variance (ANOVA) showed that order of presentation of Technique had no significant effect on time or errors, indicating that a within-subjects design was appropriate.

Learning and/or Fatigue Effects

A 3 × 3 (Block × Task) within subjects ANOVA found a significant interaction between

Block and Task on selection time (F2,30 = 4.352, p = .022, Greenhouse-Geisser adjustment). A post hoc analysis19 found one significant difference between Blocks 2 and 3 for Tapping(p = 0.001). The mean selection time was 22 ms slower in Block 3 compared to Block 2, with

mean times of 663 ms (SE 18.0) and 641 ms (SE 18.3) respectively. Another 3 × 3 (Block ×

Task) within subjects ANOVA also found a significant interaction between Block and Task on

error rate (F4,60 = 2.570, p = .047). A post hoc analysis found significant differences between Block 3 and Blocks 1 and 2 for Tapping (p < .04). The mean error rate for Block 3 was .02 higher than Block 1 and .023 higher than Block 2, with mean rates of .043 (SE 0.8), .040 (SE

0.9), and .063 (SE 1.1) for Blocks 1, 2, and 3 respectively. These results suggest no learning effects are present, but that participants experienced fatigue at Block 3 during the Tapping task. Although the effect is relatively small, we decided to remove this block from subsequent analysis.

Error and Time by Visibility

To determine if the cross-hair visibility augmentation was successful in mitigating occlusion, we examine its overall effect in terms of error rate and selection time. We aggregated the data by Task and Visibility and performed two 3 × 3 (Task × Visibility) within

subjects ANOVAs using error rate and selection time as dependent variables. For error rate, most values were reasonably low between 3% and 6%. We did not find a

main effect for Visibility, but did find a significant Task × Visibility interaction (F4,68 = 11.776, p < .001). A post hoc multiple means comparison found that for Tracing, Hidden had a higher error rate than Visible, for Dragging, Hidden had a slightly lower error rate than

19 All post-hoc analyses use the conservative Bonferroni adjustment.

151

Visible, and for Tapping, Hidden had a slightly lower error rate than Crosshair (Figure 4-23). There were no Task differences between Crosshair and Visible.

0.10 Crosshair Visible Hidden

0.08

0.06

Error Rate Error 0.09 0.04 0.06 0.06 0.06 0.05 0.02 0.04 0.04 0.03 0.03

0.00 Tapping Dragging Tracing Task Figure 4-23. Error rate by Task and Visibility. Error bars are 95% confidence interval.

For selection time, we found a main effect for Visibility (F1.5,25.1 = 528.359, p < .001,

Greenhouse-Geisser adjustment) and a significant Task × Visibility interaction (F4,68 = 28.060, p < .001). A post hoc multiple means comparison found all Visible conditions significantly different overall, with Crosshair 14 ms slower than Visible, and Hidden 227 ms slower than Crosshair. Within each task, a post hoc multiple means comparison found a similar pattern (Figure 4-24): Hidden slower than Crosshair and Visible (at least 160 ms, 209 ms, and 310 ms slower for Tapping, Dragging, and Tracing respectively). Crosshair and Visible were only significantly different in the Dragging Task, with Crosshair 30 ms slower in that case. Note that although it is tempting to compare performance between Tasks in Figure 4-24, this does not consider the relative task index of difficulty (ID). Later, we compare Tasks by modelling them with Fitts’ law based on ID.

152

1.4

1.2 Crosshair Visible Hidden

1.0

0.8

0.6 1.27 1.02 0.96 0.95 Selection Time (s) 0.4 0.75 0.81 0.78 0.59 0.59 0.2

0.0 Tapping Dragging Tracing Task Figure 4-24. Selection time by Task and Visibility. Error bars are 95% confidence interval.

Our analysis of selection times and error rates for the Crosshair condition for each Task by Angle, and by Angle × Distance, did not reveal any obvious differences from the Visible condition. This, together with the results discussed in the preceding paragraphs, suggests that the Crosshair visibility condition did not provide any significant benefit beyond the Visible condition. Therefore, to simplify reporting, we removed Crosshair from subsequent analysis, and focus on differences between Visible and Hidden.

Angle × Distance Interactions for Selection Time and Error Rate

The data was aggregated for each Task and Visibility by Direction and Distance across blocks for subsequent analysis. We analyze each Task and Visibility combination individually with six separate 8 × 4 (Direction × Distance) within subjects ANOVAs. The Direction × Distance interaction is the most useful for interpretation. For selection time, the Direction × Distance interaction was significant for all Task and

Visibility combinations (all F21,357 > 3.3, p < .001). Post hoc multiple means comparisons found many differences. Figure 4-25 illustrates Directions which are significantly slower than three or more other Directions within the same Distance: this highlights more pronounced trends. For the most part, more significant differences were found as Task becomes more complex (Tapping to Dragging to Tracing) and when the end target is Hidden. With the exception of Tapping, Hidden (Figure 4-25b), the shortest Distance has the fewest significantly slower Directions. Most interesting is that overall, the most common Directions within a Distance to be significantly slower are NW, E, and SE.

153

(a) Tap,Visible N (b) Tap, Hidden N 1.0 1.0

NW NE NW NE

0.5 0.5

W 0 E W 0 E

SW SE SW SE

S S

(c) Drag,Visible N (d) Drag, Hidden N 1.5 1.5

NW NE NW NE 1.0 1.0

0.5 0.5

W 0 E W 0 E

SW SE SW SE

S S

(e) Trace,Visible N (f) Trace, Hidden N 2.5 2.5

NW 2.0 NE NW 2.0 NE 1.5 1.5

1.0 1.0

0.5 0.5

W 0 E W 0 E

SW SE SW SE

S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-25. Mean Selection Time by Distance, Direction, Task, and Visibility. Longitudinal axes in seconds, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. Note that the most common Directions within a Distance to be significantly slower are NW, E, and SE.

154

For error rate, the Direction × Distance interaction was only significant for the Tracing

and Hidden combination (all F21,357 = 1.794, p = .018). A post hoc multiple means comparison found that the E Direction had a higher error rate than NW, N, and W at the shortest Distance ,and a higher error rate than S at the longest Distance (Figure 4-26).

Trace, Hidden N 0.4

NW NE

0.2

W 0 E

SW SE

S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-26. Error rate by Distance, Direction for Tracing and Hidden Visibility. Longitudinal axes is error rate, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than one or more Directions in the same Distance band are circled.

To explore error in the Tracing task in more detail, we examine the type of error which occurred. During the experiment, we logged three types of errors for the Tracing task: a premature pen lift before the entire line was traced (Up error); when the pen went beyond the width of the tracing boundary (Tolerance error); and when the pen overshot the end of the line (Overshoot error). Note that only a single error type can occur with the Tapping and Dragging tasks. We analyze each of the Tracing task error types with each Visibility individually using

six separate 8 × 4 (Direction × Distance) within subjects ANOVAs. The only significant Direction × Distance interaction was with the Hidden visibility and Overshoot error type

(F5.5,91.1 = 3.117, p = .01, Greenhouse-Geisser adjustment). A post hoc multiple means comparison did not find any differences, most likely due to the conservative Bonferroni adjustment and the large number of comparisons. However, a plot may suggest some trend towards more Overshoot errors in the E Direction (Figure 4-27).

155

Trace, Hidden N 0.2

NW NE

0.1

W 0 E

SW SE

S 20.4 mm (100 px) 142.9 mm (700 px) Figure 4-27. Overshoot errors with Tracing Task and Hidden Visibility. Longitudinal axes is mean number of overshoot errors, latitudinal axes compass direction. Only the shortest and longest Distances are shown, since only they had significant differences with overall error rate.

Other Performance Measures

To examine movement characteristics during each trial, we use two accuracy measures proposed I. S. MacKenzie, Kauppinen, and Silfverberg (2001) and a third measure we devised which is more specific to pen input. Movement Direction Change (MDC) captures the rate at which the movement path changes direction (Figure 4-28b) relative to the ideal task path (Figure 4-28a). To normalize across distances, we report this statistic as number of changes per cm. Movement Error (ME) captures how far the movement path deviates from the ideal task path between start and end target (Figure 4-28c):

∑| | (4-1)

where is the distance from each movement sample point to an ideal left-to-right task trajectory along 0. For calculation purposes, we transform all movement samples and target locations to this simplified left-to-right task trajectory reference frame. We report ME in mm. Out-of-range (OOR) captures the rate at which the pen is lifted out of the digitizer’s tracking range (Figure 4-28d). This statistic is only relevant to the Tapping task, since a pen

156

lift in Dragging or Tracing would cause an error. We report this statistic as number of out-of- range events per cm.

(a) idealized movement trajectory

(b) Movement Direction Change (MDC)

(c) Movement Error (ME)

(d) Out-of-range (OOR)

Figure 4-28. Illustration of other performance measures. Illustration based on Figures 1, 3, and 4 in I. S. MacKenzie et al. (2001)

We chose these measures to capture qualities of movement which are most likely caused by occlusion. VDC and ME should capture movement deviation strategies used to visibly acquire an occluded end target during the Tapping and Dragging tasks. MDC should capture occlusion-related hesitation or contortion symptoms during the Tracing task. As before, the Direction × Distance interaction is perhaps most useful for interpretation. For this analysis, as with selection time analysis, the data was aggregated for each Task and Visibility by Direction and Distance across blocks for subsequent analysis. We analyze each Task and Visibility combination individually with six separate 8 × 4 (Direction

× Distance) within subjects ANOVAs. To characterize overall differences between Tasks, we aggregated Task and Visibility across all Directions, Distance, and blocks, and perform a single 3 × 2 (Task × Visiblity) within subjects ANOVA. Although this level of analysis is specific to our experimental design, it does indicate a possible trend of a Task and Visibility effect with these performance measures.

Movement Direction Change (MDC)

Overall, there was a significant Task × Visibility interaction on ME (F2,34 = 52.315, p < .001). Post hoc multiple means comparisons found differences between all Tasks in the Visible condition and between Tracing and the other two Tasks in the Hidden condition

157

(Figure 4-31). The slower, constrained movement while Tracing may introduce more direction changes as the user adjust their posture to see beyond their hand.

2.0

Visible Hidden 1.5

1.0 1.61

1.15

MDC (number per cm) per (number MDC 0.5 0.96 0.92 0.76 0.82

0.0 Tapping Dragging Tracing Task Figure 4-29. Movement Direction Change (MDC) by Task, and Visibility. Error bars are 95% confidence interval.

The Direction × Distance interaction with MDC was significant for all three Tasks with

the Hidden Visibility and the Tracing Task with Visible Visibility (all F21,357 > 1.8, p < .02). Post hoc multiple means comparisons found several differences, but only in the Tracing Task were there Directions which are significantly greater than three or more other Directions within the same Distance (Figure 4-30). These cases appear exclusively in the E Direction.

158

(a) Tap, Hidden N (b) Drag, Hidden N

3.0 3.0 NW NE NW NE

2.0 2.0

1.0 1.0

W 0 E W 0 E

SW SE SW SE

S S

(c) Trace,Visible N (d) Trace, Hidden N 3.0 3.0 NW NE NW NE

2.0 2.0

1.0 1.0

W 0 E W 0 E

SW SE SW SE

S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-30. Movement Direction Change (MDC) by Distance, Direction, Task, and Visibility. Longitudinal axes in number per cm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. Note that Tapping and Dragging with Visible Visibility are not shown since no statistical differences were observed.

Movement Error (ME)

Overall, there was a significant Task × Visibility interaction on ME (F2,34 = 34.111, p < .001). Post hoc multiple means comparisons found differences between all Tasks in the Visible condition and between Tracing and the other two Tasks in the Hidden condition (Figure 4-31). In both Visibility conditions, Tracing has lower ME which is expected due to the constrained nature of the task – in fact, Tapping and Dragging had ME four times greater in the Hidden condition. Dragging has slightly lower ME than Tapping in the Visible condition.

159

5.0 Visible Hidden

4.0

3.0

ME (mm) 4.45 2.0 4.06

1.0 1.85 1.43 0.90 0.95 0.0 Tapping Dragging Tracing Task Figure 4-31. Movement Error (ME) by Task, and Visibility. Error bars are 95% confidence interval.

For ME, the Direction × Distance interaction was significant for all Task and Visibility combinations except the Tracing Task with Hidden Visibility (all F21,357 > 2.008, p < .001). Post hoc multiple means comparisons found several differences, but only a few Directions which are significantly greater than three or more other Directions within the same Distance (Figure 4-32). With the Hidden Visibility there are pronounced increases at certain Directions and certain Distances for Tapping and Dragging, concentrated around the E Direction. With Dragging these differences occur at the 102.0 mm and 142.9 mm Distances, but with Tapping, they occur at the 61.2 mm and 102.0 mm Distances. With the Visible Visibility, there are fewer pronounced increases, with almost all occurring with Dragging: at the SE Direction with the 61.2 mm and 102.0 mm Distances. Unlike the Tracing Task, the intermediate X-Y pen positions (and therefore ME) with Tapping and Dragging are unconstrained – ME will of course be much smaller and more regular for the constrained Tracing task.

160

(a) Tap,Visible N (b) Tap, Hidden N 12.0 3.0 NW NE NW NE 8.0 2.0

1.0 4.0

W 0 E W 0 E

SW SE SW SE

S S

N N (c) Drag,Visible (d) Drag, Hidden 12.0 3.0 NW NE NW NE 8.0 2.0

1.0 4.0

W 0 E W 0 E

SW SE SW SE

S S

N N (e) Trace,Visible (e) Trace, Hidden 12.0 3.0 NW NE NW NE 8.0 2.0

1.0

W 0 E W E

SW SE SW SE

S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-32. Movement Error (ME) by Distance, Direction, Task, and Visibility. Longitudinal axes in mm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. *The E Direction for the 102.0 mm Distance of the Dragging Task with Hidden Visibility is significantly different than all other Directions at that Distance.

161

Out-of-Range (OOR) For OOR, we only compare Visibility within the Tapping Task using a one-way within subjects ANOVA. An overall effect of Visibility was found (F1,17 = 14.884, p = .001) with higher OOR (more pen lifts) in the Hidden condition (see Figure 4-33 for values).

0.030

Visible Hidden 0.025

0.020

0.015

0.023 0.010 OOR (number per cm) per (number OOR 0.005 0.005 0.000 Tapping Dragging Tracing Task Figure 4-33. Out-of-Range (OOR) by Task, and Visibility. Error bars are 95% confidence interval. Note that because Dragging and Tracing require that the pen contact the display throughout, the OOR is 0.

The Direction × Distance interaction was significant for the Tapping Task and both

Visibility combinations (all F21,357 > 2.297, p < .001). At three E Directions, post hoc multiple means comparisons found differences which were significantly greater than three or more other Directions within the same Distance (Figure 4-34).

162

N N (a) Tap,Visible 0.05 (b) Tap, Hidden 0.10

NW 0.04 NE NW 0.08 NE 0.03 0.06

0.02 0.04

0.01 0.02

W 0 E W 0 E

SW SE SW SE

S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-34. Out-of-Range (OOR) by Distance, Direction, Visibility for Tapping Task. Longitudinal axes in number per cm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled.

Discussion

Recall that our experiment had three main goals: investigate the effect of end target direction and distance on performance; determine if the cross-hair visual augmentation mitigates the effect of occlusion; and verify the relative performance of tapping, dragging, and tracing with a direct input pen device. We now discuss how our findings relate to these goals.

The Effect of Target Direction and Distance and Occlusion

Overall, across dependent variables, there is a pattern where most significant differences are in the E and SE Directions. A comparison of our Tapping and Hidden Visibility selection results with Hancock and Booth (2004) reveals the same pattern, with most frequent significant differences at the E Direction.

163

(c) comparison N 1.0 (a) Hancock et al. (b) our results

N N NW NE 1.0 1.0 NW NE NW NE 0.5

0.5 0.5

W 0 E W 0 E W 0 E

SE SW SE SW

S S SW SE

S 35.0 mm, Hancock et al. 20.4 mm (100 px) 61.2 mm (300 px) Figure 4-35. Comparison with Hancock and Booth’s results. (a) Hancock et al.’s results for right-handed, direct horizontal condition; (b) our results for similar distances and our equivalent Tapping and Hidden Visibility condition; (c) a combination of both graphs for comparison. Hancock et al. use the same 6.1 mm target. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. (data from Hancock & Booth, 2004)

These common significant differences are typically more frequent with target Distances greater than 20.4 mm, and with Hidden Visibility. Based on selection time alone, Tracing has more numerous significant differences in the E and SE directions across Distances compared to Dragging or Tapping. Recalling our results in Experiment 4-1 for the mean occlusion shape (Figure 4-16), these differences seem to coincide with the occluded area. To explore this observation more specifically, we created a mean occlusion shape using the participants in Experiment 4-2. We processed occlusion silhouettes captured at the moment participants successfully selected the start target20. Unlike Experiment 4-1, we did not have a strict control for this position. Instead, we selected all starting positions that were within 31 mm of the left side of the display, and between 82 mm and 123 mm from the top of the display. These bounds were chosen such that the majority of the hand and forearm would be inside the display area, and are similar to the positions sampled in Experiment 4-1. There were about 20 samples from each participant, roughly spread across Task and Visibility

20 We had to exclude participants 5 and 6 due to lighting problems making silhouette isolation difficult.

164

conditions. We used the same image processing steps for frame extraction, rectification, and isolation described above in Experiment 4-1. The results match our expectation that the E Direction is most often occluded, with some occlusion evident in the SE and NE Directions as well (Figure 4-36). Focusing on the area with greater than 50% concentration (more than half of the silhouettes cover this area), we see that the 61.2, and 102.0 mm Distances are centrally located within this often occluded area in the E Direction. It should follow that targets located at these positions have the greatest chance of being occluded. Referring back to our results for selection time (Figure 4-25) we see that the 102.0 mm Distance has more than 3 significant differences in the E Direction for all Task and Visibility combinations except Tracing with Visible Visibility, and the 61.2 mm Distance has more than 3 significant differences in the E Direction for all Task and Visibility combinations except Tapping and Dragging with Visible Visibility. Moreover, for Movement Error (ME), Tapping and Dragging, have pronounced significant spikes in the E Direction for the 102.0 mm Distance.

N -90°

142.9 mm (a) -135° -45° (b) NW NE 102.0 mm

61.2 mm

20.4 mm

180° 0° W E

135° 45° SW SE

S 90°

Figure 4-36. Comparison of target position and mean occlusion silhouette. (a) target positions with mean occlusion silhouette superimposed and visually augmented to show silhouette areas with greater than 50% concentration; (b) mean occlusion silhouette. Note the similarity of this mean occlusion shape with the ones reported in Experiment 4-1 (Figure 4-16).

165

To further visualize a possible effect of occlusion to illustrate these differences in movement time and ME, we generated motion paths for Dragging and Tapping for the 61.2 mm (Figure 4-37) and 102.0 mm (Figure 4-38) Distances. Note the greater variety of paths used when the target was Hidden at the E Direction for Tapping and Dragging, especially at the 102.0 mm Distance. This matches our observations regarding inefficient dragging movements into an occluded area in the observation study of chapter 3.

166

(a) Tapping -Visible (b) Tapping - Hidden

N N NW NE NW NE

W E W E

SW SE SW SE S S

(c) Dragging -Visible (d) Dragging - Hidden

N N NW NE NW NE

W E W E

SW SE SW SE S S

significant differences in: selection time ME. MDC, or OOR Figure 4-37. Motion paths by Direction for 61.2 mm Distance. All error-free trials shown across all participants. Directions which have significantly greater selection time, ME, MDC, or OOR than three or more other Directions for are circled.

167

(a) Tapping -Visible (b) Tapping - Hidden

N N NW NE NW NE

W E W E

SW SE SW SE S S

(c) Dragging -Visible (d) Dragging - Hidden

N N NW NE NW NE

W E W E

SW SE SW SE S S

significant differences in: selection time ME. MDC, or OOR Figure 4-38. Motion paths by Direction for 102.0 mm Distance. All error-free trials shown across all participants. Directions which have significantly greater selection time, ME, MDC, or OOR than three or more other Directions for are circled.

Based on the comparison with the mean occlusion silhouette, the general pattern of motion paths for the E Direction, and the quantitative results, there does appear to be evidence that when the end target is occluded, or when an occluded portion of the display separates the current location from the end target, performance will suffer due to occlusion.

168

We do not illustrate Tracing motion paths since the constrained nature of the task makes them visually identical. Indeed, Tracing has much lower ME overall (Figure 4-31). However, for MDC, all three significant differences are in the E Direction and curves across Distances bulge somewhat in the E Direction (most pronounced in Hidden Visibility) (Figure 4-30). In addition, only Tracing with Hidden Visibility had a significant Distance by Direction interaction with significantly greater error rates in the E direction only (Figure 4-26). Although not significant, there appears to be a possible trend towards more Overshoot errors in the E Direction (Figure 4-27). After reviewing the video logs for a sampling of Tracing trials, we believe this is due to hesitation and posture changes to view the occluded area mid task. These often cause the pen position to momentarily pause and backtrack slightly, before continuing in the intended direction. Another symptom of this behaviour is the significant interaction of error rate and Direction for Tracing in the NE Direction. The astute reader will notice that the NW and W Direction also has several significant differences in selection time and ME. Hancock and Booth (2004) also report a similar pattern. These cannot be due to occlusion since targets at these locations are far away from the occluded area (based on the mean silhouettes in Figure 4-25). Moreover, the movement paths (Figure 4-37 and Figure 4-38) do not reveal any obvious visual differences in these directions. For selection time, there are actually more significant differences in the NW and W with Visible Tapping and Dragging with the 142.9 mm Distance, suggesting that this non- occlusion effect is related to target distance.

Cross-Hair Augmentation

The cross-hair visual augmentation had no positive effect on selection time or error, behaving virtually the same as the standard visible target condition. There are two possible reasons for this: occlusion has little or no effect when the goal target is already visible on the display; the visualization failed to counteract the effect of occlusion. We believe it is, in fact, a combination of these reasons. Compared to Hidden, there were fewer significant differences in the Visible condition in Directions most likely to be occluded (E, SE and NW). This suggests that occlusion may be a small factor in performance when the end target is Visible. However, the motion paths in the Visible conditions still show more variance in the E, SE and NW Directions, indicating

169

that although it may be small, an effect for occlusion is present. If the effect of occlusion is small in this case, any further compensation for occlusion can only make a small, but perhaps undetectable improvement. The simple cross-hair visualization may not have provided enough assistance to overcome an effect of occlusion. To use it effectively, the user needs solve the visual intersection of the horizontal and vertical components quickly and move accordingly. The mental overhead to do this may counteract any possible benefit. We originally designed a halo visualization, which has been shown to be effective for locating off-screen targets (Baudisch & Rosenholtz, 2003). To make this effective, the halo had to be larger than the largest hand, and pilot participants found this to be distracting. In fact, even with the minimal 1 pixel grey cross-hair visualization, several participants said that it was distracting.

Relative Performance of Tracing, Dragging, and Tapping

To compare the relative performance of Tapping, Dragging, and Tracing, it is most useful to perform a Fitts’ law linear regression analysis and compare the resulting model parameters. Since we use a single target size and remove error trials, we use target width rather than computing effective width (I. S. MacKenzie, 1992) when calculating ID. This simplifies analysis and interpretation and makes a comparison with the tracing task more reasonable since it has no effective width equivalent. We performed the regression on aggregated data points for each Task and Visibility by Direction and Distance across blocks. Plots of mean selection time by ID visually suggest good fits to Fitts’ law (Figure 4-39), but the computed r2 values (Table 4-1) are lower than those previously published. The low r2 values are a symptom of using many data points across participants in the regression, rather than single overall mean times by ID as is often done. Recall also that our experiment design only includes a single target width, and relatively small range of task IDs (at least for Tapping and Dragging), so these results are useful for comparing within our range of IDs, but are not definitive models. Our intent is not to perform a rigorous verification of Fitts’ law, but to use Fitts’ law to characterize the relative differences between Tasks under different Visibility conditions.

170

(a) Visible (b) Hidden

2.0 Tap Drag Trace 2.0 Tap Drag Trace

1.5 1.5

1.0 1.0 Selection (s) Time Selection (s) Time 0.5 0.5

0 0 0 5 10 15 20 0 5 10 15 20 Index of Difficulty (ID) Index of Difficulty (ID)

Figure 4-39. Relationship of Time to Index of Difficulty (ID). (a) Visible Visibility; (b) Hidden Visibility

Visibility Task Model r2

Visible Tapping 152 + 122 ID .53

Dragging 42 + 205 ID .71

Tracing 228 + 73 ID .79

Hidden Tapping 501 + 70 ID .24

Dragging 467 + 154 ID .36

Tracing 568 + 72 ID .73

Table 4-1. Linear regression values for Time from Index of Difficulty (ID).

The relative ordering of Task in each Visibility condition is somewhat surprising. Dragging is slower than Tapping and Tracing for equivalent IDs in both Visible and Hidden Visibilities. Tapping is slower than Tracing in the Visible condition, but faster in the Hidden condition. This may be partially explained by how the Tracing Task is rendered on the display – the user needs to simply follow the path from start to finish, whereas Dragging and Tapping provide no intermediate visual guidance. However, in the Hidden condition, since Tapping is a discrete task, the user can lift their hand to survey the display which may provide a greater benefit than following the Tracing path (Forlines & Balakrishnan, 2008).

171

A comparison of the time intercept values (the constant value in each model, Table 4-1) shows that the Hidden has a large constant time overhead of about 0.5 s, regardless of task. This is most likely due to an initial visual search time to find the initially hidden target. Both Dragging and Tapping have lower slope values with Hidden Visibility (the coefficient of ID in each model, Table 4-1), showing that the increase of selection time with increasing ID is more gradual. Over our range of IDs, Dragging and Tapping in the Hidden condition are slower – but according to our regression models, Visible Tapping would become slower than Hidden Tapping at an ID of 6.75 and Visible Dragging would become slower than Hidden Dragging at an ID of 8.3. This seems very unlikely to occur, and may be symptomatic of performing a regression on a narrow range of IDs.

4.4 Experiment 4-3: Influence on Hand and Arm Posture

In Experiment 4-1, we examined the shape and area of occlusion resulting from a typical, neutral posture. While these may account for the majority of interactions, users may occasionally contort their hand and wrist to alter the naturally occluded area. For example, Inkpen et al. (2006) observed left-handed users arching their hand when using a right-hand scrollbar, and we observed users adopting a high arching “hook” posture when adjusted image parameters using nearby slider widgets during the observational study reported in chapter 3. These two examples of posture contortion occur during simultaneous monitoring tasks: the user adjusts a parameter while at the same time monitoring feedback which occurs in a naturally occluded area. The goal of this qualitative experiment is to observe how participants adjust their posture to counteract occlusion while performing a simultaneous monitoring task.

Participants and Apparatus

20 people (12 female, 8 male) with a mean age of 26 (SD 9) participated. All were pre- screened for color blindness. As in Experiment 4-1 and 4-2, participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level

172

physical behaviour. We measured and recorded the same anatomical dimensions used in experiment one (Figure 4-6). The apparatus is identical to Experiments 4-1 and 4-2: a Wacom Cintiq 12UX direct input pen tablet for input and display, fiducial markers attached around the tablet bezel, and a head-mounted video camera to capture the participant’s point-of-view (Figure 4-9).

Task

The high level task is to adjust a numeric value using a slider so that it matches a target value. To begin the task, the participant tapped a 13.0 × 26.3 mm (64 × 128 px) home target positioned the same as Experiment 4-1: 52.0 mm (255 px) from bottom of display at the extreme right side (Figure 4-41b). The slider is located at the centre left side of the display, oriented horizontally and 41.0 × 4.9 mm (200 × 24 px) in size with a 4.0 mm (20 px) wide drag-able thumb (Figure 4-40a). The participant’s objective is to drag the slider thumb, to adjust the numeric value so that it matches a single target value displayed beside the current slider value in a feedback box (Figure 4-40b), both set in a 40 pt font. After the target value is matched, the feedback box turns red with an equality sign (Figure 4-40d), and a 8.1 × 4.9 mm (40 × 24 px) red continue button appears beside the slider (Figure 4-40d). The participant taps this button to continue to the next task.

continue target (c) in progress state (a) slider -4 ≠ 10

thumb (d) satistfied state -4 ≠ 10 10 = 10 (b) feedback box

Figure 4-40. Simultaneous monitoring task. (a) Slider with continue button; (b) feedback box with current value and target value; (c) In- progress state where the participant drags the red slider thumb to change the current value of -4 to match the target value of 10. (d) Satisfied state after the target value of 10 has been reached and held constant for 250 ms: the next task button is now red and the thumb grey.

173

Time: 00:27 Vogel_Daniel_J_201006_PhD_video_4_3.mp4 Video 4-3. Simultaneous monitoring demonstration.

We took steps to ensure that participants actually monitored the displayed value while simultaneously manipulating the slider. First, the slider’s numeric scale, direction, target, and start position were randomized before each trial to prevent participants from using the slider thumb position to visually locate the target value. Second, the slider value had to rest at the target value for more than 250 ms before correct match feedback appeared. This prevented participants from watching for a red flicker on the continue button to locate the target value. Finally, target values were selected so they were never the minimum or maximum slider value. This prevented locating the target value by simply pushing the thumb all the way to the left or right. Note that during the experiment we did not observe any participants “cheating” – all performed the simultaneous monitoring task as intended. The specific steps to select values for the task were:

1. A left value for the slider was randomly selected in the range [-100, 100].

2. A right value for the slider was randomly set to be one of left + 50 or left – 50, creating a random direction.

3. A random start value was selected in the range [left, right].

4. A random target value was selected in the range [left, right], such that | target – start | > 4, | target – left | > 4, and | target – right | > 4.

174

(a) -90° (b)

-60°

-30°

0° 31 mm

30°

120° 60° 90° Figure 4-41. Simultaneous monitoring task positioning. (a) the target value is shown in a feedback box which is placed at one of 8 different positions around the slider; (b) the positioning of the slider, feedback box, and start target relative to the display

Design

The target value display was placed at 8 different positions (Figure 4-41a) on a 31.0 mm (150 px) radius arc from -90° to 120°. These positions were chosen from pilot experiments and cover a wide enough range to include what we expect to be normally occluded and not occluded regions. Three blocks were presented; each block presented all positions in random order. In summary, the experimental design was: 8 target value Positions × 3 Blocks × = 24 observations per participant

Data Preparation

We removed 7 outlier trials (1.5% of 480 total trials) which had a task time more than

3 SD away from the cross-participant mean for corresponding Positions.

175

Results

There are two primary dependent variables:

Errors: Since a participant could encounter multiple errors during a single trial, our error measure is the mean number of error occurrences per trial.

Completion Time: This is the total time from successful selection of the start target until selection of the continue button.

Note that completion time includes all trials regardless of whether errors are encountered. Unlike experiments measuring low level movements such as Fitts’ law target selection, our task is inherently more complex and the time to recover from errors is a natural part of task completion.

Learning and/or Fatigue Effects

A one-way within subjects ANOVA found a significant main effect for Block on number 21 of errors (F2,38 = 5.933, p = .006). A post hoc multiple means comparison found that Block

3 had .24 more errors than Block 2 (but not Block 1), with means of .47 (SE .06) for Block 1,

.40 (SE .06) for Block 2, and .63 (SE .1) for Block 3. A one-way within subjects ANOVA did not find a significant main effect for Block on completion time. Since completion time includes error recovery time and the mean number of errors for Block 3 was not significantly different than Block 1, we felt the fatigue effect was slight and decided to not remove any blocks in the subsequent analysis.

Errors

We aggregated errors by Position across blocks to perform a one-way within subjects

ANOVA. There was no significant main effect for Position (using Greenhouse-Geisser adjustment).

21 All post-hoc analyses use the conservative Bonferroni adjustment.

176

Completion Time

Recall that we used random start and end values for each trial. This was done to prevent our participants from completing the task using the visual slider thumb distance to “cheat” – we wanted them to rely on the current value shown in the feedback box. However, after running the experiment, we realized that this may introduce a confounding effect on completion time. According to Fitts’ law, changing the distance between start and end thumb positions while the target width is held constant will produce different times. Our random start and end values could result in Fitts’ law IDs between 2.3 and 5.4 (given the small 4 px target). However, because the end target position is essentially hidden throughout, the movement cannot be pre-planned as in a classic Fitts’ law task. Yet, the Pearson product-moment correlation coefficient (Pearson’s r) was .301 and found to be significant in a two-tailed test (p < .001). However, this is a relatively low correlation, and since the target position is intentionally hidden, the confounding effect is relatively weak. Thus, we continue with our analysis of completion time, but with some caution. In chapter 6, we report our results for a refined version of this experiment using a constant distance between start and end values, and confirm the pattern of these results. We aggregated completion time by Position across blocks to perform a one-way within

subjects ANOVA. There was a significant main effect for Position (F7,133 = 5.640, p < .001). A post hoc analysis found Position -30°, 0°, and 30° to be significantly slower than at least one other Position: -30° was 707 ms slower than 120°; 0° was 1110 ms slower than 90° and 1240 ms slower than 120°; and 30° was 482 ms slower than 120° (Figure 4-42).

177

-90° -60° 6

4 -30°

Time (s) 2

0 0°

30°

120° 60° 90° Figure 4-42. Completion time by target box Position. Position data points which are significantly greater (p < .05) than one or more Positions are highlighted with a circle.

Pen Azimuth Angle

Since this task used slower, consistent pen movement which were centrally located on the device, the pen tilt data exhibited much less noise than in the previous two experiments. We were able to extract the azimuth angle of the pen at the moment the values were successfully matched. This data reveals that participants hold the pen such that the azimuth angle samples form two clusters, one above and one below 0° (Figure 4-43a). The mean azimuth angle reveals that participants move from below 0° to above 0° (Figure 4-43b), roughly opposite to the progression of target box Positions (Figure 4-41a).

(a)

-90° -60° -30° 0° 30° 60° 90° 120°

-10.3° -17.3° (b) 0.0° 1.6° 21.6° 30.0° 30.9° 27.8°

Figure 4-43. Pen azimuth angle by target box Position. (a) all pen azimuth angles across participants; (b) mean pen azimuth angle in thick red, standard deviation shown on either side in magenta.

178

Occlusion Silhouettes

To explore the pattern of hand postures suggested by mean pen azimuth angle, we created occlusion silhouettes for all 480 video frames captured at the moment a participant successfully matched the values. To compensate for different end positions for the scrollbar thumb, we registered all silhouettes using the pen location. We used similar image processing steps for frame extraction, rectification, and isolation described above in Experiment 4-1. However, instead of using the blue channel to isolate the hand, we transformed each RGB image into YCbCr colour space, and used the Cr channel (the red-difference chroma component). This had the advantage of capturing areas of the hand and forearm outside the display, but did not include the pen as part of the silhouette. Since we are not calculating quantitative occlusion statistics as in Experiment 4-1, this trade-off is acceptable – in fact, it is somewhat advantageous since the area left out by the pen can be seen in the silhouette. The mean silhouettes for each Position form two separate shapes, one above and one below (Figure 4-44a). By visually augmenting the mean silhouettes to show areas with greater than 70% concentration (Figure 4-44b), one can see the same “predominately below 0°” to “predominately above 0°” trend as seen with the mean pen azimuth angle (Figure 4-43).

(a)

-90° -60° -30° 0° 30° 60° 90° 120°

(b)

Figure 4-44. Mean occlusion silhouette by Position. (a) mean occlusion silhouettes; (b) mean occlusion silhouettes with visual augmentation showing areas with greater than 70% concentration

Discussion

Our results show that posture contortion used to compensate for occlusion does have an effect on performance. Although there was no effect for number of errors, task time was

179

longer when the target box was at Positions -30°, 0°, 30°, Positions which are likely to be occluded when compared with a mean occlusion silhouette (Figure 4-45).

-90° -60°

-30°

30°

120° 60° 90°

Figure 4-45. Comparison of target box position and mean occlusion silhouette. The mean occlusion silhouette from the circle task in Experiment 4-1 (Figure 4-16) is the superimposed and visually augmented to show silhouette areas with greater than 50% concentration. Based on this comparison, Positions -60°, -30°, 0°, and 30° are fully occluded; and Positions -90° and 60° are partially occluded.

Analysis of occlusion silhouettes and pen azimuth angles, reveals a pattern of compensatory hand contortion to minimize occlusion. To explore why contortion affects performance times, we reviewed video segments for commonly occluded tasks. We observed two likely reasons: the deviation from a neutral posture reduced their ability to comfortably control the pen; and the extra time needed to plan their posture at the beginning of the task, or adjust their posture during the task.

Contortion Strategies

The mean clusters of pen azimuth angles (Figure 4-43a) and occlusion silhouettes (Figure 4-44) suggest that participants are using different contortion strategies. To explore this further, we found four individual participants who illustrate different strategies, and created mean occlusion silhouettes for each (Figure 4-46). Since each participant repeated a target box Position three times, there are three silhouettes per Position.

180

Participants 8 and 18 used a mixed strategy, in which they arched their hand above for some target box positions and below for others (Figure 4-46a,b). These two participants exhibited different cross-over points, where they moved from a below to an above strategy: participant 8 crossed-over between 0° and 30°, and participant 18 crossed over between -30° and 0°. Participants 20 and 3 used a uniform strategy where they adopted nearly the same posture for all positions (Figure 4-46c,d). Participant 20 kept their hand arched above the target box nearly all the time, while participant 3 kept their hand below. With all of these participants, there are examples in which they use a different strategy at different times for the same position. The different permutations of strategy collaborates our argument that some extra time is needed for posture planning.

181

-90° 0.8 -60°

0.6

-30° 0.4 Time (s) Time 0.2 (a) 0 0°

30° -90° -60° -30° 0° 30° 60° 90° 120°

120° 60° 90°

-90° 0.8 -60°

0.6

-30° 0.4 Time (s) Time 0.2

(b) 0 0°

30° -90° -60° -30° 0° 30° 60° 90° 120°

120° 60° 90°

-90° 0.8 -60°

0.6

-30° 0.4 Time (s) Time 0.2

(c) 0 0°

30° -90° -60° -30° 0° 30° 60° 90° 120°

120° 60° 90°

-90° 0.8 -60°

0.6

-30° 0.4 Time (s) Time 0.2

(d) 0 0°

30° -90° -60° -30° 0° 30° 60° 90° 120° 120° 60° 90° Figure 4-46. Different occlusion contortion strategies (a) participant 8 using mixed strategy with cross-over between 0° and 30°; (b) participant 18 using mixed strategy with cross-over between -30° and 0°; (c) participant 20 using near- consistent “above” strategy; (d) participant 3 using near-consistent “below” strategy.

182

4.5 Design Implications

The results of these experiments suggest basic implications for the design of direct pen input interfaces. Experiment 4-2 and 4-3 demonstrate that occlusion has an effect on performance, and Experiment 4-1 provides guidance regarding how to avoid the occluded area. These can be summarized as the following design implications for right-handed users:

1. Avoid displaying status messages or document previews in the right third and bottom four-fifths of the display, since it may be often occluded by the hand (Figure 4-47b).

2. Avoid showing simultaneous visual feedback or related widgets in the occluded area relative to the pen (Figure 4-47a).

3. When designing for occlusion, be aware that real users have a wide range of pen grips and postures.

The third implication seems to contradict the first two. Universal rules which govern which pixels are likely occluded can be problematic in light of individual differences. We could make the above rules more conservative to cover a wide range of individual grip styles, but this will reduce the space of available pixels to use. Additional factors such as loose clothing would also affect the area being occluded. Note that our initial left-handed results reported above suggest that implications 1 and 2 can be mirrored about the vertical axis when designing for left-handed users.

183

(b) occluded areas on display

(a) occluded areas near pen

0 cm 5 10 15

5 5

0 0 cm

least often occluded 5 5 most often 80% occluded

10 10

0 cm 5 10 15

33% 100 mm2 Figure 4-47. Design guidelines for avoiding occluded areas. (a) mean occlusion silhouette thresholded to 50% concentration (based on circle task in Figure 4-16); (b) areas most likely to be occluded given a uniform distribution of pen positions (based on circle task in Figure 4-17). For left-handed users, reflect these guidelines across the vertical.

4.6 Summary

The experiments described above provide relevant findings regarding characteristics of occlusion, but they also demonstrate novel ways to perform experiment logging and analysis. We extend the utility of recording video taken from the participant’s point-of-view from simply acting as an observational tool in chapter 2, to a useful quantitative measurement tool. By combining simple computer vision marker tracking and image processing steps, we demonstrate how to rectify and isolate images of the hand and arm relative to the display to produce occlusion silhouettes. When the video is synchronized with our traditional experiment event log, we can perform these steps for frames captured at critical moments

184

during experiment tasks. We show how the occlusion silhouettes can be used for quantitative analysis and visualization in Experiment 4-1, to illustrate an interaction between the hand and arm and target positions in Experiment 4-2, and to relate different posture strategies to task stimulus in Experiment 4-3. In the next chapter, we continue using occlusion silhouettes for analytical testing. In Experiment 4-1, our results show that the shape of occlusion varies among participants largely due to grip style rather than simply anatomical measurements. Yet, individuals tend to adopt a consistent posture for long and short localized interactions. Moreover, the general shape of occluded pixels is higher relative to the pen than previously thought, and given a uniform distribution of pen positions, a large part of the display is often occluded. When a right-handed user is accessing pen positions near the upper-left of the display, as much as 47% of a 12 inch display is occluded. In Experiment 4-2, based on an analysis of time, error, movement paths, and other performance metrics we find evidence the occlusion has an effect on three fundamental GUI interactions. Our experiment confirms Hancock and Booth’s (2004) findings for tapping when the end target initially hidden at a single distance. However, we expand their study design significantly by adding two more fundamental GUI interactions, dragging and tracing; introduce a baseline condition where the end target is initially visible; and incorporate four target distances. Our findings provide evidence that when selecting occluded targets, participants typically perform more slowly. This effect is most pronounced when dragging in which the resulting movement paths and movement error metric show large deviations from an optimal path. Detailed analysis suggests that participants are more likely to raise their pen out-of-range when moving to tap on an occluded target, and more likely to encounter errors when tracing into an occluded area, possible caused by a less consistent pen movements suggested by higher movement direct changes. Overall, we find that occlusion appears more subtle when the end target is not initially hidden. However, it is important to note that our attempt to explicitly control for occlusion using a visual augmentation did not appear to work, so our results (like Hancock and Booth’s) likely include an interaction effect between target direction and characteristics of hand movement. The third experiment, Experiment 4-3, found that posture contortion is used to minimize occlusion during a simultaneous monitoring task reduces performance. Moreover,

185

different participants use different posture contortion strategies. Some use a predominantly constant strategy with slight adjustments: such as keeping their hand primarily below or above the area to be monitored. Others use a mix of above and below postures, and chose to switch posture at different locations of the target to be monitored. One possible side-effect of this type of posture contortion could be discomfort and, if repeated many times over a long period, physical damage from repetitive strain. Recall that Haider, Luczak, and Rohmert (1982) found their participants had high amounts of muscle strain when using a light pen on a vertical display, so even conventional pen use may be tiring. Unlike Haider et al, we did not measure muscle strain directly, nor did not ask participants to self-report on this issue. Without this data, we cannot even speculate on possible short or long term physical effects. Although we provide guidelines for avoiding the occluded area in Figure 4-47, in practice this may have two potential problems. First, we attempt to quantify these guidelines by providing an illustration of the mean occluded area with an overlaid measurement grid. This may be adequate as a coarse set of rules when designing static layouts or to guide widget behaviours, but it would be difficult to implement in software. Second, and most important, is that these general guidelines do not accommodate the wide range of pen grips and postures which we observed. In the next chapter, we describe a model of the occluded area which addressed these two problems by expressing the occluded area relative to the pen in a simplified geometric way suitable for real-time software applications, and provide a mechanism to adapt to specific user grips and postures.

186

5 Modelling Occlusion

Our observational study in chapter three found that occlusion likely contributed to error and fatigue, and Experiments 4-2 and 4-3 presented in the previous chapter suggest that occlusion can hurt performance. If system designers had at their disposal a representation of the currently occluded area, could this be used to counteract these issues? In addition, if researchers had this information, could this aid experimental analysis? The most accurate representation of the occlusion area would be a literal image of the hand and arm as seen from the user’s point of view (Figure 5-1a); we introduced this in the previous chapter as an occlusion silhouette. We already described a methodology for extracting and un-warping video frames captured from a head-mounted video camera to produce these silhouettes off-line. However, asking users to wear a head-mounted camera at all times is obtrusive to say the least, and relying on a fixed camera in the environment (such as Cotting & Markus Gross, 2006) is not practical for mobile contexts. Capturing the occluded area un-obtrusively would require a multi-touch device capable of tracking objects above the surface (Echtler, Huber, & Klinker, 2008; Hilliges et al., 2009), but these devices are still being developed and they typically require a strong above-surface light source. An alternative is to use a simplified representation to model the actual occluded area (Figure 5-1b). Although a model is only an approximation of the actual occluded area, it can be configured and positioned without capturing a literal occlusion silhouette, perhaps only requiring input such as pen position and pen tilt.

187

(a) literal silhouette (b) model

Figure 5-1. Approximating the actual occluded area with a model. (a) literal representation of occluded area, an occlusion silhouette image taken from the point- of-view of a user and rectified; (b) an approximate model representation capturing the general shape of the occluded area.

Previous examples exist where design or implementation rules suggest underlying simple models. Some researchers and designers use a simple rule to avoid occlusion which implicitly suggests a bounding-rectangle model (Figure 5-2a), where all pixels below and to the right of the pen position are considered occluded. Brandl et al. (2009) and Hancock and Booth (2004) employ specialized rules, which can be thought of as using underlying model- like representations (Figure 5-2b,c). While these examples can all function with limited input, aside from handedness, there is no consideration for different shapes produced by different user grips. In addition, the latter two examples are designed specifically for menu placement, and thus only attempt to describe what is occluded in the immediate vicinity of the pen. A more accurate, more user-specific, and more complete description of the occluded area would be desirable for more general applications. In this chapter, we describe a configurable model of occlusion which fulfills these goals. At its core is a five parameter geometric model, comprised of an offset circle and pivoting rectangle (Figure 5-1b), capturing the general shape of the occluded area. To adapt the model to different user grips and anatomical differences, we introduce a simple configuration process. To assess our model’s fidelity, we conduct analytical tests using occlusion silhouettes gathered in Experiment 4-1 from the previous chapter. To evaluate the usability of our configuration process and test the performance of a real model configuration outcome, we conducted a short user study. We found all participants

188

completed configuration successfully, but that further refinements could make the process easier and more self-explanatory. Finally, we briefly discuss future improvements to the model including adaptations for large displays, multi-users contexts, and direct touch interaction.

5.1 Related Work

There has been no previous work explicitly presenting models for tracking occlusion. However, researchers have developed systems to capture literal representations for related purposes, and interface techniques have been designed and implemented using rules that suggest underlying model-like representations.

Literal Representations

Cotting and Gross (2006) demonstrate a tabletop which detects shadows of objects which occluded the beam of a ceiling mounted projector. They do this using a video camera mounted near the projector to capture the shadows, and then distort the display contents to avoid those areas. Their goal is to address problems resulting from front-projector occlusion, but in theory, the system could be adapted to detect hand and arm occlusion from the user’s point-of-view. This would require either mounting the camera on or near the users head. Of course a head-mounted camera is intrusive and mounting a camera near a user’s point-of- view in a mobile Tablet PC context would be challenging. An alternative is to use a multi-touch device (e.g., Han, 2005) to capture a view of the hand and arm as seen from below, and then warp the image to approximate the user’s point- of-view. However, most multi-touch devices can only track the hand and arm when they are touching the surface. Researchers have presented ways to track objects above the surface as well (Echtler et al., 2008; Hilliges et al., 2009), but these typically require a strong above- surface light source which would be difficult to control in a mobile Tablet PC context. Another consideration for literal representations is that capturing and warping can be processor intensive and the result often noisy. Regardless, such a high level of fidelity may not be needed for most applications anyway.

189

Rules-of-Thumb and Model Like Representations

Rather than capture a literal image, designers and researchers have used rules-of-thumb, which can be expressed as simple models, to guide their designs in avoiding occluded areas. The simplest model, which we call the bounding-rectangle model, is often used implicitly by researchers and designers (Figure 5-2a). In this model, all pixels below and to the right of the pen position are considered to be occluded. For example: the Twist Lens slider (Ramos & Balakrishnan, 2003) displays relevant information to the left and above the pen position, avoiding the area to the right due to occlusion; CrossY (Apitz & Guimbretière, 2004) use a predominant right-to-left movement to avoid displaying subsequent targets in a sequence to the right of the pen position; and XLibris (Schilit et al., 1998) places a menu bar at the bottom of the display to avoid occlusion when navigating pages. Although simple to use and implement, this model considers too much below and to the right of the pen to be occluded, and misses any occlusion above and to the right. Hancock and Booth (2004) use an experimentally validated rule-of-thumb which can be expressed as a four-quadrant model centred at the pen position (Figure 5-2b). Their rule considers movement time as well as occlusion, and recommends menu placement in the bottom-left quadrant (for a right-handed person). They monitor pen location and orientation over a period of time, and use a classifier to determine if the user is left- or right-handed. This enables the rule to be automatically flipped to accommodate left- and right-handed users. Brandl et al. demonstrate an occlusion-aware pie menu (2009) which uses an underlying model (although the authors do not refer to it as a model) to label pie slices near the pen as typically occluded or not, given a reference hand orientation (Figure 5-2c). Based on where the hand and pen each contact the surface, the pie menu is rotated to minimize occlusion. Although their implementation uses a multi-touch table, pen tilt could provide similar hand to pen orientation information. However, like Hancock and Booth, Brandl et al. only identify occlusion in the immediate vicinity of the pen as it pertains specifically to positioning a menu widget.

190

p p p

(a) bounding rectangle (b) Hancock & Booth (c) Brandl et al. Figure 5-2. Previous implicit and explicit occlusion models. (a) simple bounding rectangle used implicitly; (b) Hancock and Booth’s four quadrant rule- based model for context menu placement; (c) Brandt et al.’s radial pie slice representation (2009) for pie menu orientation.

Aside from adjustments for handedness, all of these are “one-size-fits all” models, in that they do not compensate for individual grip styles. They also do not attempt to model the whole occluded area of the hand or the forearm. Our aim to is provide a more flexible model for designers and researchers, one that can adapts to specific users beyond accommodating handedness, and provides a complete representation of the occluded area on the display.

5.2 Geometric Model for Occlusion Shape

Experiment 4-1 in the previous chapter revealed that the shape of the occluded area is quite uniform within each participant, and across participants there were high-level similarities. We wondered if a geometric model could be created to predict the complete shape and location of the occluded area, given relatively sparse inputs such as the pen position and aspects such as the user’s anatomical size and grip style. This model would essentially function as a binary classifier, labelling each pixel on the display as occluded or not occluded. There are many possible approaches to representing the complete shape of the occluded area. We have already discussed perhaps the most straightforward approach, the bounding rectangle model (Figure 5-2a). This model is constant relative to the pen’s position and requires no other input, but the accuracy is poor. At the other end of the spectrum, we could create a model with a flexible shape such as one composed of Bézier spline segments (Figure

191

5-3d). While this would certainly yield a very accurate representation of the occluded area, but the huge number of parameters would make creating and using the model difficult and less practical for use in real interface design. Our aim then is to create a relatively simple model with a small number of parameters, yet still produce a reasonable degree of accuracy, ensuring it is viable for practical use.

p p p p

(a) oriented rectangle (b) circle & rectangle (c) rectangle, ellipse, trapezoid (d) spline Figure 5-3. Different geometric models of occlusion. (a) oriented rectangle; (b) offset circle and pivoting rectangle; (c) rectangle for pen, ellipse for hand, trapezoid for forearm; (d) highly detailed Bézier spline.

We noticed that occlusion silhouettes often resembled a lopsided circle for the fist, a thick narrowing rectangle sticking out the bottom for the arm, and, with some participants, there was also a thinner rectangle puncturing the top of the ball for the pen. The irregularity of shape suggested that a single oriented rectangle (Figure 5-3a) would be unlikely to capture all grip styles accurately, but that a composition of simple shapes may suffice. Our first approach was to create a geometric model using an ellipse for the fist, an isosceles trapezoid for the arm, and an oriented rectangle for the pen (Figure 5-3c). This model captures most aspects of a typical silhouette, but even this relatively simple model required 11 parameters to position. Instead, we simplified our representation further to an offset circle and a rectangle with only 5 parameters (Figure 5-3b). This ignores some details like the protruding pen and the widening forearm, but with so few parameters, we expected that it could be configured easily and implemented to function in real time with very few inputs. The five model parameters are illustrated in Figure 5-4 and described below:

1. q is the offset distance from the pen position p to the edge of the circle,

192

2. r is the radius of the circle over the fist area,

3. Φ is the rotation angle of the circle around p (expressed in degrees where Φ = 0° when the centre is due East, Φ = -45° for North- East, and Φ = 45° for South-East, see Figure 5-4b),

4. Θ is the angle of rotation of the rectangle around the centre of the circle (using the same angle configuration as Φ),

5. w is the width of the rectangle representing the forearm.

p c q r

-90° -135° -45°

180° 0°

w 135° 45° 90° (a) (b)

Figure 5-4. Offset circle and pivoting rectangle model parameters. (a) model parameters; (b) illustration of angular convention used by Θ and Φ

For convenience, we refer to the centre of the circle in the geometric model as c, and for device independence, all non-angular parameters are recorded in millimetres. Note that the length of the rectangle is infinite for our purposes. If we were building a model for larger displays, this may become another parameter, but at present we are concerned with smaller displays such as the portable Tablet PC.

193

Analytic Evaluation of Performance

To test the fidelity and performance of our geometric model, we perform a series of analytical evaluations using the occlusion silhouette data from Experiment 4-1 in chapter 4 (see Table 5-1 a,b,c) and new silhouette data gathered in Experiment 5-1 from this chapter (see Table 5-1 d,e,f).

Purpose of Test Model(s) Experiment Notes Tested Test Data

(a) What is the theoretical upper Fitted Models 4-1 Use optimization to fit the model to bound for geometric each silhouette

approximation? Also test simple bounding box for

baseline comparison

(b) How well does a mean model Mean Model 4-1 Mean model uses mean parameter settings from fitted models in test perform? (a).

(c) How well could a configured Participant 4-1 Use per-participant mean model model perform? Mean Models parameters form test (a) to approximate an actual

configuration.

(d) Re-test theoretical upper bound Fitted Models 5-1 Use optimization to fit the model to each silhouette as in test (a), but for geometric approximation. using different data.

(e) How well does a mean model Mean Model 5-1 Use exact same mean model parameters used in test (b), but perform? (using different test data) different test data.

(f) How does the model perform Participant 5-1 Use actual model configurations with actual user configurations? Configured performed by participants in an Models experiment.

Table 5-1. Overview of model tests.

Note that the purpose of these tests is not to build a model (in the machine learning sense). Our five-parameter geometric model has already been built and the purpose of these tests is to evaluate its performance. For example, by fitting our model to each occlusion silhouette using optimization, we can test how well it captures the real silhouette data (see tests a and d in Table 5-1). Testing how well a user configured model performs are similar, but instead of fitting the model to each silhouette, we use only a single model configuration per participant (see tests c and f in Table 5-1). The mean model could be thought of as the result of very naive learning, so it is likely that testing the mean model with the same data

194

that generated the mean values (Experiment 4-1 data, see b in Table 5-1) will likely exhibit some over-fitting (Bishop, 1995). However, we validate the performance of these same mean model parameters using Experiment 5-1 test data, thus any over-fitting is minimal.

F Scores, Precision, and Recall

For quantitative assessment, we use precision-recall plots and F scores, standard measures used in information retrieval (Van Rijsbergen, 1979). This can be justified by considering the geometric model as a binary classifier which labels each pixel as occluded or not occluded. In this context, precision is the number of pixels correctly classified as occluded, over all pixels classified as occluded. Recall is the number of number of pixels correctly classified as occluded, over all pixels that are actually occluded. Precision is a measure of the model’s exactness, whereas recall is a measure of its completeness. As an example, the model parameters can be rather easily configured to achieve perfect precision (Figure 5-5a) or perfect recall (Figure 5-5c). In both cases, however, the other measure will be low. The challenge is to find a reasonable balance between precision and recall (Figure 5-5b).

(a) perfect precision, low recall (b) good precision, good recall (c) perfect recall, low precision Figure 5-5. Illustration of precision and recall. (a) perfect precision is achieved because all pixels covered by the model are actually occluded, but recall is low because the model misses many occluded pixels; (b) a balance between precision and recall; (c) perfect recall is achieved because the model covers all pixels that are actually occluded, but precision is low because the model also covers many non- occluded pixels.

In our case, we would prefer to have slightly higher recall by including more occluded pixels even if it means losing some precision. Misclassifying non-occluded pixels as occluded will result in a more conservative design or layout, but misclassifying many pixels

195

that are actually occluded, may lead to lower performance since occlusion will still be a problem. An F score expresses precision and recall in one value. The inherent trade-off between precision and recall is captured by a weight, which can be adjusted to weight one or the other

more strongly. We use a weighting known as the F2 score, which emphasises recall more than twice as much as precision22:

· 5· (5-1) · We use the corpus of occlusion silhouettes, gathered in Experiment 4-1 in the previous chapter, as a ground truth to compute precision and recall for corresponding settings of model parameters. To perform this test, we needed a way to set the five model parameters for each silhouette making the model “fit” over the actual shape as accurately as possible. With more than 6000 silhouettes, doing this operation manually was out of the question. Our solution is to automatically fit the geometric model to each silhouette using non- linear optimization techniques. Note that other optimization algorithms, or other fitting techniques, can be used – we describe our process as an example of one possible solution.

Geometric Fitting

To guide the optimizers to an optimal fit, we use an objective function. This function returns 0 for a perfect fit, when the geometry matches the silhouette exactly, and increases as the alignment diverges.

0.5 0.4 0.1· (5-2) It consists of two area ratios, and (illustrated in Figure 5-6): the area of the silhouette image not covered by the geometry () over the area of the silhouette covered by the model (); and the area of geometry not covering the silhouette () over the area of the silhouette covered by the model (). We give slightly more weight to the first ratio to favour covering more occluded pixels at the potential cost of covering non-occluded pixels as well. To compute these area ratios, we converted the silhouette binary images to polygons and

22 In Vogel et al. (2009), we used the F1 score which weights precision and recall equally. Since we have

a preference for high recall, even if it means somewhat lower precision, we use the more representative F2 score here.

196

computed the ratios analytically. The inverse would have worked as well, converting the geometric model to a binary image and “counting pixels” to calculate the ratios. To reduce the chance of the optimizer finding anatomically improbable configurations, we constrained the possible angles for Θ and Φ to be in (0, 90) and (-90,90) respectively. We also added a normalized objective term () with a relatively small weight to encourage a smaller rectangle width w and shorter distance from circle to pen position q.

A

B C

(a) (b)

Figure 5-6. Illustration of objective function area calculation. (a) the model is positioned over the silhouette; (b) A is the area of the silhouette covered by the model, B is the area of the silhouette not covered by the model, and C is the area of the model not covering the silhouette.

One problem during our initial optimization attempts was caused by cropped occlusion silhouette images. As the pen moves towards the bottom right, more and more of the forearm and fist are outside the display area and were cropped during image processing, making it difficult for the optimizer to find an optimal placement of the geometry. We solved this by fitting the geometry in two stages for each participant and target type (circle and tap). In the first stage, we optimized all parameters using 3 pen positions near the upper left portion of the display, since the hand and forearm would not be cropped. Using these values, we found mean values for r and w. In stage two, we locked r and w to these mean values and optimized over the remaining parameters. We rationalize this two-stage strategy by arguing that the size of silhouettes produced by the fist and forearm is unlikely to vary greatly according to X- and Y-coordinate, but their position and angle may change. If we had silhouette images capturing the entire image of the fist and forearm including parts outside the display, we would not have needed this step.

197

We ran the optimization using two algorithms in sequence over all target locations except the rightmost where the hand was completely off the display. First, a pattern search algorithm found a candidate neighbourhood for the global minima, and then a standard gradient search found a local minimum (see Venkataraman, 2002 for algorithm descriptions). We could not use a gradient search alone since our objective function produced a rough error surface. The total time for optimization on 648823 silhouettes was approximately 12 hours on a 2.66 GHz quad processor (Video 5-1).

Time: 00:27 Vogel_Daniel_J_201006_PhD_video_5_1.mp4 Video 5-1. Geometric model fitting demonstration.

Evaluation

By plotting the results of each fitted silhouette in precision-recall space, we get a sense for how well the model performs (Figure 5-7). A near-perfect model will have a concentration of points in the upper right corner and an F2 score close to 1. We calculate mean F2 scores across all cases. High precision means that pixels labelled by the model as occluded are actually occluded, but other occluded pixels may have been missed. High recall means that the model is correctly labelling occluded pixels, but could also be labelling non- occluded pixels as occluded.

Our geometric model has a mean F2 score of 0.891 (SD 0.111) with mean precision 0.746 (SD 0.207) and mean recall 0.971 (SD 0.030). The precision-recall plot illustrates the very high recall, with some falloff for precision (Figure 5-7a). This precision falloff is expected since we designed our optimization objective function to fit the model in a more conservative manner, favouring covering more occluded pixels at the potential cost of

23 Recall that 7.8% of silhouettes were removed in Experiment 4-1 due to capture lag.

198

covering non-occluded pixels. A designer would probably be more comfortable over compensating for occlusion, but this is a limitation.

We included the bounding box model as a baseline comparison. It has a F2 score of

0.472 (SD 0.203) with mean precision 0.377 (SD 0.235) and mean recall 0.585 (SD 0.243). A precision-recall plot illustrates the poor fit in terms of both precision and recall (Figure 5-7b). A comparison with Brandl et al. or Hancock and Booth’s representations would not be fair because they only attempt to capture occlusion in the immediate vicinity of the pen and their parameters are not completely specified (e.g., the radius of Brandl et al.’s circle). (a) fitted geometry (b) bounding box 1 1 high precision precision concentration

low 0 recall 1 0 recall 1 Figure 5-7. Precision-recall plots for bounding box and fitted geometry. (a) simple bounding box model; (b) fitted circle and rectangle geometry. A concentration near the upper-right indicates good performance. The ellipse shows the mean and standard deviation of precision and recall.

5.3 Space of Fitted Parameters and Mean Model

We can use the space of fitted parameters to create a simple, non-configurable version of the geometric model. We call this a mean model.

Space of Fitted Parameters

To enable comparison with Figure 4-15 in chapter 4, in Table 5-2 we summarize the participant mean parameters for the circle task across the same nine pen positions at the middle-left portion of the display. This focuses our comparison on positions in which the entire hand and forearm silhouette is captured without cropping and reduces variance from parameters, such as Θ (the forearm angle), as it changes across pen coordinate positions.

199

For the most part, the fitted parameters match visual intuition from the mean silhouette images in Figure 4-15 in chapter 4. For example, a low value of Φ indicates a high grip and a high value of Φ indicates a low grip: the two lowest Φ values of -25.5 and -21.4 for participants 6 and 16 match the high grips seen in Figure 4-15, the high Φ values of 12.1 and 24 11.9 for participants 9, and 17 match their low grips . Likewise, q captures how close participants hold the pen relative to the pen tip: high q values of 28.3 and 26.1 for participant 2 and 17 indicate they hold the pen far from the tip, and low q values of 5.3 and 4.1 for participants 16 and 19 indicate the opposite. A comparison of other mean parameters with the silhouettes in Figure 4-15 reveals similar patterns. We expected more variance in parameter values between participants than within a participant. For the most part, this was the case, but there are exceptions. Participants 6 and 20 have high variance, but we expected this from their blurry mean silhouettes in Figure 4-15. The high variance for participant 17 is somewhat surprising; we speculate that this may be due to image cropping caused by the grip style.

24 Recall that we use the convention of negative angles above the y=0 axis, see Figure 5-4b. Thus a negative angle of Φ is above x=0 and therefore indicates a high grip hand grip since the centre of the circle representing the fist is above x=0.

200

q r Φ Θ w 1 12.3 (2.2) 61.5 (1.4) 10.1 (3.7) 58.0 (2.3) 58.9 (1.8)

2 28.3 (3.8) 64.0 (6.6) -4.9 (3.8) 63.5 (3.6) 62.7 (2.3)

3 14.9 (2.5) 64.5 (1.1) -13.9 (2.6) 57.7 (3.7) 72.8 (3.3)

4 7.1 (4.7) 50.3 (0.8) -7.4 (5.1) 60.1 (3.3) 49.0 (2.2) 5 17.6 (4.3) 59.9 (0.8) -7.9 (3.9) 53.8 (2.1) 61.8 (2.2)

6 15.3 (4.8) 58.4 (13.3) -25.5 (8.1) 60.5 (5.0) 58.9 (5.5)

8 14.1 (5.5) 53.8 (1.4) 8.6 (6.3) 68.6 (4.0) 50.1 (1.7)

9 21.5 (1.9) 63.2 (1.1) 12.1 (4.5) 62.0 (3.6) 59.5 (1.1)

10 9.5 (3.4) 55.3 (1.7) -1.6 (3.9) 69.2 (3.2) 54.9 (4.0)

11 15.5 (2.7) 56.8 (1.1) -7.0 (5.9) 53.8 (5.4) 56.1 (3.3) 12 14.9 (3.5) 59.5 (0.8) 1.9 (3.5) 61.5 (2.6) 57.0 (3.5)

13 23.9 (3.8) 65.4 (1.2) 7.6 (4.1) 56.6 (3.4) 61.0 (5.7)

15 13.0 (3.8) 64.6 (1.6) -9.1 (2.7) 45.8 (5.0) 63.9 (3.9)

16 5.3 (3.0) 52.6 (1.8) -21.4 (6.9) 61.0 (3.2) 50.7 (2.4)

17 26.1 (7.9) 63.4 (6.8) 11.9 (7.7) 39.9 (23.3) 60.5 (19.1)

18 23.5 (2.9) 60.4 (0.9) -16.4 (3.9) 46.9 (6.5) 73.1 (1.8) 19 4.1 (3.5) 58.6 (1.0) -11.0 (2.8) 48.1 (3.8) 48.7 (2.9)

20 11.2 (5.4) 56.5 (9.5) -11.9 (12.4) 39.0 (8.3) 58.6 (1.4) all 15.5 (7.9) 59.5 (6.2) -5.1 (12.3) 55.6 (10.8) 59.0 (8.5)

Table 5-2. Summary statistics of fitted geometric model parameters. For each participant for circle task (9 samples from 3 pen positions at middle-left portion of display, standard deviations shown in parenthesis).

Mean Model

If we can assume that our experimental sample is representative, the mean parameter values across all participants (bottom row of Table 5-2) can form a mean configuration for the geometric model (Figure 5-8). In practice, this model may not be a reasonable assumption considering the between subject variance we observed. However, this has the advantage of requiring no configuration aside from handedness, and would be trivial to implement.

201

q=16mm

r=60mm

p = 56 = -5

w = 59mm

Figure 5-8. Mean configuration for the geometric model.

Conducting an analytical test of the mean model with the corpus of occlusion silhouettes resulted in a F2 score of 0.748 (SD 0.159). This is much better than expected, given that we observed high variance in grip style between participants. However, the precision-recall concentration plot (Figure 5-9) illustrates a much wider concentration compared to the concentration plot for the fitted geometry (Figure 5-7). This type of “one- size-fits-all” model does not provide reliable accuracy in many specific instances in spite of the F2 score suggesting that it performs reasonably well overall. Since it is a mean model, if an individual participant grip is far from the mean grip, the model will consistently perform poorly. However, it may suffice as a rough guide for designers, but for our purposes we would like more fidelity.

202

mean model 1 high precision concentration

low 0 recall 1 Figure 5-9. Precision-recall plot for mean model. A concentration near the upper-right indicates good performance. The ellipse shows the mean and standard deviation of precision and recall.

5.4 User Configurable Model

We had initially expected to find correlations between the fitted model parameters and user characteristic variables. Unfortunately, although the final participant parameters seem to have a relationship to the occlusion silhouettes, we could not find any simple correlations to user variables such as anthropomorphic measurements, gender, pen pressure, or pen tilt. Finding a way to set model parameters to match a specific user’s grip would have to be done in another way. One solution would be to capture occlusion silhouette images of their hand and arm in different locations on the display (like Experiment 4-1 in the previous chapter), then fit the geometric model using the non-linear optimization steps described above. We could then use the mean parameter values to either set final parameters directly, or use in simple algorithms to vary a parameter according to commonly available input like pen position or pen tilt. Of course, capturing the silhouettes would be obtrusive and time consuming – users would need to don a head mounted camera to configure the model. In addition, our optimization technique is not infallible and can occasionally become trapped in undesirable minima, especially if we had a small number of occlusion silhouettes. Instead, we essentially ask the user to perform those steps for us. They manually fit the model to their own view of their hand and arm using an interactive tool. To simplify this further, we assumed that this could be done once, at a single location near the centre of the

203

display. Once this short configuration process is completed, we have a set of base parameters, which in turn can be used to position the model in real time.

Assumptions and Limitations for Our Approach

We make certain assumptions to make our configurable model tractable, which in turn result in possible limitations to our approach. The first and most obvious assumption is that the size of the device display is similar to that used for our data gathering experiments (a 307 mm, 12.1 inch diagonal display). A large display may need to include more shapes to capture the upper arm, and perhaps require additional configuration steps. If the display is too small to render the full geometric model, the configuration steps would need to be adjusted and potentially extraneous portions of the model, such as the rectangle to capture the forearm, removed for simplicity. Second, we assume the tablet remains in a stable portrait orientation relative to the user. Although the geometric model is unlikely to change in a landscape orientation, the switching back and forth between landscape and portrait would have to be sensed using a sensor. Fortunately, the operating system can provide this information based on a North, South, East or West landscape or portrait display resolution, and accelerometers could provide intermediate orientations as well (they are fast becoming standard equipment on Tablet PCs, the Lenovo X60 Tablet PC used for the study presented in chapter 3 contains an onboard accelerometer). A third assumption is that users maintain a near constant grip regardless of task. Our results in Experiment 4-1 indicate that this is most often that case, but recall that we found two cases where participants adjusted their grip for tapping versus circling. A fourth simplifying assumption that we make is that the user’s head remains in a near constant location relative to the display. We did not find that users tilted the tablet a great deal in the observational study in chapter 3, in spite of placing the device on their lap which presumably encouraged some movement. Yet, based on watching the video logs from experiments in chapter 3 and chapter 4, we know this assumption is not always upheld. For

204

example, the simultaneous monitoring task in Experiment 4-3 caused some users to change their head position to see around their hand25. In practice, we do not believe these assumptions and limitations undermine the integrity of the model or the configuration steps. Some inaccuracy may be acceptable and produce results suitable for many applications. However, extreme differences such as a change in orientation would need to be accommodated. At the end of this chapter we speculate how this and other limitations could be addressed.

Model Configuration

A four step process guides the user through progressive refinement of the model’s rendered shapes until they roughly match the user’s arm and hand from their point-of-view (Figure 5-10, Video 5-2). We also capture handedness to “flip” the model for left-handed users. Base parameters are denoted with a prime mark (e.g., x′ is the base version of x). The model is rendered at a fixed reference position with the circle centred at c′, creating a set of base parameters q′, r′, Φ′, Θ′, and w′. The steps are as follows:

Step 1. (Figure 5-10a). While gripping the pen, the user places their hand so that it is roughly centred on a cross-hair and circle displayed at the centre of the display c′. Once positioned, and without lifting their hand, they tap the pen to record a base pen position p′. Based on p′ and c′, we calculate the two hand-offset parameters, q′ and Φ′. At the same time, handedness is determined using a simple rule: if p′ is to the left of c′, the user is right-handed, otherwise they are left-handed.

Step 2. (Figure 5-10b). Keeping their hand on the display, they adjust the circle size with two repeat-and-hold buttons displayed immediately above and below p. This adjusts the base hand size parameter r′ and also refines the base offset distance q′ as needed. Once satisfied, they tap a continue button located at p.

25 Note that the head movement in this example is actually caused by occlusion. In the next chapter we describe an interaction technique which avoids this type of occlusion in the first place.

205

Step 3. (Figure 5-10c). Using the same two adjustment buttons, the user rotates a set of dashed lines to alight with the direction of their forearm which sets Θ.

Step 4. (Figure 5-10d). Finally, the thickness of the rectangle is increased or decreased until it roughly matches their arm, setting w′.

(a) Step 1: handedness, hand offset ( and q) (b) Step 2: hand radius (r)

(c) Step 3: forearm angle ( ) (d) Step 4: forearm width (w)

Figure 5-10. Occlusion model user configuration steps.

Time: 00:27 Vogel_Daniel_J_201006_PhD_video_5_2.mp4 Video 5-2. Model configuration demonstration.

Real-Time Model Positioning

Using these base parameters, we can position the model at arbitrary pen positions p. Without tilt, we use all base parameters directly with the exception of Θ since the forearm angle varies as the wrist is moved. We considered using a full kinematic model, but this

206

proved difficult to configure with Θ′ and added complexity. We also considered multiple Θ′ samples at different positions, but this would lengthen the configuration process. Instead, we use an extremely simple model of forearm angle which is adequate for our medium-sized display. Θ is calculated as the angle between the approximate wrist position c and a vertically sliding elbow position e. The 2-D coordinates of e are calculated during configuration as the end point of a vector originating at c′ at angle Θ′ and 270 mm long, the mean forearm length (Pheasant & Hastlegrave, 2006). (a) (b) (c)

c'' c

c'=c c' c c'

ey

ey ey ex e

e e x e x e Figure 5-11. Vertically sliding elbow to set Θ. (a) adjusting Θ in step 4 of the configuration creates an elbow position e relative to a base position, in this case, the circle centre c′; (b) during real-time positioning, as the circle centre c moves horizontally, theta is set such that it is the angle of the vector ce; (c) as the circle moves vertically, an adjusted base position of the circle centre follows along c′′ so that e moves vertically as well, and Θ remains constant.

With Stylus Tilt

Some direct input pen devices detect the azimuth and altitude of pen tilt. With a constant grip, pen tilt should be correlated to q and Φ, so our model uses this additional information when available. The azimuth, φ, uses the same angle configuration as Φ and Θ, and the altitude, θ, is the angle deviation away from the display surface normal. To compensate for somewhat noisy tilt data, we apply a dynamic recursive low pass filter (Vogel & Balakrishnan, 2005) at 60Hz, with cut-offs of 0.05 and 2 Hz interpolated between 4 and 20 degrees/s. Base values φ′ and θ′ are sampled during configuration in step 1. Thus, q is calculated from q′ using the ratio of current altitude and base altitude:

207

/ (5-3) The parameter Φ is calculated as a fixed offset from the current and base azimuth: Φ Φ (5-4) Where attenuate is a function to attenuate Φ as the pen nears a perpendicular orientation to the display (θ nears 0 or the pen azimuth deviates more than 30° from the base azimuth. This compensates for sometimes noisy tilt data (in spite of filtering) – users may change their grip slightly, but large deviations in φ and θ are likely to be outliers.

Analytical Test of Configurable Model Accuracy

To test the fidelity our configurable model, we use the same technique explained above. To configure our model analytically, we use participant mean fitted parameters as base parameters. For tilt, we use the actual data logged during the experiment.

The configurable model test results found mean F2 scores of 0.803 (SD 0.166) without 26 tilt, and 0.795 (SD 0.166) with tilt . Our results approach the theoretical maximum F2 score of 0.89 for silhouettes generated by “fitting” the model using non-linear optimization and are well above 0.47 for a simple bounding box.

It is surprising that the tilt version has a slightly lower F2 score than non-tilt. We attribute this to the noisy, unfiltered tilt data logged in Experiment 4-127. Due to the tilt noise affecting the tilt version of the model, the non-tilt version appears to match the task posture slightly better. Looking at the individual precision and recall scores, we find that the tilt version actually has slightly better recall (0.856 for tilt, 0.865 without tilt), but lower precision (0.662 for tilt, 0.700 without tilt). The precision-recall plots look very similar, both illustrating overall patterns of higher recall with some falloff for precision (Figure 5-7a,b). In informal tests of our implementation, we found that with the addition of filtered tilt data, the model tracked postures better as they deviated from the configured neutral posture.

26 Note that these results are similar to the results of an early version of the model reported in Vogel et al. (2009) , which used a much more complex kinematic model to set Θ. 27 We tried to filter the tilt data from Experiment 4-1, but due to relatively low and irregular sampling rates, we could not apply a filter without introducing noticeable lag.

208

(a) without tilt (b) with tilt 1 1 high precision precision concentration

low 0 recall 1 0 recall 1 Figure 5-12. Precision-recall plots for analytical configurable model. A concentration near the upper-right indicates good performance. The ellipse shows the mean and standard deviation of precision and recall.

5.5 Experiment 5-1: Occlusion Model Evaluation

We conducted a short experiment to test the configurable occlusion model. This experiment has two main goals:

1. Test the usability of the occlusion model user configuration steps.

2. Verify the fidelity of the model with a non-analytic configuration.

Participants

15 people (3 female, 12 male) with a mean age of 23.5 (SD 4.8) participated, 12 were right-handed and 3 were left-handed. All were pre-screened for color blindness, and most had little experience with direct pen input, but this is acceptable since we are observing a relatively simple style of interaction. It would have been ideal to have a fully balanced number of participants in left- and right-handed conditions, but recruiting left-handed participants was difficult. However, since our study is not a typical human performance study, this is not a critical issue. Our objective for including left-handed participants is to verify that our implementation works correctly when mirrored, and to provide initial results and discussion for non-right handed users.

209

Apparatus

The experiment used the same apparatus set-up as the experiments in chapter 4: a Wacom Cintiq 12UX direct input pen tablet in portrait-orientation and a small head-mounted video camera to record the entire experiment. The Cintiq tablet has a 307 mm (12.1 inch) diagonal display and a resolution of 1280 by 800 px (261 × 163 mm) creating a device resolution of 4.9 px/mm (125 DPI). It was supported at an angle close to 12 degrees off the desk, oriented towards the participant. Participants were seated in an adjustable office chair with the height adjusted so that the elbow formed a 90 degree angle when the forearm was on the desk. The head-mounted video camera recorded the entire experiment at 640 × 480 px resolution and 15 frames-per-second. The camera is attached to a head harness using hook- and-loop strips making it easy to move up or down so that it can be positioned close to the center of the eyes, without interfering with the participants’ line of sight. Printed fiducial markers were attached around the bezel of the tablet to enable us to transform the point-of-view frames to a standardized image perspective for analysis.

Tasks

The experiment consisted of two tasks. First, participants configured the occlusion model using the interactive steps explained previously (Figure 5-10). Second, a series of circle selection targets were presented on a grid, similar to Experiment 4-1 in the previous chapter.

(a)

(b)

Figure 5-13. Experiment 5-1 experimental stimuli. (a) 6 x 10 grid used to place the measurement target; blue start target is located near the bottom right; (b) circle measurement target showing ink trail.

210

The circle selection target worked exactly the same as Experiment 4-1. An ink trail visualization indicates progress, and errors occurred when the pen tip moved beyond the inner or outer radius. As before, the distance between the 0.8 mm (4 px) inner and 6.5 mm (32 px) outer radius creates a 5.7 mm (28 px) tolerance for error-free circling. The only difference was the target colour – we coloured all targets blue instead of red to make occlusion silhouette extraction using the blue color channel easier. The targets are located at a subset of positions on the same 7 × 11 unit invisible grid. We removed target positions at the extreme right (for right-handed participants) and bottom. This created 60 different locations with target centers spaced 24.9 mm (122 px) horizontally and 25.1 mm (123 px) vertically. The grid locations and start target location was mirrored horizontally for left-handed participants. At the beginning of each trial, a 13.0 mm (64 px) circular blue home target was displayed 52 mm from the display bottom. At the same time, the first circle target was shown in a grey disabled state, only becoming active once the start target was selected. Unlike Experiment 4-1, participants selected a row of 6 targets one after the other in sequence, without having to tap a start target each time. This was done to shorten the duration of the experiment and decrease the amount of pen tilt noise caused by the type of rapid, short movements used in Experiment 4-1. Recall that we are not interested in measuring target selection time, so randomizing target presentation order or including a start target before each target is not strictly required.

Design

After the model was configured, we presented 1 block of trials for the circle selection task. A block consisted of 60 trials covering each target position in the grid. Trials were presented in sequence from target positions at the top of the display to the bottom. Before beginning the first block of a task, the participant completed 12 practice trials. In summary, the experimental design was: 1 Task (Circle) × 60 Target Positions × 1 Block = 60 data points per participant

211

Results

Occlusion Model Configuration

All participants completed the model configuration step successfully, but they had some difficulty and required guidance. In step 1, participants found the notion of centering their hand in a circle ambiguous, and often placed their hand too high or low. A related issue occurred in step 4, when the rectangle was shifted from the forearm midline due to a limitation of the simple geometric model, and they were not clear as to what constituted a good rectangle width. Participants also tended to lift their hand during configuration, which seems to be motivated by a desire to see what was on the entire display (which of course, was mostly occluded), or due to a seemingly natural lifting motion as they tapped the adjustment buttons. Precision and Recall We evaluate the capability of our configurable model to capture the essence of the actual occluded area using the same techniques described earlier. To achieve this, we first need to generate occlusion silhouettes from video frames taken at the end of each circle selection task. We used the same steps described in Experiment 4-1. First, we extracted each video frame taken just before the trial ended (just before the circular target was completely circled). Then, by tracking the fiducial markers using ARToolkit (“ARToolKit”), we found the four corner positions of the display and un-warped the perspective. Once cropped to a 267 × 427 px image of the display area only, we applied the same sequence of image processing steps to create the final silhouette. To establish an upper bound for the new participant data in this experiment, we fit the geometric model shapes to each silhouette using the same non-linear optimization techniques explained above. Testing the fitted geometry against each silhouette, we found an F2 score of

0.924 (SD 0.046) with mean precision 0.800 (SD 0.152) and mean recall 0.965 (SD 0.070). The corresponding precision-recall plot (Figure 5-14a) has the same basic pattern as the previous fitted geometry plots (Figure 5-7b). We also tested the mean model configuration (Figure 5-8). Recall that the mean model is a single “one-size-fits-all” configuration using mean parameters values calculated from the

fitted geometry on occlusion silhouettes gathered in Experiment 4-1. We found an F2 score of

212

0.685 (SD 0.169) with mean precision 0.791 (SD 0.161) and mean recall 0.675 (SD 0.197). Because this version of the model is configured from mean fitted values for silhouettes

gathered in Experiment 4-2, a much lower F2 score is expected. The precision recall plot illustrates the wide dispersion of data points and the surprisingly high precision, but lower recall (Figure 5-14b). (a) fitted geometry (b) mean model 1 1 high precision precision concentration

low 0 recall 1 0 recall 1 Figure 5-14. Experiment precision-recall for mean model and fitted geometry. The ellipse shows the mean and standard deviation of precision and recall.

Next, we tested the performance of the configurable version of the model, as configured in the interactive steps during the experiment, with and without tilt data from the tablet. To do this, we ran the real-time model algorithms using the logged experiment data and compared the model state at the end of each circle selection task, with the occlusion silhouette captured at the same moment.

We found mean F2 scores of 0.737 (SD 0.164) for the configurable model without tilt,

and 0.755 (SD 0.162) with tilt. The concentration pattern shown on the precision recall plots are similar, but evidence of more concentration in recall with tilt (Figure 5-15). For with tilt and without tilt respectively, the mean precision values are 0.750 (SD 0.213) and 0.768 (SD

0.198) and mean recall values are 0.770 (SD 0.179) and 0.744 (SD 0.187).

213

(a) without tilt (b) with tilt 1 1 high precision precision concentration

low 0 recall 1 0 recall 1 Figure 5-15. Precision-recall plots for experimentally configured model. The ellipse shows the mean and standard deviation of precision and recall.

Discussion

Usability of Occlusion Model Configuration

Our results suggest that the configuration process can be further improved to better match the mental model and physical tendencies of users. To discourage the lifting of hands, adjustment widgets could be redesigned such that the pen remains pressed against the display throughout. For example, continuous crossing widgets (Accot & Zhai, 2002) could be used. The visual difference between the model, and the participants’ view of their hand and forearm, appears to be somewhat problematic. One way to address this is by rendering a more realistic representation, such as a typical occlusion silhouette, for the purpose of user configuration. In this case, the underlying circle and rectangle model would be adjusted indirectly. A more radical departure would be for users to trace the actual shape of their hand and forearm, as seen from their point-of-view, using the pen held in their non-dominant hand. Then, the geometric model can be automatically fitted to the outline shape using an optimization process similar to the fitting steps described above.

Model Fidelity

Unlike in the analytical configuration test, the configurable model with tilt data outperformed the non-tilt version. This suggests that our dynamic filtering in the real-time

model implementation had a positive effect. However, the configurable model F2 scores are both lower than the scores for the configurable model in the analytical configuration test. The higher precision values and lower recall scores suggest that participants configured the model too tightly – they attempted to match the shapes as closely as possible to their hand and

214

forearm. This lowered recall, and thus also lowered the resulting F2 scores (which favour recall over precision). As discussed above, in an implementation setting, a model which has more recall is preferred – we would rather capture all areas which are actually occluded, even if this means misidentifying some areas which are not occluded. A simple solution to remedy this is to instruct participants to fit the shapes as closely as possible, and then increase r and w by a constant factor or percentage.

5.6 Future Directions

In addition to refining the configuration process as discussed above, we believe there are ways to make the model more robust and applicable to other input contexts and display form factors.

Using Relative Head Position

Recall that we make the simplifying assumption that users keep their head in a constant position relative to the display. In practice, we know this is not the case. However, many new laptops and desktops come equipped with a small, unobtrusive webcam built into the display bezel – and recent research has demonstrated that computer vision based head tracking is robust and usable for this type of scenario (Harrison & Dey, 2008). With even an approximate estimate of head position, the model state could be influenced. This could be as simple as adjusting q, Φ, and Θ to modify the position of the circle and rectangle relative to the current head position. Or, a full transform matrix could be applied to the rendered shapes similar to shadow mapping in computer graphics (L. Williams, 1978).

Other Device Sizes

Although we built our model with a 12 inch tablet, we believe it can be extended to work with larger display sizes. The base parameters are unlikely to change, but the configuration process and real time positioning algorithms would need to be refined. First, we expect that tracking head position would be essential, since intuitively, larger displays will cause larger head movements. Also, using a single, central display position for configuration may no longer be adequate. Additional steps could be added to adjust the model as viewed at extreme positions across the display. Our scheme for setting Θ using a

215

vertically sliding elbow is unlikely to scale to larger displays. However, with enough refined model positions across the display, all parameters, including Θ, could simply be interpolated between sets of location-specific base parameters. When the display size diagonal is larger than the combined length of the hand and forearm, the upper arm may have to be considered as well – 480 mm for men and 435 mm for women according to anthropomorphic data (Pheasant & Hastlegrave, 2006). As displays grow in size, so does the likelihood of multi-user interaction, and with a tabletop context, the orientation of a user relative to the display must be tracked (Brandl et al., 2009). On very large displays, with multiple users, it may also be necessary to model occlusion caused by another user’s body. Body tracking techniques used for front-projector occlusion (Audet & Cooperstock, 2007; Tan & Pausch, 2002) or the Shadow Reaching interaction technique (Shoemaker, Tang, & Booth, 2007) may provide partial solutions.

Touch and Multi-Touch Devices

Although we believe that a configurable occlusion model could be developed for touch or multi-touch interaction, the geometry, configuration, and real time positioning would all need to be revised for it to be accurate. For instance, known problems with finger occlusion (Vogel & Baudisch, 2007) suggest that the geometric model would have to include shapes which capture the position of the thumb and fingers. Moreover, the wide variety of hand postures which occur naturally between and within individuals would make creating a model more difficult than with pen input. However, when the accurate tracking of objects above a touch or multi-touch surface (Echtler et al., 2008) becomes reliable and without environmental lighting assumptions, the occluded area could be captured directly. We also suspect that compared to a hand gripping a pen, a hand engaged in touch input is typically flatter and closer to the display. This would make an image of the hand and fingers captured from beneath the surface approximate the actual occluded area more closely. However, even if the shape of the occluded area can be captured reliably, a geometric model of expected occlusion shape may assist computer vision algorithms by identifying each user’s relative orientation to the display, or perhaps, identifying different users based on individually configured occlusion models. Conducting a study similar to Experiment 4-1 for touch or multi-touch input would be a logical first step in exploring this area.

216

5.7 Summary

Experiment 4-1 in chapter 4 found that the occlusion shape varies for different users, yet, we were able to develop a configurable, five-parameter geometric model that captures the general shape of the occluded area created by an individual’s hand and forearm. We now summarize the performance of the various incarnations of our occlusion model (see also Figure 5-16).

(a) analytic configuration (b) actual configuration using Experiment 4-1 data using Experiment 5-1 data 1 1 precision precision

0 recall 1 0 recall 1

bounding box mean model without tilt with tilt fitted geometry Figure 5-16. Summary of precision recall performance for tested models. (a) models compared analytically using occlusion silhouettes from Experiment 4-1; (b) models compared using the actual user configuration and occlusion silhouettes from Experiment 5-1. The ellipses show the mean and standard deviation of precision and recall.

We began by developing and testing our geometric model analytically using corpus of occlusion silhouette images gathered in Experiment 4-1 (Figure 5-16a). To initially test the theoretical optimum performance for the geometric model, we used non-linear optimization algorithms to “fit” the model shapes to the occlusion silhouettes. This procedure resulted in

an F2 score of 0.891 for the fitted geometry. Based on the space of fitted parameters, we

constructed a “one-size-fits-all” mean model which has a F2 score of 0.748. Despite the encouraging score, in practice, this type of “one-size-fits-all” model in unlikely to provide good performance. Since it is a mean model, if an individual participant grip is far from the mean grip, the model will consistently perform poorly. However, it may suffice as a rough guide for designers. To adapt the geometric model to an individual grip, we designed a four-step interactive configuration process. This allows an individual to progressively refine the position of the

217

model shapes to conform to their own view of their hand and forearm on the display. This creates a set of base-parameters, which we use to position the geometric model in real-time using only the current cursor coordinates and, and if available, pen tilt data. Our real-time positioning scheme is simple: we use four out of five of the base parameters directly with some adjustment given pen tilt if available. The fifth parameter, which captures the angle of the forearm, is calculated using a very simple kinematically inspired calculation. We tested this configurable version of the model analytically, using the same corpus of silhouette images gathered in Experiment 4-1, and also with actual user configurations and silhouettes captured during a user experiment. The F2 scores for the analytical configuration are 0.803 without tilt and 0.795 with tilt (Figure 5-16a). We attribute the lower score to the unfiltered tilt data in the experiment log.

The F2 scores for the actual configuration as performed in Experiment 5-1 are somewhat lower at 0.737 without tilt and 0.755 with tilt (summarized in Figure 5-16b). We attribute this to participants configuring the model too tightly which lowered recall. At the end of the configuration process, we could simply increase the size of the model slightly to counteract this behaviour. Note also that by filtering the tilt data during the experiment, we arrived at a higher F2 for the tilt version of the model compared to the no-tilt version.

Applications

We envision three main areas of application for our model: to analyze the interaction of occlusion in controlled experiments; as a simple, quantitative design guideline to minimize occlusion; and as an enabling technology to create real-time occlusion-aware interaction techniques. This model could be used in the analysis of occlusion in formal experiments. Past researchers such as Hancock and Booth (2004), Grossman et al. (2006), Hinckley et al. (2006), Guimbretière, and Chen (2008), and Forlines and Balakrishnan (2008) could only speculate on the effect of occlusion in their results. In the previous chapter, we used occlusion silhouettes to illustrate different strategies when contorting to minimize occlusion in Experiment 4-3, and overlaid the mean occlusion silhouette to visualize which targets were likely occluded or in Experiment 4-2. Of course, we had a set of occlusion silhouettes at our disposal. Researchers may wish to illustrate or test for occlusion without having to outfit a

218

head-mounted camera and perform the somewhat tedious post-processing steps to get their own set of occlusion silhouettes. Instead, they can use a mean model or configure the real- time model to their participants. In the simplest form, the mean model could be used to test for occluded experimental stimuli (e.g., Figure 5-17), or with an implementation of the real time model, each participant could complete the configuration steps, and individual participant model used for more detailed analysis. Researchers could also use the model to refine the position of experiment stimuli to induce or avoid occlusion a priori.

N -90°

142.9 mm -135° -45° NW NE 102.0 mm

61.2 mm

20.4 mm

180° 0° W E

135° 45° SW SE

S 90° Figure 5-17. Using occlusion model in formal experiment analysis. As a simple example, the mean model could be used instead of capturing and processing a mean silhouette image. Note the likely occluded target positions here are similar to those shown by overlaying the mean silhouette from Experiment 4-2 (Figure 4-36).

It is easier to communicate and apply occlusion-aware design rules based on the geometric model. In the previous chapter, we provided a guideline for avoiding the occluded area relative to the pen. To communicate the occlusion area itself, we were forced to provide an illustration showing the mean area overlaid by a measured grid (Figure 4-45). Although this may be adequate in some instances, using the geometric model is more concise and usable. It can be expressed in only five parameters, and the geometry could be applied directly to assist in positioning elements or laying out a widget. As a simple example,

219

consider how the geometric model could be used to further refine (and justify) Ramos and Balakrishnan’s Twist Lens sinusoidal slider (Figure 5-18).

(a) original Twist Lens (b) revised Twist Lens

Figure 5-18. Using occlusion model to design an interaction technique. As an example, we refine Ramos and Balakrishnan’s Twist lens (2003): (a) the sinusoidal shape of the original Twist Lens is partly motivated to reduce occlusion; (b) by using the mean model geometry as a guideline, the Twist Lens could be refined to further reduce occlusion.

Perhaps the most exciting application of our model is to enable occlusion-aware interfaces. These are interaction techniques which adjust and refine an interface automatically based on what is currently occluded according to the position of the configured, real-time model. Consider for example, the problem noted in our observational study in chapter 3, where status messages displayed in the lower right corner are occluded and missed by the user. Based on our first design guideline in chapter 4, we could simply display the status messages elsewhere – but what if this is not possible, or extremely difficult, due to the existing operating system convention or other layout constraints? Using our real- time model of occlusion, the occluded status messages could be automatically recognized, and the interface could take steps to reposition them in a nearby non-occluded portion of the display. We present this and other designs for occlusion-aware interfaces in the next chapter.

220

6 Occlusion-Aware Interfaces

In previous chapters, we identified problems with occlusion and developed a real-time configurable model to track the currently occluded area. In this chapter we introduce the notion of occlusion-aware interfaces. These are interaction techniques which know what area of the display is currently occluded – in our case, with our configurable model – and use this knowledge to counteract potential problems with occlusion and/or utilize the hidden area. We present the Occlusion-Aware Viewer technique (Figure 6-1, demonstrated in Video 6-1) which displays otherwise missed previews and status messages in a non-occluded area using a bubble-like callout. It demonstrates how a sufficiently accurate representation of the occluded area can be utilized, and provides a case study of research problems for creating other occlusion-aware techniques. A related problem in this example is determining if anything of interest is occluded. Rather than ask programmers to implement a custom protocol and notify the technique (Ishak & Feiner, 2004), our technique monitors the interface for changes using image processing, and uses what is changing as a proxy for what is important.

221

contrast brightness

auto saving file auto saving file

Figure 6-1. Occlusion-Aware Viewer technique. Displays otherwise missed previews and status messages in a non-occluded area using a bubble-like callout.

In this chapter, we describe and evaluate the Occlusion-Aware Viewer technique which uses our configurable model of occlusion described in chapter 5, real-time image processing technique to identify important areas, and a simple callout positioning algorithm. Our experiment design uses a refined version of the simultaneous monitoring task used in Experimenter 4-3. At the end of this chapter, we also discuss initial designs and ideas for three other occlusion-aware techniques for dragging into occluded areas, placing pop-ups in non-occluded areas, and a hidden widget.

6.1 Related Work

Researchers have developed techniques at least partly motivated by occlusion. Direct pen-input techniques include Ramos and Balakrishnan’s (2003) sinusoidal shaped slider which should reduce occlusion from the user’s hand; Apitz and Guimbretières’ (2004) CrossY, which uses predominant right-to-left movement to counteract occlusion with right- handed users; and Schilit, Golovchinsky, and Price’s pen-based XLibris ebook reader (1998) places a menu bar at the bottom of the display to avoid occlusion when navigating pages. Touch screen and table top techniques focus on finger occlusion: examples include Shen et al.’s (2005) design for visual feedback which expands beyond the area typically occluded by a finger; and selection techniques which use a second hand to explicitly or implicitly

222

counteract occlusion by shifting the cursor up (Potter, Weldon, & B. Shneiderman, 1988), adding a handle (Albinsson & Zhai, 2003), or zooming the target area (Benko, A. D. Wilson, & Baudisch, 2006). In these examples, there is no underlying user-specific model of what is actually being occluded. Instead, simple rules-of-thumb are used, such as “avoid the area South-East of the cursor position for right-handed users”, or the user explicitly adjusts factors themselves to address occlusion.

Vogel and Baudisch’s Shift and the Apple iPhone

Vogel and Baudisch’s Shift (2007) is a touch screen target selection technique for avoiding occlusion problems when selecting small targets with a relatively large finger (Figure 6-2). The technique uses a simple estimate for finger size (which can be thought of as a simple model) and compares this with the size of targets underneath the finger as it touches the display. If the one or more targets are found to be smaller than the finger, a copy of the occluded area is re-displayed in a circular callout near the finger. The user can then fine tune their selection by adjusting a cursor before lifting their finger again (called “take-off” selection). By tracking whether the user makes adjustments before each take-off selection, Shift fine tunes its estimate for finger size in real time to compensate for individual differences. If the user adjusts the cursor before lifting their finger, the estimated finger size is increased slightly. If the user makes a selection without any cursor adjustment, the finger size is decreased slightly. The callout is typically placed immediately above the finger, but for selections near the top of the display, the callout position is adjusted using a simple deterministic algorithm. The Apple iPhone GUI also includes simple techniques to counteract finger occlusion while entering text using the onscreen keyboard. Like Vogel and Baudisch’s Shift, a callout displays the otherwise occluded area of the display in a non-occluded area using a callout, but without any individual adaptation. We adopt the notion of using a callout to display otherwise occluded content, and present a more flexible callout placement technique.

223

(a) (b) (c)

Figure 6-2. Vogel and Baudisch’s Shift touch screen selection technique. (a) small targets are occluded by a user’s finger (b) Shift reveals occluded screen content in a callout displayed above the finger, so that users can fine tune with take-off selection; (c) by adjusting the relative callout location, Shift handles targets anywhere on the screen. (from Vogel & Baudisch, 2007)

(a) (b)

Figure 6-3. Simple occlusion-awareness in Apple’s iPhone. (a) a callout re-displaying keyboard selections in a non-occluded area; (b) a “magnifying glass” callout re-displaying the occluded text cursor insertion point.

Hancock and Booth’s Context menu Placement

Hancock and Booth (2004) use a simple technique for context menu placement based on user handedness. Based on experimental data, they suggest that pop-up menus should be placed South-West of the menu invocation location for right-handed users, and South-East for left-handed users. This recommendation echoes the same basic occlusion-avoidance strategy used by the other researchers discussed above, but Hancock and Booth introduced a method to detect handedness automatically, moving closer to the notion of occlusion- awareness.

224

Brandl et al.’s Occlusion-Aware Pie Menu

Brandl et al.’s occlusion-aware pie menu (2009) rotates to minimize occlusion according to the user’s hand orientation (Figure 6-4). This is achieved by tracking pen location and hand contact point on a multi-touch table. The table top context motivates the technique since user orientation relative to the table is largely unknown. Although their implementation uses a multi-touch table, pen tilt could provide similar hand to pen orientation information. However, they do not conduct a user evaluation, so it is unknown if the technique reliably functions as intended, or provides a tangible performance benefit. In addition, like Hancock and Booth, Brandl et al. only identify occlusion in the immediate vicinity of the pen as it pertains specifically to positioning a menu widget.

Figure 6-4. Brandl et al.’s occlusion-aware pie menu. Pen location and hand contact point are tracked independently using a multi-touch table. Based on a pen-to-hand reference orientation, the menu is rotated about the pen location to minimize occlusion. (from Brandl et al., 2009)

Interaction Techniques for Other Types of Occlusion

Relevant to our work is research investigating occlusion caused by GUI elements, physical objects and shadows caused by front projectors. Shen et al. (2005) note that hand occlusion is compounded by “pop-up” user interface elements such as menus, tool-tips, and toolbars which also occlude potentially important display space. Ramos et al.’s Tumble and Splat (2006) and Dragicevic’s fold-and-drop technique (2004) are two techniques which address occluding windows or user interface objects, but both require explicit user interaction.

225

Cotting and Gross’s environment-aware display bubbles (2006) distort the display to avoid physical objects and arm shadows caused by the beam of a front-projected interactive table. Leithinger and Haller (2007) present a menu which unveils options along a user drawn path to avoid physical objects cluttering a tabletop. Bezerianos, Dragicevic, and Balakrishnan’s Mnemonic Rendering (2006) buffers hidden pixel changes and re-displays them later when they are no longer hidden. This could occur when a window covers interface updates and is finally moved out of the way, or when the user turns their head back to a previously active area. The authors identify physical hand occlusion as one motivation for their Mnemonic Rendering technique, but their focus and prototype implementations only identify pixels which are hidden by overlapping windows or out of the user’s field of view. However, the technique of buffering and re-displaying interface changes may be appropriate for our technique in some scenarios.

6.2 Occlusion-Aware Interfaces

An occlusion-aware interface knows which portions of the display are currently occluded by the hand, and uses this knowledge to counteract potential problems and make better use of the hidden area. There are two key issues that must be resolved before occlusion-aware interaction techniques can be designed and implemented. First, a sufficiently accurate representation of the occluded area must be determined in real time with available sensors. We achieve this using our configurable geometric model described in the previous chapter. The second issue is identifying what kinds of problems are caused by occlusion. For this, we draw on the results of our initial observational study of Tablet PC usability and the experiments investigating occlusion performance.

Occlusion-Related Usability Problems

Recall, that in our observational study of Tablet PC usability discussed in chapter 3, we found that occlusion likely contributed to user errors, led to fatigue, and forced inefficient movements:

226

1. Missed Status Messages. Participants missed occluded system status messages which can lead to errors caused by mismatched user and system states.

2. Missed Previews. Real time document previews were often occluded when using a formatting toolbar which led to this feature going unnoticed, or again, leading to errors from mismatched user and system states.

3. Occlusion Contortion. Participants arched their wrist when adjusting options so they could simultaneously monitor otherwise occluded document changes.

4. Inefficient Movements. When dragging, participants made movement deviations past or away from the intended target when it was occluded.

The first three of these issues relate to cases where important content is occluded by the hand. Missed status messages and missed previews occur when the user does not know that important content is occluded. Occlusion contortion is a coping mechanism also observed by Inkpen et al. (2006). In Experiment 4-3 in chapter 4, we found that occlusion contortion can reduce performance in a simultaneous monitoring task.

6.3 Occlusion-Aware Viewer

The Occlusion-Aware Viewer technique addresses three out of four issues we identified in previous studies and experiments, and provides a good case study of related research problems when developing occlusion-aware techniques. Later in this chapter, we present designs for three more occlusion-aware interactions techniques: an extension to Hancock and Booth’s work (2004) for displaying non-occluded menus, tooltips, and other types of temporary layers explicitly invoked by the user; an occlusion-aware dragging technique to reduce inefficient movements when dragging towards an occluded area; and a technique which makes use of the occluded area. The Occlusion-Aware Viewer displays otherwise missed previews and status messages in a non-occluded area using a bubble-like callout (demonstrated in Video 6-1). As the user interacts (Figure 6-5a), our system uses our configured real-time model to track which display areas are currently occluded (Figure 6-5b), and compares this with the current state of the interface. If something important is occurring in the interface in an occluded region

227

(Figure 6-5c), the Viewer re-displays that important information in a non-occluded area (Figure 6-5d). The occlusion model is also used to guide callout positioning.

(b) real-time model tracks occluded area (c) important occluded region identified

(a) user interacts (d) important region shown in callout Figure 6-5. Occlusion-Aware Viewer demonstration. (a) as the user interacts; (b) a configured real-time model tracks the area currently occluded by the hand; (c) image processing techniques detect that an important region is occluded; (d) the important region is displayed in another, non occluded location;

Time: 01:03 Vogel_Daniel_J_201006_PhD_video_6_1.mp4 Video 6-1. Occlusion-Aware Viewer demonstration.

Rather than require application programmers to inform us what is important (Ishak & Feiner, 2004), we use an application-agnostic image processing layer. We look for regions which are dynamically changing, and consider these important. To automatically determine what areas are dynamically changing, the system monitors the visual state of the interface

228

using simple image processing techniques. Unlike Bezerianos et al.’s Mnemonic Rendering (2006), we identify the changes as they happen and re-display them without any time shift. This is an important distinction because users often need to monitor document previews as they manipulate parameters elsewhere, or use status messages to confirm immediate actions. When choosing the location of the callout, we attempt to keep it near to its content source while also trying to not to cover other important areas of the display. Background distortion (Cotting & Markus Gross, 2006) is an alternative display technique, but this could become distracting with frequent region changes. The identification of display changes and locating the callout are related research problems which we had to solve to realize the full technique.

Detecting Importance through Interface Changes

Compared to processing real-world images, the uniformity, clarity, and restricted visual domain make image analysis more viable. We consider this a proof-of-concept. It actually works very well, but some changes are not always important (e.g., constant feedback of cursor position in a drawing program) and should be filtered out. Other techniques like texture analysis or object recognition could improve importance identification and further filter out false positives. Truong and Abowd (2004) also perform image processing on screen captures in real time for their classroom capture and access application. They compute the percentage of pixels which differ on two consecutive screen captures of the projector display to determine when slides change – without instrumenting the slide presentation software. We extend their idea and apply it an interaction technique with more detailed region identification. Our change detection process can be conceptually divided into two distinct steps: change detection mask creation, and then occluded region identification.

Change Detection Mask Creation

The change detection mask is a binary image of the entire user interface which labels pixels which are changing with a “0” and pixels which are not changing with a “1” (see Figure 6-6a). The change detection mask is created using the following three steps:

229

1. Screen Capture: The entire screen captured in Red Green Blue (RGB) space at 5 Hz and scaled to 30% to reduce subsequent processing requirements. Since we reduce the scale of the screen capture, the final change detection mask actually identifies groups of pixels as changing. Note also that the screen capture does not include our technique’s bubble callouts if they are currently shown.

2. Average Image Accumulator: The screen capture is added to a running average image accumulation buffer with an alpha weight of 0.5. This means that each pixel of the current screen capture is multiplied by 0.5 and added to the corresponding pixel of the image accumulation buffer which itself is multiplied by 0.5. This average image is used as an estimate of what the current display looked like before the current screen capture – it slowly changes to absorb changes over time. Note that a lower weight amplifies and prolongs changes and a higher weight filters out more short duration, subtle changes.

3. Change Detection: The absolute difference of the screen capture and the accumulation buffer is calculated and converted to greyscale. The difference is then thresholded to a binary image using a cut-off of 8 (out of 255 possible luminance values for each pixel). We arrived at this cut-off by experimentation: at 5Hz, pixel intensity must change at least 3% to be detected. Finally, to reduce noise and merge nearby regions, we apply 10 iterations of morphological dilation and erosion (with a 3 × 3 structuring element) (Dougherty, 1992).

Occluded Region Identification

We identify important occluded regions with the image space operations described below, but this could also be done at a geometric level. Currently, we pick a single best region, but this could be extended to multiple regions (and thus, multiple callouts).

1. Occlusion Mask (Figure 6-6b): A second accumulation buffer is used as a mean occlusion mask. At 5Hz, the rendered model is added to the buffer with a 0.3 alpha weight; a 5 × 5 blur applied, then thresholded with a cut-off of 128. This means that occlusion is detected when the hand and arm remain still for a moment, rather than when sweeping across the display.

230

2. Identify Occluded Regions (Figure 6-6c): Using the change detection mask and occlusion mask, we find bounding boxes of regions which are at least 40% occluded. Very small or very large regions are removed: areas less than 256 px2 (area of a small icon) or more than 25% of the display; width or height less than 16 px, or more than 50% of smallest display side. Also, regions which are within 16 px of the cursor are removed – this eliminates false-positives when dragging or selecting text, and proved to be very important.

3. Final Region Selection (Figure 6-6d): The remaining region with the largest area is selected. For consistency, if a region was identified on the previous iteration, and it overlaps with this one, the union of the two regions is used.

231

(a) change mask (b) occlusion mask (c) occluded regions

p 1

2

3 4

5 6 7

(d) region selection (e) callout positioning (f) callout visibility

2

p 1

2 2 2

3 4

7 Figure 6-6. Detecting importance and callout positioning. A change detection mask (a) and occlusion mask (b) identify regions which are more than 40% occluded (regions #1, #2, #3, #4, #7) (c); occluded regions which are very small or large (#4, #7) or too close to the pen position P (#1) are also removed and the largest remaining region selected (#2) (d); the callout is positioned by optimizing an objective function over a small set of candidate positions (e); the callout is visible (f).

Callout Visibility and Positioning

We update the callout state after importance detection.

Callout Positioning

We want to find a non-occluded callout position close to the actual region, but not covering anything else important. In early tests, we found that once visible, it is important to keep the callout position stable. A simple objective function expresses these qualities:

⁄ ⁄ (6-1)

232

Where d1 is distance from callout centre to region centre, d2 is the distance from the last

callout centre, d is a constant to normalize the distance terms, and overlap is the percentage of callout area occluded or covering other important regions. Two sets of weights

are used: when the callout was previously hidden, =0.3, =0.0, =0.7; otherwise,

=0.1, =0.3, =0.6. We experimented with finding a global minimum, empirically the best position, but the visible result for the user could be very inconsistent and unstable. Instead, we consider a small number of possible positions which are typically not occluded by the hand or arm, and use the objective function to find the best one. We use six candidate directions relative to the region centre (W, SW, S, N, NE, W – which are flipped for left-handed users), and two possible distances (250 and 350 px) (Figure 6-6e). This is fast to compute, and makes callout positions predictable. Of course, with few possibilities, there are times where poor callout positions are selected. In practice it works surprisingly well. A hybrid approach, such as using a small set of candidate positions to initialize a local optimization step to “fine tune” the position, could also be used.

Callout Visibility

If the callout is hidden, and a region has been found in a consistent location for at least 333 ms, the callout is made opaque and visible (Figure 6-6f). If the callout was visible, but no region found, then callout opacity begins to decrease, completely hiding it after 1 second. Delaying visibility reduces spurious callouts and fading before hiding helps convey the sensitivity of the detection algorithm.

6.4 Experiment 6-1: Occlusion-Aware Viewer Evaluation

In Experiment 4-3, we found that during a simultaneous monitoring task, if the region to be monitored is occluded, people contort their hand posture to minimize this occlusion and task times suffer. The Occlusion-Aware Viewer technique is designed to compensate for this, and should result in a nearly uniform task time regardless of where the region to be monitored is located. However, in that previous experiment, the way we randomized the task may have led to a confounding effect. There was a weak correlation between task time and the physical distance between the initial and target scrollbar thumb positions. In this new

233

experiment, we revised the simultaneous monitoring task to remove the possibility of this confound. In addition, the revised task requires slightly more pointing accuracy, and the experiment design explores more intermediate target box positions. Our experiment has three main goals:

1. Validate that our Occlusion-Aware Viewer technique mitigates occlusion contortion and its effect.

2. Confirm that occlusion contortion increases the task duration.

3. Test whether occlusion contortion decreases accuracy.

Note that this experiment was performed immediately after Experiment 5-1, with the same 15 participants. We use the 12 right-handed participants for primary analysis, but discuss observations from the 3 left-handed participants afterward.

Participants

12 right-handed people (4 female, 8 male) with a mean age of 22.3 (SD 3.7) participated. All participants were pre-screened for color blindness. Participants had little experience with direct pen input, but this is acceptable since we are observing a relatively simple style of interaction.

Apparatus

The experiment used the same apparatus set-up as Experiment 4-3: a Wacom Cintiq 12UX direct input pen tablet in portrait-orientation and a small head-mounted video camera to record the entire experiment. The Cintiq tablet has a 307 mm (12.1 inch) diagonal display and a resolution of 1280 by 800 px (261 × 163 mm) creating a device resolution of 4.9 px/mm (125 DPI). It was supported at an angle close to 12 degrees off the desk, oriented towards the participant. Participants were seated in an adjustable office chair with the height adjusted so that the elbow formed a 90 degree angle when the forearm was on the desk. The head-mounted video camera recorded the entire experiment at 640 × 480 px resolution and 15 frames-per-second. The camera is attached to a head harness using hook- and-loop strips making it easy to move up or down so that it can be positioned close to the center of the eyes, without interfering with the participants’ line of sight.

234

Printed fiducial markers were attached around the bezel of the tablet to enable us to transform the point-of-view frames to a standardized image perspective for analysis.

Task

A sequence of simultaneous monitoring task trials were presented with, and without enabling the Occlusion-Aware Viewer technique. The full Occlusion-Aware Viewer technique was used – including the real-time occlusion model, importance detection, and automated callout placement. Our objective was to test the full technique. As in Experiment 4-3, this is a controlled version of a real-life document preview task. The user adjusts a numeric value with a slider widget (Figure 6-7a) until it matches a target value displayed in a feedback box located elsewhere (Figure 6-7b). Each trial begins with a successful tap on a 13.0 mm (64 px) circular start target located near the lower right of the display. Once tapped, the slider and feedback box are revealed. The participant acquires the slider thumb and drags it left or right until the current value matches the target value. After the thumb is held at the matching position for 500 ms, the trial ends with a satisfying tick sound.

-90° -75° (c) -60° -45° (a) slider -30°

-15°

-4≠10 40.8 mm (200 px) -4≠10 15°

(b) feedback box 30°

45° 60° 90° 75° Figure 6-7. Simultaneous monitoring task. A slider (a) is used to match values displayed in a feedback box (b) (in this example, the current value is -4 and target is 10); (c) the feedback box is positioned at one of 13 radial positions around the centre of the slider.

235

The slider is located near the centre display, oriented horizontally, 20.4 × 3.3 mm (100 × 16 px) in size with a 3.3 mm (16 px) square drag-able thumb. (The values are displayed in a 36 pt font inside the 22.4 × 12.2 mm (110 × 60 px) feedback box, which is positioned at 13 different radial locations along a 40.8 mm (200 px) arc from the center of the slider at 15° increments (Figure 6-7). The distance between the slider thumb start and target positions is 34 px (6.9 mm) and the target dock width 2 px (0.4 mm). With a 100 px wide slider, this means that the slider range is fixed at 50 and the difference between start and target values is always 17. This was done to avoid the confounding effect from unequal docking task distances in Experiment 4-3. Note that the task is designed to require more precise movements to complete. Unlike most sliders, the pen tip must stay inside the slider track. Also, once the thumb is acquired, the pen tip has to remain on the display until the trial ended. If the participant misses the slider thumb, or any of these conditions are violated, an error is logged along with audio and visual feedback. We also did not want participants to use the visual slider thumb position to locate the target value, so we took steps to ensure that they had to monitor the displayed value. First, the slider’s numeric scale, direction, and start position are randomized for each trial. The minimum slider value is randomly chosen to be a number between -99 and 49. Second, the slider value had to stop at the target value for more than 500 ms before any correct match feedback appeared. Third, target values were selected so they were never the minimum or maximum slider value. Finally, to hide the consistent target distances, 6 extra trials with random distances are inserted but excluded from analysis. During the experiment we did not observe anyone “cheating” – all performed the simultaneous monitoring task as intended. Participants were asked to immediately recover from errors and continue the task until completion. This prevents rushing through the experiment, but most importantly, it enables us to include error recovery time in overall task time. We also delay appearance of the slider and target value box until after the start target is selected so that the time used by the participant to adjust their posture to accommodate occlusion is included in the trial. The dashed border of the feedback box is animated as the value changes (“marching ” feedback). This indicates that the displayed value is being adjusted and acts as a hint to the Viewer technique’s display change algorithm. This is a concession forced by our artificial

236

monitoring task where the user must match a requested target value, and thus watching the current value only would require memorizing the target before each task and would add more variance. In addition to using a constant target distance in the task, there are other refinements over Experiment 4-3 to increase accuracy, capture the time to plan a posture, and remove unessential portions of the task. The slider is thinner and shorter than in Experiment 4-3 which was 13.0 × 26.3 mm (64 × 128 px) with a 4.0 mm (20 px) thumb. In Experiment 4-3, participants had to hold the thumb at the target position for 250 ms and then press a continue button. In this version, we removed the unnecessary continue button and instead required that the thumb had to be held for 500 ms. This created a situation where once the slider thumb was acquired, the entire task had to be completed without lifting the pen. In Experiment 4-3, the position of the target box was shown before the start button was tapped. In this version, we hide the slider and target box until after the start button is tapped so task time includes time to prepare and implement a posture strategy.

Design

A repeated measures within-participant factorial design was used with two independent variables: Technique; and Angle. The two Technique conditions were: with Occlusion-Aware Viewer (Viewer) and without (Baseline). The target value display was positioned at 13 Angles from -90° to 90° in 15° increments, 40.8 mm (200 px) from the centre of the slider (Figure 6-7c). These factors were selected to increase the resolution of positions from Experiment 4-3. Note that in Experiment 4-3, the target box was located on a 31.0 mm (150 px) radius arc from the right-side of the slider. We added an extra 10 mm to compensate for the arc’s shift to the slider center. Presentation of Technique was counter-balanced across participants. Each Technique consisted of four consecutive Blocks with each block presenting all 13 Angles in random order. As explained above, 6 additional special non-timed trials were inserted in each block to prevent participants from recognizing the consistent 17 px target value distance in timed trials. At the beginning of each Technique, a short demonstration and practice block was presented. The entire experiment took 20 to 30 minutes to complete.

237

Time: 00:34 Vogel_Daniel_J_201006_PhD_video_6_2.mp4 Video 6-2. Occlusion-Aware Viewer experiment demonstration.

In summary, the experimental design was: 2 Techniques (Viewer, Baseline) × 13 Angles × 4 Blocks = 104 trials per participant

Data Preparation

We removed 22 outlier trials (1.7% of 1248 total trials) which had a task time more

than 3 SD away from the cross-participant mean for corresponding Technique and Angle. We did not find any pattern in these outliers except that 17 had one or more errors. All had times greater than the mean.

Results

There are two primary dependent variables:

Errors: Since a participant could encounter multiple errors during a single trial, our error measure is the mean number of error occurrences per trial.

Completion Time: This is the total time from successful selection of the start target until selection of the continue button.

Note that completion time includes all trials regardless of whether errors are encountered. Unlike experiments measuring low level movements such as Fitts’ law target selection, our task is inherently more complex and the time to recover from errors is a natural part of task completion.

238

Repeated measures analysis of variance (ANOVA) showed that order of presentation of Technique had no significant effect on time or errors, indicating that a within-subjects design was appropriate.

Learning and/or Fatigue Effects

A 2 × 4 (Technique × Block) within subjects ANOVA found a significant main effect for

Block on task time (F3,30 = 5.602, p < .01) indicating the presence of a learning effect. Post hoc analysis revealed that Block 1 was significantly slower than the other 3 blocks (p < .2), so Block 1 was not included in subsequent analysis.

Mean Number of Errors

Since a participant could encounter multiple errors during a single trial, our error measure is the mean number of error occurrences per trial. We aggregated errors by Angle

across blocks 2, 3, and 4 to perform a 2 × 13 (Technique × Angle) within subjects ANOVA.

There was a significant main effect for Angle (F12,120 = 2.649, p < .01) and a Technique × 28 Angle interaction (F12,120 = 2.810, p < .01). A post hoc multiple means comparison of the interaction found that at an Angle of 15°, the Baseline technique had more errors per task than Viewer (0.694 vs. 0.097 respectively, p = 0.09).

Completion Time

We aggregated completion time by Angle across blocks 2, 3, and 4 to perform a 2 × 13

(Technique × Angle) within subjects ANOVA.

There was a significant main effect for Angle (F12,120 = 5.918, p < .001) and a

Technique × Angle interaction (F12,120 = 5.912, p < .001). The Technique × Angle interaction is most relevant (Figure 6-8c) where a post hoc multiple means comparison of Technique at each Angle found Viewer faster at -30° and -15°, but slower at 45° (all p <.05). In summary, Viewer was 16% faster than Baseline at -30° (4.8 vs. 5.7 s), 23% faster at -15° (5.0 vs. 6.7 s), but 24% slower at 45° (5.6 vs. 4.5 s).

28 All post-hoc analyses use the Bonferroni adjustment.

239

(a) Baseline (b) Viewer (c) Baseline and Viewer

-90° -75° -90° -75° -90° -75° -60° -60° -60° 6 6 6 -45° -45° -45°

4 -30° 4 -30° 4 -30°

Time (s) 2 -15° Time (s) 2 -15° Time (s) 2 -15°

0 0° 0 0° 0 0°

15° 15° 15°

30° 30° 30°

45° 45° 45°

60° 60° 60° 90° 75° 90° 75° 90° 75°

Baseline Viewer Figure 6-8. Completion times of Technique by Angle. (a) Baseline by Angle; (b) Viewer by Angle: circled Angle data points are significantly greater (p < .05) than one or more other Angles. (c) comparison of Baseline and Viewer by Angle: circled Angle data points are significantly greater (p < .05) than the same Angle for other Technique.

A post hoc multiple means comparison of Angle for each Technique also found significant differences. For Baseline, the mean time at -15° was slower than 75°, 60°, 45°, 30°, 15°, 0°, and -60° (Figure 6-8a). For Viewer, the mean time at 45° was slower than 90°, 60°, 15°, and -90° (Figure 6-8b).

Participant Rating

At the end of the experiment, participants were asked to rate the two techniques based on their perception of their speed, fewest errors, comfort, ease-of-use, and least tiring. The numeric rating scale ranges from -1 to 1 where -1 means that Viewer is better, 1 means that Baseline is better, and 0 means no difference. The results suggest that participants rated the Viewer technique as somewhat better in all categories (Figure 6-9). Ratings for fewer errors, comfort, and least tiring are all clustered near -0.5, a medium measure of benefit. Several participants commented that the hand contortion required by Baseline was uncomfortable and error-prone, and that the Viewer technique seemed to help. Viewer ease-of-use and speed were favourable, but ranked less strongly due to occasional inconsistencies in callout position and visibility.

240

Speed

Fewer Errors

Comfort

Ease of Use

Least Tiring

-1.0-0.5 0 0.5 1.0 Viewer better no difference Baseline better Figure 6-9. Participant ratings. The vertical mark is the mean; the shaded area is the standard deviation.

Discussion

Effect of Occlusion Contortion on Performance

The significant effect of angular position of the monitored target value box on task time supports previous qualitative observations regarding hand contortion (Inkpen et al., 2006). The poorest performance near -15° supports our observation from Experiment 4-1 that the occluded area is high relative to pen position, but compared to Experiment 4-3, which had significant differences at -30, 0, and 30, the range of significant differences between angles is very small. Given the large occluded area of the hand, we expected to see a similar pattern of time differences. The reason for fewer differences between angles may be that our participants in this experiment had larger inter-participant variance of occlusion silhouettes, strategies, and dexterity. As an example, we give individual task times and silhouettes for participants 3 and 10 (Figure 6-10). The baseline silhouettes below -60° and above 60° capture a neutral posture, but in between, one can see different strategies used such as arching above or twisting below. Task times for participant 3 suggest broader problems (ranging from -30 to 15°) compared to participant 10 (-15 to 0°). The silhouettes suggest why: participant 3 has more posture variance and mixed contortion strategies across a broader range of angles, perhaps due their larger hand grip. Regarding dexterity, comparing their baseline silhouettes to time profiles indicates these participants are capable of slight contortion to peer around their hand to counter-act occlusion. With a larger preview area, or in the case of missed status messages, this ability may not apply.

241

-90° -75° -60° 6 -45° 4 -30°

Time (s) 2 -15°

(a) 0 0° -90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°

15°

30° 45° 60° 90° 75°

-90° -75° -60° 6 -45° 4 -30°

Time (s) 2 -15°

(b) 0 0° -90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°

15°

30° 45° 60° 90° 75°

Baseline Viewer

Figure 6-10. Sample task completion times and occlusion silhouettes. (a) participant 3; (b) participant 10. The silhouettes for Baseline and Viewer conditions are indicated by colour of reference line (Viewer is above Baseline for each participant example). Silhouettes are captured at the end of each task in blocks 2 to 4 and image processed using the technique outlined in Experiments 4-1 and 4-3.

Performance of Occlusion-Aware Viewer

At first, it may seem that a technique which essentially repositions the target value box in a non-occluded area would produce consistent task times across angular positions. Of course, this assumes the technique has no cognitive overhead and the model, importance detection, and callout positioning work perfectly in all situations. With lower task times at angles 0° and -15°, we know that when the Viewer technique is working well, it can mitigate occlusion. However, the 45° task time spike suggests further refinement. To investigate this issue, we reviewed the point-of-view video logs for trials where the feedback box was near 45°. We found this was often an ambiguous zone for the occlusion model, creating more frequent false negatives and false positives. With false negatives, the feedback box may really be occluded, but no callout appears (e.g., Figure 6-11a); or the callout appears, but may be placed in an occluded position. With false positives, in spite of an un-occluded feedback box, a callout appears (e.g., Figure 6-11b) which can be distracting

242

– especially in mid-task. The worst case is when ambiguity creates callout visibility and position vacillation (e.g., Figure 6-11c). Note also that some participants experienced this kind of ambiguity elsewhere (e.g., participant 10’s time at -60°, Figure 6-10).

(a) occluded, but no callout (b) un-occluded, but callout appears

19:38.47 19:39.95 19:40.28

(c) callout vacilation Figure 6-11. Ambiguity problems when feedback box is at Angle 45. (a) the feedback box is occluded, but no callout appears (participant 3, block 4, time 11.10.39); (b) the feedback box is not occluded, but a callout appears anyway (participant 12, block 2, time 15:55.33) ; (c) ambiguity causing callout position to vacillate (participant 6, block 4, times 19:38.47 – 19:40.28).

Many participants commented on the sometimes unpredictable position and visibility of the callout in spite of preferring it to having no technique at all. We discussed earlier how we had already improved the callout layout algorithm for predictability. The layout objective function could be further tuned to increase the penalty for callout movements regardless of a slight increase in occlusion. An additional term could encourage new callouts to appear as close as possible to previous ones, especially if little time has passed. Overall, we think that users prefer callout consistency, even if this causes some slight occlusion. Overall, the same high variance in participant grip and dexterity prevented more statistical differences over a broader range of angles. As an example, participants 3 and 10 show very different task time profiles across angles for the Viewer compared to the baseline (Figure 6-10). What is clear is the consistent hand posture with the viewer technique. This

243

suggests that they trusted the technique enough to simply start adjusting the slider – and expected that the callout would appear if needed.

Left-Handed Participants

We ran a small pilot with two left-handed participants to see if our results for right- handed participants hold for left-handed participants as well. This is too small for any conclusive quantitative analysis, but we will briefly compare individual left-handed participant results. Overall, the occlusion silhouettes (Figure 6-12) show a similar type of pattern as the right-handed participants (Figure 6-10): silhouettes for Viewer are quite consistent, but silhouettes for Baseline exhibit more variance in both cases. However, changes to posture strategy in the Baseline condition are quite subtle for left-handed participant 2 (Figure 6-12b). Completion times appear somewhat noisier overall, perhaps due to the skill of these two participants (for example, note that completion times for left-handed participant 1 are almost twice as great as left-handled-participant 2). The times for left- handed participant 1 (Figure 6-12a) do have a somewhat similar pattern to participant 10 (Figure 6-10b) with the Viewer experiencing more ambiguity when just above the hand. Overall, it appears that the viewer does reduce posture contortion for these left-handed participants, but any conclusion regarding overall performance would require a full left- handed study to be performed.

244

-90° -75° 10 -60° 8 -45° 6 -30° 4 Time (s) -15° 2 (a) 0 0° -90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°

15°

30° 45° 60° 90° 75°

-90° -75° 5 -60° 4 -45° 3 -30° 2 Time (s) -15° 1 (b) 0 0° -90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90° 15°

30° 45° 60° 90° 75°

Baseline Viewer

Figure 6-12. Left-handed sample task completion times and occlusion silhouettes. (a) left-handed participant 1; (b) left-handed participant 2. The silhouettes for Baseline and Viewer conditions are indicated by colour of reference line (Viewer is above Baseline for each participant example). Silhouettes are captured at the end of each task in blocks 2 to 4 and image processed using the technique outlined in Experiments 4-1 and 4-3.

6.5 Deployment Issues and Future Directions

The Occlusion-Aware Viewer was designed to make integration with current GUIs straightforward and could, in theory, be deployed without making any changes to existing applications, GUI window management, or display driver code. This is because we use image processing to identify what is important rather than relying on a specific protocol. However, our prototype implementation would have to be re-written and the change detection thresholds will likely have to be increased to make the technique more conservative by reducing the chance of false positives. Our prototype is implemented using C# and the Windows Presentation Framework library, so the current capability for cross process screen capture and input hooking and event injection is limited (Microsoft, "Hooks"). However, based on initial investigation, we believe that re-implementing in C++ would make lower-level operating system Application Programming Interfaces available and enable cross application capture, hooking, and

245

injection. We accomplished something similar in terms of hooking and injection with C++ for the Rubber Edge prototype device driver (Casiez, Vogel, Pan, & Chaillou, 2007) and were successful in applying it globally across the Windows GUI. However, capturing the display has become somewhat more complex due to the introduction of the Desktop Window Manager with Windows Vista (Microsoft, "DWM"). Another factor for deployment is the memory and processing overhead for screen capture and image processing. Our prototype is running on a 3.1 GHz dual processor with 4 GB of RAM, more than what is offered on most Tablet PCs currently. Even with this machine configuration, we found it necessary to scale the screen capture to 30% before converting the bitmap to Open CV and to limit all image processing steps to only 5 Hz. However, with optimized C++ code for critical capture, scaling, and region identification, the memory and processing requirements could be significantly reduced. As future work, we see improvements for the Occlusion-Aware Viewer technique such as multiple simultaneous callouts (though this may be somewhat distracting) and refinements to algorithms for callout visibility and location with an emphasis on stability and consistency. Mnemonic Rendering (Bezerianos et al., 2006) could also be used when the occluded area is not a real time preview, such as system alerts.

6.6 Other Occlusion-Aware Techniques

Based on our initial results with the Occlusion-Aware Viewer technique, we have already begun exploring designs and prototypes for other Occlusion-Aware techniques. These techniques all utilize our underlying model of the occluded area, and most build on the image processing techniques used with the Occlusion-Aware Viewer.

Occlusion-Aware Dragging

An Occlusion-Aware Dragging technique (Figure 6-13) could address the fourth occlusion-related issue identified by our observation study and confirmed in the dragging task in Experiment 4-2: the problem of inefficient movements when dragging. By using the model to detect when the user is dragging the cursor into an occluded area, the area in front of the cursor could be displayed in a non-occluded callout.

246

quick brown jumped.

(a) (b)

The quick brown fox jumped. The quick brown fox jumped.

Figure 6-13. Occlusion-Aware Dragging. (a) the user begins to drag towards the currently occluded area; (b) a callout opens to reveal the occluded area in front of the cursor.

We have already built an initial prototype to explore different callout dynamics and behaviours (Video 6-1). For instance, the callout could be remain anchored at the initial invocation location, maintain a constant offset from the current pointer location, or move in a viscous manner depending on pointer velocity and how far the callout is trailing behind – perhaps acting similar to a Trailing Widget (Forlines et al., 2006). Other behaviours we have begun exploring include expanding the size of the callout in the direction of movement based on velocity. We have also experimented with adjusting the focal point of the callout (the location of the pointer as displayed in the callout) in a similar way based on pointer velocity and direction, as well as effects like velocity-dependent zooming (Igarashi & Hinckley, 2000) . Other variables include when to display the callout, how to smoothly animate the callout opening so the user can continue to visually track the same point of interest, and when the callout should disappear. There are also possibilities to tailor callout behaviour depending on the underlying widget. For example, when dragging to select a line of text, the callout could be a very wide rectangle; but, when dragging a file icon, the callout could be round.

Time: 00:27 Vogel_Daniel_J_201006_PhD_video_6_3.mp4 Video 6-3. Occlusion-Aware Dragging technique demonstration.

Occlusion-Aware Pop-Ups

Hancock and Booth’s work (2004) could be extended to include hierarchical menus, tooltips, and other types of temporary pop-ups. Before a pop-up is shown, it can be checked

247

for occlusion and if necessary, moved to a non-occluded area near the invocation point (Figure 6-14). Our importance detection techniques could also be used to prevent occluding other important or dynamic areas. Our findings regarding the importance of callout stability and predictability with the Occlusion-Aware Viewer would apply to Occlusion-Aware Pop-Ups as well. When moving the position of a menu or sub menu to avoid occlusion, the adjusted position should not be too far from the invocation point and the chosen location should remain fixed once displayed.

v v left right copy copy middle (a) cut (b) cut paste paste align > align >

Figure 6-14. Occlusion-Aware Pop-Ups. (a) the user invokes a menu; (b) the default location of the “align” sub menu would be occluded by the hand, but its location is moved to a non-occluded location.

Hidden Widget

To take advantage of the otherwise occluded area, we envision a Hidden Widget (Figure 6-15) reminiscent of a Trailing Widget (Forlines et al., 2006). A Hidden Widget floats underneath the users’ hand until a ballistic movement “traps” it, opening a context menu. This reduces the Trailing Widget’s visual distraction while maintaining the benefit of a visible gesture. An obvious limitation is when the pen location is at the extreme right or bottom of the display: there would be no place to locate the widget since very little of the display is occluded. A simple solution is to train users to intentionally move the pen so that their hand once again occludes a portion of the display, and invoke the Hidden Widget then. This would of course introduce an additional time penalty. If this technique can be satisfactorily implemented, it could be a viable alternative to the customary press-and-hold delay used to invoke context menus. Unlike a standard context menu, the invocation would occur away from the object context. The context object would be the last object selected just before the hidden widget is invoked. This would need to be communicated to the user with a well-designed visualization. Perhaps similar to the coloured

248

bands used to communicate a similar message with the Context-Rooted Rotatable Draggables table-top technique (Shen et al., 2005).

(a) (b)

A A

copy cut paste align >

Figure 6-15. Hidden Widget. (a) the Hidden Widget is in the occluded area; (b) a quick movement traps the Hidden Widget and opens a context menu.

6.7 Summary

While previous researchers have considered occlusion in designs, and even incorporated simple rules-of-thumb to compensate for menu placement, our configurable model, user interface image processing technique for change detection, simple callout positioning, and experimental results demonstrate that a broader class of occlusion-aware interface techniques are plausible. Our intention is that the Occlusion-Aware Viewer provides a case study for building other occlusion-aware interface techniques. This technique is motivated by occlusion related issues observed in the observation experiment in chapter 3, and investigated more closely in Experiment 4-3 in chapter 4. We built this technique using the occlusion model described in chapter 5 which, for the most part, this worked as intended, but is not infallible – especially when occlusion itself is ambiguous. We conducted an experiment with a simultaneous monitoring task, similar to Experiment 4-3 to evaluate the performance of the Occlusion-Aware Viewer. We found the Viewer can decrease task time up to 23% when the value to be monitored is in an often occluded position; but it also increased time by 24% in one position where occlusion was ambiguous and the model became unstable, which in turn created unstable callout visibility and location. In spite of this, our participants rated our technique as better than no technique. In addition, in the baseline condition without the Occlusion-Aware Viewer, we confirmed and refine the results of Experiment 4-3.

249

We applied image processing to an interaction technique for interface analysis in real time. This was surprisingly robust and enabled our technique to operate without special instrumentation of the underlying GUI layer. However, we also identified spurious issues during our experiment: subtly changing regions can be missed (false-negatives), or the wrong region identified (false-positives). Whether these can be eliminated entirely, or what is acceptable to users, remains to be investigated. The inclusion of more image processing layers, such as simple texture-analysis or trained object recognition, would likely create a better estimate of what is important. Finally, we briefly describe designs and ideas for other occlusion-aware interface techniques: Occlusion-Aware Dragging which, when the user is dragging into an occluded region, a subset of that region in-front of the cursor is displayed in a non-occluded callout; Occlusion-Aware Pop-ups which extends Hancock and Booth’s context-menu work (2004) to locate hierarchical menus, tooltips, and other types of temporary pop-ups in non-occluded areas; and a Hidden Widget which could take advantage of the otherwise occluded area.

250

7 Conclusions

To our knowledge, this dissertation provides the first detailed exploration of the ubiquitous phenomenon of hand occlusion with direct pen input. Our decision to focus on occlusion followed directly from the results of an initial observational study using a realistic scenario and actual software applications. These two aspects – general issues with direct pen input with a GUI and the problem of hand occlusion – fulfill our research objective stated in at the beginning:

Identify issues with direct pen interaction with a conventional GUI, and improve the experience by investigating, modelling, and addressing hand occlusion.

To achieve our results, we utilize a novel collection of quantitative and qualitative study design, multi-faceted logging, qualitative analysis, image-based analysis, task visualization, optimization-based analytical testing, and user interface image processing. Of course, there is more work to be done on the subject of hand occlusion, but our hope is that this dissertation provides motivation, fundamental results, and potential applications for occlusion – as well as a set of techniques to assist future researchers in this area and beyond. In this final chapter, we provide a summary of our work, summarize its limitations and assumptions, discuss opportunities for future research, and make final conclusions.

251

7.1 Summary

The chapters of this dissertation form a progressive research path through an initial exploration of direct pen input, then focus on the important, but previously under-researched phenomenon of occlusion. In chapter 1, we introduce the area of direct pen input, by discussing past research directions and the disappointing performance of these devices in the mass market. We argue that this may be partly because the graphic user interfaces (GUIs) used by the most popular operating systems used on these devices are designed for the mouse. Moreover, although we note that a GUI designed specifically for pen input would improve the experience; this may not be feasible given the large amount of existing software and the prevalence of convertible tablets, where users can switch between using a pen and mouse. In chapter 2 we survey background literature relating to pen manipulation and digital pen input. This includes work sampled from the fields of psychology, physiology, ergonomics, commercial technology, and human computer interaction. We conclude that although direct pen technology has been available for many decades, it is not been widely used for general computer input. This in spite of the human hand’s capability for controlling a writing instrument and research suggesting that pen input may be more natural – possibly even faster than conventional input such as the mouse. However, we note that researchers typically study direct pen input in isolation using low-level tasks. In chapter 3, we report our results from an observational study of direct pen interaction with realistic tasks and common GUI software applications. We capture a rich set of logging data including 3-D motion capture, video taken from the participants’ point-of-view, screen capture video, and pen events such as movement and taps. For qualitative analysis, we use an adapted open coding approach (Strauss & Corbin, 1998) which includes a preliminary step of identifying important events before performing actual coding. Our results reveal overarching problems with direct pen input: poor precision when tapping and dragging; errors caused by hand occlusion; instability and fatigue due to ergonomics and reach; cognitive differences between pen and mouse usage; and frustration due to limited input capabilities. We feel that these issues can be addressed by improving hardware, base interaction, and widget behaviour without sacrificing the consistency of current GUIs and applications. Moreover, previous

252

research has focused on issues other than occlusion, yet our results suggest that occlusion also has a profound effect on the usability of direct pen input. In chapter 4, we explore fundamental aspects of direct pen occlusion in three controlled experiments. In the first experiment (Experiment 4-1) we use a novel combination of video recording, computer vision marker tracking, and image processing techniques to capture images of the hand and arm as they appear from the point-of-view of the user. We call these images occlusion silhouettes. Our analyses of these silhouettes finds that the hand and arm can occlude as much as 47% of a 12 inch display and that the shape of the occluded portion of the display varied across participants according to the style and size of their pen grip. In the second experiment (Experiment 4-2) we examine the effect of occlusion when performing three fundamental GUI interactions: tapping, dragging, and tracing. Our results show that although it is difficult to control for occlusion within a single direct input context, there is reasonable evidence that occlusion has an effect on these fundamental GUI interactions. In the third experiment (Experiment 4-3), we investigate how participants contort their hand posture to minimize occlusion while performing a simultaneous monitoring task. We find that posture contortion reduces performance and discuss how different participants use different posture contortion strategies. Based on these experiments, we propose a set of guidelines for designers and researchers regarding how to avoid the occluded area. One guideline for areas to avoid near the pen is illustrated using a mean occlusion silhouette. This may be reasonable in some situations, but we note it does not capture individual grip characteristics and the image-based format is difficult to communicate and use in a real-time setting. In chapter 5, we design and evaluate a simplified representation of the occluded area: a configurable model of occlusion. It is comprised of a five parameter offset circle and pivoting rectangle geometric model which captures the general shape of the occluded area. Using a simple four-step configuration process, we show how the model can be adapted to conform to different user grips and anatomical differences, and then updated in real time based on only pen position and optionally pen tilt. Using analytical precision-recall tests on occlusion silhouettes, we show that the model provides a sufficient amount of fidelity. A small user study (Experiment 5-1) demonstrated that participants could successfully complete the model configuration steps and identified areas for further refinement. Finally, we provide

253

examples of three ways to use our model: to analyze the interaction of occlusion in controlled experiments; as a simple, quantitative design guideline to minimize occlusion; and as an enabling technology to create real-time occlusion-aware interaction techniques. In chapter 6, we introduce the notion of occlusion-aware interfaces. These are interaction techniques which know what area of the display is currently occluded, and use this knowledge to counteract potential problems with occlusion and/or utilize the hidden area. To track occlusion, we use the configurable model of occlusion developed in chapter 5. As a case study, we present a fully realized design for an Occlusion-Aware Viewer interaction technique which displays otherwise missed previews and status messages in a non-occluded area using a bubble-like callout. This technique is motivated by occlusion- related issues identified in the observational study in chapter 3, and further explored in Experiment 4-3 in chapter 4. To realize this technique at a base interaction level and for it to remain compatible with current GUIs, we using image processing to monitor what regions of the interface are changing, and use this to recognise occluded status messages and document previews. We conducted a controlled experiment (Experiment 6-1) using a simultaneous monitoring task, similar to Experiment 4-3 to evaluate the performance of the Occlusion- Aware Viewer. Our results show that the Viewer can decrease task time up to 23% when the value to be monitored is in an often occluded position; but, in one position where occlusion is ambiguous the technique can also hurt performance. In spite of this, our participants rated our technique as better than no technique. We discuss ways to refine our technique and briefly describe designs and ideas for other occlusion-aware interface techniques.

7.2 Assumptions and Limitations

To make this research tractable, we make a number of assumptions which in turn impose possible limitations. We summarize them here so that our results may be further tested and verified before applying them to other device contexts or specific populations. The observational study in chapter 3 necessarily includes certain bias due to our choice for a test scenario. Our aim was to select common applications and a realistic task, and we took steps to include a diverse set of interactions. Yet, by simulating a realistic task, we were forced to make choices in our chosen scenarios which may impact our results. Our hope is

254

that future researchers may conduct similar studies with different scenarios but similar analysis, which would further increase the validity of our results. As with any examination of human behaviour, the results of our studies and experiments are drawn from a population of participants which were practically available to us. We took steps to include a reasonable diversity of age and gender, but this was not always feasible. For example, in Experiment 4-1, we had to remove an older participant who was an outlier in terms of performance time. We also counterbalance and randomize presentation orders and use statistical tests of significance where appropriate to avoid ad hoc quantitative judgements. In the observational experiment in chapter 3, we did control for levels of Tablet PC and general computing experience. However, in experiments relating to occlusion, there are individual factors of grip style and anatomical size which we did not explicitly control for. If these experiments were conducted with a group of participants with a different anthropomorphic or age bias, different cultural background which influenced pen grip or writing style (e.g., logographic alphabet, right-to-left writing direction), or specialized training (e.g., illustrators, painters, calligraphers) our results for the mean occlusion shapes would be somewhat different. However, based on consistencies in human anatomy and hand capability, we believe our five parameter geometric model can accommodate any population, but this remains untested. Related to this is that we used right-handed participants for primary analysis in all experiments. We did discuss results for a limited number of left-handed participants in Experiment 4-1, Experiment 4-3, Experiment 5-1, and Experiment 6-1. However, we did not test enough left-handed participants to draw any quantitative conclusions. Hancock and Booth (2004) did perform a full quantitative comparison of left-handed and right-handed participants, and found that their results were for the most part mirrored. Our limited left- handed results suggest the same trend, but could be validated with a full handedness study. Note that our occlusion model, and therefore also our Occlusion-Aware Viewer, were designed and implemented to accommodate left-handed and right-handed users. In Experiment 4-1, we discuss limitations due to our head mounted camera apparatus for capturing occlusion silhouettes. We elected not to use a stereo camera, and because the single camera is offset above the participant’s eyes, it shifts the silhouettes downward slightly. We include a detailed description of our error estimate in that section. Related to this

255

are limitations resulting from our image processing techniques for analysis. Video capture lag, log synchronization errors, and fiducial marker tracking error may shift the captured silhouette. The image processing techniques may add or remove pixels from the occluded area during morphological operations to reduce noise. It is unlikely that these limitations significantly distort the shape of the occluded area, but it may affect quantities such as the size of the occluded area. In chapter 5, we discuss assumptions and limitations with our configurable model. We assume the display size similar to that used for our data gathering experiments, we assume the tablet remains in a stable portrait orientation relative to the user, users maintain a near constant grip regardless of task, and the user’s head remains in a near constant location relative to the display. In practice, we do not believe these undermine the integrity of the configurable model. For the applications we envision, some inaccuracy may be acceptable and produce suitable results. However, extreme differences such as a change in orientation would need to be accommodated, and we did observe ambiguity problems in Experiment 6-1 which may be partly due to these limitations.

7.3 Future Research

Based on the work presented in the previous chapters, we believe there are opportunities to perform additional studies, refine our implementations, create additional techniques, and apply our methodology and techniques to other direct input modalities such as touch. At the end of chapter 5, we already discussed specific opportunities to improve the steps for our configurable model, make the model more robust, and adapt it to other input contexts such as touch, and larger or horizontal display form factors. Also, in chapter 6 we identify improvements to the Occlusion-Aware Technique, and describe initial designs for other occlusion-aware interface techniques such as Occlusion-Aware Dragging, Occlusion- Aware Pop-Ups, and a Hidden Widget. Rather than re-iterate those refinements here, we discuss new opportunities which have not already been identified.

Altering Reality as an Experimental Control for Occlusion

In Experiment 4-2 in chapter 2, we attempted to control for occlusion within a single direct input context using a crosshair visual augmentation. Unfortunately, our results found

256

that this was not effective. Perhaps this was because participants found the extra visual feedback distracting, or the mental overhead to use the crosshair cancelled out any benefit. Yet, overall the same experiment provides evidence that occlusion does affect performance. The question is: how can we experimentally remove direct input occlusion, so that we can isolate its effect? To do this, participants would have to “see through their arm” in the non-occlusion condition. One way to do this is using immersive virtual reality or augmented reality to create with- and without-occlusion conditions. In both cases, a conventional tablet could be used for pen input, but the participant would view the tablet (or a rendering of the tablet) using a head-mounted display. In the case of augmented reality, this would also require a head mounted camera. To remove occlusion with augmented reality, targets would be rendered so as to appear in front of the hand and arm, yet in the same plane as the physical tablet display. This would require an accurate object tracking system to synchronize the physical position of the tablet with the participant’s head. To eliminate a confounding effect in the with-occlusion condition, the targets would also be rendered and occlusion simulated. The simulation could be done by also tracking the hand and arm, but perhaps a more straightforward (and ultimately more effective) method would be to render their area as seen from the head mounted camera using simple image processing techniques (similar to the techniques we introduced in chapter 4, but in real time). With virtual reality, the tablet, pen, hand, arm, and targets are all rendered. This has the benefit of eliminating visible alignment problems between the physical and virtual elements in augmented reality. However, it may require tracking the hand, arm, and pen as well as the tablet. Based on the position of these objects, the targets and virtual pen, hand and arm can be rendered to either simulate occlusion or remove it. The actual size and position of the hand and arm could also be normalized across participants, or a more experimentally controlled version of the hand and arm used (such as a rectangular shape). Virtual and augmented reality has been used in Fitts’ Law style studies previously (I. S. MacKenzie & Ware, 1993; Mason, Walji, E. J. Lee, & C. L. MacKenzie, 2001). The current generation of commercially available head mounted displays boast very high resolutions and panoramic fields of view. For example, the piSight (“Sensics”) has a resolution of 4200 × 1800 pixels per eye and a 180° horizontal field of view. However, using a head mounted

257

display is not without problems. For example, in a 3-D Fitts’ Law task comparing different virtual reality display devices, researchers found that participants wearing a head mounted display had lower target selection performance and found wearing the HMD uncomfortable and tiring (C. Lin, Sun, H. Chen, & Cheng, 2009). These issues would need to be considered and tested in detail to ensure that the introduction of a mediated reality does not confound the results.

Pen Specific Widgets for Conventional GUIs

At the end of chapter 3, we also suggested that the behaviour of individual widgets could be tailored for pen input without altering their initial size or appearance to maintain GUI consistency. The changing in widget behaviour could be a subtle refinement of its operation or a transition to a complete paradigm shift. An example of subtle refinement of widget behaviour is Dixon et al.’s (2008) crossing- based dialogue boxes. They found that by relaxing the crossing semantic to blur the line between a point-and-click action and a crossing action, they could improve performance, yet still maintain the spatial layout of current dialogue boxes. Note that their relaxed crossing semantic could be applied to any widget that would otherwise be tapped, such as a button, checkbox, or radio button. However, the measurable performance gain in their experiment was due to the rapid, serial crossing selection of multiple widgets in one motion. An example of another subtle refinement is for widgets which are often tapped multiple times in quick succession. In the observational study in chapter 3, we noticed this type behaviour in the paging region of scrollbars and when using a numeric up-down. We observed that participants were often successful with the first one or two taps, but then began to drift and miss the target. To counter-act this behaviour, we have already experimented with a widget enhancement that progressively “inflates” the target size with each subsequent tap, then “deflates” back to the original size when the tapping stopped. We call this an Inflatable Widget (Figure 7-1).

258

Figure 7-1. Inflatable Widget. When a widget is tapped repeatedly, it “inflates”. Once tapping stops, it “deflates”.

More radical paradigm shifts may be accommodated, as long as the transition from conventional GUI widget to the pen-specific technique can be performed seamlessly. For example, there are several alternatives to hierarchical menus such as the Marking Menu (Kurtenbach & Buxton, 1991a) and its single flick variations (Zhao, Agrawala, & Hinckley, 2006; Zhao & Balakrishnan, 2004). Although they use an interaction style which is quite different from a standard GUI menu, they could be invoked from a current GUI button on a toolbar, or as a context menu. This same strategy could be used to transition to a continuous gesture mode to enter a parameter value (K. B. Evans, Tanner, & Wein, 1981) , or scroll a document (Moscovich & Hughes, 2004; G. M. Smith & schraefel, 2004). When the general visual layout of the widget is integral to it function, such as the expand and collapse tree-view widget, a crossing-based menu design could be used (Apitz & Guimbretière, 2004; Lapizco-Encinas & Rodrıguez, 2003). A related example is a crossing- based scrollbar which maintains the spatial layout of a conventional scrollbar, but radically changes its interaction behaviour (Apitz & Guimbretière, 2004). Altering the behaviour of widgets in a GUI without re-writing the window presentation code is a challenge. A potential design and implementation strategy is to use an interface façade, similar to Stuerzlinger et al. (2006), with global “customizations” based on input device rather than user preference.

Touch and Multi-Touch Input

Recently, there has been a considerable amount of interest in another form of direct input, touch input. Like pen input, using our fingers to control a computer directly on the

259

screen is not new. However, recent multi-touch research (Han, 2005) has received widespread attention, and the unveiling of new commercial devices such as Apple’s iPhone and iPad, and Microsoft’s Surface have rekindled the public’s imagination. These devices have GUIs designed specifically for touch and multi-touch, but occlusion may still be a problem. Selecting small targets with a finger remains problematic: we discussed above how the iPhone GUI contains simple “occlusion-aware” techniques reminiscent of Vogel and Baudisch’s Shift (2007). However, in the case of the small iPhone, even though the entire display could be occluded in some instances, the relatively brief interactions and short distance to an un-occluded home position may reduce occlusion’s effect. There are also general computing devices, such as Hewlett-Packard’s TouchSmart, which enable touch input with a (mostly) standard GUI. Microsoft added touch support to its Windows operating system: Vista introduced basic enhancements to support single touch interaction such as the “Touch Pointer”, and Windows 7 adds native multi-touch support. However, even with these enhancements, it is likely that touch users will experience many of the same problems as pen users. Past research suggests that problems like precision will be even worse (Benko et al., 2006; Sears & Ben Shneiderman, 1991; Vogel & Baudisch, 2007). The area occluded by the hand and arm will likely increase, since it includes fingers as well. We believe that the study design and methodology in chapter 3 could be repurposed to investigate touch input in a GUI context. Our experiments in chapter 4 could also be adapted to investigate touch input after such a study, or independently. Using our image processing steps to analyze occlusion silhouettes for touch interaction would be a logical first step. The experimental tasks would need to be expanded to capture different types of hand postures, especially in the case of multi-touch interaction. At the conclusion of chapter 5, we discussed potential extensions to our geometric model to accommodate the shape of touch input occlusion. This may include the addition of additional shapes to capture the area occluded by the fingers, and the ability to detect user orientation. Researchers exploring suitable models may wish to re-purpose our analytic model testing methodology to compare different potential models. The notion of occlusion- aware interfaces remains unchanged for touch input, and techniques like the Occlusion- Aware Viewer could be re-used directly once a suitable model for touch input is designed. Moreover, even if multi-touch sensing techniques can capture the shape of the hand, arm, and

260

fingers when near the surface, an occlusion model may assist computer vision algorithms by identifying relative user orientation and different users according to their individually configured occlusion models. There is also the possibility of combining touch input with pen input (Brandl, Forlines, Wigdor, Haller, & Shen, 2008; Hinckley, Dixon, Sarin, Guimbretiere, & Balakrishnan, 2009; Leitner et al., 2009). Leaked documents describe a dual screen, combined touch and pen input device from Microsoft called Courier29 (Stone & Vance, 2009). A device which can be controlled with either touch or pen input, is similar to the problem of convertible Tablet PCs which can be controlled with pen or mouse input: any changes to accommodate one form of input needs to remain compatible with the other. The excitement over touch input is reminiscent of the excitement over commercial pen- based computers in the early 1990s (and then again in 2002). If our findings regarding direct pen input have taught us anything, it is that touch and multi-touch are unlikely to be relevant for typical users without a deep understanding of how they perform with conventional GUIs and conventional applications.

Using a Conté Metaphor for More Expressive Pen-like Interaction

Our focus has been on improving pen input at a base interaction level, but we also discussed opportunities for improvements at the hardware level. Fundamental hardware refinements which reduce parallax error, increase input sensitivity, and decrease the bulk of the device should continue to occur with new engineering advancements. With those aside, are there ways to increase the usability of direct pen interaction at a hardware level? One idea is to re-imagine the design and manipulation of the pen itself. We reviewed various ergonomic pen designs in chapter 2 which present one area for improvement. However, we are interested in addressing another problem identified in chapter 3, the issue of limited input. Harnessing additional pen input capabilities such as pressure (Ramos et al., 2004), tilt (Tian et al., 2008), and rotation (Bi et al., 2008) is one avenue for exploration. However, these investigations all maintain the standard physical design of a conventional pen

29 The Courier seems to be a touch-enabled commercial incarnation of Hinckley et al.’s InkSeine pen- based application (2007) and Codex dual-screen prototype device (2009).

261

or pencil – a tool which may not have strong physical affordances to encourage these extra capabilities. Instead, our idea is to create a digital version of an artist’s conté stick. A conté stick is a flat sided crayon which is manipulated to achieve a variety of lines and strokes: the corner is used for thin lines, the end edge for medium strokes, and the long side for broad fill strokes. In addition, artists typically use their finger to blend the strokes, while simultaneously grasping the conté stick. This suggests that this could be a natural fit for use in combination with multi-touch devices which would further increase the expressivity. Digital conté, combined with a multi-touch surface, builds on these existing physical world techniques by also sensing which corner, which edge, or which side is contacting the surface, along with enabling simultaneous finger contact. This would significantly expand the vocabulary of strokes and modes of use. Such a device could be applied to contexts beyond an artist’s drawing program – conté can be a rich mode of input for conventional GUIs: one corner would be used to write text like a pen (Figure 7-2a), another corner would act as cursor control (Figure 7-2b), the short side could enter gestures (Figure 7-2c); the long side could be used to scroll (Figure 7-2d), and if the stick is laid down, it could act akin to a conventional mouse for cursor control with a relative input mapping (Figure 7-2e). With the addition of fingers, the behaviour of a particular mode could be further altered while the conté is still touching the surface: for example, a finger tap near the stick could function as a button click when in mouse mode.

262

copy

(a) (b) (c)

(e) (d)

Figure 7-2. Conté manipulations for GUI interaction. (a) using the corner to write; (b) using another corner for cursor control; (c) using the short side to enter gestures; (d) using the long side to scroll; (e) held similar to a mouse flat on the surface for relative cursor control with nearby finger tap as button click.

Previous work combining devices with a multi-touch surface focuses on accessing storage (A. D. Wilson, 2005) or acting as a tangible physical control (Rekimoto, Ullmer, & Oba, 2001). Researchers combining pen input with touch have focused on bimanual interaction (Brandl et al., 2008). In fact, “dual-mode” touch enabled Tablet PCs cannot work simultaneously with pen and fingers. Leaked demonstration videos for the Microsoft Courier prototype tablet (Stone & Vance, 2009) show cases where both pen and fingers are used to interact with one hand, but not simultaneously. The idea of using pen orientation or contact point to trigger modes has existed in a simple binary form for some time: most current pen tablets sense whether the eraser or the nib is touching the display. A more fully realized example is Rekimoto and Sciammarella’s ToolStone (2000) which enables many modes according to orientation and location of touch on the surface. However, it is an indirect input device which is held in the non-dominant hand and resembles a larger block.

263

7.4 Final Word

In the text above, we devote a great deal of space describing problems with direct pen input. To be clear, we do not believe that using a pen to control a computer is fundamentally flawed. There are many situations, such as drawing or writing, where it is arguably superior to any other form of input. Yet, using a pen to interact with a conventional GUI can feel awkward, inefficient, and error prone. We wished to investigate if this was due to an inherent incompatibility, or if experience could be improved. The problems identified in our observational study are not insurmountable – as we discuss, researchers have offered several possible solutions. However, hand occlusion, perhaps the most idiosyncratic problem with direct pen input, has been largely overlooked. Our subsequent focus on hand occlusion demonstrates how a deeper understanding of this phenomenon can motivate techniques which improve the direct pen input experience with a conventional GUI, and perhaps in other computing contexts as well. This alone constitutes a relevant contribution, but, it may be our contributions to quantitative and qualitative study design, multi-faceted logging, qualitative analysis, image-based analysis, task visualization, optimization-based analytical testing, and user interface image processing that provide the broadest utility to future researchers and practitioners.

264

References

Accot, J., & Zhai, S. (1997). Beyond Fitts' law: models for trajectory-based HCI tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 295-302). Atlanta, Georgia, United States: ACM. Accot, J., & Zhai, S. (1999). Performance evaluation of input devices in trajectory-based tasks: an application of the steering law. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 466-472). Pittsburgh, Pennsylvania, United States: ACM. Accot, J., & Zhai, S. (2002). More than dotting the i's --- foundations for crossing-based interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (pp. 73-80). Minneapolis, Minnesota, USA: ACM. Agarawala, A., & Balakrishnan, R. (2006). Keepin' it real: pushing the desktop metaphor with physics, piles and the pen. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 1283-1292). Montréal, Québec, Canada: ACM. Ahlstroem, D., Alexandrowicz, R., & Hitz, M. (2006). Improving menu interaction: a comparison of standard, force enhanced and jumping menus. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 1067-1076). Montréal, Québec, Canada: ACM. Albinsson, P., & Zhai, S. (2003). High precision touch screen interaction. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 105-112). Ft. Lauderdale, Florida, USA: ACM. doi:10.1145/642611.642631 Aliakseyeu, D., Irani, P., Lucero, A., & Subramanian, S. (2008). Multi-flick: an evaluation of flick-based scrolling techniques for pen interfaces. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems (pp. 1689-1698). Florence, Italy: ACM. Apitz, G., & Guimbretière, F. (2004). CrossY: a crossing-based drawing application. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 3-12). Santa Fe, NM, USA: ACM. ARToolKit. (n.d.). . Retrieved from http://www.hitl.washington.edu/artoolkit/ Audet, S., & Cooperstock, J. (2007). Shadow Removal in Front Projection Environments Using Object Tracking. In Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on (pp. 1-8). Presented at the Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on. doi:10.1109/CVPR.2007.383470 Autodesk. (2007). SketchBook Pro. Retrieved November 29, 2007, from http://www.autodesk.com/sketchbookpro

265

Bae, S., Balakrishnan, R., & Singh, K. (2008). ILoveSketch: as-natural-as-possible sketching system for creating 3d curve models. In Proceedings of the 21st annual ACM symposium on User interface software and technology (pp. 151-160). Monterey, CA, USA: ACM. doi:10.1145/1449715.1449740 Baecker, R. M. (1969). Picture-driven animation. In Proceedings of the May 14-16, 1969, spring joint computer conference (pp. 273-288). Boston, Massachusetts: ACM. doi:10.1145/1476793.1476838 Balakrishnan, R., & MacKenzie, I. S. (1997). Performance differences in the fingers, wrist, and forearm in computer input control. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 303-310). Atlanta, Georgia, United States: ACM. Baudisch, P., & Rosenholtz, R. (2003). Halo: a technique for visualizing off-screen objects. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 481-488). Ft. Lauderdale, Florida, USA: ACM. doi:10.1145/642611.642695 Benko, H., Wilson, A. D., & Baudisch, P. (2006). Precise selection techniques for multi- touch screens. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 1263-1272). Montréal, Québec, Canada: ACM. doi:10.1145/1124772.1124963 Bezerianos, A., Dragicevic, P., & Balakrishnan, R. (2006). Mnemonic rendering: an image- based approach for exposing hidden changes in dynamic displays. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 159- 168). Montreux, Switzerland: ACM. doi:10.1145/1166253.1166279 Bi, X., Moscovich, T., Ramos, G., Balakrishnan, R., & Hinckley, K. (2008). An exploration of pen rolling for pen-based interaction. In Proceedings of the 21st annual ACM symposium on User interface software and technology (pp. 191-200). Monterey, CA, USA: ACM. Bieber, G., Abd Al Rahman, E., & Urban, B. (2007). Screen Coverage: A Pen-Interaction Problem for PDA's and Touch Screen Computers. In Wireless and Mobile Communications, 2007. (p. 87). Presented at the ICWMC '07. doi:10.1109/ICWMC.2007.78 Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press. Brandl, P., Forlines, C., Wigdor, D., Haller, M., & Shen, C. (2008). Combining and measuring the benefits of bimanual pen and direct-touch interaction on horizontal interfaces. In Proceedings of the working conference on Advanced visual interfaces (pp. 154-161). Napoli, Italy: ACM. doi:10.1145/1385569.1385595 Brandl, P., Leitner, J., Seifried, T., Haller, M., Doray, B., & To, P. (2009). Occlusion-aware menu design for digital tabletops. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems (pp. 3223-3228). Boston, MA, USA: ACM. doi:10.1145/1520340.1520461 Bricklin, D. (2002, November 22). About Tablet Computing Old and New. www.bricklin.com. Retrieved November 28, 2007, from http://www.bricklin.com/tabletcomputing.htm

266

Briggs, R. O., Dennis, A. R., Beck, B. S., & Nunamaker, J. F. (1993). Whither the pen-based interface. J. Manage. Inf. Syst, 9(3), 71-90. Buxton, W. A. S. (1995). Chunking and phrasing and the design of human-computer dialogues. In Human-computer interaction: toward the year 2000 (pp. 494-499). Morgan Kaufmann Publishers Inc. Buxton, W. A. S., Fiume, E., Hill, R., Lee, A., & Woo, C. (1983). Continuous Hand-Gesture Driven Input. In Proceedings of Graphics Interface '83, 9th Conference of the Canadian Man-Computer Communications Society (pp. 191-195). Presented at the Graphics Interface, Edmonton, AB. Buxton, W. A. S., Sniderman, R., Reeves, W., Patel, S., & Baecker, R. (1979). The Evolution of the SSSP Score Editing Tools. Computer Music Journal, 3(4), 14-25. Cao, X., & Zhai, S. (2007). Modeling human performance of pen stroke gestures. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 1495-1504). San Jose, California, USA: ACM. Card, S. K., Moran, T. P., & Newell, A. (1986). The Psychology of Human-Computer Interaction (1st ed.). CRC. Casiez, G., Vogel, D., Balakrishnan, R., & Cockburn, A. (2008). The Impact of Control- Display Gain on User Performance in Pointing Tasks. Human-Computer Interaction, 23(3), 215-250. Casiez, G., Vogel, D., Pan, Q., & Chaillou, C. (2007). RubberEdge: reducing clutching by combining position and rate control with elastic feedback. In Proceedings of the 20th annual ACM symposium on User interface software and technology (pp. 129-138). Newport, Rhode Island, USA: ACM. Charness, N., Holley, P., Feddon, J., & Jastrzembski, T. (2004). Light Pen Use and Practice Minimize Age and Hand Performance Differences in Pointing Tasks. Human Factors: The Journal of the Human Factors and Ergonomics Society, 46(3), 373-384. doi:10.1518/hfes.46.3.373.50396 Chatty, S., & Lecoanet, P. (1996). Pen computing for air traffic control. In Proceedings of the SIGCHI conference on Human factors in computing systems: common ground (pp. 87- 94). Vancouver, British Columbia, Canada: ACM. Chau, T. (2006). A Novel Instrument for Quantifying Grip Activity During Handwriting. Archives of Physical Medicine and Rehabilitation, 87(11), 1542-1547. Chen, A. (2004, May 24). Tablet PCs Pass Inspection. eWeek.com. Retrieved from http://www.eweek.com Chris Baber. (2006). Cognitive aspects of tool use. Applied Ergonomics, 37(1), 3 - 15. Comparison of Postures from Pen and Mouse Use – An Ergonomic Study. (1998). . Global Ergonomic Technologies. Retrieved from http://www.wacom-europe.com/int/use- it/ergonomics/index.asp?lang=en

267

Cotting, D., & Gross, M. (2006). Interactive environment-aware display bubbles. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 245-254). Montreux, Switzerland: ACM. Daniel Leithinger. (2007, October). Improving Menu Interaction for Cluttered Tabletop Setups with User-Drawn Path Menus. text, . Davis, M. R., & Ellis, T. O. (1964). The RAND tablet: a man-machine graphical communication device. In Proceedings of the October 27-29, 1964, fall joint computer conference, part I (pp. 325-331). San Francisco, California: ACM. doi:10.1145/1464052.1464080 Debbas, C. G. (1995, June 20). United States Patent: D359508 - Ergonomic pen. Dimond, T. L. (1958). Devices for reading handwritten characters. In Papers and discussions presented at the December 9-13, 1957, eastern joint computer conference: Computers with deadlines to meet (pp. 232-237). Washington, D.C.: ACM. doi:10.1145/1457720.1457765 Dixon, M., Guimbretière, F., & Chen, N. (2008). Maximizing Efficiency in Crossing-Based Dialog Boxes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Florence, Italy: ACM. Dougherty, E. R. (1992). An Introduction to Morphological Image Processing. Bellingham, Wash., USA. Dragicevic, P. (2004). Combining crossing-based and paper-based interaction paradigms for dragging and dropping between overlapping windows. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 193-196). Santa Fe, NM, USA: ACM. Echtler, F., Huber, M., & Klinker, G. (2008). Shadow tracking on multi-touch tables. In Proceedings of the working conference on Advanced visual interfaces (pp. 388-391). Napoli, Italy: ACM. doi:10.1145/1385569.1385640 Elliott, J. M. 1., & Connolly, K. J. (1984). A classification of manipulative hand movements. Developmental Medicine & Child Neurology. Vol 26(3). Enstrom, E. A. (1962). The Relative Efficiency of the Various Approaches to Writing with the Left Hand. The Journal of Educational Research, 55(10), 573-577. Evans, K. B., Tanner, P. P., & Wein, M. (1981). Tablet-based valuators that provide one, two, or three degrees of freedom. In Proceedings of the 8th annual conference on Computer graphics and interactive techniques (pp. 91-97). Dallas, Texas, United States: ACM. Fischer, S. R. (2001). A History of Writing. Reaktion Books. Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47(6), 381-391. Fitzmaurice, G., Khan, A., Pieké, R., Buxton, W. A. S., & Kurtenbach, G. (2003). Tracking menus. In Proceedings of the 16th annual ACM symposium on User interface software and technology (pp. 71-79). Vancouver, Canada: ACM.

268

Fitzmaurice, G. W., Balakrishnan, R., Kurtenbach, G., & Buxton, W. A. S. (1999). An exploration into supporting artwork orientation in the user interface. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 167-174). Pittsburgh, Pennsylvania, United States: ACM. Forlines, C., & Balakrishnan, R. (2008). Evaluating tactile feedback and direct vs. indirect stylus input in pointing and crossing selection tasks. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems (pp. 1563-1572). Florence, Italy: ACM. Forlines, C., Vogel, D., & Balakrishnan, R. (2006). HybridPointing: fluid switching between absolute and relative pointing with a direct input device. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 211-220). Montreux, Switzerland: ACM. Forsberg, A., Dieterich, M., & Zeleznik, R. (1998). The music notepad. In Proceedings of the 11th annual ACM symposium on User interface software and technology (pp. 203-210). San Francisco, California, United States: ACM. Gajos, K., & Weld, D. S. (2004). SUPPLE: automatically generating user interfaces. In Proceedings of the 9th international conference on Intelligent user interfaces (pp. 93- 100). Funchal, Madeira, Portugal: ACM. Gallenson, L. (1967). A graphic tablet display console for use under time-sharing. In Proceedings of the November 14-16, 1967, fall joint computer conference (pp. 689- 695). Anaheim, California: ACM. doi:10.1145/1465611.1465703 Gorbunov, A. E. (1995, February 21). United States Patent: 5391010 - Writing device. Greer, T., & Lockman, J. J. (1998). Using writing instruments: invariances in young children and adults. Child Development, 69(4), 888(15). Gross, M. D., & Do, E. Y. (1996). Ambiguous intentions: a paper-like interface for creative design. In Proceedings of the 9th annual ACM symposium on User interface software and technology (pp. 183-192). Seattle, Washington, United States: ACM. Grossman, T., Hinckley, K., Baudisch, P., Agrawala, M., & Balakrishnan, R. (2006). Hover widgets: using the tracking state to extend the capabilities of pen-operated devices. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 861-870). Montréal, Québec, Canada: ACM. Guiard, Y., Beaudouin-Lafon, M., & Mottet, D. (1999). Navigation as multiscale pointing: extending Fitts' model to very high precision tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 450-457). Pittsburgh, Pennsylvania, United States: ACM. Guimbretière, F., & Winograd, T. (2000). FlowMenu: combining command, text, and data entry. In Proceedings of the 13th annual ACM symposium on User interface software and technology (pp. 213-216). San Diego, California, United States: ACM. Gurley, B. M., & Woodward, C. E. (1959). Light-Pen Links Computer to Operator. Electronics, 85-87.

269

Haider, E., Luczak, H., & Rohmert, W. (1982). Ergonomics investigations of work-places in a police command-control centre equipped with TV displays. Applied ergonomics, 13(3), 163-70. Han, J. Y. (2005). Low-cost multi-touch sensing through frustrated total internal reflection. In Proceedings of the 18th annual ACM symposium on User interface software and technology (pp. 115-118). Seattle, WA, USA: ACM. doi:10.1145/1095034.1095054 Hancock, M. S., & Booth, K. S. (2004). Improving menu placement strategies for pen input. In Proceedings of Graphics Interface 2004 (pp. 221-230). London, Ontario, Canada: Canadian Human-Computer Communications Society. Harris, J. (2005, October 6). Saddle Up to the MiniBar. An Office User Interface Blog. Retrieved October 20, 2008, from http://blogs.msdn.com/jensenh/archive/2005/10/06/477801.aspx Harrison, C., & Dey, A. K. (2008). Lean and zoom: proximity-aware user interface and content magnification. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems (pp. 507-510). Florence, Italy: ACM. doi:10.1145/1357054.1357135 Hilliges, O., Izadi, S., Wilson, A. D., Hodges, S., Garcia-Mendoza, A., & Butz, A. (2009). Interactions in the air: adding further depth to interactive tabletops. In Proceedings of the 22nd annual ACM symposium on User interface software and technology (pp. 139- 148). Victoria, BC, Canada: ACM. doi:10.1145/1622176.1622203 Hinckley, K., Baudisch, P., Ramos, G., & Guimbretière, F. (2005). Design and analysis of delimiters for selection-action pen gesture phrases in scriboli. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 451-460). Portland, Oregon, USA: ACM. Hinckley, K., Cutrell, E., Bathiche, S., & Muss, T. (2002). Quantitative analysis of scrolling techniques. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (pp. 65-72). Minneapolis, Minnesota, USA: ACM. Hinckley, K., Dixon, M., Sarin, R., Guimbretiere, F., & Balakrishnan, R. (2009). Codex: a dual screen tablet computer. In Proceedings of the 27th international conference on Human factors in computing systems (pp. 1933-1942). Boston, MA, USA: ACM. doi:10.1145/1518701.1518996 Hinckley, K., Guimbretière, F., Agrawala, M., Apitz, G., & Chen, N. (2006). Phrasing techniques for multi-stroke selection gestures. In Proceedings of Graphics Interface 2006 (pp. 147-154). Quebec, Canada: Canadian Information Processing Society. Hinckley, K., Guimbretière, F., Baudisch, P., Sarin, R., Agrawala, M., & Cutrell, E. (2006). The springboard: multiple modes in one spring-loaded control. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 181-190). Montréal, Québec, Canada: ACM.

270

Hinckley, K., Zhao, S., Sarin, R., Baudisch, P., Cutrell, E., Shilman, M., & Tan, D. (2007). InkSeine:In Situ search for active note taking. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 251-260). San Jose, California, USA: ACM. Hourcade, J. P., & Berkel, T. R. (2006). Tap or touch?: pen-based selection accuracy for the young and old. In CHI '06 extended abstracts on Human factors in computing systems (pp. 881-886). Montréal, Québec, Canada: ACM. Hurst, J., Mahoney, M. S., Gilmore, J. T., Roberts, L. G., & Forrest, R. (1989). Retrospectives II: the early years in computer graphics at MIT, Lincoln Lab, andd Harvard. In ACM SIGGRAPH 89 Panel Proceedings (pp. 39-73). Boston, Massachusetts, United States: ACM. doi:10.1145/77276.77280 Igarashi, T., & Hinckley, K. (2000). Speed-dependent automatic zooming for browsing large documents. In Proceedings of the 13th annual ACM symposium on User interface software and technology (pp. 139-148). San Diego, California, United States: ACM. Inkpen, K., Dearman, D., Argue, R., Comeau, M., Fu, C., Kolli, S., Moses, J., et al. (2006). Left-Handed Scrolling for Pen-Based Devices. International Journal of Human- Computer Interaction, 21(1), 91-108. Ishak, E. W., & Feiner, S. K. (2004). Interacting with hidden content using content-aware free-space transparency. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 189-192). Santa Fe, NM, USA: ACM. doi:10.1145/1029632.1029666 Jastrzembski, T., Charness, N., Holley, P., & Feddon, J. (2005). Input devices for Web browsing: age and hand effects. Universal Access in the Information Society, 4(1), 39- 45. Jones, L. (1998). Manual Dexterity. In K. J. Connolly (Ed.), The Psychobiology of the Hand, Clinics in developmental medicine (p. 276). London: Mac Keith Press. Jones, L. A., & Lederman, S. J. (2006). Human Hand Function (1st ed.). Oxford University Press, USA. Jordan, B., & Henderson, A. (1995). Interaction Analysis: Foundations and Practice. The Journal of the Learning Sciences, 4(1), 39-103. Kabbash, P., MacKenzie, I. S., & Buxton, W. (1993). Human performance using computer input devices in the preferred and non-preferred hands. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems (pp. 474-481). Amsterdam, The Netherlands: ACM. doi:10.1145/169059.169414 Kao, H. S., Smith, K. U., & Knutson, R. (1969). An experimental cybernetic analysis of handwriting and penpoint design. Ergonomics, 12(3), 453-458. Kao, H. S., Van Galen, G. P., & Hoosain, R. (1986). Graphonomics: contemporary research in handwriting. Elsevier.

271

Karat, C., Halverson, C., Horn, D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 568-575). Pittsburgh, Pennsylvania, United States: ACM. doi:10.1145/302979.303160 Kotani, K., & Horii, K. (2003). An analysis of muscular load and performance in using a pen-tablet system. Journal of physiological anthropology and applied human science, 22(2), 89-95. Kristensson, P., & Zhai, S. (2004). SHARK2: a large vocabulary shorthand writing system for pen-based computers. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 43-52). Santa Fe, NM, USA: ACM. doi:10.1145/1029632.1029640 Kroemer, K. H. E., & Grandjean, E. (1997). Fitting The Task To The Human, Fifth Edition: A Textbook Of Occupational Ergonomics (5th ed.). CRC. Kurtenbach, G., & Buxton, W. A. S. (1991a). Issues in combining marking and direct manipulation techniques. In Proceedings of the 4th annual ACM symposium on User interface software and technology (pp. 137-144). Hilton Head, South Carolina, United States: ACM. Kurtenbach, G., & Buxton, W. A. S. (1991b). GEdit: a test bed for editing by contiguous gestures. SIGCHI , 23(2), 22-26. Landsmeer, J. M. F. (1962). Power Grip and Precision Handling. Annals of Rheumatic Diseases, 21(2), 164-170. Lank, E., & Saund, E. (2005). Sloppy selection: Providing an accurate interpretation of imprecise selection gestures. Computers & Graphics, 29(4), 490-500. Lapizco-Encinas, G., & Rodrıguez, A. (2003). CrossEd: Novel Interaction for Pen-Based Systems. University of Maryland. Lee, J. C., Dietz, P. H., Leigh, D., Yerazunis, W. S., & Hudson, S. E. (2004). Haptic pen: a tactile feedback stylus for touch screens. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 291-294). Santa Fe, NM, USA: ACM. Leitner, J., Powell, J., Brandl, P., Seifried, T., Haller, M., Dorray, B., & To, P. (2009). Flux: a tilting multi-touch and pen based surface. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems (pp. 3211- 3216). Boston, MA, USA: ACM. doi:10.1145/1520340.1520459 Li, Y., Hinckley, K., Guan, Z., & Landay, J. A. (2005). Experimental analysis of mode switching techniques in pen-based user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 461-470). Portland, Oregon, USA: ACM. Licklider, J. C. R., & Clark, W. E. (1962). On-line man-computer communication. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 113-128). San Francisco, California: ACM. doi:10.1145/1460833.1460847

272

Lin, C., Sun, T., Chen, H., & Cheng, P. (2009). Evaluation of Visually-Controlled Task Performance in Three Dimension Virtual Reality Environment. In Virtual and Mixed Reality (pp. 465-471). Lin, J., Newman, M. W., Hong, J. I., & Landay, J. A. (2000). DENIM: finding a tighter fit between tools and practice for Web site design. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 510-517). The Hague, The Netherlands: ACM. Long, C. A. J., Landay, J. A., Rowe, L. A., & Michiels, J. (2000). Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 360-367). The Hague, The Netherlands: ACM. MacKenzie, C. L., & Iberall, T. (1994). The Grasping Hand (Advances in Psychology) (1st ed.). North Holland. MacKenzie, I. S. (1992). Fitts' Law as a Research and Design Tool in Human-Computer Interaction. Human-Computer Interaction, 7(1), 91-139. doi:10.1207/s15327051hci0701_3 MacKenzie, I. S., & Buxton, W. (1992). Extending Fitts' law to two-dimensional tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 219-226). Monterey, California, United States: ACM. doi:10.1145/142750.142794 MacKenzie, I. S., Kauppinen, T., & Silfverberg, M. (2001). Accuracy measures for evaluating computer pointing devices. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 9-16). Seattle, Washington, United States: ACM. MacKenzie, I. S., Sellen, A., & Buxton, W. A. S. (1991). A comparison of input devices in element pointing and dragging tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology (pp. 161-166). New Orleans, Louisiana, United States: ACM. MacKenzie, I. S., & Ware, C. (1993). Lag as a determinant of human performance in interactive systems. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems (pp. 488-493). Amsterdam, The Netherlands: ACM. Mangold International. (n.d.). . Retrieved June 1, 2009, from http://www.mangold- international.com Mason, A. H., Walji, M. A., Lee, E. J., & MacKenzie, C. L. (2001). Reaching movements to augmented and graphic objects in virtual environments. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 426-433). Seattle, Washington, United States: ACM. doi:10.1145/365024.365308 Matias, E., MacKenzie, I. S., & Buxton, W. (1996). One-handed touch typing on a QWERTY keyboard. Human-Computer Interaction, 11(1), 1-27. Meyer, A. (1995). Pen computing: a technology overview and a vision. SIGCHI Bull, 27(3), 46-90.

273

Microsoft. (2009, October 22). Windows 7 Product Guide. Retrieved March 22, 2010, from http://www.microsoft.com Microsoft. (n.d.). Hooks. Microsoft Windows Developer Centre. Retrieved from http://msdn.microsoft.com/en-us/library/ms632589(VS.85).aspx Microsoft. (n.d.). Desktop Window Manager. Microsoft Windows Developer Centre. Retrieved from http://msdn.microsoft.com/en-us/library/aa969540(VS.85).aspx Mizobuchi, S., & Yasumura, M. (2004). Tapping vs. circling selections on pen-based devices: evidence for different performance-shaping factors. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 607-614). Vienna, Austria: ACM. Moran, T. P., Chiu, P., & Melle, W. V. (1997). Pen-based interaction techniques for organizing material on an electronic whiteboard. In Proceedings of the 10th annual ACM symposium on User interface software and technology (pp. 45-54). Banff, Alberta, Canada: ACM. Moscovich, T., & Hughes, J. F. (2004). Navigating documents with the virtual scroll ring. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 57-60). Santa Fe, NM, USA: ACM. Myers, B. A. (1998). A brief history of human-computer interaction technology. interactions, 5(2), 44-54. doi:10.1145/274430.274436 Myers, B. A., Bhatnagar, R., Nichols, J., Peck, C. H., Kong, D., Miller, R., & Long, A. C. (2002). Interacting at a distance: measuring the performance of laser pointers and other devices. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (pp. 33-40). Minneapolis, Minnesota, USA: ACM. doi:10.1145/503376.503383 Myers, B. A., Peck, C. H., Nichols, J., Kong, D., & Miller, R. (2001). Interacting at a Distance Using Semantic Snarfing. In Proceedings of the 3rd international conference on Ubiquitous Computing (pp. 305-314). Atlanta, Georgia, USA: Springer-Verlag. Retrieved from http://portal.acm.org/citation.cfm?id=741332 Napier, J. R. (1956). The prehensile movements of the human hand. The Journal of Bone and Joint Surgery. British Volume, 38-B(4), 902-913. Napier, J. R. (1993). Hands (Rev Sub.). Princeton University Press. Petroski, H. (1992). The Pencil: A History of Design and Circumstance. Knopf. Pheasant, S., & Hastlegrave, C. (2006). Bodyspace: Anthropometry, Ergonomics and the Design of the Work (3rd ed.). CRC. Phillips, J., Triggs, T., & Meehan, J. (2005). Forward/up directional incompatibilities during cursor placement within graphical user interfaces. Ergonomics, 48(6), 722. Potter, R. L., Weldon, L. J., & Shneiderman, B. (1988). Improving the accuracy of touch screens: an experimental evaluation of three strategies. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 27-32). Washington, D.C., United States: ACM. doi:10.1145/57167.57171

274

Poupyrev, I., Okabe, M., & Maruyama, S. (2004). Haptic feedback for pen computing: directions and strategies. In CHI '04 extended abstracts on Human factors in computing systems (pp. 1309-1312). Vienna, Austria: ACM. Ramos, G., & Balakrishnan, R. (2003). Fluid interaction techniques for the control and annotation of digital video. In Proceedings of the 16th annual ACM symposium on User interface software and technology (pp. 105-114). Vancouver, Canada: ACM. Ramos, G., & Balakrishnan, R. (2005). Zliding: fluid zooming and sliding for high precision parameter manipulation. In Proceedings of the 18th annual ACM symposium on User interface software and technology (pp. 143-152). Seattle, WA, USA: ACM. Ramos, G., & Balakrishnan, R. (2007). Pressure marks. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 1375-1384). San Jose, California, USA: ACM. Ramos, G., Boulos, M., & Balakrishnan, R. (2004). Pressure widgets. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 487-494). Vienna, Austria: ACM. Ramos, G., Cockburn, A., Balakrishnan, R., & Beaudouin-Lafon, M. (2007). Pointing lenses: facilitating stylus input through visual-and motor-space magnification. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 757-766). San Jose, California, USA: ACM. Ramos, G., Robertson, G., Czerwinski, M., Tan, D., Baudisch, P., Hinckley, K., & Agrawala, M. (2006). Tumble! Splat! helping users access and manipulate occluded content in 2D drawings. In Proceedings of the working conference on Advanced visual interfaces (pp. 428-435). Venezia, Italy: ACM. Ranjan, A., Birnholtz, J. P., & Balakrishnan, R. (2006). An exploratory analysis of partner action and camera control in a video-mediated collaborative task. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 403- 412). Banff, Alberta, Canada: ACM. Rekimoto, J., & Sciammarella, E. (2000). ToolStone: effective use of the physical manipulation vocabularies of input devices. In Proceedings of the 13th annual ACM symposium on User interface software and technology (pp. 109-117). San Diego, California, United States: ACM. doi:10.1145/354401.354421 Rekimoto, J., Ullmer, B., & Oba, H. (2001). DataTiles: a modular platform for mixed physical and graphical interactions. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 269-276). Seattle, Washington, United States: ACM. doi:10.1145/365024.365115 Ren, Yin, Zhao, & Li. (2007). The Adaptive Hybrid Cursor: A Pressure-Based Target Selection Technique for Pen-Based User Interfaces. Human-Computer Interaction – INTERACT 2007. Ren, X., & Moriya, S. (2000). Improving selection performance on pen-based systems: a study of pen-based interaction for selection tasks. ACM Transactions on Computer- Human Interaction, 7(3), 384-416.

275

Roche, C., & Ronsse, R. (2003, October 28). United States Patent: 6637962 - Ergonomic writing instrument. Sassoon, R. (1993). The Art and Science of Handwriting. Intellect Books. Schilit, B. N., Golovchinsky, G., & Price, M. N. (1998). Beyond paper: supporting active reading with free form digital ink annotations. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 249-256). Los Angeles, California, United States: ACM Press/Addison-Wesley Publishing Co. Scholtz, J., Young, J., Drury, J., & Yanco, H. (2004). Evaluation of human-robot interaction awareness in search and rescue. In Robotics and Automation, 2004. (Vol. 3, pp. 2327- 2332 Vol.3). Presented at the Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on. doi:10.1109/ROBOT.2004.1307409 Schomaker, L. (1998). From handwriting analysis to pen-computer applications. Electronics & Communication Engineering Journal, 10(3), 93-102. Scott, S. D. (2005). Territoriality in Collaborative Tabletop Workspaces. (Ph.D.). University of Calgary. Scott, S. D., Carpendale, S. M., & Inkpen, K. M. (2004). Territoriality in collaborative tabletop workspaces. In Proceedings of the 2004 ACM conference on Computer supported cooperative work (pp. 294-303). Chicago, Illinois, USA: ACM. Sears, A., & Shneiderman, B. (1991). High Precision Touchscreens: Design Strategies and Comparisons with a Mouse. International Journal of Man-Machine Studies, 34(4), 593- 613. Selin, A. (2003). Pencil Grip: A descriptive Model and Four Empirical Studies. Akademi University Press. Sellen, A., Kurtenbach, G., & Buxton, W. A. S. (1992). The Prevention of Mode Errors Through Sensory Feedback. Human-Computer Interaction, 7(2), 141-164. Sensics - Lightweight Panoramic Head-Mounted Displays. (2009). . Retrieved October 27, 2009, from http://sensics.com/ Shao, J., Fiering, L., & Kort, T. (2007). Dataquest Insight: Tablet PCs Are Slowly Gaining Momentum (No. G00147423). Gartner. Retrieved from http://www.gartner.com Sharmin, S., Evreinov, G., & Raisamo, R. (2005). Non-visual feedback cues for pen computing. In Eurohaptics Conference, 2005 and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2005. World Haptics 2005. (pp. 625- 628). Shen, C., Hancock, M. S., Forlines, C., & Vernier, F. D. (2005). CoR2Ds. In CHI '05 extended abstracts on Human factors in computing systems (pp. 1781-1784). Portland, OR, USA: ACM. Shen, C., Ryall, K., Forlines, C., Esenther, A., Vernier, F. D., Everitt, K., Wu, M., et al. (2006). Informing the Design of Direct-Touch Tabletops. IEEE Comput. Graph. Appl., 26(5), 36-46.

276

Shilman, M., Tan, D. S., & Simard, P. (2006). CueTIP: a mixed-initiative interface for correcting handwriting errors. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 323-332). Montreux, Switzerland: ACM. Shoemaker, G., Tang, A., & Booth, K. S. (2007). Shadow reaching: a new perspective on interaction for large displays. In Proceedings of the 20th annual ACM symposium on User interface software and technology (pp. 53-56). Newport, Rhode Island, USA: ACM. doi:10.1145/1294211.1294221 Smith, G. M., & schraefel, M. C. (2004). The radial scroll tool: scrolling support for stylus- or touch-based document navigation. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 53-56). Santa Fe, NM, USA: ACM. Smith, G., schraefel, M. C., & Baudisch, P. (2005). Curve dial: eyes-free parameter entry for GUIs. In CHI '05 extended abstracts on Human factors in computing systems (pp. 1146-1147). Portland, OR, USA: ACM. Sommerich, C. M., Ward, R., Sikdar, K., Payne, J., & Herman, L. (2007). A survey of high school students with ubiquitous access to tablet PCs. Ergonomics, 50(5), 706-727. Soukoreff, R. W., & MacKenzie, I. S. (1995). Theoretical upper and lower bounds on typing speed using a stylus and a soft keyboard. BEHAVIOUR & INFORMATION TECHNOLOGY, 14(1995), 370-379. Soukoreff, R. W., & MacKenzie, I. S. (2004). Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts' law research in HCI. Int. J. Hum.-Comput. Stud., 61(6), 751-789. Spooner, J. G., & Foley, M. J. (2005, August 30). Tablet PCs' Future Uncertain. eWeek.com. Retrieved from http://www.eweek.com Steinman, B. A., & Garzia, R. P. (2000). Foundations of binocular vision. McGraw-Hill Professional. Stone, B., & Vance, A. (2009, October 4). Just a Touch Away, the Elusive Tablet PC. The New York Times. Strauss, A. L., & Corbin, J. M. (1998). Basics of Qualitative Research: Techniques and Procedures for Developing. Sage Publications Inc. Sun Microsystems. (n.d.). Java Advanced Imaging API. Retrieved from http://java.sun.com Sutherland, I. E. (1963). Sketchpad: a man-machine graphical communication system. In Proceedings of the May 21-23, 1963, spring joint computer conference (pp. 329-346). Detroit, Michigan: ACM. doi:10.1145/1461551.1461591 Tan, D. S., & Pausch, R. (2002). Pre-emptive shadows: eliminating the blinding light from projectors. In CHI '02 extended abstracts on Human factors in computing systems (pp. 682-683). Minneapolis, Minnesota, USA: ACM. doi:10.1145/506443.506544

277

Tian, F., Ao, X., Wang, H., Setlur, V., & Dai, G. (2007). The tilt cursor: enhancing stimulus- response compatibility by providing 3d orientation cue of pen. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 303-306). San Jose, California, USA: ACM. Tian, F., Xu, L., Wang, H., Zhang, X., Liu, Y., Setlur, V., & Dai, G. (2008). Tilt menu: using the 3D orientation information of pen devices to extend the selection capability of pen- based user interfaces. In Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems (pp. 1371-1380). Florence, Italy: ACM. Truong, K. N., & Abowd, G. D. (1999). StuPad: integrating student notes with class lectures. In CHI '99 extended abstracts on Human factors in computing systems (pp. 208-209). Pittsburgh, Pennsylvania: ACM. Truong, K. N., & Abowd, G. D. (2004). INCA: A Software Infrastructure to Facilitate the Construction and Evolution of Ubiquitous Capture & Access Applications. In Pervasive Computing (pp. 140-157). Turner, S. A., Pérez-Quiñones, M. A., & Edwards, S. H. (2007). Effect of interface style in peer review comments for UML designs. J. Comput. Small Coll, 22(3), 214-220. Twining, P., Evans, D., Cook, D., Ralston, J., Selwood, I., Jones, A., Underwood, J., et al. (2005). Tablet PCs in schools: Case study report. Open University. Retrieved from http://publications.becta.org.uk/display.cfm?resID=25914 Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London ; Toronto. Venkataraman, P. (2002). Applied Optimization with MATLAB Programming. New York. Vicon Motion Systems. (n.d.). . Retrieved from http://www.vicon.com/ Visser, B., De Loose, M. P., De Graaff, M. P., & Dieen, J. H. V. (2004). Effects of precision demands and mental pressure on muscle activation and hand forces in computer mouse tasks. Ergonomics, 47(2), 202-217. Vogel, D., & Balakrishnan, R. (2005). Distant freehand pointing and clicking on very large, high resolution displays. In Proceedings of the 18th annual ACM symposium on User interface software and technology (pp. 33-42). Seattle, WA, USA: ACM. Vogel, D., & Baudisch, P. (2007). Shift: a technique for operating pen-based interfaces using touch. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 657-666). San Jose, California, USA: ACM. Vogel, D., Cudmore, M., Casiez, G., Balakrishnan, R., & Keliher, L. (2009). Hand Occlusion with Tablet-sized Direct Pen Input. In Proceedings of the SIGCHI conference on Human factors in computing systems. Wacom Components : Input Technology : EMR® Technology. (2009). . Retrieved November 4, 2009, from http://www.wacom- components.com/english/technology/emr.html Wagner, D., & Schmalstieg, D. (2007). ARToolKitPlus for Pose Tracking on Mobile Devices. In Proceedings of 12th Computer Vision Winter Workshop (CVWW'07).

278

Ward, J., & Phillips, M. (1987). Digitizer Technology: Performance Characteristics and the Effects on the User Interface. Computer Graphics and Applications, IEEE, 7(4), 31-44. Whitefield, A. (1986). Human factors aspects of pointing as an input technique in interactive computer systems. Applied ergonomics, 17(2), 97-104. Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., & Shen, C. (2007). Lucid touch: a see- through mobile device. In Proceedings of the 20th annual ACM symposium on User interface software and technology (pp. 269-278). Newport, Rhode Island, USA: ACM. Wigdor, D., Leigh, D., Forlines, C., Shipman, S., Barnwell, J., Balakrishnan, R., & Shen, C. (2006). Under the table interaction. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 259-268). Montreux, Switzerland: ACM. Wigdor, D., Williams, S., Cronin, M., Levy, R., White, K., Mazeev, M., & Benko, H. (2009). Ripples: utilizing per-contact visualizations to improve user interaction with touch displays. In Proceedings of the 22nd annual ACM symposium on User interface software and technology (pp. 3-12). Victoria, BC, Canada: ACM. doi:10.1145/1622176.1622180 Wiklund, M. E., Dumas, J. S., & Hoffman, L. R. (1987). Optimizing a portable terminal keyboard for combined one-handed and two-handed use. In Proceedings of the Human Factors Society 31st Annual Meeting: Rising to New Heights with Technology. (Vol. 1, pp. 585-9). Santa Monica: Human Factors Soc. Williams, L. (1978). Casting curved shadows on curved surfaces. SIGGRAPH Comput. Graph., 12(3), 270-274. doi:10.1145/965139.807402 Wilson, A. D. (2005). PlayAnywhere: a compact interactive tabletop projection-vision system. In Proceedings of the 18th annual ACM symposium on User interface software and technology (pp. 83-92). Seattle, WA, USA: ACM. doi:10.1145/1095034.1095047 Wilson, F. R. (1999). The Hand: How Its Use Shapes the Brain, Language, and Human Culture (1st ed.). Vintage. Worden, A., Walker, N., Bharat, K., & Hudson, S. (1997). Making computers easier for older adults to use: area cursors and sticky icons. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 266-271). Atlanta, Georgia, United States: ACM. doi:10.1145/258549.258724 Wu, F., & Luo, S. (2006a). Design and evaluation approach for increasing stability and performance of touch pens in screen handwriting tasks. Applied Ergonomics [Kidlington], 37(3). Wu, F., & Luo, S. (2006b). Performance study on touch-pens size in three screen tasks. Applied Ergonomics, 37(2), 149-158. Zeleznik, R., & Miller, T. (2006). Fluid inking: augmenting the medium of free-form inking with gestures. In Proceedings of Graphics Interface 2006 (pp. 155-162). Quebec, Canada: Canadian Information Processing Society.

279

Zeleznik, R. C., Bragdon, A., Liu, C., & Forsberg, A. (2008). Lineogrammer: creating diagrams by drawing. In Proceedings of the 21st annual ACM symposium on User interface software and technology (pp. 161-170). Monterey, CA, USA: ACM. doi:10.1145/1449715.1449741 Zhai, S., Hunter, M., & Smith, B. A. (2000). The metropolis keyboard - an exploration of quantitative techniques for virtual keyboard design. In Proceedings of the 13th annual ACM symposium on User interface software and technology (pp. 119-128). San Diego, California, United States: ACM. doi:10.1145/354401.354424 Zhai, S., Hunter, M., & Smith, B. A. (2002). Performance Optimization of Virtual Keyboards. Human-Computer Interaction, 17(2&3), 229 - 269. doi:10.1207/S15327051HCI172&3_4 Zhai, S., & Kristensson, P. (2003). Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 97-104). Ft. Lauderdale, Florida, USA: ACM. Zhai, S., Smith, B. A., & Selker, T. (1997). Dual stream input for pointing and scrolling. In CHI '97 extended abstracts on Human factors in computing systems: looking to the future (pp. 305-306). Atlanta, Georgia: ACM. Zhao, S., Agrawala, M., & Hinckley, K. (2006). Zone and polygon menus: using relative position to increase the breadth of multi-stroke marking menus. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 1077-1086). Montréal, Québec, Canada: ACM. Zhao, S., & Balakrishnan, R. (2004). Simple vs. compound mark hierarchical marking menus. In Proceedings of the 17th annual ACM symposium on User interface software and technology (pp. 33-42). Santa Fe, NM, USA: ACM. Zuberec, S. (2000). The Interaction Design of Microsoft Windows CE. In E. Bergman (Ed.), Information appliances and beyond (p. 385).

280

Appendices A. Observational Study Scenario Script

TIME-MARK

Training

In this experiment, you’ll be creating a presentation about animals of Canada. You can complete this task without having to enter any text.

Begin by taking the “Tablet PC Pen Training” tutorial by double clicking on the icon in the middle of the desktop. Since you won’t be entering any text, stop the tutorial when you get to the part about hand writing recognition.

To get you comfortable with PowerPoint, we’ll do a short training and orientation session.

- Open the “PowerPoint Training” file on your desktop.

- This is a new version of PowerPoint. Note that there are no menus like the old version. Instead, there are a series of tabbed toolbars across the top. This is a called a Ribbon. I’ll be referring to a particular ribbon tab during the study, so if I say “pick the “Design” ribbon tab, I mean click on the word “Design” in near the top of the program.

- Dragging shapes in PowerPoint is similar to the Tablet PC tutorial you just took. To drag the blue square, just point anywhere on the square and drag it.

- Dragging text is more difficult. You need to drag text by the edge, since pointing at the words themselves selects the words and doesn’t drag the text. Try moving the “This is Text” text by pointing at the edge and dragging.

- Resizing shapes is done my dragging the small handles at the corners or in the middle if the edges. Dragging a corner resizes the height and width equally. Dragging the handles in the middle of an edge just stretches the length or width separately.

- The green circle handle sticking out of the shape is for rotating the shape. By dragging it in a circular pattern, you can rotate the shape.

- Resizing and rotating the text is exactly the same, except that the text will snap the height back to the text.

- To select more than one thing at a time you need to drag a “selection rectangle” to completely enclose everything you want to select. Note that the text often has an invisible rectangle larger that the actual text, so you need to surround that as well as the text you can

281

see. If you don’t completely enclose something, then it won’t be included in the selection. Try selecting the red square and the “Select me” text, but make a big rectangle. Don’t worry if you cover part of the green square, it won’t be selected unless it is entirely inside the selection rectangle.

- Close the training presentation without saving it.

TIME-MARK

Map

Now, you’ll open an in progress “Animals of Canada” PowerPoint presentation which you’ll be completing.

1. Tap on the “Start Menu” button at the lower left and select “Documents”. Then navigate to the folder “Presentations/In Progress” and open “Animals of Canada”.

2. To help you see the full presentation area, you will need to make the left side panel narrower. Drag it as far to the left as possible. Now advance to slide 2 showing the animal’s habitat.

3. Your first task is to correct the switched labels for , spotted owl, and polar bear by moving the text boxes. To move a text box, select its edge and drag it by moving the edge.

4. Correct the position of the polar bear and owl thumbnails: the polar bear’s habitat is in the north and the owl is in the south.

5. Make the size, orientation and aspect ratio of all the thumbnails approximately the same as the beaver. It’s fine to do judge this “by eye.”

6. Change all labels to be in the same typeface and style as the beaver (20pt Arial Bold). You may have to select the “Home” tab to see the font controls.

7. Now, we’ll make all animal names perfectly centred below the thumbnail. Select the label and the thumbnail together and pick “Align/Align Center” from the “Arrange” button in the “Drawing” toolbar. Hint: make a large selection area to get the label and picture together.

8. This is a good time to save your presentation. Press the save document icon located at the top left of the application.

TIME-MARK

Owl

Now you will finish the slides for each animal. 9. Start by advancing to slide 3, the Beaver.

282

This one’s already done and gives you an idea of what the completed slides should look like. Each animal will have a description on the left, a picture on the right with a green border, and a reference at the bottom. You’ll be told where to get the necessary text and the pictures for each animal. 10. When ready, advance to slide 5, the Spotted Owl (the Lynx slide is also completed). Your first task is to copy a one sentence description from a webpage to use in this slide. 11. Begin by opening Internet Explorer by selecting the Internet Explorer icon on the bottom quick launch toolbar. Use the “Favourites Centre” in IE to open the bookmark for the Wikipedia page on the Lynx. 12. Copy the first sentence of the Wikipedia article into the clipboard. Then return to the presentation and paste it in the “Description” textbox where the bullet is.

13. Next you will copy a citation reference for the slide. Copy the first reference displayed in the “references” section on the Wikipedia page (don’t include the bullet) and paste it into the text box outlined in green at the bottom of the slide.

Next you will insert a picture of the “Spotted Owl” into the slide.

14. From the “Insert” toolbar, select “Picture”. Navigate to “Source Images/Animals”. Select the picture and press OK.

The picture you inserted needs to be re-sized, positioned and styled so that it looks nice in the presentation.

15. First, adjust the size. From the “Format” toolbar. Adjust the “width” to be as close to 4 inches as you can get it using the buttons in the top right. Make sure you’re adjusting width and not height.

Now we’ll add a nice green border.

16. Open the “Picture Styles” dialogue (small button at right of “Picture Styles” label. Select “Line Color” in the list at the left. Select “Solid Line” and open the “Color” button to select “More Colors…”

The colour you want is a dark forest green with approximately Hue: 92, Saturation: 200 and Luminance: 50. Plus or minus 10 is fine.

17. Open the “Custom” tab, switch to the “HSL” Color Model. Then adjust the position of the pointers in the colour space and luminance slider until you get something close. Press “OK” to accept your colour.

We want to make the border thicker and give it a nice style.

18. Select “Line Style” in the list at the left, then pick 6pt for “Width” and “Thick Thin” for “Compound Type”. Press “Close” to return to your slide.

283

19. Move the picture to the right side of the slide and resize the “Description” text to be on the left side.

20. This is a good time to save your presentation.

TIME-MARK

Moose

Now you have completed the slide for the Spotted Owl. Continue with the two remaining incomplete animals, following the same steps with the following exceptions and notes:

Copy the first sentence and the citation from the Wikipedia page.

Insert the picture from the file.

NOTE: The last two animals aren’t listed in the favourites, you’ll have to find the animal name or species group in the “List of Animal Names” Wikipedia page.

NOTE: Instead of using the green border, pick one of the ready-made “Picture Styles” in the “Format” ribbon tab. Note that there are three rows of styles and you can preview the style when you hover.

21. Advance to slide, get description, citation from Wikipedia, then insert the picture and resize and position it.

22. Instead of using the green border, pick one of the ready-made “Picture Styles” in the “Format” ribbon tab. Note that there are three rows of styles and you can preview the style when you hover.

23. Advance to Polar Bear slide, get description, citation from Wikipedia, then insert the picture and resize and position it.

24. Pick the same “Picture Styles” in the “Format” ribbon tab as you did for the Moose.

You may have noticed that the picture quality of the moose and polar bear are not the greatest. You’ll adjust the brightness and contrast to improve how they look.

25. From the “Format” ribbon tab, open the “Brightness” menu and select “Picture Correction Options” at the bottom of the menu. Adjust the brightness and contrast sliders to improve the picture quality.

26. This is a good time to save your in-progress presentation.

TIME-MARK

Graph

284

27. On the last slide we’ll create a graph of population trends for these animals over the last two centuries. Advance to the last slide.

28. Begin by opening Microsoft Office Excel using the start menu at the bottom left, picking “All Programs”, and open “Microsoft Office/ Microsoft Office Excel 2007”

29. Tap the circular icon at the top left and select “Open”. Find the “Animals of Canada” spreadsheet located in “Documents/Spreadsheets/Source Data”

This document contains a table showing population counts for the last two centuries. Now we’ll create a chart.

30. Select the “Insert” tab and select “Line” for the chart type and use the subtype with description “Line.” Move the chart to the right of the data.

Now we’ll add in the data for all the animals.

31. Press “Select Data” and then select the cells for all five animals at once (B4 to F25), including the animal names at the top of each column, but without including the averages at the bottom. You should see the animal population lines in the chart preview.

Now add in the years along the horizontal chart axis. Click “Edit” under “Horizontal (Category) Axis Labels” and elect the cells with the years (cells A5 to A25) but not the “year” column heading. Press “Ok”. Press “Ok” again to return to your Excel document. You should see the years along the X axis in the chart preview.

We want the legend to be at the bottom of the chart.

32. Pick “Layout 4” under “Chart Layouts” in the “Design” toolbar (the first one in the second row).

33. Finally, we don’t need Sheet2, or Sheet3, so “right click” on each and select “Delete”.

Now we’ll paste the completed chart into the presentation.

34. Select the chart in Excel (you need to select near the edges) and paste it into slide 8. Resize the chart to be 4.3 by 7 inches.

We want it to be centred on the slide.

35. Open the “Align” menu and make sure “Align to Slide” is checked. Then pick “Align/Align Centre” and then “Align/Align Middle”. Uncheck “Align to Slide” when done.

36. Return to Excel and close it, but say “no” when asked to save.

37. This is also a good time to save your in-progress presentation.

285

TIME-MARK

Drawing

Now, we’ll add some ink annotations to our presentation. First, we want to highlight the effects of the fur trade on the beaver population shown in the population chart.

38. Select the “Review” toolbar and press “Start Inking”. Select “Highlighter”.

39. Use the highlighter to emphasize the beaver line in the chart.

Now we’ll add a written annotation.

40. Select the “Ballpoint Pen” and pick bright red for the line colour and 1 ½ pt for the line weight.

41. Circle the two low parts on the beaver population line. Make a written note on the right side saying “effects of fur trade” and draw two arrows pointing at the circles.

42. As a final step, we’ll configure presentation settings.

a. Select “Set Up Show” in the “Slide Show” ribbon tab.

b. Under “Show Type” pick “Browsed by an individual” and uncheck “Show scrollbar”

c. Under “Show Options” select all three options.

d. Under “Show slides”, set it to show slides 2 to 7.

e. Under “Advance slides” pick “Manually”

f. Under “Performance” check “User hardware” and pick “Slowest/Highest Fidelity” for the “Slide show resolution”. NOTE: alternate between “Fastest/Lowest Fidelity” with participants since the default will be saved

g. Press “Ok”

43. Save your final presentation.

We also want to save an HTML version of the presentation.

44. Select the round office icon at top left, selecting “Save As/Other Formats.” In the save dialogue, pick “Web Page” from the “Save as type:” dropdown menu.

45. We need to set some options for the web page. At the bottom of the file dialogue, select “Tools/Web Options…”

286

In the Web Options dialogue, select the “Browsers” tab and pick “Microsoft Internet Explorer® 6 or later” in the “People who view this web page will be using:” drop-down. Uncheck “Allow PNG as a graphics format” and “Rely on VML for displaying graphics in browsers”. Press “Ok” to exit the “Web Options” dialogue.

Press “Save” to save the HTML version of the presentation to the current folder.

Close the presentation.

You’re almost done. We just want to check the html file you created. We’ll do this by dragging the HTML file into your browser window.

46. First position and size the browser and file manager windows above each other.

47. Drag the html file into the browser window. You’ll need to say “Ok”, then click on the yellow bar and select “Allow blocked content”, and then select “yes” in the confirmation dialogue. Go through the slides quickly to make sure they look ok.

Close Internet Explorer and the file window.

Congratulations, you’re done.

287