Direct Pen Input and Hand Occlusion
Direct Pen Input and Hand Occlusion
by
Daniel Vogel
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Computer Science University of Toronto
© Copyright 2010 Daniel Vogel
Direct Pen Input and Hand Occlusion
Daniel Vogel Doctor of Philosophy Department of Computer Science, University of Toronto 2010 Abstract
We investigate, model, and design interaction techniques for hand occlusion with direct pen input. Our focus on occlusion follows from a qualitative and quantitative study of direct pen usability with a conventional graphical user interface (GUI). This study reveals overarching problems relating to poor precision, ergonomics, cognitive differences, limited input, and problems resulting from occlusion. To investigate occlusion more closely, we conduct three formal experiments to examine its area and shape, its affect on performance, and compensatory postures. We find that the shape of the occluded area varies across participants with some common characteristics. Our results provide evidence that occlusion affects target selection performance: especially for continuous tasks or when the goal is initially hidden. We observe how users contort their wrist posture during a simultaneous monitoring task, and show this can increase task time. Based on these investigations, we develop a five parameter geometric model to represent the shape of the occluded area and extend this to a user configurable, real-time version. To evaluate our model, we introduce a novel analytic testing methodology using optimization for geometric fitting and precision- recall statistics for comparison; as well as conducting a user study. To address problems with occlusion, we introduce the notion of occlusion-aware interfaces: techniques which can use
ii
our configurable model to track currently occluded regions and then counteract potential
problems and/or utilize the occluded area. As a case study, we present the Occlusion-Aware
Viewer: an interaction technique which displays otherwise missed previews and status messages in a non-occluded area. Within this thesis we also present a number of methodology contributions for quantitative and qualitative study design, multi-faceted study logging using synchronized video, qualitative analysis, image-based analysis, task visualization, optimization-based analytical testing, and user interface image processing.
iii
Acknowledgements
I feel incredibly lucky to have Ravin Balakrishnan as an advisor, whose guidance and encouragement made the completion of this dissertation possible. I was also fortunate to have an outstanding and well-rounded committee: Ron Baecker, Khai Truong, Karan Singh, and my external examiner, Brad Myers. There are many students, faculty members, and administrative staff at the University of Toronto who contributed directly and indirectly. John
Hancock deserves specific mention for technical assistance; and Géry Casiez, whom I met when he was a Postdoctoral Researcher at the University of Toronto, developed an initial version of the real-time occlusion model.
There are several individuals at Mount Allison University who made it much easier for me to complete this work after I moved to Sackville, New Brunswick: Liam Keliher, Anna
Sheridan-Jonah, Jeff Ollerhead, Laurie Ricker, and Ron Beattie in particular. Matthew
Cudmore’s assistance with programming and facilitation of experiments, in addition to video analysis, was especially valuable.
Of course, I owe a debt of gratitude to Jennifer, family, and friends who encouraged and supported me throughout.
iv
Copyright Notices and Disclaimers
Sections of this document have appeared in publications or are forthcoming (at the time of writing). In all cases, permission has been granted by the publisher for these works to appear here. Below, the publisher’s copyright notice and/or disclaimer is given, with thesis chapter(s) and corresponding publication(s) noted.
Taylor and Francis Copyright © 2010 Taylor & Francis. Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf. This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
portions of chapters 2 and 3
Vogel, D., and Balakrishnan, R. (forthcoming). Direct Pen Interaction with a Conventional Graphical User Interface. Human-Computer Interaction. Taylor and Francis.
Association for Computing Machinery
Copyright © 2009, 2010 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Send written requests for republication to ACM Publications, Copyright & Permissions at the address above or fax +1 (212) 869-0481 or email [email protected]. Copyright © 2009, 2010 ACM Inc. Included here by permission.
v
portions of chapters 4 and 5
Vogel, D., Cudmore, M., Casiez, G., Balakrishnan, R., and Keliher, L. (2009). Hand occlusion with tablet-sized direct pen input. In Proceedings of the 27th international Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 557-566. portions of chapters 5 and 6
Vogel, D., and, Balakrishnan, R. Occlusion-Aware Interfaces. (2010). In Proceedings of the 28th international Conference on Human Factors in Computing Systems (Atlanta, GA, USA, April 10 - 15, 2010). CHI '10. ACM, New York, NY, 263-272.
vi
Table of Contents
1 Introduction ...... 1 1.1 Research Objectives and Overview ...... 5 1.2 Contributions...... 8 1.3 Dissertation Outline ...... 12
2 Background Literature ...... 13 2.1 The Hand and the Pen ...... 14 2.2 The Pen as a Computer Input Device ...... 27 2.3 Pen Input Performance and Capabilities ...... 34 2.4 Pen Interaction Paradigms ...... 46 2.5 Summary ...... 50
3 Observational Study of Pen Input ...... 51 3.1 Related Work ...... 52 3.2 Study ...... 55 3.3 Analysis...... 66 3.4 Results ...... 75 3.5 Interactions of Interest ...... 95 3.6 Discussion ...... 109 3.7 Summary ...... 116
4 Investigating Occlusion ...... 119 4.1 Related Work ...... 120 4.2 Experiment 4-1: Area and Shape ...... 126 4.3 Experiment 4-2: Performance ...... 144 4.4 Experiment 4-3: Influence on Hand and Arm Posture...... 172 4.5 Design Implications ...... 183 4.6 Summary ...... 184
5 Modelling Occlusion ...... 187 5.1 Related Work ...... 189 5.2 Geometric Model for Occlusion Shape ...... 191 5.3 Space of Fitted Parameters and Mean Model ...... 199 5.4 User Configurable Model ...... 203 5.5 Experiment 5-1: Occlusion Model Evaluation ...... 209 5.6 Future Directions ...... 215 5.7 Summary ...... 217
6 Occlusion-Aware Interfaces ...... 221
vii
6.1 Related Work ...... 222 6.2 Occlusion-Aware Interfaces ...... 226 6.3 Occlusion-Aware Viewer...... 227 6.4 Experiment 6-1: Occlusion-Aware Viewer Evaluation ...... 233 6.5 Deployment Issues and Future Directions ...... 245 6.6 Other Occlusion-Aware Techniques ...... 246 6.7 Summary ...... 249
7 Conclusions ...... 251 7.1 Summary ...... 252 7.2 Assumptions and Limitations ...... 254 7.3 Future Research ...... 256 7.4 Final Word ...... 264
viii
List of Tables
Table 2-1. Anthropomorphic measurements for hand and arm...... 18
Table 2-2. Comparisons between pen input with mouse...... 35
Table 2-3. Comparisons between direct pen input and indirect stylus input...... 39
Table 3-1. Ideal amount of widget and action usage in our study...... 65
Table 3-2. Ideal number of expected interactions by interaction type...... 66
Table 3-3. Wrong click errors...... 80
Table 3-4. Unintended action errors...... 81
Table 3-5. Repeated invocation, hesitation, and inefficient operation errors...... 83
Table 4-1. Linear regression values for Time from Index of Difficulty (ID)...... 171
Table 5-1. Overview of model tests...... 194
Table 5-2. Summary statistics of fitted geometric model parameters...... 201
ix
List of Figures
Figure 1-1. Illustration of occlusion...... 4
Figure 1-2. Research path showing research problems, activities, and main results. ....7
Figure 2-1. Sensiomotor continuum of human hand function...... 14
Figure 2-2. Bones and joints of the hand...... 16
Figure 2-3. Anthropomorphic measurements...... 18
Figure 2-4. Selected hand and wrist postures...... 19
Figure 2-5. Principle range of motion for hand and wrist...... 19
Figure 2-6. Horizontal arc of grasp...... 20
Figure 2-7. Jones’s force-displacement framework for manual dexterity...... 21
Figure 2-8. Dynamic tripod pen grip illustrated by Merctor, 1540...... 23
Figure 2-9. Examples of different adult pen grips reported in the literature...... 24
Figure 2-10. Commercial ergonomic pen designs...... 27
Figure 2-11. Illustration of pen point placements in Kao et al.’s experiment...... 27
Figure 2-12. Direct input and indirect input ...... 29
Figure 2-13. Sutherland’s Sketchpad with light pen input...... 30
Figure 2-14. RAND Tablet ...... 31
Figure 2-15. Electromagnetic pen position sensor...... 32
Figure 2-16. Hardware and visual parallax...... 34
Figure 2-17. Forearm and hand postures observed by Wu & Luo...... 44
Figure 2-18. Extreme pen grips observed by Wu & Luo...... 45
Figure 2-19. Wu and Luo’s ergonomic Tablet PC pen...... 45
Figure 2-20. Pen sizes evaluated by Wu and Luo ...... 46
Figure 3-1. Experimental setup and apparatus...... 59
Figure 3-2. Study screen captures taken from initial task sequence...... 61
Figure 3-3. Screen captures of selected scenario tasks...... 62
Figure 3-4. Illustration of selected widgets...... 64
x
Figure 3-5. Analysis software tool...... 67
Figure 3-6. Motion capture player...... 68
Figure 3-7. Coding decision when participant makes a noticeable pause...... 72
Figure 3-8. Coding decision tree when participant attempts an action...... 73
Figure 3-9. Mean time for all constrained tasks per group...... 76
Figure 3-10. Mean non-interaction errors per group...... 77
Figure 3-11. Mean interaction errors per group...... 78
Figure 3-12. Mean interaction errors by error type...... 78
Figure 3-13. Pen participant heat map plots for taps/click and errors...... 79
Figure 3-14. Estimated interaction error rate for widget and action contexts...... 86
Figure 3-15. Average pen or mouse movement distance per minute...... 88
Figure 3-16. Proportion of movements greater than 0.25 mm per frame...... 89
Figure 3-17. Mean Euclidian distance between down and up click ...... 90
Figure 3-18. Obtrusive tooltip hover visualizations, “hover junk” ...... 91
Figure 3-19. Tablet or laptop movement per minute for all constrained tasks...... 92
Figure 3-20. Heat map plot of forearm and pen/mouse rest positions...... 93
Figure 3-21. Examples of occlusion contortion: the “hook posture.” ...... 94
Figure 3-22. Example of occluded status message when pressing save button...... 96
Figure 3-23. Button trajectory example...... 97
Figure 3-24. Scrollbar parts...... 98
Figure 3-25. Example of scrollbar occlusion causing “ramp” movement...... 100
Figure 3-26. Pen tip trajectories during scrollbar interaction...... 100
Figure 3-27. Proportion of left-to-right and right-to-left text selection directions. ...102
Figure 3-28. Pen tip (and selected wrist trajectories) during text selection...... 104
Figure 3-29. Handwriting examples...... 106
Figure 3-30. Tracing examples...... 107
Figure 3-31. Occlusion resulting from MiniBar floating palette...... 108
Figure 4-1. Brandl et al.’s occlusion area experiment...... 121
xi
Figure 4-2. Bieber, Rahman, and Urban’s analytic study...... 122
Figure 4-3. Experimental tasks used by Forlines and Balakrishnan...... 123
Figure 4-4. Hancock and Booth’s results for direct and indirect input task time...... 125
Figure 4-5. Inkpen et al.’s left-handed users and right-aligned scrollbars...... 126
Figure 4-6. Anthropomorphic measurements...... 127
Figure 4-7. Estimated error introduced by monocular versus stereo view...... 129
Figure 4-8. Experiment apparatus...... 129
Figure 4-9. Head mounted camera...... 130
Figure 4-10. Experiment 4-1 experimental stimuli...... 131
Figure 4-11. Estimated rectification error from head-mounted camera...... 133
Figure 4-12. Image processing steps...... 134
Figure 4-13. Mean occlusion ratio...... 135
Figure 4-14. Participant size (S) vs. max occlusion ratio...... 136
Figure 4-15. Occlusion shape silhouettes for each participant...... 138
Figure 4-16. Mean occlusion silhouettes...... 139
Figure 4-17. Pixels most likely to be occluded...... 141
Figure 4-18. Video stills of observed grip styles...... 143
Figure 4-19. Left-handed participant results...... 144
Figure 4-20. Crosshair to minimize effect of occlusion...... 147
Figure 4-21. Tapping, Dragging, and Tracing tasks...... 148
Figure 4-22. Target directions and distances...... 149
Figure 4-23. Error rate by Task and Visibility...... 152
Figure 4-24. Selection time by Task and Visibility...... 153
Figure 4-25. Mean Selection Time by Distance, Direction, Task, and Visibility...... 154
Figure 4-26. Error rate by Distance, Direction for Tracing and Hidden Visibility. ..155
Figure 4-27. Overshoot errors with Tracing Task and Hidden Visibility...... 156
Figure 4-28. Illustration of other performance measures...... 157
Figure 4-29. Movement Direction Change (MDC) by Task, and Visibility...... 158
xii
Figure 4-30. Movement Direction Change (MDC) by Distance, Direction, ...... 159
Figure 4-31. Movement Error (ME) by Task, and Visibility...... 160
Figure 4-32. Movement Error (ME) by Distance, Direction, Task, and Visibility. ...161
Figure 4-33. Out-of-Range (OOR) by Task, and Visibility...... 162
Figure 4-34. Out-of-Range (OOR) by Distance, Direction, ...... 163
Figure 4-35. Comparison with Hancock and Booth’s results...... 164
Figure 4-36. Comparison of target position and mean occlusion silhouette...... 165
Figure 4-37. Motion paths by Direction for 61.2 mm Distance...... 167
Figure 4-38. Motion paths by Direction for 102.0 mm Distance...... 168
Figure 4-39. Relationship of Time to Index of Difficulty (ID)...... 171
Figure 4-40. Simultaneous monitoring task...... 173
Figure 4-41. Simultaneous monitoring task positioning...... 175
Figure 4-42. Completion time by target box Position...... 178
Figure 4-43. Pen azimuth angle by target box Position...... 178
Figure 4-44. Mean occlusion silhouette by Position...... 179
Figure 4-45. Comparison of target box position and mean occlusion silhouette...... 180
Figure 4-46. Different occlusion contortion strategies ...... 182
Figure 4-47. Design guidelines for avoiding occluded areas...... 184
Figure 5-1. Approximating the actual occluded area with a model...... 188
Figure 5-2. Previous implicit and explicit occlusion models...... 191
Figure 5-3. Different geometric models of occlusion...... 192
Figure 5-4. Offset circle and pivoting rectangle model parameters...... 193
Figure 5-5. Illustration of precision and recall...... 195
Figure 5-6. Illustration of objective function area calculation...... 197
Figure 5-7. Precision-recall plots for bounding box and fitted geometry...... 199
Figure 5-8. Mean configuration for the geometric model...... 202
Figure 5-9. Precision-recall plot for mean model...... 203
Figure 5-10. Occlusion model user configuration steps...... 206
xiii
Figure 5-11. Vertically sliding elbow to set Θ...... 207
Figure 5-12. Precision-recall plots for analytical configurable model...... 209
Figure 5-13. Experiment 5-1 experimental stimuli...... 210
Figure 5-14. Experiment precision-recall for mean model and fitted geometry...... 213
Figure 5-15. Precision-recall plots for experimentally configured model...... 214
Figure 5-16. Summary of precision recall performance for tested models...... 217
Figure 5-17. Using occlusion model in formal experiment analysis...... 219
Figure 5-18. Using occlusion model to design an interaction technique...... 220
Figure 6-1. Occlusion-Aware Viewer technique...... 222
Figure 6-2. Vogel and Baudisch’s Shift touch screen selection technique...... 224
Figure 6-3. Simple occlusion-awareness in Apple’s iPhone...... 224
Figure 6-4. Brandl et al.’s occlusion-aware pie menu...... 225
Figure 6-5. Occlusion-Aware Viewer demonstration...... 228
Figure 6-6. Detecting importance and callout positioning...... 232
Figure 6-7. Simultaneous monitoring task...... 235
Figure 6-8. Completion times of Technique by Angle...... 240
Figure 6-9. Participant ratings...... 241
Figure 6-10. Sample task completion times and occlusion silhouettes...... 242
Figure 6-11. Ambiguity problems when feedback box is at Angle 45...... 243
Figure 6-12. Left-handed sample task completion times and occlusion silhouettes. .245
Figure 6-13. Occlusion-Aware Dragging...... 247
Figure 6-14. Occlusion-Aware Pop-Ups...... 248
Figure 6-15. Hidden Widget...... 249
Figure 7-1. Inflatable Widget...... 259
Figure 7-2. Conté manipulations for GUI interaction...... 263
xiv
List of Video Figures
Video 3-1. Time-lapse demonstration of study scenario...... 61
Video 3-2. Obtrusive tooltip hover visualization examples...... 91
Video 3-3. Occlusion contortion examples: the “hook posture.” ...... 94
Video 3-4. Button trajectory example...... 98
Video 3-5. Scrollbar trajectory examples...... 101
Video 3-6. Text selection trajectory examples...... 105
Video 4-1. Area and shape experiment demonstration...... 131
Video 4-2. Performance experiment demonstration...... 149
Video 4-3. Simultaneous monitoring demonstration...... 174
Video 5-1. Geometric model fitting demonstration...... 198
Video 5-2. Model configuration demonstration...... 206
Video 6-1. Occlusion-Aware Viewer demonstration...... 228
Video 6-2. Occlusion-Aware Viewer experiment demonstration...... 238
Video 6-3. Occlusion-Aware Dragging technique demonstration...... 247
Additional Information for Video Figures
All digital videos are encoded using the MPEG-4 H.264 codec and saved in a “.mp4” file container. The filename of each video figure is given in text below the thumbnails, with a hyperlink for viewing:
video filename, click to view
xv
List of Appendices
A. Observational Study Scenario Script ...... 281
xvi
1 Introduction
Given our familiarity with using pens and pencils, one might expect that operating a computer using a pen would be more natural and efficient. The second generation of commercial pen input devices, such as the Tablet PC, are reasonably priced and readily available. Yet, they have failed to live up to analysts’ prediction for marketplace adoption (Spooner & Foley, 2005; Stone & Vance, 2009). When the first wave of commercial pen computing devices were released in the early 1990s, marketers claimed that non-typists such as business executives would find pen input faster than using a keyboard (Bricklin, 2002). Today, this claim seems more tenuous, since users are more likely to have keyboard experience. Perhaps the problem with pen computing is entirely due to entering text without a physical keyboard? For example, the average computer user types 20 to 40 words-per-minute (wpm) (C. Karat, Halverson, Horn, & J. Karat, 1999) and 60 wpm or more if proficient (Matias, I. S. MacKenzie, & Buxton, 1996).
Contrast this to tapping a pen on a QWERTY soft keyboard (a keyboard rendered on the display) which has a predicted maximum speed of 30 wpm (Soukoreff & I. S. MacKenzie, 1995). Or, if natural handwriting recognition is used, text entry speeds can be no better than actual writing speeds, between 12 – 23 wpm for printing (Card, Moran, & Newell, 1986) or up to 30 wpm for cursive (Wiklund, Dumas, & Hoffman, 1987). When compared to typing speeds, this suggests a performance deficit for pen-based text entry. However, there exist alternative pen-based text entry techniques which could perform as fast, or faster, than most typists. For example, users can attain speeds of 41 wpm with
Zhai, Hunter, and Smith’s ATOMIC optimized soft keyboard layout (2002), and Kristensson
1
2 and Zhai’s SHARK technique (2004) has produced speeds as high as 80 wpm. These speeds are encouraging and variants of these techniques can be installed in many devices – but, like keyboard typing, they require training and practice to master. Regardless, even if there is some performance loss for pen-based text entry, is this the only problem? Another, perhaps less obvious, problem is that commercial pen-based devices use a graphical user interface (GUI). A GUI is built on the premise of pointing and clicking for target selection and direct manipulation. Note that the verb click is used instead of tap or touch – the typical assumption is that a mouse is used for input. The style of GUI used in the Tablet PC was designed for indirect input using a mouse, where there is a spatial separation between the input space and output display (Meyer, 1995). Thus issues specific to direct pen input, where the input and output spaces are coincident (Whitefield, 1986), have not been considered. The research community has responded with pen-specific interaction paradigms such as crossing (Accot & Zhai, 2002; Apitz & Guimbretière, 2004), gestures (Aliakseyeu, Irani, Lucero, & Subramanian, 2008; Grossman, Hinckley, Baudisch, Agrawala, & Balakrishnan, 2006; Kurtenbach & Buxton, 1991a, 1991b), pen tilting (Tian et al., 2008), pen rolling (Bi, Moscovich, Ramos, Balakrishnan, & Hinckley, 2008), and pressure (Ramos, Boulos, & Balakrishnan, 2004); pen-tailored GUI widgets (Bi et al., 2008; Fitzmaurice, Khan, Pieké, Buxton, & Kurtenbach, 2003; Guimbretière & Winograd, 2000; Hinckley et al., 2006; Ramos & Balakrishnan, 2005); and pen-specific applications (Agarawala & Balakrishnan, 2006; Bae, Balakrishnan, & Singh, 2008; Hinckley et al., 2007; Ramos & Balakrishnan, 2003; Schilit, Golovchinsky, & Price, 1998; Zeleznik, Bragdon, C. Liu, & Forsberg, 2008). While these all demonstrate ways to improve pen usability, retrofitting the vast number of existing software applications to accommodate these new paradigms is arguably not practically feasible. Moreover, the popularity of convertible Tablet PCs, which operate in laptop or slate mode, suggests that users may prefer to switch between using a mouse and keyboard for certain working contexts (such as when seated at a desk), and using a pen for other situations (such as when standing, or seated without a flat surface on a bus or park bench) (Twining et al., 2005). Thus any pen-specific GUI refinements or pen-tailored applications should also be compatible with mouse and keyboard input.
2
So, if we accept that conventional GUIs are unlikely to change in the near future, are there still ways to improve pen input? The first step towards such a goal is determining what the major issues are with pen interaction and a conventional GUI. A large body of work has already investigated low-level aspects of pen performance mostly using controlled experiments. While this type of investigation is certainly important, we agree with Ramos et al. (2006) and Briggs et al. (1993), who argue that investigating pen-based interaction with realistic tasks and applications may provide a more complete picture of actual performance. Indeed, there are examples of qualitative and observational pen research (Briggs et al., 1993; Turner, Pérez- Quiñones, & Edwards, 2007; Inkpen et al., 2006; Fitzmaurice, Balakrishnan, Kurtenbach, & Buxton, 1999; Haider, Luczak, & Rohmert, 1982). Unfortunately, these use older technologies like indirect styli with opaque tablets and light pens, or they focus on a single widget or specialized task. In this thesis, we present the results of an observational study of direct pen interaction with a realistic scenario involving popular office applications and tasks designed to exercise standard GUI components, and covered typical interactions such as parameter selection, object manipulation, text selection, and ink annotation. Based on our analysis, we believe that improvements can be made at three levels without altering the fundamental behaviour and layout of conventional GUIs: hardware, base interaction, and widget behaviour. Hardware improvements can reduce parallax and lag, increase input sensitivity, and reduce the weight of the tablet. Individual widget behaviour could be tuned for pen input without altering their initial size or appearance. Base interaction improvements focus on improving fundamental input operations such as pointing, tapping, dragging, as well as adding enhancements which address global issues specific to direct pen input. We believe that well- designed base interaction improvements may have the greatest capability to dramatically improve direct input overall since they use current hardware technology and could potentially apply to all GUI operations and widgets. In our observational study, we found overarching problems, many of which could be addressed at the base interaction level: poor precision when pointing or tapping; instability and fatigue due to ergonomics; cognitive differences such as pen “tapping” versus mouse
3
“double clicking”; and frustration due to limited input capabilities such as the lack of short- cut keys. Researchers have already presented ideas which seek to address many of these problems. Many could be implemented as base interaction improvements, and many are, or could be, made compatible with conventional GUIs. Examples include: improving precision with new target selection techniques (Accot & Zhai, 2002; Ramos, Cockburn, Balakrishnan, & Beaudouin-Lafon, 2007; Ren, Yin, Zhao, & Li, 2007; Ren & Moriya, 2000); reducing ergonomic problems associated with direct pen input and reaching (Forlines, Vogel, & Balakrishnan, 2006); and pen-specific command invocation techniques to address problems with limited input (Grossman et al., 2006; Ramos & Balakrishnan, 2007). However, we found another issue which is under-researched and believe can also be addressed with base interaction improvements. That issue is occlusion: when the user’s hand and forearm covers portions of the display during interaction (Figure 1-1). In our observational study, we found that occlusion likely contributed to user errors, led to fatigue, and created inefficient movements.
Figure 1-1. Illustration of occlusion. If a light source was placed at a user’s eyes, the resulting shadow of the hand and forearm against the tablet would be the region of the display hidden (or “occluded”) from the user.
Past researchers have suggested that occlusion impedes performance in specific contexts and widgets (Brandl et al., 2009; Hancock & Booth, 2004; Inkpen et al., 2006), used occlusion as motivation for interaction techniques (Apitz & Guimbretière, 2004; Ramos &
4
Balakrishnan, 2003; Schilit et al., 1998), and argued for its effect during experiments and usability studies (Forlines & Balakrishnan, 2008; Grossman et al., 2006; Hinckley, Baudisch, Ramos, & Guimbretière, 2005; Hinckley et al., 2007; Ramos et al., 2007). However, as of yet, there has been no systematic study of the fundamental characteristics of occlusion or its effect on performance, nor have general techniques been developed to address it at a base interaction level. Thus, after reporting results from an initial observation study of direct pen input, we will focus on examining, modelling, and designing techniques for hand occlusion.
1.1 Research Objectives and Overview
The research objective of this thesis can be simply stated as:
Identify issues with direct pen interaction with a conventional GUI, and improve the experience by investigating, modelling, and addressing hand occlusion.
To ultimately reach this goal, we investigate a series of primary research problems, many of which build on (or are dependent on) research outcomes from previous steps. The six research problem statements are:
(a) Why is direct pen input difficult with a conventional GUI?
(b) What is the area and shape of hand occlusion?
(c) How does occlusion affect performance when tapping, dragging, or tracing?
(d) How do people compensate for occlusion?
(e) Is it possible to model the occluded area and update it in real time using only conventional pen input?
(f) Can techniques be developed for a conventional GUI interface to counteract occlusion?
5
To answer these research problems, we took the following steps (also illustrated in Figure 1-2):
1. To answer the first question (a), we conduct an observational study with realistic tasks and common software applications. Our results verify that occlusion is an aspect worth investigating and place it in context with other direct pen input issues.
2. To investigate the phenomena of occlusion more closely, we answer questions (b), (c), and (d) by conducting three formal experiments to examine its area and shape, its affect on performance, and compensatory postures.
3. The results from inquiry (b) lead us to design a five-parameter geometric model which captures the general shape of the occluded area, and can be configured for a particular individual so that it can be updated in real time using only the current pen position and optionally, pen tilt.
4. Motivated by results from inquiry’s (a), (c), and (d), we design and evaluate an occlusion-aware interface technique called the Occlusion-Aware Viewer. This provides a case study for other occlusion-related base interaction techniques that would be compatible with current GUIs.
6
Chapter 3: a Why is direct pen input difficult with a conventional GUI? We conduct an observation study with realistic tasks and common software applications. There are five overarching issues, but one is the nearly uninvestigated issue of hand occlusion.
Chapter 4: b What is the area and shape of hand occlusion? We conduct a controlled Chapter 5: experiment to capture and e Is it possible to model the analyze rectified and registered occluded area and update it in images of the occluded area. real time using only conven- The hand and arm can occlude a tional pen input? large area, and although the We design a user-configurable, shape varies across participants, geometric model and test it there are common features. analytically and in a user study. The occluded area can be modeled c How does occlusion effect using a five parameter geometric performance when tapping, representation, the model can be dragging, or tracing? configured for individual users with a four-step process and can be We conduct a controlled updated in real time using only pen experiment to investigate how location and optionally pen tilt. occlusion affects time, error, and other performance metrics. It is difficult to experimentally control for occlusion within a single direct input context, but there is reasonable evidence Chapter 6: that occlusion has an effect. f Can techniques be developed for a conventional interface to counteract occlusion? d How do people compensate for occlusion? We design and evaluate the Occlusion-Aware Viewer We conduct a controlled interaction technique. experiment with a simultaneous monitoring task. The Occlusion-Aware Viewer interaction technique can reduce People use different posture the effect of occlusion in a contortion strategies which can simultaneous monitoring task reduce task performance. and it functions as a case study for creating other occlusion-n- aware interface techniques.
Figure 1-2. Research path showing research problems, activities, and main results. Bold text is the research problem statement; italic text is the research activity; and the final block of text is the primary contribution which leads to the next stage. Highlighted text and arrows illustrate dependencies forming the research path used in this thesis.
Note that our focus is on GUI manipulation rather than text entry – text entry is an isolated and difficult problem with a large body of existing work relating to handwriting
7
recognition and direct input keyboard techniques – Shilman, Tan, and Simard (2006) and Zhai and Kristensson (2003) provide overviews of this literature. We see our work as complementary; improvements in pen-based text entry are needed, but our focus is on improving direct input with standard GUI manipulation.
1.2 Contributions
We make the following contributions relating to human factors, interaction design, and methodology.
Issues for Direct Pen Interaction with a Conventional GUI
To our knowledge, there has been no comprehensive qualitative, observational study of Tablet PC or direct pen interaction with realistic tasks and common GUI software applications. Our study, described in chapter 3, presents results that can help guide future pen input researchers. We found that pen participants made more errors, performed inefficient movements, and expressed frustration compared to mouse users. When examined as a whole, our quantitative and qualitative observations reveal overarching problems with direct pen input: poor precision when pointing or tapping, problems caused by hand occlusion; instability and fatigue due to ergonomics; cognitive differences between pen and mouse usage; and frustration due to limited input capabilities. We believe these to be the primary causes of non- text errors and contribute to user frustration when using a pen with a conventional GUI. We feel that these issues can be addressed by improving hardware, base interaction, and widget behaviour without sacrificing the consistency of current GUIs and applications. Moreover, previous research has focused on issues other than occlusion, yet our results suggest that occlusion has a profound effect on the usability of direct pen input.
Characteristics of Direct Pen Occlusion
Our investigation into fundamental aspects of direct pen occlusion in chapter 4 – its size and shape relative to the pen position, how it affects performance and ways in which
8
users contort their hand posture to minimize its affect – reveal new insights into the characteristics of hand occlusion and its effect on user performance.
The Area and Shape of Occlusion
In Experiment 4-1, we use a novel combination of computer vision and image processing to capture an image showing the shape of the occluded area from the perspective our participants (we call these images occlusion silhouettes). We find that the hand and arm can occlude a large area, as much as 47% of a 12 inch display, and that the shape of the occluded portion of the display varied across participants according to anatomical size and the style of pen grip. However, there are some common features and similar grip characteristics among users. Using mean images of the occluded area, we present three basic design implications which take the occluded area into account.
The Effect of Occlusion on Performance; Compensatory Postures
We support our qualitative findings from the observational study with further evidence from controlled experiments. The results of Experiment 4-2 suggest that occlusion has an effect on performance. Moreover, for continuous tasks in which the pen remains against the display surface, such as dragging, or when the desired target is initially hidden, the effect appears more pronounced. However, we also found that it is difficult to experimentally control for occlusion within a single direct input context. The results from Experiment 4-3 show that users contort their posture to minimize the effect of occlusion during a simultaneous monitoring task, and different users utilize different contortion strategies. Compensating for occlusion in this way reduces performance by increasing task time.
A Configurable Model of Occlusion
We show that a five parameter geometric model can adequately represent the general shape of the occluded area examined in Experiment 4-1. We use this geometric model to design a configurable model of occlusion which can be interactively customized for a particular individual using a four-step interactive process. Once completed, the model can be updated in real time based only on pen location and pen tilt if available. We evaluate our model analytically using a novel methodology, and in a user study (Experiment 5-1). Finally,
9
we illustrate three examples showing how the model can be used by designers and researchers.
Occlusion-Aware Interface Techniques
We introduce the notion of occlusion-aware interfaces: interaction techniques which know what regions of the display are currently occluded, and use this knowledge to counteract potential problems with occlusion and/or utilize the occluded area. As a case study, we present a fully realized design for an interaction technique called the Occlusion- Aware Viewer which displays otherwise missed previews and status messages in a non- occluded area using a bubble-like callout. Based on results from a user study, the Occlusion- Aware Viewer can decrease the time of a simultaneous monitoring task by up to 23%. However, the study also revealed that techniques such as this need to carefully consider cases where the occluded area is ambiguous, or else performance will decrease. In spite of this problem, our participants rated using our technique as better than no technique. We also describe designs and ideas for other occlusion-aware interface techniques.
Methodology, Analysis, and Implementation
In addition to contributions pertaining to direct pen interaction and occlusion, this thesis contains a number of contributions for quantitative and qualitative study design, multi- faceted logging, qualitative analysis, image-based analysis, optimization-based analytical testing, and user interface image processing.
Hybrid Quantitative and Qualitative Study Design
In the observational study presented in chapter 3, we describe a study design which is a hybrid of typical controlled HCI experimental studies, usability studies, and qualitative research. We believe this enables more diverse observations involving a variety of contexts and interactions, and moves researchers closer to studying how people might perform in real settings.
Multi-Faceted Logging
In every study we conducted for this dissertation, we captured head-mounted video in addition to conventional input event logs. For the observational study in chapter 3, we also
10
captured 3-D positions of their forearm, pen, Tablet PC, and head, as well as a full scale screen capture video. To use this extra logging data, we describe techniques for synchronizing, segmenting, and annotating – and we developed a reasonably complete software application to perform these tasks efficiently.
Qualitative Analysis
For qualitative analysis, we introduce an adapted open coding approach (Strauss & Corbin, 1998) which includes a preliminary step of identifying important events before performing actual coding. We demonstrate the utility of creating coding decision tree to train raters and reduce coding ambiguity. We also provide a strategy to combine events identified by two different raters.
Image Based Analysis
The experiments in chapters 4, 5, and 6 utilize a novel combination of head-mounted video logging, augmented reality marker tracking, and image processing techniques to capture images of hand and arm occlusion from the point-of-view of the user. We use these images, which we call occlusion silhouettes, to visualize the mean shape of occlusion for individual participants, to compute a quantitative occlusion ratio in Experiment 4-1, for analytical tests in chapter 5, and to test whether the experimental stimuli was occluded in Experiments 4-2 and 4-3.
Optimization-Based Analytical Testing
To analytically test our configurable model of occlusion, we created what we believe is a novel methodology using techniques taken from classification and numerical optimization.
We use mean F2 scores (Van Rijsbergen, 1979), calculated from the model’s precision-recall performance for a corpus of occlusion silhouettes, to compare different version of models.
We demonstrate how to establish a theoretical maximum F2 score by fitting geometric parameters to a corpus of silhouettes using non-linear optimization.
User Interface Image Processing
As part of our Occlusion-Aware Viewer implementation, we introduce what we believe is a novel application of image processing and computer vision techniques for real-time user interface analysis to enable an interaction technique. In real-time, we monitor what regions of
11
the interface are changing, and use this to recognise occluded status messages and document previews. This makes our technique compatible with current GUIs by functioning at a base interaction level.
1.3 Dissertation Outline
The remainder of this document is organized as follows (see also Figure 1-2):
In chapter 2, we summarize relevant background information regarding how we use our hand to grasp and manipulate a pen, and discuss pen input technologies. Then, we summarize previous research results pertaining to performance and input characteristics, and give a brief overview of pen-specific interaction techniques and applications that have been developed.
In chapter 3, we describe the methodology and results for our observational study of direct pen input.
In chapter 4, we describe three controlled experiments which investigate occlusion.
In chapter 5, we present a configurable, real-time, geometric model of the area occluded by the hand and forearm.
In chapter 6, we introduce occlusion-aware interfaces and describe and evaluate a case study technique called the Occlusion-Aware Viewer.
In chapter 7, we draw conclusions, summarize limitations, and suggest possible future work.
12
2 Background Literature
Using a pen (or stylus) to operate a computer is not new. Tethered light-pens and digitizer tablets have existed since at least the 1950s (Davis & Ellis, 1964; Gurley & Woodward, 1959). Pen-based personal computers have been commercially available since the early 1990s with early entrants such as Go Corporation’s PenPoint, Apple’s Newton, and the Palm Pilot. In fact, Microsoft released Windows for Pen Computing in 1992 (Bricklin, 2002; Meyer, 1995). It is not surprising then, that there exists a large body of academic and commercial research seeking to understand and improve pen input for computers. A common benefit touted by industry marketing teams and suggested by some researchers (e.g., Mark D. Gross & Do, 1996; Whitefield, 1986) is that pen computing should be more natural than other input modalities since we are already familiar with pen and paper. This is based on the simple notion that drawing on paper should be the same as drawing on a digitizer display. Thus, we begin by describing pen input from the most basic level: how we use our hand to grasp and manipulate a pen. This includes a review of hand and upper limb anatomy and mechanical capabilities; followed by an overview of prehension and grip as it relates to pen usage. With these fundamentals in mind, we move to pen input with a computer. After an overview of pen sensing technologies and input modalities, we summarize research results pertaining to performance and input characteristics. This includes low-level performance for basic tasks such as target selection, docking (dragging), and path following; and human capabilities for controlling additional input channels such as pressure, tilt, and rotation. Next, we discuss findings examining pen interaction with specific widgets and in common usage
13
contexts. Finally, we give an overview of pen-specific interaction techniques and applications that have been developed. Based on our background survey, we conclude that investigating the usability of direct pen input with a conventional GUI is an important, but overlooked context. Furthermore, we note that the effect of hand occlusion has been an often mentioned, but under-researched aspect of direct pen input.
2.1 The Hand and the Pen
One does not need to be convinced that our hands are incredibly important for many daily functions. We use them to explore, touch, manipulate, and move objects around us (Napier, 1993; F. R. Wilson, 1999). Jones and Lederman (2006) present a conceptual framework of human hand function along a continuum from predominantly sensory to predominantly motor (Figure 2-1).
Sensory Motor Tactile Active haptic Prehension Non-n- sensing sensing prehensile skilled movements
Figure 2-1. Sensiomotor continuum of human hand function. (from L. A. Jones & Lederman, 2006, fig. 1.1)
Many of these basic functions are further refined with the use of hand-held tools. For example, we can break a piece of wood in two with our bare hands, but a saw will make this much easier and more accurate. Other common tasks are nearly impossible without the assistance of a hand-held tool (Chris Baber, 2006); consider cutting metal without a hack saw or writing on paper without an ink pen. When wielded by capable and dexterous hands, a handwriting instrument becomes an extremely flexible tool. Pens, pencils, styli, brushes, crayons, and chalk enable a wide range of expression with relatively simple technology. In most cases a keyboard may be more efficient for entering written text (Zhai, Hunter, & Barton A. Smith, 2000), but it lacks flexibility in spite of its increased technical complexity. Of course speech requires no tool at all, but it is really only effective for certain modes of communication. Using speech to
14
describe schematics, identifying specific areas of interest, or render a portrait can be challenging if not impossible. Writing a word, or drawing a shape with a pen presupposes the existence of a hand for support and manipulative control. How successful the hand is at manipulating the pen is partially due to human anatomical properties and capabilities such movement range, stability, strength, and precision. In the same way, the physical properties and ergonomics of the pen such as its size, mass, and friction, also affect manipulative performance.
Hand and Upper Limb Anatomy
It is the anatomy of the hand, as well as the arm and shoulder which enable its positioning in space. We briefly review relevant aspects of skeletal structure, external observable structure and movement capability. For more detailed descriptions, the reader should consult Napier (1993, chap. 2), Jones and Lederman (2006, chap. 2), or C. L. MacKenzie and Iberall (1994, pp. 349-).
Bones
The hand is comprised of 27 bones: 14 phalanges in the digits, 5 metacarpals in the palm, and 8 carpals in the wrist (Figure 2-2a). The most common names for the digits are thumb, index, middle, ring, and little. The latter four are collectively referred to as the fingers. Beginning at the finger tip, each finger has 3 phalanges: the distal, middle, and proximal. The thumb has no middle phalanx1. The proximal phalanges are connected to metacarpals, which are in turn connected to the wrist carpals.
1 phalanx is the singular form of phalanges
15
middle ring index
little distal interphalageal (DIP) proximal interphalageal (PIP) distal phalanges thumb metacarpophalageal (MP) middle phalanges
proximal phalanges interphalageal (IP) metacarpophalageal (MP) metacarpals
carpals carpometacarpal
ulna radius
(a) bones (b) joints Figure 2-2. Bones and joints of the hand. (skeleton illustration based on C. L. MacKenzie & Iberall, 1994)
The terms proximal and distal are used in anatomy to describe the relative position of body structures. Distal denotes a structure attached farther from the centre of the body and proximal nearer. To avoid ambiguity, a standard body reference position is needed – otherwise, we could point the tip of our finger at our heart and change the relative positioning. This body reference position is called the standard anatomical position. It places the arms straight and slightly way from the side of the body with the palm facing forward. This standard position comes from the practice of suspending cadavers for dissection in the eighteenth century (Napier, 1993, p. 13).
Joints and Range-of-Motion
The finger joints are the distal interphalangeal joint (DIP), and the proximal interphalangeal joint (PIP) (Figure 2-2b). Based on the standard anatomical position, the DIP is farther from the centre of the body than the PIP. These are primarily hinge joints which restrict most finger movement to bending or extending with some small side-to-side movement. The thumb has only a single interphalangeal joint (IP). Each proximal phalanx in the digits connects to a metacarpal with the aptly named metacarpophalangeal joint (MP). For the fingers, this joint functions similar to the interphalangeal joints, but enables much greater side-to-side movement. To assist forming a
16
precise oppositional grip with the thumb (such as holding a pen) this joint also has some capability for rotation, especially with the index finger. The metacarpals are connected to the wrist carpals by the carpometacarpal joints. The carpometacarpal joint for the thumb permits a very wide range of motion side-to-side and when flexing and straightening, functioning like a saddle joint. The eight carpals that make of the palm are able to move independently to varying amounts, with most movement occurring along the axis aligned with the middle digit, permitting the hand to be cupped.
Musculature
Anatomically, there are 29 muscles to control hand movement, but some of these muscles perform different functions by tendon subdivisions. The majority of muscles for the hand are actually located in the forearm; and their movements are transfered to the digits, palm, and wrist using a system of tendons. This enables the hand to have strength without bulk. For fingers, there are two sets of primary muscles, the superficial and the deep, which divide their work between controlling the DIP joint and the PIP joint respectively. Each set of muscles are arranged in opposition, which enable joints to bend (or “flex”) and extend (a flexion-extension movement), and through some ingenious routing, can also control how fingers spread apart as well. The thumb and the wrist are each controlled by three primary muscles which suggest the greater range of motion.
Anthropometry and Range of Motion
The human hand comes in different shapes and sizes, but overall there are some consistent trends (Napier, 1993, p. 18). For western men and women, the ratio of hand breadth to hand length is remarkably consistent, with only a slight indication of women’s hands being more slender. The longest digit is the middle finger with length decreasing as digits deviate from the centre. Other digit lengths are not symmetrical, the thumb is much shorter than the little finger, and interestingly, the relative ordering of length for the index and ring finger varies by individual (although females have longer index fingers more often than men). Table 4-1 and Figure 4-6 provide selected 50th percentile anthropomorphic measurements for the hand and forearm. Additional percentiles and dimensions can be found in Pheasant and Hastlegrave (2006) or Kroemer and Grandjean (1997).
17
Dimension Men Women
EL elbow to fingertip length 480 435
SL shoulder to elbow length 365 335
UL upper limb length including hand 790 715
HL hand length 189 174
HB hand breadth 87 76
Table 2-1. Anthropomorphic measurements for hand and arm. All dimensions in millimetres, given for 50th percentile only (from Pheasant & Hastlegrave, 2006, tables 6.1 and 10.11).
UL
HL SL FL HB
EL
Figure 2-3. Anthropomorphic measurements. Note that FL = EL - HL (based on Pheasant & Hastlegrave, 2006, figs. 2.11 and 6.1)
In spite of variation of shape, Napier (1993) argues that human hand function is universal. We already gave general characteristics for the range-of-motion of digits, but as we shall see, during pen manipulation, their primary function is to hold the pen and perform fine movements. Larger movements involve the wrist, forearm, and upper arm. Within the kinematic chain from shoulder to pen, hand movements using the wrist and elbow joints afford a large range-of-motion (Figure 2-4 and Figure 2-5).
18
extension radial deviation neutral ulnar deviation flexion Figure 2-4. Selected hand and wrist postures. (based on Pheasant & Hastlegrave, 2006, fig. 6.2)
(a) extension (e) supination (f) pronation (c) abduction
(d) adduduction (b) flexion
Figure 2-5. Principle range of motion for hand and wrist. Values given for 50th percentile males and females: (a) wrist extension 62°, 72°; (b) wrist flexion 68°, 72°; (d) wrist adduction 22°, 27°; (e) wrist abduction 32°, 28°; (e) forearm supination, 108°, 109°; (f) forearm pronation 65°, 81°; (illustration and measurements from Kroemer & Grandjean, 1997, fig. 4.9)
The maximum distance addressable by the hand is more difficult to precisely define due to anatomical differences and joint coordination. When seated at a standard desk most individuals can grasp objects in a horizontal area bounded by a 35 – 45 cm arc (Figure 2-6) (based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6). With upper arm movement, the arc can be increased to 55 – 65 cm; and with torso movement, such as when pianists lean slightly when reaching for distant keys, this distance can be extended further.
19
Figure 2-6. Horizontal arc of grasp. (based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)(based on 5th %tile data, Pheasant & Hastlegrave, 2006, fig. 4.6)
Manual Dexterity
Manual dexterity tasks can be mapped in a force-displacement framework (Figure 2-7). The force typically exerted by the hand for most tasks ranges from approximately 0.1 to 100 N and can perform movements as fine as 0.1 mm (L. Jones, 1998). When writing with a pen, researchers have measured barrel grip forces as high as 7.3 N (Chau, 2006). For comparison, when manipulating a mouse, researchers report mean grip forces near 0.8 N (Visser, De Loose, De Graaff, & Dieen, 2004). The strongest fingers in the hand are the index and middle fingers (Radwin et al., 1992, as cited in Pheasant & Hastlegrave, 2006, p. 149), which oppose the thumb and prevent an object from slipping during these fine manipulations.
20
100
human hand and upper limb function
10 timed playing piano dexterity tests
electronic keyboard 1 assembly typing Force (N) micro- surgery
0.1
0.01
0.0001 0.001 0.01 0.1 1 Displacement (m) Figure 2-7. Jones’s force-displacement framework for manual dexterity. (from L. Jones, 1998, fig. 4.1)
A considerable amount of research (especially in human-computer interaction) has focused on relatively large, forceful movements, where dexterity is primarily measured by speed and accuracy and modeled using Fitts’ Law2. Many of these studies are one- and two- dimension variations of Fitts’ original experiment (Fitts, 1954) , and typically involve a rapid coordinated movement between fingers, wrist, and forearm over distances more than a few centimetres (Soukoreff & I. S. MacKenzie, 2004). A notable exception is Balakrishnan and I. S. MacKenzie (1997), who tested the performance of isolated limbs with motor-space distances as small as 3 mm. They found that pointing with isolated index finger movements were slower and less accurate than using a wrist or forearm individually, or when manipulating a pen. Balakrishnan and I. S. MacKenzie conclude that “... stylus3-type input devices that exploit the high bandwidth of the thumb and index finger, working in unison, is
2 Since Fitts’ Law is generally accepted to be a core component of the Human Computer Interaction literature, and since it is not the focus of this dissertation, we will refrain from providing an explanation here. The uninitiated reader should consult MacKenzie (1992) and Soukoreff and MacKenzie (2004) for detailed background, tutorials, and practical guidelines for its application. 3 The terms “pen” and “stylus” are often used interchangeably in Human-Computer Interaction literature and industry. We will adopt a more strict convention explained on page 28.
21
likely to yield high performance.” We survey additional research investigating pen input performance below.
The Pen
Writing instruments have been in existence for more than five thousand years, and using sticks and fingers for drawing have been used much longer. The general form of writing and drawing instruments remain largely unchanged – a cylindrical shank with a tip to leave an impression. For example, Sumerian’s from 3500 BC made marks in clay tablets to keep inventories of items such as food stocks (Fischer, 2001, p. 28). These standardized marks were created with a hollow reed stylus which was pushed into wet clay at different angels to create a vocabulary of symbols. The use of a stylus to leave a mark by scratching into a surface continued for thousands of years, and advanced to incorporate re-usable tablet surfaces such as wax. Modern writing instruments such as the pen and pencil enable the creation of continuous lines and marks by leaving a trail of ink or lead. It is interesting to note that in their early form, the pen and pencil were designed to serve different purposes. The pencil was primarily used for drawing lines and a nib pen for writing (Petroski, 1992). This was not due to convention, but rather because writing was calligraphic which required different stoke widths. Today we still use specialized pen-like mark making tools such as the cabinetmakers’ marking knife, or artists’ conté crayons and brushes. These are more accurate or more expressive than modern two-dimensional pens and pencils – in fact, Fischer (2001, p. 51) argues that a stylus making clay is capable of a richer visual vocabulary.
Pen Grips
The interface between the hand and pen is the grip, the third stage of prehension during which manipulation of the acquired object occurs. There are many different types of grips and much different taxonomies for their categorization (C. L. MacKenzie & Iberall, 1994, provide a comprehensive discussion). Using a functional categorization, most grips
22
incorporating thumb and finger opposition can be labelled as power grips or precision grips4 (Napier, 1956). With a power grip, the object is held immobile against the palm of the hand with fingers and thumb wrapped around. This makes the object function as a static extension of wrist and arm movements: a common example is when swinging a hammer. With a precision grip, the object is held with the tips of the thumb and fingers. The enables some fine movements of the object using only the digits, but with a reduction in gripping force. When manipulating a pen, most adults use a type of precision grip, most often a variation on the dynamic tripod (C. L. MacKenzie & Iberall, 1994, p. 27). The dynamic tripod grip is named for the way in which the thumb, index, and middle finger work in opposition to support and manipulate the pen (Figure 2-8). Gripping an object for manipulation requires balancing a firm hold to keep the object stationary and securely attached to the hand, while at the same time enabling the manipulation of objects independent of the hand (Napier, 1956). In most cases, the goal is to amplify or attenuate finger movements and enable a wider variety of tool control (Elliott & Connolly, 1984).
Figure 2-8. Dynamic tripod pen grip illustrated by Merctor, 1540. (from Kao, Van Galen, & Hoosain, 1986)
Although the dynamic tripod is considered by many teachers and therapists to be the ideal pen grip (Selin, 2003, p. 4), it is not the only way in which individuals hold a pen. The
4 There is some debate whether this is a grip per se, since the pen is not held in a single, static phase. The term precision handling has been proposed (Landsmeer, 1962). For simplicity, and to remain consistent with most of the literature, we will continue to use the term grip.
23
greatest diversity of grips is seen with young children (Elliott & Connolly, 1984; Sassoon, 1993; Selin, 2003), but adults also employ different grips, not all of which are considered efficient (Elliott & Connolly, 1984). Common adult grip variants include the lateral tripod (Figure 2-9b), and adapted tripod (Figure 2-9c), as well as many variations on the dynamic tripod itself (Figure 2-9a,b,c). Examples of less common, and argued to be less efficient (Sassoon, 1993; Selin, 2003) adult grips include the thumb wrap (Figure 2-9g), ventral grip (Figure 2-9h), and index grip (Figure 2-9i)5.
(a) dynamic tripod (b) lateral tripod (c) adapted tripod
(d) dynamic tripod variation: (e) dynamic tripod variation: (f) dynamic tripod variation: thumb and finger extended thumb flexed, finger hyperextended thumb hyperextended, finger flexed
(g) thumb wrap (h) ventral grip (i) index grip Figure 2-9. Examples of different adult pen grips reported in the literature. Grips (a) through (f) are considered efficient, grips (g) through (i) are not. (based on illustrations in Greer & Lockman, 1998; Sassoon, 1993; Selin, 2003)
For the most part, research has shown that adults use a consistent grip when writing (e.g., Greer & Lockman, 1998). As we noted earlier, when using a precision grip, the held
5 For interested readers, Selin (2003) provides a comprehensive overview of pen grip research.
24
object has some freedom for independent movement. Greer and Lockman (1998) observed that adults often tilt the pen slightly to the right when making vertical lines, and towards their body when making horizontal lines. They also found individuals tilt the pen more at the beginning of a line than at the end, and the degree of tilt was independent of the drawn line’s position on the page. Other characteristics, such as the distance from pen tip to grip fulcrum, the amount of the pen barrel that extends beyond the grip, or even the grip force applied to the pen, appear to be less frequently considered. Part of the reason may be the difficulty of accurately measuring these statistics, or establishing a baseline set of tasks for which to measure them. Chau (2006) demonstrated a pen capable of accurately measuring the grip forces applied around the pen barrel. In a small pilot experiment, he establishing the first accurate estimate for pen grip forces (which we discussed earlier) and found that participants held the pen with a mean grip height of 35 mm (SD 13.7).
Writing Posture
The hand and fingers which grip the pen are positioned in space by the forearm, which is positioned by the arm, which is positioned by the shoulder, which is positioned by the upper body. Part of the reason why the dynamic tripod is advocated as a preferred grip is because it also suggests an ideal posture for the rest of the body. The flexed ring and little fingers are the hand’s connection to the writing surface, and form an arch with the elbow (Erhardt, 1994, in Selin, 2003). The wrist, forearm, and upper arm are therefore in a relaxed position (Rosenbloom and Horton, 1971, in Selin, 2003). The position of the body relative to the table has been studied by Sassoon (1993, chap. 2) using different ages of school children. She coded general posture categories which are comprised of individual observations relating to forearm pronation, wrist flexion and extension, and body lean direction. With the oldest 15 year-old right-handed group, she found that in 78% of her observations, the children used a neutral, upright posture. The remaining observations were almost evenly distributed between leaning forward and left, with a few instances of leaning right. Observations with left-handed children of the same age revealed roughly mirrored result. Sassoon does not comment whether her observations confirmed a common association between left-handedness and a flexed (or “hooked”) wrist. Enstrom (1962) conducted a large
25
study of postural methods employed by left-handed elementary school students when writing6. He classified 15 different techniques broken into two primary groups: 6 techniques used by students who kept their hand below the writing line; and 9 techniques for students who kept their hand above the writing line (Enstrom refers to the second group as “hookers” [sic]). Based on factors comprised of writing quality, speed, good posture, and lack of smearing (the students used graphite pencils), three techniques in the first below-the-writing- line group are recommended, and only one technique in the hooking group was considered good, but with reservations. Overall, 69% of the students in the below-the-writing-line group were already using the recommended techniques, but only 20% of the students in the hooking group used the technique identified as good. No statistical breakdown is given for the groups themselves, so it is still unclear the frequency of extreme postures used by left-handed writers. In three smaller observational studies, also with school children, Selin (2003) was not able to statistically prove or disprove this left-handed characteristic.
Pen Ergonomics
The majority of pens in use today have a similar cylindrical shape which some variation in the diameter of the barrel, the addition of faceted sides, or style of tapering at the bottom or top. The overall weight and balance can vary quite a bit, and different materials can be used to wrap the barrel to increase friction, or cushion against fatigue from tight grips. Some the ergonomically inspired material changes are motivated by writer’s cramp and carpal tunnel syndrome (Harris and Hodges, 1995, in Selin, 2003). Somewhat radical departures in pen design have been patented and are available commercially (Figure 2-10). Note that therapists train patients suffering from chronic writer’s cramp to use the adapted tripod grip (Figure 2-9c) since it provides similar support to these ergonomic pen designs.
6 The study involved observing more than 1000 left-handed elementary school students over two years.
26
(a) RingPen (b) PenAgain (c) Evo pen Figure 2-10. Commercial ergonomic pen designs. (a) RingPen (Gorbunov, 1995); (b) PenAgain (Roche & Ronsse, 2003); (c) Evo pen (derivative of Debbas, 1995).
More subtle improvements to pen design may also provide ergonomic benefits. Kao, Smith, and Knutson (1969) suggest that the relationship between the pen point and the shank are the most important factors for writing comfort and efficiency. They found that if a pen tip was off centred, so that it aligned with the edge of the shank rather than a conventional central placement (Figure 2-11), overall writing time can be reduced. They attribute this finding to the improved visibility of the pen point during manipulation which “... slightly enhances the space-displacement of visual feedback of the writing point, as compared to the feedback from movement of the pen shank.”
(a) off centred pen point
(a) centred pen point
Figure 2-11. Illustration of pen point placements in Kao et al.’s experiment. (a) off-centred; (b) centred (from Kao et al., 1969, fig. 2).
2.2 The Pen as a Computer Input Device
Using a pen for computer input has long been through to be a natural solution for interactive control. In 1962, Licklider and Clark describe how they were currently addressing the second “immediate problem” to enable effective human-machine communication: Devise an electronic input-output surface on which both the operator and the computer can display, and through which they can communicate, correlated
27
symbolic and pictorial information. ... We are employing an oscilloscope and light pen to fulfill the function ... (Licklider & Clark, 1962, p. 121) Note that Licklider and Clark’s comments specify a single surface for input and output. Before presenting our survey of pen input technology, performance, techniques, and applications, we explore the possible device configuration of input and output space, as well as different control to display mappings. Licklider and Clark are specifying a direct input device, which is different than a device which uses indirect input (Figure 2-12) (Forlines et al., 2006; Whitefield, 1986). A direct input device combines the input and display together into one coincident space (most often a planar surface, given the 2-D nature of current displays). The most common control mapping is absolute: the input device is physically moved to the actual target position shown on the display. With indirect input, the input and display spaces are separated. Because of this separation, the current position specified by the input device is usually shown on the display as a cursor. Although the mapping can be absolute, it can also be relative, meaning that the input device specifies a new offset for the display cursor, rather than a unique position. With a relative or absolute mapping, movements made by the input device can be amplified or attenuated in display space. With an absolute mapping this is a direct result on the ratio of tablet size to display size. But with relative mapping, this is achieved with a control-to- display transfer function. Two common functions are: multiplying movements by a constant gain factor; and multiplying movements by a dynamic gain factor based on current device velocity (Casiez, Vogel, Balakrishnan, & Cockburn, 2008). Note that direct input devices can use a relative mapping as well, but research has indicated it is only beneficial for very large displays (Forlines et al., 2006). Unfortunately, the literature does not always make these distinctions clear. The term pen is used for both direct pen input and indirect pen input. Moreover, the term stylus is used interchangeably with pen in either context. In some cases, for indirect input, the type of mapping function is not given at all, or its parameters not specified. For clarity and consistency in this dissertation, we will use the term stylus to refer to indirect input (Figure 2-12b), and pen for direct input (Figure 2-12a). Our use of these terms
28
can be justified if one considers that digital ink is being emitted directly from the pen in the direct case, and in the indirect case, the stylus leaves no ink trail. We will also explicitly note the absolute or relative mapping for stylus input. For pen input, the assumption is that the mapping is absolute unless stated otherwise.
(a) direct input, pen (b) indirect input, stylus on tablet (c) indirect input, mouse Figure 2-12. Direct input and indirect input (a) direct input using a pen, the pen input space and the output display are coincident; (b,c) indirect input with a stylus on opaque tablet or a mouse, the input space and the output display space are separated.
Technology
Early Devices
Pen and stylus input devices are perhaps the earliest forms of X-Y input to a computer, and were demonstrated years before Engelbart’s invention of the mouse in 1964 (Myers, 1998, p. 49)7. The earliest type of pen input was a tethered light pen designed in 1957 and reported in 1959 (Gurley & Woodward, 1959) at MIT’s Lincoln Laboratory (Hurst, Mahoney, Gilmore, Roberts, & Forrest, 1989)8. A light pen is a direct input device which works with a cathode-ray-tube display. It calculates the current pointing position by detecting a pulse produced by an electron gun in the display as it refreshes the image. Sutherland used
7 There are sources that claim the joystick was the first form of X-Y computer input, but we could not find a trustworthy reference to verify this claim. Regardless, a joystick is a rate control device, so specifying an X-Y position is inherently relative and indirect. However, pen input is certainly pre-dated by “light gun” input, see next footnote. 8 The functionally of the light pen is the same as an earlier device called a “light gun”, invented by Bob Everett in 1952 also at MIT. The light gun had a large gun-like handle and trigger attached to the barrel.
29
this type of light pen to provide drawing and selection input for his Sketchpad system (Sutherland, 1963).
Image MIT archives. Figure 2-13. Sutherland’s Sketchpad with light pen input.
The first indirect stylus input device was the RAND tablet (Davis & Ellis, 1964)9. It functioned conceptually similar to a light pen, with the stylus sensing pulses emitted from the digitiser surface. Unlike the light pen, which sensed one pulse during a linear scan of the entire display, the RAND tablet generated a unique pattern of binary pulses at each X-Y location. The stylus detected the pulse pattern at the current physical location, and translated this to a unique X-Y location creating an absolute mapping. The stylus contained a lightweight tip switch to detect when it was pressed against the tablet. The version of the RAND tablet reported by Davis and Ellis (1964) had a resolution of 1024 × 1024 within a 10.4 inch square surface. Davis and Ellis use the terms “ink” and “stroke” to describe the marks created by the stylus (in spite of it being indirect), and motivate their design as one which “maintains ‘naturalness’”.
9 There are sources which claim that the Stylator (Dimond, 1958) was the first pen input device. While it is true that the Stylator enabled stylus input to a computer, it was purpose-built to recognize only handwritten characters which were entered one at a time on a physically and electronically constrained template: it had no capacity for sensing X-Y position.
30
Image courtesy of Computer History Museum. Figure 2-14. RAND Tablet (image courtesy Computer History Museum, www.computerhistory.org/collections/accession/102630781)
In their conclusion, Davis and Ellis speculate that the RAND tablet could be adapted for direct input by using a translucent surface and a back projected display. Gallenson (1967) describes a working prototype of this configuration which he calls a graphic tablet display. He notes that although Davis and Ellis report that users could adapt to a side-by-side indirect tablet and display context, Gallenson found people had difficulty with this configuration. However, with his direct input graphic tablet display, Gallenson suggests the effect is one of a “live piece of paper”. He writes: The superposition of the display on the tablet surface is a natural evolution and makes the displayed feedback more meaningful to the user, as well as easier to use. (p. 693)
Current Technology
Pen input technology continued to evolve using other hardware sensing techniques such as resistive, capacitive, acoustical, electromagnetic (Meyer, 1995; J. Ward & M. Phillips, 1987), and computer vision. Most pen-based hand-held mobile devices use a resistive digitizer, which detects where the tip of the stylus is pressed against the display. However, resistive digitizers are sensitive to anything pressed against the display, so they are more practical for small displays where other objects such as the hand typically rest outside the sensing area. Capacitive digitizers were originally designed to sense finger contact, but
31
special pens can be used as well. Of course, when used with a compatible pen, they will suffer from the same hand sensitivity problems as resistive digitizers. Acoustical sensors monitor changing characteristics of sound pulse to calculate pen position. They are relatively simple to implement and require no dedicated surface, but interference from environmental noise and potential inconsistencies in air pressure make them less practical, especially for larger surfaces (J. Ward & M. Phillips, 1987, p. 33). Recently, computer vision techniques have been used for pen input. This has been achieved by integrating cameras into the bezel, behind the surface, or in the pen tip itself. When integrated into the pen tip, a special pattern printed on the writing surface enables very accurate absolute positioning. This pattern can be made very subtle on a transparent surface so that it may be used in conjunction with a back projected display (Leitner et al., 2009). Electromagnetic sensors are currently the most common pen input technology for Tablet PCs and medium sized direct input devices such as Wacom Cintiq tablets. They work by sensing the characteristics of a magnetic pulse sent through a grid of conductors to a powered wire loop in the pen (or vice versa) (J. Ward & M. Phillips, 1987, p. 32). For a more detailed explanation of electromagnetic sensing, see Schomaker’s (1998) illustration and description reproduced in Figure 2-15.
Figure 2-15. Electromagnetic pen position sensor. “A controller samples the field strength emitted by the resonating tuned circuit at each line of a relatively coarse grid. Low-pass filtering of the sensed signal strength followed by differentiation yields a good position estimate on the basis of the time of zero crossing.” (illustration and description from Schomaker, 1998)
32
The main advantage is that only the pen is sensed with no interference from hands or fingers. Early implementations of electromagnetic sensing required a battery powered pen, but most current Tablet PCs use a slight variation called EMR® (Electro-Magnetic Resonance) Technology from Wacom (“Wacom EMR,” 2009). This uses the magnetic field to power a resonant circuit in the pen which in turn returns the magnetic pulse using the wire coil back to the digitizer surface (no other power is provided to the pen). Other sensors such as pen tilt and pressure also receive power from the resonant circuit and communicate their state within the pattern of return pulses. This non-contact technology also enables detection of pen movement up to 14 mm from the sensor grid (“Wacom EMR,” 2009). However, to reduce magnetic interference from ferrous materials and electronics, great care must be taken to shield the sensor grid and compensate for known irregularities (for example, when the pen is near the edge of the display).
Digitizer Problems
Ward and Phillips (1987) and Meyer (1995) survey potential problems with digitizer technology. Many of the problems described by Ward and Phillips appear to be corrected in modern digitizers like Wacom, but there are still four problems which seem to persist. There are two types of parallax errors (Figure 2-16). Hardware parallax errors are caused by the divergence of the pen coil and pen tip contact against the display glass. Visual parallax errors are caused by the thickness of the glass causing divergence of the rendered position form the pen tip. Two other common problems are eccentricity, when rotating the pen barrel perturbs the sensed position, and magnet field effects caused by ferrous hand jewellery or poor shielding near the bezel. A final problem noted by Ward and Phillips (1987, p. 43) is when large pen tips obscure (or occlude) what the user is drawing.
33
user’s eye
pen and coil
glass 4 mm LCD display digitiser sensor
Figure 2-16. Hardware and visual parallax. Hardware parallax: as the pen is tilted, the sensed position of the pen coil in the digitiser sensor (red dot) diverges from the tip contact point on the glass display surface (black dot). Visual parallax: as the user’s viewing angle diverges, the visual position of the cursor rendered by the LCD display (green dot) diverges from the tip contact point (block dot). Many digitizers attempt to compensate for parallax with user calibration (shown with adjusted position of green dots in LCD display), but this rarely works for all pen tilt angles.
2.3 Pen Input Performance and Capabilities
Comparisons of Pen Input with Other Devices
Studies which directly compare pen input to the mouse and other devices are summarized in Table 2-2. The studies cover a wide range of tasks, and almost all find pen input to be faster than using the mouse in some cases. However, there are potential problems due to the type of pen device used, mouse settings, and in some cases, inadequate study reporting.
34
Study Device(s) Task(s) Results Notes
I. S. MacKenzie et indirect stylus (A*) 1-D tapping stylus faster when *mapping is not reported, dragging but assumed to be al., 1991 mouse** 1-D dragging absolute based on stylus less accurate author’s later work when dragging * *mouse assumed to use no difference when CG 1.88 based on tapping author’s later work
Kabbash et al., indirect stylus (A) 1-D tapping stylus “somewhat” faster * mouse uses CG 2.0 1993 mouse* 1-D dragging stylus more accurate when dragging stylus less accurate when tapping
Accot & Zhai, 1999 indirect stylus (A) straight and circular stylus faster for circular * mouse uses two-stage steering and narrow threshold, with maximum mouse* tracing (“steering”) straight steering CG 2.0
Guiard, et al., 1999 indirect stylus (A) 1-D multi-scale stylus faster for high- *puck functioned like an absolute mapped mouse puck “mouse” (A)* pointing accuracy tasks puck faster for low- accuracy tasks
Kotani & Horii, indirect stylus (R) 1-D tracing stylus faster and more 2003 accurate for 1-D tracing mouse 2-D tracing no difference with 2-D tracing less muscle activity with stylus
Charness et al., direct pen* “menu”** direct pen faster *used a tethered light pen 2004 mouse mouse has lower **task not defined workload*** ***according to participant rating
Jastrzembski et al., direct pen* “web browsing mouse faster *used a tethered light pen 2005 task”** mouse **task not defined
Myers et al., 2002 direct pen* 1-D tapping pen faster *hybrid direct pen technique mouse mouse has fewer errors
Table 2-2. Comparisons between pen input with mouse. In Device(s) column, (A) is absolute mapping and (R) is relative mapping. In Notes column, CG is control-to-display gain factor for mouse transfer function.
35
I. S. MacKenzie, Sellen, and Buxton (1991) are often cited regarding pen pointing performance. In their comparison with mouse input10 when pointing and dragging, they find no difference between indirect stylus input and the mouse for pointing, but a significant speed benefit for the stylus when dragging. However, the mouse had a lower error rate when dragging. The authors conclude that the stylus can outperform a mouse in a GUI, especially if drawing or gesture activities are common. Like most Fitts’ law style research, they use a highly controlled, one-dimensional, reciprocal pointing task. Kabbash, I. S. MacKenzie, and Buxton (1993) compare the same devices with similar 1-D pointing and dragging tasks in their study of dominant and non-dominant hand performance. They report that the stylus is “somewhat” better than the mouse (no post-hoc test results are reported), and that the stylus more accurate when tapping, but worse for dragging. In an evaluation of two tracing (or “steering”) tasks, straight tracing and circular tracing, Accot and Zhai (1999) found that an indirect stylus with absolute mapping was faster than a mouse for narrow straight tracing, and circular tracing. They argue that the concept of error rate does not apply to steering tasks and do not report results pertaining to accuracy. Guiard, Beaudouin-Lafon, and Mottet (1999) compared an indirect stylus with a tablet puck in multi-scale pointing tasks. A puck is physically similar to a mouse, but the authors configured it to operate in absolute mode. They found the puck was faster for low-accuracy pointing tasks, but stylus performance is higher for tasks requiring high precision hand movements. No error rates are given. Kotani and Horii (2003) found that with practice, participants had higher speed and accuracy when using an indirect stylus with a relative mapping, compared to a mouse for simple tracing tasks. Surprisingly, they did not find any difference with a precise tracing task. They also examined their participant’s electromyograms (EMG) and found lower activity with the fingers and bicep for the stylus compared to the mouse.
10 Many of these studies compare pen input with other devices in addition to the mouse (such as a trackball, touchpad, etc.). We only report results for the mouse comparison since it is the best baseline device. For the most part, other input devices performed worse than both mouse and pen device.
36
Charness et al. (2004) compared a tethered light pen version of a direct pen to mouse input with different age groups. They used a “menu selection task”, but it is not described in detail (there is no indication of essential elements such as target distance, width, or direction). Their results found that the light pen is faster than the mouse. However, using the NASA workload scale (Hart & Staveland, 1988, cited in Charness et al. 2004) they found participants rated the mouse as having lower workload. An odd experimental condition asked participant to use the pen or mouse with their non-dominant hand, and some participants said they found it easier to use the pen in this way. The authors suggest this may be due to hand occlusion. Jastrzembski et al. (2005) also compare a tethered light pen version of a direct pen direct pen with mouse input. They use a loosely defined “web browsing task” with interleaved keyboard use, and tested a light pen for the direct pen condition. They found the light pen to be less efficient than the mouse. These are favourable results, but one has to be somewhat cautious. Charness et al. (2004) and Jastrzembski et al. (2005) do not report experimental task parameters, and they use a tethered light pen making their results difficult to confirm and less relevant. Guiard et al. (1999) use an absolute mapped puck in their comparison. Kotani and Horii (2003) use an indirect stylus with a relative mapping, which is quite different than direct pen input. The direct-pen related results from Myers et al. (2002) are interesting in that the speed- accuracy trade-off makes the pen and mouse favourable in terms of time and error rate respectively. These results are part of a larger study comparing the performance of laser pointing. The direct pen condition used in their experiment is in actuality a hybrid technique called Semantic Snarfing (Myers, Peck, Nichols, Kong, & R. Miller, 2001). The technique uses a hand-held computer with a built-in laser pointer to select a general area of the display first, and then the pen is used on the hand-held device to select the desired target within this area. We include this study for comparison, but reader should be aware that the nature of study and use of a hybrid pen input technique make it less of a direct comparison. The remaining studies all use an indirect stylus with absolute input, but they may have used an inferior mouse transfer function. Although not reported in I. S. MacKenzie, Sellen, and Buxton (1991), a similar paper by I. S. MacKenzie and Buxton (1992) uses the same mouse hardware and reports a constant control-to-display gain (CG) factor of only 1.88. Kabbash et al. (1993) use a constant CG of 2.0. Accot and Zhai (1999) use a threshold
37
acceleration function which switches between constant CG of 1.0 to 2.0. Recent work by Casiez et al. (2008) suggests that these CG settings are quite low and would likely reduce the performance of the mouse. In fact, Accot and Zhai acknowledge that the type of transfer function used with relative devices can introduce an experimental confound, but for the sake of comparison, they resort to the default CG setting.
Comparisons of Direct Pen and Indirect Stylus
If we ignore the potential confound of CG gain used for the mouse, the positive findings for indirect stylus input could still be relevant for direct input. If the performance of direct pen input is as-good-or-better than indirect stylus input, then we can conclude that direct pen input is advantageous over mouse input. Theoretically, Whitefield argues that a direct input pen is advantageous since it does not need extra workspace like the mouse (Whitefield, 1986). However, he also notes problems with occlusion and parallax: ... if one is inputting more than a single isolated response, one might have to move one's hand away from the screen between responses in order to get a clear view of the screen and thus to locate the next target: and parallax effects can produce a tendency to point at a location usually slightly nearer the centre of the screen than the target. (p. 100) Whitefield continues that indirect devices, after some initial learning, are more comfortable due to a more optimal body position, they eliminate visual feedback issues such as parallax, and could enable CD gain manipulation. There are three studies we are aware of that make such a comparison (summarized in Table 2-3). Hancock and Booth (2004) compared 2-D target selection performance with direct and indirect input. Overall, they found direct pen input to be faster. Phillips, Triggs, and Meehan (2005) investigated differences between direct pen and indirect stylus input. Their direct pen condition actually used a stylus on an opaque tablet, but target locations were painted on the tablet. For the indirect stylus condition, they forced participants to monitor the display by placing a curtain between the participant and their hand operating the stylus. They found significantly higher task times for their indirect stylus condition (when participants were forced to watch the display) compared to direct pen (when watching their
38
hand). Forlines and Balakrishnan (2008) compare crossing and pointing performance with direct pen and indirect stylus input. They find that direct pen input is advantageous for crossing tasks, but when selecting very small targets, there is little difference between direct pen and indirect stylus.
Study Device(s) Task(s) Results Notes
Hancock & Booth, direct pen 2-D tapping direct pen faster 2004 indirect stylus (A)
Phillips, Triggs, & direct pen 2-D tapping direct pen faster *direct pen condition used Meehan, 2005 (simulated)* painted targets on tablet
indirect stylus (A)
Forlines & direct pen 1-D crossing direct pen faster Balakrishnan, 2008 indirect stylus (A) 1-D tapping effect more pronounced with larger and more distant targets
Table 2-3. Comparisons between direct pen input and indirect stylus input. In Device column, (A) is absolute mapping and (R) is relative mapping.
Performance Experiments
Researchers have examined aspects of pen performance such as speed and accuracy in common low-level interactions like target selection, area selection, and dragging. Not all studies use direct pen input with a Tablet PC-sized display: some use indirect pen input, and other researchers have focused on pen input with smaller hand-held mobile devices. Pen characteristics such as mode selection, handedness, tactile feedback, tip pressure control, and barrel rotation have also been investigated quite thoroughly, but other aspects like pen tilt and hand occlusion are often discussed, but not investigated in great detail. In addition, ergonomic factors have been observed and inspired improvements to pen design.
Target Selection
Ren and Moriya (2000) examine the accuracy of six variants of pen tapping selection techniques in a controlled experiment with direct pen input on a large display. They find very high error rates for 0.72 and 3.24 mm targets using two basic selection techniques: Direct On, where a target is selected when the pen first contacts the display (the pen down event) and Direct Off, where selection occurs when the pen is lifted from the display (the pen up
39
event). Note that in a mouse-based GUI, targets are typically selected successfully only when both down and up events occur on the target; hence accuracy will likely further degrade. Ramos et al. (2007) argue that accuracy is further impaired when using direct pen input because of visual parallax and pen tip occlusion – users can not simply rely on the physical position of the pen tip. They found that users had very high error rates when selecting targets smaller than 1.1 mm. To compensate, their Pointing Lens technique enlarges the target area with increased pressure, and selection is trigged by lift off. With this extra assistance, they find that users can reliably select targets very small targets. In Accot and Zhai’s study of their crossing interaction paradigm (2002) they find that when using a pen, users can select a target by crossing as fast, or faster, than tapping in many cases. However, their experiment uses indirect pen input and the target positions are in a constrained space, so it is not clear if the performance they observe translates to direct pen input. Hourcade and Berkel (2006) later compare crossing and tapping with direct pen input (on a small hand-held mobile device), as well as the interaction of age. They find older users have lower error rates with crossing, but find no difference for younger users. Unlike Accot and Zhai’s work, Hourcade and Berkel use circular targets as a stimulus. Without a crossing visual constraint, they find that participants exhibit characteristic movements, such as making a checkmark. The authors speculate that this may be because people do not tap on notepads, but make more fluid actions like writing or making checkmarks supporting the general notion of crossing. Mizobuchi and Yasumura (2004) investigate tapping and lasso selection on a pen-based hand-held mobile device. They find that tapping multiple objects is generally faster and less error prone than lasso circling, except when the group of targets are highly cohesive and form less complex shapes. Note that enhancements introduced with Windows Vista encourage selecting multiple file and folder objects by tapping through the introduction of selection check-boxes placed on the objects. Lank and Edward (2005) note that when users lasso objects, the “sloppy” inked path alone may not be the best indicator of their intention. They find that by also using trajectory information, the system can better infer the user’s intent.
Mode Selection
To operate a conventional GUI, the pen must support multiple button actions to emulate left and right mouse clicks. The Tablet PC simulates right-clicking using dwell time and
40
visual feedback by pressing a barrel button, or by inverting the pen to use the “eraser” end. Li, Hinckley, Guan, and Landay (2005) find that using dwell time for mode selection is slower, more error prone, and is disliked by most by participants. In addition to the increased time for the dwell itself, the authors also found an additional preparation time is needed for the hand to slow down and prepare for a dwell action. Pressing the pen barrel button, pressing a button with the non-preferred hand, or using pressure are all fast techniques but using the eraser or button with non-preferred hand are the least error prone. Hinckley, Baudisch, Ramos, and Guimbretière’s related work examining mode delimiters (2005) also finds dwell timeout to be slowest, but in contrast to Li et al., finds that pressing a button with the non-dominant can be error prone due to synchronization issues. However, Hinckley et al.’s Springboard (2006) shows that if the button is used for temporary selection of a kinaesthetic quasi mode (where the user selects a tool momentarily, but afterwards returns to the previous tool), then it can be beneficial. Grossman et al. (2006) provide an alternate way to differentiate between inking and command input by using distinctive pen movements in hover space (i.e., while the pen is within tracking range above the digitizing surface but not in contact with it). An evaluation shows that this reduces errors due to divided attention and is faster than using a conventional toolbar in this scenario. Forlines, Vogel, and Balakrishnan’s (2006) Trailing Widget provides yet another way of controlling mode selection. The Trailing widget floats nearby, but out of the immediate area of pen input, and can be “trapped” with a swift pen motion.
Handedness
Hancock and Booth (2004) study how handedness affects performance for simple context menu selection with direct pen input on large and small displays. They note that identifying handedness is an important consideration, since the area occluded by the hand is mirrored for left- or right-handed users and the behaviour of widgets will need to change accordingly. Inkpen et al. (2006) studies usage patterns for left-handed users with left- and right-handed scrollbars with a direct pen input hand-held mobile device. By using a range of evaluation methodologies, including a longitudinal field study with an open-ended task and two formal experiments, they find a performance advantage and user preference for left- handed scrollbars. All left-handed participants cite occlusion problems when using the right- handed scrollbar. To reduce occlusion, some participants raised their grip on the pen or
41
arched their hand over the screen, both of which are reported as feeling unnatural and awkward. Their methodological approach includes two controlled experiments and a longitudinal study which lends more ecological validity to their findings.
Pressure
Ramos, Boulos, and Balakrishnan (2004) argue that pen pressure can be used as an effective input channel in addition to x-y position. In a controlled experiment, they found that participants could use up to 6 levels of pressure with the aid of continuous visual feedback and a well-designed transfer function creating the possibility of pressure activated GUI widgets. Ramos and colleagues subsequently explore using pressure in a variety of applications, including an enhanced slider that uses pressure to change the resolution of the parameter (Ramos & Balakrishnan, 2005), a pressure-activated Pointing Lens (Ramos et al., 2007), that is found to be more effective than other lens designs, and a lasso selection performed with different pressure profiles used to denote commands (Ramos & Balakrishnan, 2007).
Tactile Feedback
Lee, Dietz, Leigh, Yerazunis, and Hudson (2004) design a haptic pen using a solenoid actuator that provides tactile feedback along the longitudinal axis of the pen, and show how the resulting “thumping” and “buzzing” feedback can be used for enhancing interaction with GUI elements. Sharmin, Evreinov, and Raisamo (2005) investigate using vibrating pen feedback during a tracing task and find that tactile feedback reduces time and number of errors compared to audio feedback. Poupyrev, Okabe, & Maruyama (2004) evaluate tactile feedback sent through the display to the pen. They found it improved performance when dragging, but did not find a difference when tapping. Forlines and Balakrishnan (2008) compare tactile feedback with visual feedback, for direct and indirect pen input on different display orientations. They find that even a small amount of tactile feedback can be helpful, especially when standard visual feedback is occluded by the hand. Current Tablet PC pens do not support active tactile feedback, but the user does receive passive feedback when the pen tip strikes the display surface. However, this may not always correspond to the system registering the tap: consider why Windows Vista designers included a small “ripple”
42
animation to visually reinforce a tap. A similar type of ripple contact visualization (tested with a multi-touch device) was found to increase accuracy (Wigdor et al., 2009).
Barrel Rotation and Tilt
Bi et al. (2008) investigate pen barrel rotation as a viable form of input. They find that unintentional pen rolling can be reliably be filtered out using thresholds on rolling speed and rolling angle and that users can explicitly control an input parameter with rolling within 10 degree increments over a 90 degree range. Based on these findings, the authors designed pen barrel rolling interaction to control object rotation, simultaneous parameter input, and mode selection. Because most input devices do not support barrel rotation, the authors use a custom built, indirect stylus. Tian et al. (2007; 2008) explore using pen tilt to enhance the orientation of a cursor and to operate a tilt-based pie-menu. The authors argue for an advantage to using tilt in these scenarios, but they used indirect stylus input in their experiments.
Ergonomics
Haider, Luczak, and Rohmert (1982) is perhaps one of the earliest studies of pen computing and focused on ergonomics. They log variables such as eye movement, muscle activity, and heart activity when using a light pen on a vertical display, touch screen, and keypad with a simulated police command and control system. The authors find lower levels of eye movement with the light pen, but high amounts of muscle strain in the arms and shoulders as well as more frequent periods of increased heart rate. They note that since the display was fixed, participants would bend and twist their bodies to reduce the strain. In a study comparing indirect stylus input to pencil on paper, Fitzmaurice et al. (1999) find that when writing or drawing on paper, people prefer to rotate and position the paper with their non-dominant hand rather than reach with their dominant hand. In addition to setting a comfortable working context and reducing fatigue, they also speculate that this reduces hand occlusion when drawing. They find that when using stylus input instead of pencil on paper, this natural rotation tendency is hampered because of the tablet’s thickness, weight, and tethered connections. An unpublished, but often cited study in the commercial world was performed by Global Ergonomic Technologies, a commercial consulting firm. They analyzed joint angles when using a pen and mouse for various tapping and dragging tasks (GET Consulting Study,
43
1998). Their findings favour the pen: unlike the mouse, pen participants adopted postures with almost no wrist pronation, very little flexion, lower ulna deviation, and lower radial deviation. Overall, they found that pen postures were more neutral, and thus “biomechanically superior” to the mouse. However, the experiment used only 8 participants, the tasks, apparatus, and design are not described adequately, and no statistical tests are used in their analysis. For instance, it is not clear whether they used an indirect stylus or direct pen for input. In addition, the study appears to be sponsored by Wacom, a large manufacturer of pen input devices, and may be inherently biased. Wu and Luo (2006a) examined arm, hand, and pen postures with direct pen input when tapping, drawing, and writing using a Tablet PC placed flat on a table. They noted four characteristic postures in which users did or did not support their arm and hand when using the pen: no support at all (“hanging in the air”), wrist local support, little finger local support, and elbow local support (Figure 2-17). When tapping, no participants supported their arm or hand, but when writing, just over half the time, participants used one of the three support methods. In addition, they noted that some participants adopted a high and loose pen grip when tapping (Figure 2-18). Post-study interviews found that participants complained about the thickness of the tablet making it uncomfortable to rest their arm like they would with pencil and paper, and found that several participants were worried about scratching or staining the display.
(a) no support (b) wrist support
(c) finger support (d) elbow support Figure 2-17. Forearm and hand postures observed by Wu & Luo. (from F. Wu & Luo, 2006a, fig. 1)
44
Figure 2-18. Extreme pen grips observed by Wu & Luo. (from F. Wu & Luo, 2006a, fig. 2)
To counteract these issues and in support of past pen ergonomic literature, the authors designed an ergonomic pen that included a fourth support point below the thumb (Figure 2-19). In an experimental evaluation, they found their pen design increased hand stability and reduced hand fatigue with slightly lower error rates and task times.
Figure 2-19. Wu and Luo’s ergonomic Tablet PC pen. (a) pointing; (b) writing; (c) drawing (from F. Wu & Luo, 2006a, fig. 8)
Wu and Luo (2006b) also evaluated the performance of different pen barrel diameters and lengths (see Figure 2-20) when drawing, writing, and pointing. Overall, they found that longer pens were faster and sometimes more accurate across all tasks. The shortest, 80 mm pen, was consistently ranked less favourable by participants. There was an interaction of pen width and task: when pointing, the thinnest, 5.5 mm pen was fastest; when writing, the medium 8 and 11 mm pens faster and more accurate; and when drawing, the thicker 11 and 15 mm pens were faster and more accurate. They suggest the poor overall performance for the short, 80 mm pen, is because the pen is close to most participants hand breadth which led to a palmer grip. They cite evidence suggesting that manipulating short objects in this way is uncomfortable and difficult to hold (Lewis & Narayan, 1993, Stanton, 1998, cited by F. Wu & Luo, 2006b). However, the authors suggest a length of 100 mm would be acceptable for portable hardware.
45
140 mm 110 80
15 11 8 5.5 mm Figure 2-20. Pen sizes evaluated by Wu and Luo (based on F. Wu & Luo, 2006b, fig. 1)
2.4 Pen Interaction Paradigms
Part of the problem with the relatively poor commercial success of direct pen input and the Tablet PC may be that current software applications and GUIs are not well-suited for the pen. One reason for the lack of pen-specific commercial applications is that the primary adopters of pen computing are in education, healthcare, illustration, computer-aided design, and mobile data entry (A. Chen, 2004; Shao, Fiering, & Kort, 2007; Whitefield, 1986). These vertical markets use specialized software which emphasizes handwritten input and drawing, rather than general computing. For the general business and consumer markets, software applications and GUIs are designed for the mouse, thus usability issues specific to direct pen input have not always been considered. Researchers and designers have responded by developing new pen-centric interaction paradigms, widgets that leverage pen input capabilities, and software applications designed from the ground up for pen input.
Gestures and Sketch-Based Interfaces
It is reasonable to argue that tasks which are achieved through drawing are more easily performed with a pen. Perhaps one of the most demanding is hand drawn animation, where users can draw shapes and specify motion through their own movements. Regarding his
GENESYS picture-driven animation system, Baecker (1969) argues a key component is a pen- based interface:
46
An input device such as a light pen, tablet plus stylus, or wand, which allows direct drawing to the computer in at least two spatial dimensions. The operating environment must, upon user demand, provide at least brief intervals during which the sketch may be made in real time. The animator must then be able to draw a picture without any interruption. Furthermore, the computer must record the "essential temporal information" from the act of sketching. (p. 274) The notion of capturing the dynamics of pen movement as temporal and positional input led researchers to gestures as a way of invoking a command with a distinctive motion rather than manipulating GUI widgets. Early explorations include Buxton, Sniderman, Reeves, Patel, and Baecker (1979) who use elementary gestures to enter musical notation, Buxton, Fiume, Hill, Lee, and Woo (1983) who use more complex gestures for electronic sketching, and Kurtenbach and Buxton’s Gedit (1991b) which demonstrates gesture-based object creation and manipulation. Later, completely gesture-based research applications appeared such as Lin, Newman, Hong, and Landay’s DENIM (2000), Moran, Chiu, & Melle’s Tivoli (1997), Forsberg, Dieterich, and Zeleznik’s music composition application (1998), and Bae, Balakrishnan, and Singh’s ILoveSketch (2008). Note these all target very specific domains which emphasize drawing, sketching, and notation. Although these researchers (and others) have suggested that gestures are more natural with a pen, issues with human perception (C. A. J. Long, Landay, Rowe, & Michiels, 2000), biomechanical performance (Cao & Zhai, 2007), and disambiguation between “ink” and gesture (Zeleznik & T. Miller, 2006) can make the design of unambiguous gesture sets challenging. Perhaps most problematic is that gestures are not self-revealing and must be memorized through training. Marking Menus (Kurtenbach & Buxton, 1991a) addresses this problem with a visual preview and directional stroke to help users smoothly transition from novice to expert usage, but these are limited to menu-like command selection. Hinckley et al. (2007) found that previous experience with point-and-click interfaces prevented users from discovering stroking or crossing gestures. Even after the availability of the gestures were explained, the users had no mental model of what they were supposed to do. As a solution, they created on-screen hints with gesture names and a highlighter-like tracing of the gesture stroke.
47
In most commercial operating systems, such as Windows 7 or Apple’s iPhone, the number of available gestures is quite small. Ignoring multi-touch gestures, the most common type of gesture used are directional flicks for pagination and scrolling. Aliakseyeu et al. (2008) evaluated this type of multi-flick gesture, and found its performance superior to using a GUI scrollbar widget. Perhaps due to limitations with gestures, several researchers have created applications which combine standard GUI widgets and gestures. Examples include Schilit, Golovchinsky, and Price’s Xlibris electronic book device (1998), Truong and Abowd’s StuPad (1999), Chatty and Lecoanet’s air traffic control system (1996), Gross and Do’s Electronic Cocktail Napkin (1996), and Zeleznik et al.’s Lineogrammer (2008). These all support free-form inking for drawing and annotations but rely on a conventional GUI tool bar for many functions which suggests a limitation when using gestures with a large command set.
Pen Specific Widgets and Interfaces
Later, researchers introduce pen-specific widgets in their otherwise gesture-based
applications. Ramos and Balakrishnan’s LEAN (2003) is a pen-specific video annotation application which uses gestures along with an early form of pressure widget (Ramos et al., 2004) and two slider-like widgets for timeline navigation. Agarawala and Balakrishnan’s BumpTop (2006) uses physics and 3-D graphics to lend more realism to pen-based object manipulation. Both of these applications are initially free of any GUI, but once a gesture is recognized, or when the pen hovers over an object, widgets are revealed to invoke further commands or exit modes. Hinckley et al.’s InkSeine (2007) presents what is essentially a pen-specific GUI. It combines and extends several pen-centric widgets and techniques in addition to making use of gestures and crossing widgets. Aspects of its use required interacting with standard GUI applications in which the authors found users had particular difficulty with scrollbars. To help counteract this, they adapted Fitzmaurice et al.’s Tracking Menu (2003) to initiate a scroll ring gesture in a control layer above the conventional application. Fitzmaurice et al.’s Tracking Menu (2003) is designed to support the rapid switching of commands by keeping a toolbar near the pen tip at all times. The scroll ring gesture uses a circular pen motion as an alternative to the scrollbar (Grham Smith, schraefel, & Baudisch, 2005; Moscovich & Hughes, 2004). Creating pen-specific GUI widgets has been an area of pen research for some
48
time, for example Guimbretière and Winograd’s FlowMenu (2000) combines a crossing- based menu with smooth transitioning to parameter input. In most cases, compatibility with current GUIs is either not a concern or unproven.
Crossing
Another, perhaps less radical, pen interaction paradigm is selecting targets by crossing through them rather than tapping on them (Accot & Zhai, 2002). Apitz and Guimbretière’s CrossY (2004) is a sketching application which exclusively uses a crossing-based interface including crossing-based versions of standard GUI widgets such as buttons, check-boxes, radio buttons, scrollbars, and menus. Two potential issues with crossing-based interfaces are target orientation and space between targets. Accot and Zhai (2002) suggest that targets could automatically rotate to remain orthogonal to the pen direction, but this could further exacerbate the space dilemma. They note that as the space between the goal target and near-by targets is decreased, the task difficulty becomes a factor of this distance rather than the goal target width. Dixon, Guimbretière and Chen (2008) investigate this “space versus speed tradeoff” in the context of crossing-base dialogue boxes. They find that if the recognition algorithm is relaxed to recognize “sloppy” crossing gestures, then lower operation times can be achieved (with only slightly higher error rates). This relaxed version of crossing could ease the transition from traditional click behaviour, and, with reduced spatial requirements, it could co-exist with current GUIs.
Pen-Specific Operating Systems and Applications
In spite of the activity in the pen research community, commercial applications for the Tablet PC tend to emphasize free-form inking for note taking, drawing, or mark-up while relying on standard GUI widgets for commands. Autodesk’s Sketchbook Pro (Autodesk, 2007) is perhaps the most pen-centric commercial application at present. It uses a minimal interface, takes advantage of pen pressure, and users can access most drawing commands using pen-specific widgets such as the Tracking Menu and Marking Menu. However, it still relies on conventional menus for some commands.
49
2.5 Summary
Using direct pen input seems like a reasonable idea (perhaps even “natural”). The physical capabilities of our hands are well suited to grasping and manipulating a pen. Pen input technology has been around for over five decades. The raw speed of pen input is very encouraging, and may even be faster than a mouse. Researchers have studied various aspects of pen characteristics, and developed new pen-centric interaction techniques, widgets, and applications to leverage pen capabilities. Yet, pen input for general computing has not been widely adopted. Are we missing some aspect of pen performance? Persistent hardware issues such as parallax error and bulk could also be to blame. There could be fundamental problems with pen input in spite of its speed. For example researchers note problems with high-precision tasks and have found possible ergonomic issues. There is also some speculation about the effect of hand occlusion, but as of yet there are no in depth studies of its characteristics or a possible effect on performance. The persistence of current GUIs may also be a factor: many researchers seem to imply that current GUIs must be abandoned or altered significantly to better support pen interaction. It is likely due to a combination of these issues. Although researchers have examined aspects of pen input with conventional GUI widgets in the process of designing and evaluating alternative widgets and techniques, their investigations and solutions have been evaluated in experimental isolation with synthetic tasks. Our belief is that to study how pen input really performs; we need to observe users engaged in realistic tasks using current hardware with current GUIs and standard software applications.
50
3 Observational Study of Pen Input
We begin by investigating what the major issues are with pen interaction and a conventional GUI. Researchers have suggested that more open-ended tasks can give a better idea of how something will perform in real life (Ramos et al., 2006). Indeed, there are some examples of qualitative and observational pen research (Briggs et al., 1993; Turner et al., 2007; Inkpen et al., 2006; Fitzmaurice et al., 1999; Haider et al., 1982). Unfortunately, these have used older technologies like indirect pens with opaque tablets and light pens, or focused on a single widget or a specialized task. To our knowledge, there has been no comprehensive qualitative, observational study of Tablet PC or direct pen interaction with realistic tasks and common GUI software applications. In this chapter, we present the results of such a study. The study includes pen and mouse conditions for baseline comparison, and to control for user experience, we recruited participants who are expert Tablet PC users, power computer users who do not use Tablet PCs, and typical business application users. We used a realistic scenario involving popular office applications with tasks designed to exercise standard GUI components, and covered typical interactions such as parameter selection, object manipulation, text selection, and ink annotation. We base our analysis methodology on Interaction Analysis (Jordan & Henderson, 1995) and Open Coding (Strauss & Corbin, 1998). Instead of examining a broad and open- ended social working context for which these techniques were originally designed, we adapt them to analyze lower level interactions between a single user and software. This style of qualitative study is more often seen in the Computer Supported Cooperative Work (CSCW)
51
community (e.g., Ranjan, Birnholtz, & Balakrishnan, 2006; Scott, Carpendale, & Inkpen, 2004), but CSCW studies typically do not take advantage of detailed and diverse observation data like the kind we gather: video taken from the user’s point-of-view; 3-D positions of their forearm, pen, Tablet PC, and head; screen capture; and pen input events. To synchronize, segment, and annotate these multiple streams of logging data, we developed a custom software application. This allows us to view all data streams at once, annotate interesting behaviour at specific times with a set of annotation codes, and extract data for visualization and quantitative analysis. We see our methodology as a hybrid of typical controlled HCI experimental studies, usability studies, and qualitative research.
3.1 Related Work
In the previous chapter, we examined a large body of work investigating low-level aspects of pen performance mostly using controlled experiments. While evaluating low-level aspects is certainly important, we contend that understanding how pen-based interactions function with real applications and realistic tasks is equally, if not more, important. However, as Briggs et al. notes (1993), this is less understood: While there has been a great deal of prior empirical research studying pen-based interfaces, virtually all prior research has examined the elementary components of pen-based interfaces separately (cursor movement, software navigation, handwriting recognition) for very elementary subtasks such as pointing, cursor movement, and menu selection. (p. 73) Ramos et al. (2006) argue that a more open-ended ecologically-valid study can give users a better idea how a new tool will perform in real life. Studying pen input with more ecological validity necessitates using techniques like field work to examine in situ usage, and observational usability studies of realistic scenarios. Yet, there are few examples of these techniques used to examine direct pen interaction. We discuss them below.
Field Studies of In Situ Usage
Since field studies are most successful when investigating larger social and work related issues, researchers have focused on how pen computing has affected general working
52
practices. For example, a business case study of mobile vehicle inspectors finds that with the addition of handwriting recognition and wireless networking, employees can submit reports faster and more accurately with pen computers (A. Chen, 2004). However, specific results relating to pen-based interaction – such as Inkpen et al. (2006) who include a longitudinal field study in their examination of handedness and PDAs – are less common. Two field studies in education do report some aspects of Tablet PC interaction. Twining et al. (2005) reports on Tablet PC usage in twelve British elementary schools, including some discussion of usage habits. They find that staff members tend to use convertible Tablet PCs in laptop mode, primarily to allow typing with the keyboard; although many still used the pen for GUI manipulation. However, when marking assignments or working with students, they use the pen in slate mode. The students were more likely to use the pen for making notes, though they used the onscreen keyboard or left their writing as digital ink instead of using writing recognition. Pen input enables the students to do things which would be more difficult with a mouse, such as create art work and animations. In fact, comments from several schools indicate that the pen is more natural for children. They also note problems with the initial device cost, battery life, screen size, glare, and frequently lost pens. In a field study of high school students and Tablet PCs, Sommerich et al. (2007) find that the technology does affect schoolwork patterns, but more relevant for our purposes is their discussion of ergonomic issues. For example, they find that the students used their Tablet PC away from a desk or table 35% of the time, 50% reported assuming awkward postures, and 69% experienced eye discomfort. No specific applications or issues with interaction are discussed.
Observational Usability Studies of Realistic Scenarios
Most pen-based research applications have been evaluated informally or with limited usability studies. In an informal study of the Electronic Cocktail Napkin (1996), the authors find that although there is no negative reaction to gestures, users have difficulty accurately specifying the pen position, and encounter problems with hand occlusion when using marking menus. The authors of BumpTop (2006) conducted a small evaluation and find that
53
participants are able to discover and use the functionality, but that crossing widgets are awkward near display edges and note problems with hand occlusion. More formal studies with realistic tasks often focus on specific aspects of pen input. Inkpen et al. (2006) used laboratory and field experiments to observe scrollbar usage with realistic tasks. Haider, Luczak, and Rohmert (1982) focus on ergonomics with their observational study of police officers using a simulated police command and control system with various input devices, including a tethered light pen. Fitzmaurice, Balakrishnan, Kurtenbach, and Buxton (1999) included realistic tasks in their observational study of art board orientation and indirect pen input. Briggs, Dennis, Beck, and Nunamaker (1993) compare user performance and preference when using indirect pen input and mouse/keyboard for operating business applications: word processing, spreadsheets, presentations, and disk management. Only the presentation graphics application and word processor supported mouse input in addition to keyboard commands. The experiment tests each application separately and the authors recruited both novice and expert users. They use custom-made, physical digitizer overlays with “buttons” to access specific commands for each application in addition to devoting digitizer space for controlling an onscreen cursor. Overall, they find that task times for the pen are longer for novice users with the word processor, and for all users when using the spreadsheet and file management. Much of their focus is on hand writing recognition, since at that time it was suggested that the pen was a viable, and even preferred, alternative to the keyboard for novice typists. However, the authors state that “once the novelty wore off, most of the users hated the handwriting recognition component of the pen-based interface.” For operations other than handwriting, the participants said that they preferred the fine motor control of the pen over the mouse when pointing, selecting, moving, and sketching. They also preferred selecting menus and buttons using the digitizer tablet. A more recent study by Turner, Pérez-Quiñones, and Edwards (2007) compares how students revise and annotate UML diagrams using pen and paper, the Tablet PC, and a mouse and keyboard. They found that more detailed editing instructions are given with pen and paper and the use of gestural marks such as circles and errors were more common with pen and paper and Tablet PC. However, with mouse and keyboard, their participants made notes
54
with more explicit references to object names in the diagram. Their evaluation includes only writing and drawing actions with a single application. In spite of these researchers attempting to answer Briggs et al.’s call for more realistic pen input studies, one must remain cautious regarding their results since only the Turner et al. study uses direct pen input with a modern Tablet PC device and operating system. Moreover, only Turner et al. evaluate behaviour with a conventional GUI.
Summary
Although observing tool usage in real life is often done with a field study, these ethnographic inquiries are more suited to addressing general aspects of Tablet PC usage in a larger work context. In contrast, the observational studies by Briggs et al. (1993) and Turner et al. (2007) focus on specific tasks while maintaining more ecological validity. Recent work from the Computer Supported Cooperative Work community (e.g., Ranjan et al., 2006; Scott et al., 2004) have combined aspects of traditional field research methodologies with more specific inquiries into lower-level interaction behaviour with controlled tasks – an approach we draw upon.
3.2 Study
Our goal is to examine how usable direct pen input is with a conventional GUI. For our study, we imagine a scenario where an office worker must complete a presentation while away from their desk using their Tablet PC. They use typical office applications like a web browser, spreadsheet and presentation tool. Because our focus is on GUI manipulation, the scenario could be completed without any text entry. Rather than conduct a highly controlled experimental study to examine individual performance characteristics in isolation, or, at the other extreme, an open-ended field study, we elected to perform a laboratory-based observational study situated between these two ends of the continuum with real tasks and real applications. By adopting this approach, we hope to gain a better understanding of how pen input performs using the status-quo GUI used with current Tablet PC computers. Users primarily interact with a GUI through widgets – the individual elements which enable direct manipulation of underlying variables (see Figure 3-4 for examples). The
55
frequency of use and location of widgets is not typically uniform. For example, in most applications, menus are used more often than tree-views. Buttons can appear almost anywhere, while scrollbars are typically located on the right or bottom. Also, some widgets provide redundant ways to control the same variable, enabling different usage strategies. For example, a scrollbar can be scrolled by either dragging a handle or clicking a button. To further add variability, a series of widgets may be used in quick succession forming a type of phrase (Buxton, 1995). For example making text “10 pt Arial Bold” requires selecting the text, picking from drop-down menus, and clicking a button in quick succession. Controlling all these aspects in a formal experiment would be difficult and we would likely miss effects only seen in more realistic contexts. We had one group of users complete the tasks with a mouse as a control condition for intra-device comparison. We also recruited three groups of pen participants according to their level of computer and Tablet PC experience. To support our observational analysis, we gathered a rich set of logging data including 3-D motion capture, video taken from the participant’s point-of-view, screen capture video, and pen events such as movement and taps. We use this data to augment and refine our observational methodology with high-level quantitative analysis and visualizations to illustrate observations.
Participants
Sixteen volunteers (5 female, 11 male), with a mean age of 30.8 years (SD 5.4) were recruited. All participants were right-handed, had experience with standard office applications, and used a computer for at least 5 hours per day on average. In a pre-study questionnaire, participants were asked to rate their experience with various devices and applications on a scale of 0 to 3, where 3 was a high amount of experience and 0 no experience. All participants said their experience with a mouse was 3 (high). All participants had occupations which required a computer: office worker, researcher, designer, administrative assistant, and illustrator.
Design
A between-participants design was used, with the 16 participants divided into 4 groups of 4 people each. One group used a mouse during the study and acted as a baseline control
56
group. The remaining three groups all used the Tablet PC during the study, but each of these groups contained participants with different levels of Tablet PC or conventional computer experience. In summary, the four groups were:
• Mouse. This was the control group where participants used a conventional mouse. Participants in this group said they used a computer for between 8 and 9 hours per day.
• Pen1-TabletExperts. These were the only experienced Tablet PC users. Unlike the other pen groups, they all reported a high amount of experience with Tablet PC pen based computing in the pre-study questionnaire. They also reported using a computer for between 6 and 10 hours per day.
• Pen2-ConventionalExperts. These were experienced computer users who reported using a wide range of hardware, software and operating systems, but they did not report having any experience with Tablet PC pen based computing. They also reported that, on average, they used a computer between 9 and 10 hours per day.
• Pen3-ConventionalNovices. These were limited experienced computer users who used a single operating system and had a limited range of software experience (primarily standard office applications like word processors, spreadsheets, web browsing, and presentation tools). As with the Pen-2 group, they did not have any experience with Tablet PCs. They reported using a computer between 5 and 7 hours per day, which is less than all other groups.
Apparatus
The study was conducted using a Lenovo X60 Tablet PC with Intel L2400 @ 1.66 GHz and 2 GB RAM. It has a 1050 × 1400 pixel (px) display measuring 184 × 246 mm (12.1 inch diagonal) for a device resolution of 5.7 px/mm. We used the Windows Vista operating system and Office 2007 applications since they were state-of-the art at the time (we conducted this experiment in 2007) and both were marketed as having improvements for pen computing. The scenario applications were Microsoft PowerPoint 2007 (presentation tool), Microsoft Excel 2007 (spreadsheet), and Internet Explorer 7 (web browser). Since the completion of this study, Microsoft released Windows 7. It includes all of Vista’s pen computing improvements and adds two pen- related improvements: faster and more accurate
57
handwriting recognition in more languages and a handwriting entry method for mathematical notation (Microsoft, 2009). Since these improvements are both text-entry related, the results of our study remain as relevant for Windows 7 as they do for Windows Vista. We gathered data from four logging sources:
• User view. A head mounted 640 × 480 px camera recorded the user’s view of the tablet at 30 fps (Figure 3-1a). A microphone also captured participant comments and experimenter instructions.
• Motion capture. A Vicon 3-D motion capture system (“Vicon Motion Systems”) recorded the position and orientation of the head, forearm, tablet, and pen using 9.5 mm markers (Figure 3-1b) at 120 frames-per-second (fps). This data was filtered and down sampled to 30 fps for playback and analysis.
• Screen capture. The entire 1040 × 1400 px display was recorded as a digital screen capture video at 22 fps.
• Pen events. Custom logging software recorded the pen (or mouse) position, click events, and key presses. Our event logger was implemented as a Windows Global Hook process (Microsoft, "Hooks") and we were unable to capture pen specific data such as pressure.
At the basic level, these logs provided a record of the participant’s progress through the scenario. By recording their actions in multiple ways, we hoped we could discern when an intended action was successful or not. Moreover, capturing 2-D and 3-D movements would enable us to visualize characteristic motions. We also felt that a view of the pen, hand, and display together would be particularly useful for analysing direct input interactions. The Motion Capture and User View logs ran on dedicated computers. The Screen and Pen Event logs ran on the tablet without introducing any noticeable lag. Although the Vicon motion tracking system supports sub-millimetre tracking, the captured data can be somewhat noisy due to the computer vision-based reconstruction process. To compensate for this noise, we applied a low pass filter using cut-off frequencies of 2 Hz for position and 6 Hz for rotation before down sampling to 30 fps. Unlike most controlled experiments with Tablet PC input, we intentionally did not place the tablet in a fixed location on a desk. Instead participants were seated in a standard
58
chair with the tablet configured in slate mode and held on their lap (Figure 3-1). This was done to approximate a working environment where tablet usage would be most beneficial (e.g. while riding a subway, sitting at a park bench, etc.). If the device was placed on a desk, then using a mouse becomes practical, and perhaps users in the environment would opt to use a mouse instead of the pen. Mouse participants were seated at a standard desk with the same Tablet PC configured in open laptop mode. A wired, 800 DPI, infra-red mouse was used with the default Windows pointer acceleration (dynamic control-display gain) setting.
(b) (a)
(b)
Figure 3-1. Experimental setup and apparatus. In the Tablet PC conditions, the participant was seated with the tablet on their lap: (a) a head- mounted video camera captured their point-of-view; (b) 9.5mm passive markers attached to the head, forearm, pen and tablet enabled 3-D motion capture.
Protocol
The study protocol required participants to follow instructions as they completed building a short Microsoft PowerPoint presentation slide deck using a web browser, spreadsheet, and presentation tool. The slide deck was partially constructed at the beginning of the study and users could complete all tasks without entering any text. Participants were told that the study was not about memorization or application usability. They were instructed
59
to listen closely to instructions, and then complete the task as efficiently as possible. As they worked, they were told to “think aloud” by saying what they were thinking, especially when encountering problems. If they forgot what they were doing altogether, the experimenter intervened and clarified the task. Each session was conducted as follows: first, each of the participants in the 3 tablet groups completed the pointing, dragging and right-clicking sections of the standard Windows Vista Tablet PC tutorial. Then all participants completed a brief PowerPoint drawing object manipulation training exercise, since early pilot experiments revealed that these widgets were not immediately intuitive. This training period took between 5 and 10 minutes. Once training completed, the main portion of the study began. The experimenter used a prepared script to issue tasks for the participant to complete. The participant completed each task before the experimenter read the next one. At the conclusion of the study, we conducted a brief debriefing interview.
Tasks
In total, our seven page study script contained 47 tasks which had to be completed. The entire script is reproduced in Appendix A, and time-lapse demonstration of the entire scenario can be viewed in Video 3-1 , we will summarize the main sections and give a small example from the script. The first 8 tasks asked the participant to open a PowerPoint file and correct the position and formatting of labelled animal thumbnails on a map (Figure 3-2). The text for tasks 4 through 8 are shown below to convey the general style:
Task 4: Correct the position of the polar bear and owl thumbnails: the polar bear’s habitat is in the north and the owl is in the south.
Task 5: Make the size, orientation, and aspect ratio of all the thumbnails approximately the same as the beaver. It’s fine to judge this “by eye.”
Task 6: Change all labels to the same typeface and style as the beaver (20pt Arial Bold). You may have to select the “Home” tab to see the font controls.
Task 7: Now, we’ll make all animal names perfectly centered below the thumbnail. Select the label and the thumbnail together and pick “Align/Align Center” from the “Arrange” button in the “Drawing” toolbar.
60
Task 8: This is a good time to save your presentation. Press the save document icon located at the top left of the application.
(a) initial state of slide at beginning of task 4 (b) final state of slide at end of task 8
Figure 3-2. Study screen captures taken from initial task sequence. Here, the participant corrects the formatting of labelled animal thumbnails on a map: (a) before task 4 where some thumbnails are in the wrong position, scaled or rotated incorrectly, or have text labels rendered in different fonts and sizes; (b) at the conclusion of task 8 when the participant said all requested corrections to the thumbnails are complete.
Time: 01:36 Vogel_Daniel_J_201006_PhD_video_3_1.mp4 Video 3-1. Time-lapse demonstration of study scenario.
Tasks 9 through 20 continued with the participant navigating to, and then completing a slide about one of the animals. This required copying and pasting text from Wikipedia, and inserting a picture from a file folder (Figure 3-3a). Tasks 21 to 26 repeated the same steps with two more partially completed animal slides. In tasks 27 to 37, the participant used Excel to create a chart of animal population trends from a pre-existing data table, copied and pasted
61
it into the final slide of the presentation and added ink annotations such as written text and coloured highlighting (Figure 3-3b). Finally tasks 38 to 47 asked the participant to configure various slide show options and save an HTML version. After viewing the final presentation in the web browser, the study was complete.
(a)
task 11 tasks 12 - 19 task 20
(b)
task 31tasks 32 - 40 task 41
Figure 3-3. Screen captures of selected scenario tasks. (a) tasks 11 and 20, in which the participant is asked to complete a slide about one of the animals by copying and pasting text from Wikipedia, and inserting a picture from a file folder; (b) tasks 31 and 41, in which the participant uses Excel to create a chart of animal population trends from a pre-existing data table, copy and paste it into the final slide of the presentation, and add ink annotations.
62
The tasks in the study covered simple interactions like pressing the save button, as well as more complex tasks like formatting several text boxes on a presentation slide. We included two different types of tasks to reflect real world usage patterns:
• 40 Constrained tasks had a predictable sequence of steps and a clearly defined goal for task completion. Task 8 (shown above), which asked the participant to press the save button, is an example.
• 7 Open-ended tasks had a variable sequence of steps and required the participant to assess when the goal was reached and the task complete. Task 5 (shown above), which asked the participant to match the size and orientation of animal thumbnails, is an example.
Participants took between 40 minutes and 1 hour to complete the study, including apparatus set-up, training time, and debriefing. Comments from our participants suggested that the tasks were not too repetitive or too difficult, and in contrast to formal experiments we have administered, participants felt that time passed quickly – some even said the study was enjoyable.
Widgets and Actions
The tasks in our study were designed to exercise different widgets like menus and scrollbars, and common actions like text selection, drawing, and handwriting. Figure 3-4 illustrates some of the lesser-known widgets with a brief explanation. For our purposes, we categorize widgets according to their functionality, visual appearance, and invocation. For example, a menu is different than a context menu since the latter is invoked with a right- click, a drop-down is different from a menu because the former has a scrollable list of items, and a tree-view is different because hierarchical items can be expanded and collapsed. Note that widgets can also be categorized according to widget capabilities (Gajos & Weld, 2004), but this would consider a menu, tree-view, drop-down, and even a group of radio buttons equivalent, since all are capable of selecting a single choice given a set of possible options. However, in our categorization, these are different widgets because their functionality and visual appearance are different.
63
scrollbar slider up-down
splitter handles
drop-down text-box object tooltip tree-view Figure 3-4. Illustration of selected widgets. a slider adjusts a parameter by dragging a constrained handle; an up-down increments or decrements a parameter by incrementing using a button or by entering the value in a text-box; a drop-down selects one option from a list of choices which are shown upon invocation-- it may use a scrollbar to access a long list; a tree-view enables navigation through a hierarchical collection of items by expanding and collapsing nodes; a text-box object is one kind of drawing object in PowerPoint— it uses handles for scaling and rotation; a splitter changes the proportion of two neighbouring regions by dragging a constrained handle.
During our study, we expected participants to use 20 different widgets and 5 actions (Table 3-1). The actions include 3 types of selection: marquee (enclosing 2 or more objects by defining a rectangle with a drag), cell (selecting an area of spreadsheet cells by dragging), and text (selecting text by dragging). By analyzing our script, we calculated the expected minimum number of widget or action occurrences (Table 3-1, column N) necessary to complete the scenario. If a complex widget is composed of other complex widgets (such as when a drop-down includes a scrollbar) we considered these to be nested but distinct occurrences of two different widgets. In the case of buttons, we did not create a distinct occurrence for a button used to open another widget such as a menu or drop-down, or a button that is inherently part of a widget, such as the pagination buttons on a scrollbar. The specific instance of each widget or action may vary by size, orientation, position, and magnitude of adjustment. Note that the occurrence frequency distribution is not balanced and is dependent on the particular tasks in our script. This is a trade-off when adopting a realistic scenario such as in our study. However, we feel our tasks are representative of those used in other common non- text entry office application scenarios, and compared to repetitive controlled experiments, are much more representative of real usage patterns.
64
Widget/Action N I Widget/Action N I Widget/Action N I
button 52 52 up-down 8 88 window handle 4 8
drop-down 20 40 text select (A) 6 6 writing (A) 3 3
scrollbar 20 20 marquee select (A) 6 6 cell select (A) 2 2
menu 19 36 check-box 6 6 chart object 2 3
text-box object 18 27 slider 5 9 hyperlink 2 2
object handle 17 17 drawing (A) 5 5 color choice 1 1
tab 16 16 image object 5 5 splitter 1 1
context menu 16 32 tree-view 4 7
file object 11 11 radio button 4 4
Table 3-1. Ideal amount of widget and action usage in our study. N is the number of widget occurrences, I is the number of interactions such as clicks and drags. Actions are indicated with (A).
The total number of widget or action occurrences only conveys part of their usage frequency. We also computed the expected ideal number of interactions (clicks, drags, right- clicks, etc.) for each occurrence (Table 3-1, column I). Some widgets, such as a single level context menu, always have a predictable number of interactions – right-click followed by a single-click – resulting in 2 interactions per occurrence. Other widgets such as an up-down widget can have wide variance due to their magnitude of adjustment. If an up-down has to be decremented from a value of 5 to a value of 4 using 0.1 steps, it requires 9 click interactions. Some widgets, such as the scrollbar, enable a task to be completed in different ways. For example, if a browser page must be scrolled to the bottom, the user may either grab the scroll box and drag to the bottom of the page, or make many clicks in the paging region. When calculating types of interactions for this type of widget, we selected the optimum strategy in terms of effort. For the ideal number of interactions we used the minimum. In other words, we use the most efficient interaction style without any errors. For example, our predictions may assume that a scrollbar can be operated with a single drag rather than a sequence of page down taps for a long scrolling action. In practice, this ideal number of interactions may be difficult to achieve, but it does provide a theoretical bound on efficiency and enabled us to get a sense of the widget frequencies a-priori. In total, we calculated ideal numbers of 253
65
widget occurrences, and 407 interactions like clicking and dragging during the study (Table 3-2).
Interaction Number
single-click 301
double-click 11
drag 79
right-click 16
TOTAL 407
Table 3-2. Ideal number of expected interactions by interaction type.
3.3 Analysis
We based our methodology on Interaction Analysis (Jordan & Henderson, 1995) and open coding (Strauss & Corbin, 1998). Interaction Analysis “is an interdisciplinary method for the empirical investigation of the interaction of human beings with each other and with objects in their environment” (Jordan & Henderson, 1995, p. 39). It leverages video as the primary record of observation and uses this to perform much more detailed analyses of subtle observations than would be possible with traditional written field notes. However, we gathered an even richer and more quantitative collection of observational data, and focus on a much more constrained interaction context. Thus, we follow Scott (2005) and use the open coding approach. Open coding begins with a loosely defined set of codes in a first examination of the data logs, and then with subsequent iterations; the codes are refined and focused as trends emerge. In our case, we began coding general errors and refined this to ten specific types of errors, and then iterated again to code specific widgets and actions for more detailed analysis.
Custom Log Viewing and Analysis Software
Our user point-of-view video log, motion capture data, screen capture video, and pen event log were synchronized, segmented, and annotated using custom software we developed (Figure 3-5). Each data source is viewed in a separate player with the ability to pause, time
66
scrub, frame advance, adjust playback speed, etc. Our software is similar to the INTERACT commercial software package (“Mangold”). However, by building our own system, we could include custom visualizations for the pen event log and motion capture (see also Figure 3-6), more accurately synchronize the logs using our “time mark” scheme (explained below), easily gather data points based on annotation, and write code to gather specific data for statistics. Although this tool was purpose built for this analysis, we later revised it to be more general and used it to review video of the experiments discussed in chapters 4, 5 and 6.
(a) (b) (c) (d)
(e) (f) (g)
Figure 3-5. Analysis software tool. (a) screen capture player; (b) pen event player; (c) user view player; (d) synchronization markers and unified playback control; (e) motion capture player; (f) annotation detailed description; (g) annotation codes.
67
Tablet and Pen Laptop and Mouse
(a) (a) (b)
(c) (b) (d) (c)
(d)
Figure 3-6. Motion capture player. In addition to conventional playback controls, a 3-D camera can be freely positioned for different views. In the views shown in the figure the camera is looking over the participant’s shoulder, similar to the photo in Figure 3-1 right. Objects being tracked are: (a) head position; (b) tablet or laptop display; (c) pen tip or mouse; (d) forearm. In the laptop condition, the laptop keyboard was also tracked (shown in red).
Synchronization
The user view video source was used as the master time log. To assist with synchronization, six visual “time markers” were created by the participant during the study session. Each time mark was created by lifting the pen high above the tablet, and then swiftly bringing the pen down to press a large button in the pen event logging application. This created a time stamp in the pen event log, and changed the button to red for one second which created a visual marker for the user view and screen capture logs. The distinctive pen motion functioned as a recognizable mark in the motion capture log.
Segmentation into Task Interaction Sequences
Once synchronized, the logs were manually segmented into 47 sequences in which the participant was actively completing each task in our script. Since our study is more structured than typical interaction analysis, this is analogous to the first step of the open coding process where the data is segmented into a preliminary structure. A sequence began when the participant started to move towards the first widget or object, and the sequence ended when they completed the task. This removed times when the experimenter was introducing a task and providing initial instructions, the participant was commenting on a task after it was
68
completed, or when stopping and restarting the motion tracking system. This reduced the total data log time for each participant to between 20 and 30 minutes.
Event Annotation and Coding
The coding of the 47 task sequences for each of the 16 participants was performed in stages with a progressive refinement of the codes based on an open coding approach (Strauss & Corbin, 1998) with two raters – two different people identified events and coded annotations. First, a single rater annotated where some event of interest occurred, with an emphasis on errors. Next, these general annotations were split into three classes of codes, and one class, interaction errors, was further split into six specific types. A second rater was then trained using this set of codes. During the training process, the codes were further refined with the addition of a seventh type of interaction error and a fourth class (both of these were subsets of existing codes). Training also produced coding decision trees which provided both raters with more specific guidelines for code classification and event time assignment (see below). The second rater used this final set of codes and classification guidelines to independently identify events and code them across all participants. The first rater also independently refined their codes across all participants as dictated by the final set of codes and guidelines. There was a high level of agreement of codes for events found by both raters (Cohen’s Kappa of 0.89), but also a high number of events identified by one rater but not the other. We considered an event to be found by both raters if each rater annotated events with times separated by less than two seconds. Both raters found 779 events, but rater 1 and rater 2 found 238 and 251 additional events respectively. A random check of these additional events strongly indicated that these were valid events, but had simply been missed by the other rater. Moreover, the codes for these missed events did not appear to have a strong bias – raters did not seem to miss a particular type of event. Thus, with the assumption that all identified events are valid, events found by both raters account for 61% of all events found. In a similar coding exercise, Scholtz et al. (2004) also found that raters had difficulty finding events of interest (called “critical incidents” in their domain).
69
Given the high level of agreement between raters when both raters identified the same event, we felt justified to merge all events and codes from both raters. When there was disagreement (66 out of 779 cases), we arbitrarily chose to use the code from rater 1. We should note that rater 1 was the primary investigator. To guard against any unintentional bias, we examined our results when rater 2 was chosen: we found no significant changes in the quantitative results.
Annotation Events and Codes
Each annotation included the code type, synchronized log time for the event, the widget or object context if applicable, and additional description as necessary.
Code Types We identified four classes of codes: experiment instructions, application usability, visual search, and interaction errors. Events coded as experiment instructions, application usability, and visual search are general in nature and should not be specific to an input device. We felt these codes forced us to separate true interaction error codes from these other types of non-interaction errors:
• Experiment Instructions: performed the wrong task, adjusted wrong parameter (e.g., when asked to make photo 4 inches wide, the participant adjusted the height instead), or asked the experimenter for clarification
• Application Usability: application design led directly to an error (we identified specific cases as guidelines for raters, see below)
• Visual Search: performing a prolonged visual search for a known target
Since our focus is on pen based interaction on a Tablet PC versus the baseline mouse based interaction, we were most interested in interaction errors which occurred when the participant had difficulty manipulating a widget or performing an action. We defined eight types of interaction error codes:
• Target Selection: could not click on intended target (e.g., clicking outside the scrollbar)
70
• Missed Click: making a click action, but not registering any type of click (e.g., tapping too lightly to register a click)
• Wrong Click: attempting one type of click action, but a different one is recognized by the system (e.g., right-clicking instead of dragging a scrollbar, single click instead of a double click)
• Unintended Action: attempting one type of interaction, but accidentally invoking a different one (e.g., attempted to open a file with a single click when a double-click is required)
• Inefficient Operation: reaching the desired goal, but without doing so in the most efficient manner (e.g., scrolling a large document with individual page down clicks rather than dragging the scroll box; overshooting an up-down value and having to backtrack)
• Repeated Invocation: unnecessarily invoking the same action multiple times (e.g., pressing the save button more than once just to be sure it registered)
• Hesitation: pausing before clicking or releasing (e.g., about to click on target, then stop to carefully position pen tip)
• Other: errors not described by the above codes
Event Times The time to log for an event was almost always at the beginning. An ambiguous case occurs for some error events, such as when the participant is dragging. In these cases, we defined the time of the event to be when the participant set the error in motion. For example, when selecting text or multiple objects with a marquee, if the click down location constrained the selection such that an error was unavoidable, then the event time is logged at the down action. However, if the error occurs at the up action, such as a movement while releasing the pen tip from the display, then the event time is logged at the up action.
Coding Decision Trees We developed two coding selection decision trees: one is used when a participant makes a noticeable pause between actions (Figure 3-7) and a second is used when a
71
participant attempts an action (Figure 3-8). We defined “action” as an attempted click (“tap”); “right-click”; beginning or ending of a drag; or operating a physical key, wheel, or button. The definition of “noticeable” is somewhat subjective and required training – a rough guideline is to look for pauses more than two seconds that interrupt an otherwise fluid movement. With some practice, these noticeable pauses became more obvious. The participant makes a noticeable pause between actions.
Are they asking the experimenter a question Yes Experiment to clarify the task? Instructions
No
Is the participant searching for the target Yes Visual of their next interaction? Search
No
Was the pause immediately before and Yes Hesitation near the next action (1)?
No
Nothing to code. Figure 3-7. Coding decision when participant makes a noticeable pause. See below for additional notes (numbered notes in parenthesis).
72
The participant attempts an action.
Are they intentionally performing a different Yes Experiment action from the script (2)? Instructions
No
Was the manipulation successful?
No Yes
Was the error caused by Was the action Application Yes a known application performed in an Yes Inefficient Usability usability problem (3)? inefficient manner (4)? Operation
No No
Did the system fail to Did they perform the Missed Yes recognize any action? same action multiple Yes Repeated Click times unnecessarily? Invocation
No No
Did they miss the Nothing to code. Target Yes intended target? Selection
No
Was the resulting click Wrong Yes type different than the Click one expected?
No
Was a different action Unintended Yes triggered than the one Action expected?
No
Other
Figure 3-8. Coding decision tree when participant attempts an action. See below for additional notes (numbered notes in parenthesis).
73
Additional Notes on Decision Trees for Figure 3-7 and Figure 3-8:
(1) “just before”: The participant has found the location for the action, but does not perform the interaction immediately.
(2) “different action”: wrong task, adjusting wrong parameter such as adjusting height instead of width, attempting to gain access to parameter in a different way than requested (e.g., using a context menu instead of a toolbar).
(3) Known usability problems were identified to remove errors that are application specific or exhibit obvious poor design. Many involved the PowerPoint (PPT) textbox:
attempting a move a PPT textbox by trying to drag the text directly (PPT requires the user to first select the text box, then drag it from the edge);
default insertion point in a PPT textbox selects single word only and subsequent formatting does not affect entire textbox (rather than selecting all text first);
marquee selection misses the invisible PPT textbox bounding box;
attempting to select text in a PPT textbox but the down action is off of the visible text which deselects the textbox and begins a selection marquee instead;
changing a checkbox state by clicking on the label text (but the application only supports a direct click on a checkbox);
or, a problem selecting an Excel chart object (when opening a context menu or moving a chart, the application often selects an inner “plot area” rather than entire chart).
(4) Inefficient Manner: reaching the desired goal, but doing so in a noticeably inefficient manner (e.g., scrolling a large document with individual page down clicks rather than dragging the scroll box; overshooting an up-down value by several steps and having to backtrack). This is admittedly somewhat subjective and difficult to quantify; we relied on rater training to ascertain this behaviour.
Interactions of Interest
After a preliminary qualitative analysis of the task sequences and interaction errors, we identified specific interactions of interest and further segmented these for more detailed analysis. We selected widgets and actions that are used frequently (button), highly error prone (scrollbar), presented interesting movements (text selection), or highlighted differences between the pen and mouse (drawing, handwriting, and keyboard use). These are discussed in section 3.5.
Other Annotations
We also transcribed relevant comments from the participant and noted sequences where problems were caused by occlusion or tooltips and other hover-triggered visual feedback.
74
3.4 Results
Time
To get a sense of performance differences between mouse and the three pen groups, we calculated group means for the total time used to complete the 40 constrained tasks (since these tasks have a predictable sequence of steps and a clearly defined goal for task completion). Note that these times include some non-task segments such as when participants provide their comments as per the think-aloud protocol. The graph suggests slightly higher times for the pen condition, decreasing with increased computer experience (Figure 3-9). A similar trend can be seen for the total time for all tasks, but with much higher variance. A
one-way analysis of variance found a significant main effect of group on time (F3,12 = 7.445, p < .005) with the total times for the Pen3-ConventionalNovices group significantly longer than Mouse and Pen1-TabletExperts groups (p < .02)11. No other significant differences were found between the other groups. Perhaps more interesting is the range of completion times for mouse and pen participants. The best and worst times for the Mouse group were 12.8 minutes (P1) and 19.0 minutes (P3); while the best and worst times across all pen groups were 16.1 min (P7 - Pen1-TabletExperts) and 28.0 min (P13 - Pen3- ConventionalNovices). The best time for a mouse participant is well below the best pen user time, even for expert Tablet PC users.
11 All post-hoc analyses use the conservative Bonferroni adjustment.
75
30.0
25.0
20.0
15.0 23.3 10.0 18.5 Time (minutes) 16.0 17.2 5.0
0.0 Mouse Pen1 Pen2 Pen3
Figure 3-9. Mean time for all constrained tasks per group. (Note: error bars in all graphs are 95% confidence interval.) (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)
Errors
We annotated 1276 errors across all 16 participants in all 47 tasks. This included non- interaction errors (experiment instruction, visual search, and application usability), which we briefly discuss before focusing on interaction errors which are the most relevant.
Non-Interaction Errors
We found 72 application usability errors, 151 experiment instruction errors, and 41 visual search errors overall. The mean number per group does not appear to form a pattern, with the possible exception of application usability appearing higher with Pen3- ConventionalNovices (Figure 3-10). However, no significant differences were found. The large variance for experiment instructions with the Mouse group was due to participant 3 who often clarified task instructions (that person had 25 experiment instruction errors compared to the next highest participant in the study with 15, a Pen2-ConventionalExpert).
76
25 Application Usability 20 Experiment Instructions Visual Search 15
10 Number of Errors 5
0 Mouse Pen1 Pen2 Pen3
Figure 3-10. Mean non-interaction errors per group. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)
The breakdown of specific application usability errors (which we identified during the coding process, see above) found a large proportion of errors when attempting to select text in the textbox (49%) and marquee selection missing the “invisible” textbox bounding box (17%).
Interaction Errors
Recall that interaction errors occur when a click or drag interaction is attempted. The mean number of interaction errors in each group suggests a pronounced difference between mouse and pen groups (Figure 3-11). A one-way analysis of variance found a significant main effect of group on interaction errors (F3,12 = 10.496, p = .001). Post-hoc analysis found the Mouse group lower than both Pen2-ConventionalExperts and Pen3-ConventionalNovices, and Pen1-TabletExperts lower than Pen3-ConventionalNovices (all p < .05, using the Bonferroni adjustment).
77
160 140 114 120 100 73 80 60 51
Number of Errors Number 40 15 20 0 Mouse Pen1 Pen2 Pen3
Figure 3-11. Mean interaction errors per group. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)
From a breakdown of interaction error type for each group (Figure 3-12), target selection appears to be most problematic with pen users, especially those with less experience, the Pen3-ConventionalNovices group. For the mouse group, unintended actions and target selection were similar, but all errors were low. Wrong clicks, missed clicks, and unintended actions are roughly comparable across pen groups, with a slight increase with less experience. Not surprisingly, there were no missed click errors with the mouse group – a dedicated button makes this type of error unlikely. We analyze each interaction error type below, with an emphasis on target selection, wrong clicks, unintended actions, and missed clicks.
100 90 Target Selection WrongClick 80 Unintended Action 70 Missed Clicks 60 All Other 50 40 30 Number of Errors Number 20 10 0 Mouse Pen1 Pen2 Pen3
Figure 3-12. Mean interaction errors by error type. All other errors include hesitation, inefficient operation, and repeated invocation. (Pen1- TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)
78
Target Selection Error Location
All Tablet PC participants had trouble selecting targets, with the number of errors increasing with lower computer experience. In reviewing the data logs, we noted that target selection errors may be related to location. A heat map plot – a 2-D histogram using color intensity to represent quantity in discrete bins -- compares the concentration of all taps and clicks for all tasks for all pen participants (Figure 3-13a) and the relative error rate computed by taking the ratio of number of errors to the number of taps/clicks per bin (Figure 3-13b). The concentration of taps/clicks is somewhat centralized (aside from the peak in the upper right where up-down widgets in tasks 15, 21, 23, and 34 required many sequential clicks). The error rate has a different distribution, with higher concentrations near the mid-upper-left and along the right-side of the display compared to all taps/clicks. There may also be an interaction with target size, which we could not control for. We could have carefully annotated actual target size based on the screen capture log, but this would have required considerable effort which did not seem to provide a suitably large gain in analysis. A better approach would be to conduct a controlled experiment in the future to validate this initial finding.
(a) Number of Taps/Clicks (b) Error Rate
300 0.9 250 0.8 0.7 200 0.6 150 0.5 0.4 100 0.3 0.2 50 0.1 0 0.0
Figure 3-13. Pen participant heat map plots for taps/click and errors. (a) all taps/clicks; and (b) target selection error rate, the number of errors over taps/clicks. A heat map is a 2-D histogram using color intensity to represent quantity in discrete bins; in this case, each bin is 105 x 100 pixels.
79
Wrong Click Errors
We observed three types of unintended actions which occurred with seven or more Tablet PC participants (Table 3-3).
Unintended Action Type Tablet PC Mouse Frequent Pen Contexts
a) right-click instead of a single- 32 occurrences none up-down (15), drop-down (7), click 9 participants button (4), menu (2) 0.9 % rate 1
b) right-click instead of drag 26 occurrences none slider (12), scrollbar (7) 9 participants 2.7 % rate 2
c) click instead of double-click 24 occurrences 1 occurrence file object (all) 8 participants 1 participant 18 % rate 3 2 % rate 3
d) click instead of right-click 14 occurrences none context menu invocation (13) 7 participants 7.3 % rate 4
1. Occurrence rate calculated using 300 estimated single-clicks (Table 3-2). 2. Occurrence rate calculated using 79 estimated drags (Table 3-2). 3. Occurrence rate calculated using 11 estimated file operations (Table 3-1). 4. Occurrence rate calculated using 16 estimated right-clicks (Table 3-2).
Table 3-3. Wrong click errors.
Participants had problems accidentally invoking a right-click when attempting to single-click or drag (Table 3-3a,b). Right clicking instead of clicking occurred most often with the up-down widget. In some cases participants held the increment and decrement buttons down expecting this to be an alternate way to adjust the value. Other common contexts were long button presses, and triggering in menus by dwelling on the top-level item to open nested items. Right clicking instead of dragging occurred most often with scrollbars and sliders. With these widgets, pressing and holding the scrolling handle triggered a right- click if the drag motion did not begin before the right-click dwell period. A similar problem occurred when a slow double-click was recognized as two single clicks in file navigation (Table 3-4c). This often put the file or folder object in rename mode, and subsequent actions could corrupt the text. It appears to be because of timing and location: if the two clicks were not performed quickly enough and near enough to each other, they were not registered as a double-click. Accot and Zhai (2002) (2000, sec. 5.12) note that the rapid successive clicking actions required by double-clicking can be difficult to perform
80
while keeping a consistent cursor position, and our data supports this intuition. This is a symptom of pen movement while tapping, which we explore in more detail below. Clicking instead of right clicking was less frequent (Table 3-4d), but when an error occurred, the results were often costly. For example, when invoking a context menu over a text selection, several participants clicked on a hyperlink by accident. This was not only disorienting, but also required them to navigate back from the new page and select the text a second time before trying again. We also noted cases of dragging or right-dragging instead of clicking or right-clicking. Accidental right-dragging was also disorienting. The end of even a short right-drag on many objects opens a small context menu – since this occurred most often when participants were opening a different context menu, they easily become confused.
Unintended Action Errors
We observed three types of unintended actions which occurred with seven or more Tablet PC participants (Table 3-4). Five were caused by erroneous click or drag invocations; one occurred when completing a drag operation; and one was caused when navigating files folders. In fact, two of these unintended action errors occurred almost exclusively with file folder navigation. Another culprit was the right click, which was a factor in four unintended error types.
Unintended Action Type Tablet PC Mouse Frequent Pen Contexts
a) attempt to open file or folder 28 occurrences 1 occurrence file folder navigation (all) with single-click instead of double- 10 participants 1 participant click 1 1 21 % rate 2 % rate
b) movement on drag release 27 occurrences 1 occurrence dragging a drawing object’s 10 participants 1 participant rotation or translation handle 2.8 % rate 2 0.3 % rate 2 (22), or text selection (4)
c) dragging corner resize handle 9 occurrences 1 occurrence PowerPoint image object (all) locks aspect ratio, unable to resize 7 participants 1 participant in one dimension only 0.9 % rate 2 .3 % rate 2
1. Occurrence rate calculated using 11 estimated file interactions (Table 3-1). 2. Occurrence rate calculated using 79 estimated drags (Table 3-2).
Table 3-4. Unintended action errors.
There were three unintended errors with a wide distribution across pen participants. One common error was attempting to open a file or folder with a single-click instead of a double-click (Table 3-4b). 10 out of 12 pen participants attempted at least twice to open a file
81
by single clicking rather than using a conventional double-click. 2 of these participants made this mistake 4 or more times. It seems that the affordance with the pen is to single-click objects, not double-click as one participant commented: "I almost want this to be single-click which I would never use in normal windows, but with this thing it seems like it would make more sense to have a single click." (P9-Pen1-TabletExperts 33:30) 12 Unintended movement when lifting the pen at the end of a drag action was another commonly occurring error (Table 3-4d). This occurred most often when manipulating the rotation handle of a drawing object or when selecting text. The participant would position the rotation handle in the desired orientation, or drag the carat to select the desired text, but as they lifted the pen to end the drag, the handle or carat would unexpectedly shift. A third unintended action error occurred when participants tried to resize a PowerPoint object in one dimension using the corner handle, but the corner handle constrained the aspect ratio. This may have been better classified as an application usability problem. There are two unintended actions with which we expected to find more problems: however, this did not turn out to be the case. We found only one occurrence of a premature end to a dragging action which we expected to be more common since it requires maintaining a threshold pressure while moving (Ramos et al., 2004). Also, we intentionally did not disable the Tablet PC hardware buttons during the study, but yet only found a single instance of an erroneous press.
Missed Click Errors
Due to how we classified missed click errors, 67 out of 75 (91%) of these errors occurred when single-clicking (tapping with the pen). The chance of the system failing to recognize both clicks of an attempted double click would result in a wrong click error instead. All pen participants had at least one missed click, with four participants missing more than 9. The most common contexts were button (25%), menu (21%), and image (13%).
12 The numeric text 33:30 is the time of the quote in the synchronized log. Specifically, the text expresses the time in minutes and seconds separated by a colon. In this example, the quote was recorded at 33 minutes and 30 seconds. We will use this convention for time throughout this dissertation.
82
The cause appears to be too little or too much force. Tapping too lightly is a symptom of a tentative tapping motion. We noted that when participants had trouble targeting small items (such as menu buttons, check-boxes, and menu items) they sometimes hovered above the target to verify the cursor position, but the subsequent tap down motion was short, making it difficult to tap the display with enough force. Tapping too hard13 seemed to be a strategy used by some Pen1-TabletExperts participants as an (not always successful) error avoidance technique – see our discussion below.
Repeated Invocation, Hesitation, Inefficient Operation, and Other Errors
These errors had 184 occurrences in total, across all participants. There were only 3 occurrences of other errors, the remaining were repeated invocation, hesitation, and inefficient operation (Table 3-5).
Error Type Tablet PC Mouse Frequent Pen Contexts
a) Inefficient Operation 68 occurrences 25 occurrences scrollbar (31), up-down (4) 12 participants 4 participants
b) Hesitation 57 occurrences 3 occurrences file folder navigation (all) 11 participants 3 participants
c) Repeated Invocation 27 occurrences none PowerPoint image object (all) 10 participants
Table 3-5. Repeated invocation, hesitation, and inefficient operation errors.
We noted 68 cases of obvious inefficient operation across all 12 pen participants (Table 3-5a). More than half of these cases involved the scrollbar (31) or up-down (9). With the scrollbar, some participants chose to repeatedly click to page down for very long pages instead of dragging the scroll box. With the up-down, we noted several participants overshooting the desired value and having to backtrack. Of the 25 occurrences of inefficient
13 Recall that our pen event logger did not log pressure. Thus, our observations concerning soft or hard taps is based on interpretation of the motion capture and user view logs capturing pen movement as it strikes the display, as well as the loudness of the physical tap as recorded by the user view camera microphone.
83
operation with mouse participants, 14 involved the mouse scroll-wheel: three out of four mouse participants would scroll very long pages instead of dragging the scroll box, resulting in a similar type of inefficient operation as pen participants repeatedly clicking page down. We noted 57 cases of hesitation in all three pen groups, but only 3 in the mouse group (Table 3-5b). There seemed to be two main causes. One is due to the user visually confirming the position of the cursor, rather than trusting the pen tip, when selecting a small button or when opening a small drop-down target. The other is caused by many tooltips popping up and visually obscuring parts of the target. We discuss this type of “hover junk” in more detail below. We identified 27 cases of repeated invocation across 10 pen participants (Table 3-5c). Although a small number in total, it suggests some distrust when making selections. 11 of these occurred when tapping objects to select them (images, charts, textboxes and windows). 9 of these occurred when pressing a button or tab. Recall that a repeated invocation error is only logged if the first tap was successful. This is likely a symptom of subtle or hidden visual feedback which we discuss below, and the fact that the extra cost of repeated invocation “just in case” is not that high (Sellen, Kurtenbach, & Buxton, 1992).
Error Recovery and Avoidance Techniques
We have already discussed error avoidance through repeated invocation, but there were two other related observations with experienced Tablet PC group. These participants seemed to recover from errors much more quickly, almost as though they expected a few errors to occur. For example, when they encountered a missed click error, there was no hesitation before repeating the selection action – one participant rapidly tapped a second time if the result did not instantly appear which caused several repeated invocation errors. Sellen et al. (1992) also observed this type of behaviour with experts using other devices. In contrast, participants in the Pen3-ConventionalNovices and Pen2-ConventionalExperts groups tended to pause for a moment if an error occurred; they seemed to be processing what had just happened before trying again. A related strategy used by three Pen1-TabletExperts participants was to tap the pen very hard and very fast, almost as though they were using a very hard leaded pencil on paper. When asked about this behaviour, one participant commented that this helped avoid missed clicks (suggesting they may have felt the digitizer was not very sensitive). However, we
84
noted cases where a click was missed even with this hard tap, in fact the speed and force seemed to be the cause. A better mental model for Tablet PC users is to think of the digitizer as being as sensitive as an ink pen on paper – this requires a medium speed (to allow the ink to momentarily flow) and a medium pressure (more than a felt tip marker, but less than very hard leaded pencil).
Interaction Error Context
By examining the most common widget or action contexts for interaction errors overall, we can get a sense of which are most problematic with pen input. Recall that since our study tasks follow a realistic scenario, the frequencies of interaction contexts are not balanced. For example, we expected 52 button interactions, but only 6 text selection interactions (see Table 3-1). Thus for relative comparison, a normalized error rate is appropriate. Ideally, we would normalize across the actual number of interactions per widget, but this has inherent difficulties: participants use different widget interaction strategies resulting in an unbalanced distribution, and we were not able to automatically log which widget was being used, much less know what the intended widget was (in the case of a target selection error). However, most interaction errors occur when an interaction is attempted, so we can normalize against the ideal number of interactions per widget, and calculate an estimated interaction error rate. It is important to acknowledge that the ideal number of interactions is a minimum, reflecting an optimal widget manipulation strategy. Thus, although our estimated error rate may be useful for relative comparison between widgets, the actual rates may be somewhat exaggerated if a widget was used more times than expected or manipulated with an inefficient strategy. If we accept this normalized error rate estimation, an ordered plot of widgets and actions reveals differences between mouse and pen (Figure 3-14). We only include widgets or actions with more than 3 expected interactions. The relative ordering is different between mouse and pen with the highest number of pen errors with the scrollbar and highest number of mouse errors with marquee selection.
85
0.5 Pen Mouse
0.4
0.3
0.2
Estimated Error Rate Error Estimated 0.1
0.0 file tab radio slider menu image button handle window text-box scrollbar up-down tree-view marquee* check-box drop-down text select* text context-menu Figure 3-14. Estimated interaction error rate for widget and action contexts. Error rate is calculated using the ideal number of interactions (see Table 3-1). Only widgets or actions with more than 3 expected interactions are shown. (*actions are denoted with an asterisk, all other contexts are widgets)
The top error contexts for the pen range from complex widgets like sliders, tree-views, and drop-downs to simple ones like files, handles, and buttons. Note that three out of five of the top contexts involve dragging exclusively: slider, handle, and text select. The high error rate for the scrollbar is likely due to inefficient usage. Above, we noted that many inefficient usage errors occurred when participants manipulated a scrollbar with a sequence of page down taps, rather than a single drag. Since in most cases, our ideal interaction rate expected a single drag, this would create many errors relative to the normalizing factor. Below, we examine the scrollbar in more detail and establish a more accurate error rate. We noted above that the slider and handle both had small targets, and the dragging action caused wrong click errors occasionally triggering right-clicks instead of drags. The high error rate for file objects is largely due to the high number of unintended action errors where participants attempted to open files with a single tap, and a high number of wrong click errors when double taps where recognized as a single tap. Note that although many errors with the text-box were coded as application usability errors, it still produced a 16% estimated error rate. This is partly because even after participants avoided the application usability error of attempting to reposition a text-box by dragging from the centre, after they tapped it to select, the text-box must be repositioned by
86
dragging a very narrow 7 px (1.2 mm) border target: 39% of text-box interaction errors were target selection. Mouse error rates are consistently much lower, with the exception of the marquee. Several participants had a different mental model with marquee selection: they expected objects to be selected when they touch the marquee. This resulted in marquees missing desired objects for fear of touching an undesired object. There are also two mouse-specific action contexts not shown: scroll-wheel and keys. These contexts were not estimated by our script, so a normalized error rate cannot be computed. Mouse users often used the scroll- wheel to scroll very long documents, instead of simply dragging the scrollbar thumb, and often overshot the target position, both resulting in inefficient action errors. In addition, mouse users sometimes missed keys when using short-cuts.
Movements
In additional to errors, when reviewing the logs, we could see differences in movement characteristics between mouse and pen. Using the pen seemed to require larger movements involving the arm, whereas the mouse had a more subtle movement style, often using only the wrist and fingers. Due to the number of target selection and missed click errors with the pen, we also wished to investigate the movement characteristics of taps.
Overall Device Movement Amount
To measure movements, we first had to establish a threshold movement distance per data sample that would signal a moving or stationary frame. To find this threshold, we graphed Euclidian movement distance by time for different 10 second movement samples taken from four participants. Based on these graphs, we established a dead band cut-off at 0.25 mm of movement per frame (velocity of 7.5 mm/s). With this threshold, we calculated the total amount of pen and mouse movement across all constrained tasks. Since participants completed the study in different times, we calculated a mean movement amount per minute for comparison, and calculated the mean for each group (Figure 3-15). A one-way analysis of variance found a significant main effect of group on device movement (F3,12 = 19.488, p < .001). The total movement for the mouse group was significantly shorter than all pen groups (p < .001, using the Bonferroni adjustment). We
87
found that these statistics supported our observations: pen users moved more than 4.5 times farther than mouse users per minute.
6
5
4
3
2 4.1 0.8 3.4 3.6 1 Average Distance (m/min) 0 Mouse Pen1 Pen2 Pen3
Figure 3-15. Average pen or mouse movement distance per minute. for all constrained tasks (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3- ConventionalNovices)
Since we tracked the movements of the forearm in addition to the pen, we can examine the proportion of forearm movement to wrist and finger movements – with the assumption that if the pen or mouse moves without forearm movement, it must be the result of the wrist or fingers (Figure 3-16). We found that the pen had much greater proportion of combined wrist, finger, and forearm movements than the mouse (67 – 72% compared to 36%). A one- way analysis of variance found a significant main effect of group on combined wrist, finger, and forearm movements (F3,12 = 71.926, p < .001) with less movement in Mouse group compared to all pen groups (p < .001, using the Bonferroni adjustment). This was most evident when pen participants selected items in the top or left side of the display. To do this they always had to move both their arm and hand, where mouse users were often able to use only their wrist and fingers. The reason is partly because of pointer acceleration with the mouse, but also because mouse participants had a more central home position which we discuss later in the Posture section. The larger forearm only portion with mouse group is likely attributed to the fact that unlike the pen, the mouse is not supported by the hand, and the forearm can more easily move independently.
88
100%
80% 36%
67% 72% 72% 60%
40%
48% 20% Proportion of Movement 29% 24% 24% 0% Mouse Pen1 Pen2 Pen3
Wrist and Fingers Only Forearm Only Wrist, Fingers and Forearm
Figure 3-16. Proportion of movements greater than 0.25 mm per frame. (velocity greater than 7.5 mm/s). Euclidean distance in 3-D space. (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)
Pen Tapping Stability
When selecting small objects such as handles and buttons, most GUIs (like Windows Vista used in our study) deem a selection successful only when both pen down and pen up events occur within the target region of the widget (Ahlstroem, Alexandrowicz, & Hitz, 2006). Our observations suggest that this is a problem when using the pen. To disambiguate single clicks from drags, we only included a pen down and pen up event pair if the pen up event occurred within 200ms. Using this threshold, we found that the mean distance between a pen down and up event was 1.3 to 1.8 pixels. This is similar to results reported by Ren and Moriya (2000) and discussed by Zuberec (2000, sec. 5.12). Although a relatively small movement, this can make a difference when selecting near the edge of objects. In contrast, the mean distance between mouse down and up events was only 0.2 pixels.
89
2.5
2.0
1.5
1.0 1.6 1.8
Distance (pixels) Distance 0.2 1.3 0.5
0.0 Mouse Pen1 Pen2 Pen3
Figure 3-17. Mean Euclidian distance between down and up click (for down and up click pairs separated by less than 200ms). (Pen1-TabletExperts, Pen2- ConventionalExperts, Pen3-ConventionalNovices)
We observed that after participants recognized that they were suffering from frequent selection errors, they began to visually confirm the physical pen position by watching the small dot cursor. One participant commented: “It is a little finicky, eh. It's like I'm looking for the cursor instead of just assuming that I'm clicking on the right place, so it is a little bit slower than normal.” (P9-Pen1-TabletExperts 27:39) Participants also encountered problems with visual parallax, which made selecting targets difficult at the extreme top of the tablet, supporting the arguments of Ward and Phillips (1987) and Ramos et al. (2007): “It seems that when I get to the a ... um ... portions like the menu, ... the dot doesn't seem to be in place of exactly where I'm pressing it. It seems to be rocking a little bit to the right where I need to be more precise.” (P16-Pen3-ConventionalNovices 19:59) One of the purported global pen input improvements introduced with Windows Vista is visual “ripple” feedback to confirm all single and double taps. However, not one participant commented on the ripple feedback or appeared to use it as a visual confirmation to identify missed clicks. In fact, this additional visual feedback seems to only add to the general “visual din” that seems more prevalent with the pen compared to mouse input.
90
Figure 3-18. Obtrusive tooltip hover visualizations, “hover junk” P15-Pen3-ConventionalNovices, task 11. See also Video 3-2.
Time: 00:25 Vogel_Daniel_J_201006_PhD_video_3_2.mp4 Video 3-2. Obtrusive tooltip hover visualization examples.
Pen users seemed to be more adversely affected by tooltips and other information triggered by hover: 20% of pen hesitation errors occurred when tooltips were popping up near a desired target. When compensating for digitizer tracking errors by watching the cursor, participants tended to hover more often before making a selection, which in turn triggered tooltips more often. At times the area below the participant’s pen tip seemed to become filled with this type of obtrusive hover visualization, which we nicknamed “hover junk” (Figure 3-18). “Why doesn't this go away?” [referring to a tooltip blocking the favourites tree- view] (P13-Pen3-ConventionalNovices, 33:05) When there was a lot of tooltip hover visualizations, participants appeared to limit their target selection to the area outside the tooltip, which decreased the effective target size. We did not see this behaviour with the mouse. The nature of direct interaction seems to change how people perceive the depth ordering of device pointer, target, and tooltips. With the pen, the stacking order from the topmost object down becomes: pointer Æ tooltip Æ target,
compared to tooltip Æ pointer Æ object with mouse users. This is supported by comments from Hinckley et al. regarding the design of InkSeine (2007). They tried using tooltips to explain gestures, but found they were revealed too late, can be mistaken for buttons, and that the user’s hand can occlude them altogether.
91
Tablet Movement
Given previous work investigating user preference for setting input frame-of-reference with the non-dominant hand (Fitzmaurice et al., 1999). Unlike Fitzmaurice et al. whose participants used an indirect pen tablet on a desk, our participants held the Tablet PC on their lap. We expected that the lap would create a more malleable support surface – perhaps the legs would assist in orienting and tilting the tablet – so there may be more tablet movement. Using the same 0.25mm deadband threshold used previously, we calculated the mean amount of device movement per minute (Figure 3-19). Not surprisingly, we saw almost no movement in the mouse condition in spite of their ability to adjust the display angle or move the entire laptop on the desk. The amount of tablet movement in the pen condition was also small compared to the amount of movement of the pen itself (Figure 3-15), less than 7%. There is no significant difference in pen groups due to high variance: some participants re- positioned the tablet more than others.
0.5
0.4
0.3
0.2
0.3 0.1 0.2 Distance per Min (m/min) Min per Distance 0.0 0.1 0.0 Mouse Pen1 Pen2 Pen3
Figure 3-19. Tablet or laptop movement per minute for all constrained tasks. (Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices)
Posture
Related to movements are the types of postures that participant adopted to rest between remote target interactions and to avoid occlusion.
Home Position
We observed that at the beginning and end of a sequence of interactions, pen participants tended to rest their hand near the lower right of the tablet (recall that all of our participants were right-handed). A heat map plot of rest positions for forearm and pen
92
illustrates this observation (Figure 3-20a). Pen rest positions approximate the distribution of clicks and taps (see also Figure 3-13). Forearm rest positions are concentrated near the bottom right of the tablet, and do not follow the same distribution as the pen. In contrast, the mouse distributions are more compact with peaks near their centre of mass suggesting a typical rest point in the centre of the interaction space (Figure 3-20b). The spread of movement follows from a greater overall movement with the pen (Figure 3-15).
(a) Tablet PC (b) Mouse
185 x 247mm general mouse Tablet PC display movement area
pen / mouse forearm 1.0
0.8
0.6
0.4
0.2
0.0
each cell 8 x 8 mm
Figure 3-20. Heat map plot of forearm and pen/mouse rest positions. Generated across all participants for (a) Tablet PC participants; (b) Mouse participants. Rest positions are defined as movement less than 0.25 mm per frame (velocity less than 7.5 mm/s). Tablet positions are relative to the plane of the display, Mouse positions are relative to the plane of the laptop keyboard. The dashed areas represent the tablet display and general mouse movement area on the desk where all pen or mouse positions lie.
Occlusion Contortion
We observed all pen users adopting a “hook posture” during tasks in which they had to adjust a parameter while at the same time monitor its effect on another object. For example, adjusting HSL color values in task 17 or image brightness and contrast levels in task 25 (Figure 3-21, Video 3-3). The shape of the hook almost always arched away from the participant, although participant 12 began with an inverted hook and then switched to the typical style midway. We could also see this hook posture when participants were selecting
93
text, although it was more subtle. This type of posture was also observed by Inkpen et al. (2006) when left-handed users were accessing a right-handed scrollbar. Adopting such a hook posture may affect accuracy and increase fatigue. Since it forces the user to lift their wrist above the display, it reduces pen stability. One participant commented that they found it tiring to keep the image unobstructed when adjusting the brightness and contrast. In task 25, where the hook posture was most obvious, the participant had the option to move the image adjustment window below the image which would have eliminated the need for contortion. However, we observed only one pen participant do this (P5-Pen1- TabletExperts), which suggests that the overhead in manually adjusting a dialog position to eliminate occlusion may be considered too high. Yet, previous work found that adjusting the location and orientation of paper to reduce occlusion is instinctive (Fitzmaurice et al., 1999).
Figure 3-21. Examples of occlusion contortion: the “hook posture.” See also Video 3-3.
Time: 00:43 Vogel_Daniel_J_201006_PhD_video_3_3.mp4 Video 3-3. Occlusion contortion examples: the “hook posture.”
94
3.5 Interactions of Interest
Based on our analysis of errors, movement, and posture we identified widgets and actions for more detailed analysis. We selected widgets and actions that are used frequently (button), highly error prone (scrollbar), presented interesting movements (text selection), or highlighted differences between the pen and mouse (drawing, handwriting, the Office MiniBar, and keyboard use).
Button
The single, isolated button is one of the simplest widgets and also the most ubiquitous. We expected participants to use a button (not including buttons that were part of other larger widgets) 52 times during our study which constitutes 21% of all widget occurrences. Although simple, we found an interaction error rate of 16% for pen participants compared to less than 1.5% for mouse (using the expected number of button interactions as a normalizing factor, Table 3-1). 55% of these errors were target selection errors, 17% missed clicks, and 11% hesitation, and 6% repeated invocation. We already discussed problems with target selection with pen taps above, so in this section we concentrate on other errors and distinctive usage patterns. Repeated invocation errors occurred when the user missed the visual feedback of the button press and pressed it a second time. This was most evident when the button was small and the resulting action delayed or subtle. There were three commonly occurring cases in our scenario: when opening an application using the quick launch bar in the lower left, pressing the save file button in the upper left, and closing an application by pressing on the “x” at the top right. Participants did not appear to see, or perhaps did not trust, the standard button invocation feedback, or the visual feedback used for all taps introduced with Windows Vista. Depending on the timing of the second press, the state of the application could simply ignore the second click, save the file a second time, or introduce more substantial errors like opening a second application. Missing a click on a button could result in more severe mistakes. When saving a file, the confirmation of a successful save was conveyed by a message and progress bar located in the bottom right. Since the save button is located at the top left, this meant that the
95
participant’s arm was almost always blocking this confirmation, making it easy to miss (Figure 3-22). We observed 3 participants who thought they saved their file when a missed click or target selection error prevented the save action.
save button
status message
Figure 3-22. Example of occluded status message when pressing save button. P5, Pen1-TabletExperts, task 20, 18:40
Sometimes the location of buttons (and other widgets) prevented participants from realizing the current application state did not match their understanding. For example, in task 6, we saw 4 pen users go through the steps of selecting the font size and style for a text-box, when in fact they did not have the text-box correctly selected, and missed applying these changes to some or all of the text. We did not see this with any of the mouse users. Since these controls are most often accessed in the menu at the top of the display, this is likely because the formatting preview is being occluded by the arm. After making this mistake several times, one participant asks: “How do I know if that's bold? Like I keep pressing the bold button.” (P16, Pen3- ConventionalNovices, 18:27) Although the bold button was visually indicating the bold state, he failed to realize the text he wished to make bold was not selected. While reviewing the logs of button presses, we could occasionally see a distinctive motion which interrupted a smooth trajectory towards the target creating a hesitation error. As the user moved the pen towards the button, they would sometimes deviate away to re- establish visual confirmation of the location, and then complete the movement (Figure 3-23c). In some cases this was a subtle deviation, and other times it could be quite pronounced as the pen actually circled back before continuing. We saw this happen most
96
often when the button was located in the upper left corner, and the deviation was most apparent with our novice group.
1
pen tip 3D path & shadow path start path end 2
3
Figure 3-23. Button trajectory example. when selecting the save button in the upper left corner. c movement deviation to reduce occlusion and sight target; d corrective movements near target; e return to home position, (P15, Pen3-ConventionalNovices, task 26). See also Video 3-4.
97
Time: 00:09 Vogel_Daniel_J_201006_PhD_video_3_4.mp4 Video 3-4. Button trajectory example.
Scrollbar
In our study, we found that pen participants made an average of 10 errors while using a scrollbar. With 20 expected scrollbar interactions, our estimated scrollbar error rate is 50% (Table 3-2). However, we suspected this error rate is inflated due to participants using scrollbars more often than expected (e.g., repeated invocation due to previous task errors), and participants using an inefficient and error-prone scrollbar usage strategy (e.g., multiple paging taps instead of a single drag interaction). For a more detailed examination, we coded scrolling interactions in tasks 2, 6, 21, 23, 25, and 27. According to our script, we expected 15 scrolling interactions within these six tasks breaking down into four types: four when using drop-downs, four during web browsing, two during file browsing, and five while paging slides. We found an average of 14.8 scrolling interactions (SD 0.6) which suggests that, at least in these tasks, our estimated number of scrollbar interactions was reasonable. All pen participants used different scrollbar interaction strategies, except for two Pen2- ConventionalExperts participants: one of these participants always clicked the paging region, and the other always dragged the scroll box (see Figure 3-24 for scrollbar parts). When participants changed their strategy, they often clicked in the paging region for short scrolls and dragged the scroll box for long scrolls, but this was not always the case. We observed only four cases where participants used the scrollbar button: one participant used it to increment down and three participants held it down to scroll continuously as part a mixed strategy. Overall, we counted 91 occurrences of dragging and 54 occurrences of paging. scroll box paging region button
Figure 3-24. Scrollbar parts.
98
There were 17 occurrences of pen participants using a mixed scrolling strategy – where combinations of scrollbar manipulation techniques are used together for one scroll. Six participants used such a mixed strategy two or more times, and all did so exclusively for long scrolls in drop-downs or web browsing. Most often a scroll box drag was followed by one or more paging region clicks, or vice versa. Regarding errors, we found two patterns. First, for scrollbar strategy, we found error rates of 16% for paging, 9% for dragging, and 44% for mixed strategies (rate of strategy occurrences with at least one error). A mixed strategy often caused multiple errors, with an average of 1.6 errors per occurrence. Participants often moved to a mixed strategy after an error occurred – for example if repeated paging was inefficient or resulted in errors, they would switch to dragging. Pure paging and dragging strategies had 0.5 and 0.2 errors per occurrence. Since many errors were target selection related, suggesting that the repetitive nature of clicking in the paging region creates more opportunities for error. Second, regarding errors and location, we found that 77% of scrollbar errors occurred in the 100 right-most pixels of the display (i.e. when the scrollbar was located at the extreme right), but only 61 % of scrollbar interactions were in the 100 right-most pixels. Although not dramatic, this pattern is agreement with our observation of error locations (Figure 3-13). We also noted a characteristic trajectory when acquiring a scrollbar with the pen, which we call the “ramp.” When acquiring a scroll bar to the right of the hand, we observed several users moving down, to the right, and then up to acquire the scroll box (Figure 3-26, Video 3-5). Based on the user view video log, we could see that much of the scroll bar was occluded, and that this movement pattern was necessary to reveal the target before acquiring it (Figure 3-25).
99
occluded scrollbar
(a) (b) (c)
Figure 3-25. Example of scrollbar occlusion causing “ramp” movement. (a) the scrollbar is initially occluded; (b) hand moves beyond scrollbar; (c) hand moves back and up to select scrollbar. P16, Pen3-ConventionalNovices, task 21
1 1
(a) (b)
2
1
pen tip 3D path & shadow (c) path start path end
Figure 3-26. Pen tip trajectories during scrollbar interaction. (a) short drag scroll, centre of display, P16,Pen3-ConventionalNovices, task 21; (b) long drag scroll, edge of display, P7, Pen1-TabletExperts, task 21; (c) paging scroll, P9, Pen2- ConventionalExperts, task 23. In each case, c denotes the characteristic “ramp” movement, d denotes repetitive paging motion segment. See also Video 3-5.
100
Time: 00:21 Vogel_Daniel_J_201006_PhD_video_3_5.mp4 Video 3-5. Scrollbar trajectory examples.
The mouse scroll-wheel can act as a hardware alternative to the scrollbar widget, and we observed three out of four mouse participants use the scroll-wheel for short scrolls. These three participants never clicked to page up or down during short scrolls. In fact, all mouse participants used the scroll-wheel at least once for longer scrolls, but we observed each of them also abandoning it at least once – and continuing the scroll interaction by dragging the scroll box (see Figure 3-24 for scrollbar parts). The scroll-wheel does not always appear to have an advantage over the scrollbar widget, corroborating evidence from Zhai, Smith, and Selker (1997). However, this may be due to the scrolling distance and standard scroll-wheel acceleration function (Hinckley, Cutrell, Bathiche, & Muss, 2002). In fact, all of our mouse participants encountered one or more errors with the scroll-wheel, but there were two mouse errors with the scrollbar widget.
Text Selection
There are 6 expected text selections in the script, in tasks 12, 13, 21, and 23. Three involve selecting a sentence in a web page and three a bullet. We coded all text selections performed by participants in these tasks, and found a mean number of 6.3 (SD 0.6). The slightly higher number is due to errors requiring participants to repeat the selection action. We found high error rates for text selection with the pen. At 40%, this is three times the mouse error rate of 13%. Text selection errors were either target selection, or an unintended action such as accidentally triggering a hyperlink. Most of these errors seem to be related to the style of movement. An immediately obvious characteristic of text selection is the direction of movement, from right-to-left or left-to-right. Across pen participants, we found 43 left-to-right selections and 36 right-to-left (Figure 3-27). Given that our participants are right-handed, a right-to-left selection should in theory have an advantage, since the end target is not occluded by the
101
hand. Instead, we found that all of our expert pen users performed left-to-right selection 2 or more times, with one expert participant (P6) only selecting left-to-right. Five pen participants exclusively performed left-to-right text selections, but three did exclusively use right-to-left selections. Surprisingly, the latter were all in the Pen2-ConventionalExperts group, not the Pen1-TabletExperts group as one might expect. The insistence of a left-to-right motion in spite of occlusion is likely due to the reading direction in Western languages which creates a habitual left-to-right selection direction. Indeed, we found that mouse participants most often used a left-to-right direction, with two participants doing this exclusively. However, even mouse users performed the occasional right-to-left selection suggesting that there are cases when this is more advantageous even in the absence of occlusion. One participant states: “People write left to right, not right to left so my hand covers up where they're going." (P14-Pen3-ConventionalNovices 38:49) Mouse Pen1 Pen2 Pen3 100%
80%
60%
Number 40%
20%
0% 12345678910111213141516 Participant left-to-right right-to-left
Figure 3-27. Proportion of left-to-right and right-to-left text selection directions. Pen1-TabletExperts, Pen2-ConventionalExperts, Pen3-ConventionalNovices
We observed three characteristic pen trajectory patterns which suggest problems with left-to-right selection and occlusion. Note that we did not code for these patterns to enable a quantitative analysis14, we offer a description of what we feel are characteristic examples of these behaviours.
14 Our observations of these behaviours emerged after the coding process was complete.
102
We observed expert pen users intentionally moving the pen well beyond the bounds of the desired text during a left-to-right selection movement (Figure 3-28c). The most likely reason for this deviation is that it moved their hand out of the way so that they could see the end text location target. Another related movement is backtracking (Figure 3-28b) which more often occurred with novice participants. Here, the selection overshoots the end target and back tracks. This appears to be more by accident, but may be the behaviour that leads to the intentional deviation movement we saw with expert users. Another, sometimes more subtle, behaviour is a “glimpse”: a quick wrist roll downwards to momentarily reveal the occluded area above (Figure 3-28a). We also noted a characteristic trajectory when participants invoked the context menu for copying with a right-click. We observed many pen participants routinely introducing an extra movement to invoke it near the centre of the selection, rather than in the immediate area (Figure 3-28d). Since the right-click has to be invoked on the selection itself, this may be to minimize the possibility of missing the selection when opening the context menu. However, this extra movement was most often observed with right-to-left selection. This may be a symptom of participants needing to visually verify the selection before copying by moving their hand.
103
forearm 3D path & shadow
pen tip 3D path & shadow path start path end
(a)
1
(b)
2
(c) 3
(d) 4
Figure 3-28. Pen tip (and selected wrist trajectories) during text selection. (a) left-to-right selection with forearm glimpse at c, P15, Pen3-ConventionalNovices, task 12; (b) left-to-right selection with backtrack at d, P9, Pen2-ConventionalExperts, task 23; (c) left- to-right selection with deviation at e, P6, Pen1-TabletExperts, task 21; (d) right-to-left selection with central right-click invocation at (4), P11, Pen2-ConventionalExperts, task 23. See also Video 3-6.
104
Time: 00:38 Vogel_Daniel_J_201006_PhD_video_3_6.mp4 Video 3-6. Text selection trajectory examples.
Common errors with text selection were small target selection errors such as missing the first character, clicking instead of dragging and triggering a hyperlink, or an unintended change of the selection when releasing. While the first two are related to precision with the pen, the latter is a symptom of stability. As the pen is lifted from the display, a small movement causes the carat to shift slightly which can be as subtle as dropping the final character, or if it moves down, it can select a whole additional line. We noticed this happening often when releasing the handle, another case of precise dragging. One participant commented: "When I'm selecting text I'm accidentally going to the next line when I'm lifting up" (P7-Pen1-TabletExperts 16:40)
Writing and Drawing
Although we avoided formal text entry and handwriting recognition, we did include a small sample of freehand handwriting and drawing. In theory, these are tasks to which the pen is better suited. Tasks 39 and 41 asked participants to make ink annotations on an Excel chart (see Figure 3-3e). In task 39, they traced one of the chart lines using the yellow highlighter as a very simple drawing exercise. In task 41, they wrote “effects of fur trade” and drew two arrows pointing to low points on the highlighted line. In the post study interview, many pen participants said that drawing and writing were the easiest tasks. After finishing tasks 39 and 41, one participant commented: "You know this is the part that is so fun to work with, you know, using a tablet, but all the previous things are so painful to use. I mean, it's just like a mixture of things ..." (P8-Pen1-TabletExperts 38:01)
105
Handwriting
We expected to see a large difference in the quality of mouse and pen writing, but aside from pen writing appearing smaller and smoother, a visual comparison suggests this is not the case (Figure 3-29). We did see some indication of differences in mean times, with pen and mouse participants taking an average of 27.3 and 47.3 seconds respectively (SD 3.5 and 26.2). In terms of style, all mouse handwriting has a horizontal baseline, whereas four of the pen participants wrote on an angle. This supports Fitzmaurice et al.’s (1999) work on workspace orientation with pen input. (a) Mouse
(b) Tablet PC
Figure 3-29. Handwriting examples. (a) mouse and (b) pen. (approx. 70% actual size)
Tracing
When comparing participant’s highlighter paths in task 39, we could see little difference (Figure 3-30). Tracing appears slightly smoother, but not necessarily more accurate. There also appears to be no noticeable difference in task time, with pen and mouse participants taking an average of 15.8 and 13.5 seconds respectively (SD 6.4 and 2). Half the mouse participants traced from right-to-left, as opposed to left-to-right. However, only 3 out of 12 pen participants traced from right-to-left. As explained above, with the pen, tracing right-to-left has distinct advantage for a right handed person since it minimizes pen
106
occlusion. Across all participants, all except one (a Pen2-ConventionalExperts participant) traced the entire line with one drag motion.
(a) Mouse
(b) Tablet PC
Figure 3-30. Tracing examples. (a) Mouse; (b) Tablet PC, discontinuous points highlighted. (approx. 70% actual size)
Office MiniBar
PowerPoint has an alternate method of selecting frequently used formatting options, a floating tool palette called the MiniBar which appears when selecting an object that can be formatted, like a text-box (Harris, 2005). It is invoked by moving the pointer towards an initially “ghosted” version of the MiniBar; moving the pointer away makes it disappear. The behaviour has some similarities to Forlines et al.’s Trailing Widget (2006), except that the MiniBar remains at a fixed position. In theory, the MiniBar should also be well suited to pen input since it eliminates reach. However, in practice it was difficult for some pen users to invoke. The more erratic movements of the pen often resulted in its almost immediate disappearance, preventing several participants from even noticing it and making it difficult for others to understand how to reliably invoke it. We observed one of our expert Tablet PC users try to use the MiniBar more than five times – they finally aborted and returned to using the conventional application toolbar.
107
MiniBar occluded text-box
MiniBar text-box
Figure 3-31. Occlusion resulting from MiniBar floating palette. The text-box preview is occluded by the hand when using the MiniBar. (P12-Pen2-ConventionalExperts, task 6)
The other problem is that the location of the MiniBar is such that when using it, the object receiving the formatting is almost always occluded by the hand (Figure 3-31). We observed participants select multiple formatting options without realizing that the destination text was not selected properly: hand occlusion prevented them from noticing that the text formatting was not changing during the operation. A lesson here is that as widgets become more specialized they may not be suitable for all input devices, at least without some parameter tuning.
Keyboard Usage
Although we gave no direct instructions regarding keyboard usage for the mouse group, all participants automatically reached for the keyboard for common key shortcuts like ctrl-Z, ctrl-C, ctrl-V, ctrl-TAB, and often to enter numerical quantities. In task 6, two mouse participants (P1 and P3) accelerated their drop-down selection by typing the first character. However, they each did this only a single time, in spite of this task requiring them to access the same choice, in the same drop-down, four times. Yet we saw that keyboard use can also lead to errors. For example, P1 accidentally hit a function key instead of closing a dialog with the Esc key – this abruptly opened a large help window and broke the rhythm of the task as they paused to understand what happened before closing it and continuing. Three pen participants explicitly commented on the lack of accelerator keys when using the pen, with comments like: "Where's CTRL-Z?" (while making key press action with left hand), then again later "I can't tell you how much I wish I could use a keyboard..." (P9-Pen2- ConventionalExperts, 24:43 and 29:50)
108
However, not one pen participant commented on what the Tablet PC hardware keys were for, or if they could use them. Yet, we suspect they were conscious of their existence, since only one participant pressed one of these keys by accident.
3.6 Discussion
The goal of our study was to observe direct pen input in a realistic GUI task involving representative widgets and actions. In the analysis above, we presented findings for various aspects: time, errors, movements, posture, and visualization; as well as an examination of specific widgets and actions. While we did not find a significant difference in overall completion times between mouse users and experienced Tablet PC users, this null-result does not mean that direct pen input is equivalent to mouse input (especially considering increased variance due to differing participant levels, task completion strategies, and inclusion of think- aloud comment times). Yet, we found that pen participants made more errors, performed inefficient movements, and expressed frustration. Moreover, widget error rates had a different relative ordering between mouse and pen; the highest number of pen errors were with the scrollbar and highest number of mouse errors with text selection. The top error contexts for the pen range from complex widgets like scrollbars, drop-downs, and tree-views to simple ones like buttons and handles.
Overarching Problems with Direct Pen Input
When examined as a whole, our quantitative and qualitative observations reveal overarching problems with direct pen input: poor precision when pointing or tapping, problems caused by hand occlusion; instability and fatigue due to ergonomics; cognitive differences between pen and mouse usage; and frustration due to limited input capabilities. We believe these to be the primary causes of non-text errors and contribute to user frustration when using a pen with a conventional GUI.
Precision
Selecting objects by tapping the pen tip on the display serves the same purpose as pushing a button on a mouse, but the two actions are radically different. The most obvious difference is that this allows only one type of “click”, unlike pressing different buttons on a
109
mouse. To get around this issue, right click and single click are disambiguated with a time delay, overloading the tapping action to represent more than one action. Although participants did better than we expected, we found that the pen group were not always able to invoke a right-click reliably, and either unintentionally single clicked, or simply missed the click. A related problem occurred with drag-able widgets like scrollbars and sliders: when performing a slow, precise drag, users could unintentionally invoke a right-click. We found these problems affected expert pen users as well as novices. The second difference between mouse and pen selection may not be as immediately obvious: tapping with the pen simultaneously specifies the selection action and position, unlike clicking with the mouse where the button press and mouse movement are designed such that they do not interfere with each other. The higher number of target selection errors with the pen compared to the mouse suggests that this extra coordination is a problem. Our findings also reveal subtle selection and pointing coordination issues: unintended action errors due to movement when releasing a drag-able widget, such as the handle, were non- existent with the mouse, but affected 10 out of 12 pen participants; on average, the distance between pen down and pen up down events were 6 to 9 times greater than the mouse; and problems with pen double-clicks, either missed altogether, or interpreted as two single clicks. We also found problems with missed taps when the tapping motion was too hard or too soft. This could be an issue with hardware sensitivity, but given our other observations, it may also be a factor of the tapping motion. We found that some participants did not notice when they missed a click, leading to potentially serious errors such as not saving a document. It seems that the tactile feedback from tapping the pen tip does not seem to be enough, especially when compared to the sensation of pressing and releasing the micro switch on a mouse button. Although Windows Vista displays a “ripple” feedback for clicks, no participants seemed to make use of this. Surprisingly, we did not observe a large difference in the quality of pen writing and tracing compared to mouse input: pen handwriting appeared smaller and smoother, and pen tracing appeared slightly smoother. In the same way that hardware sensitivity is likely contributing to the number of missed clicks, other hardware problems such as lag and parallax (J. Ward & M. Phillips, 1987) also affect performance. When using a pen, any lag or parallax seems to have an amplified effect
110
since the visual feedback and input are coincident. When users become aware of these hardware problems, they begin to focus on the cursor, rather than trusting the physical position of the pen. This may reduce errors, but the time taken to process visual feedback will hurt performance.
Occlusion
Occlusion from the pen tip, hand, or forearm can make it difficult to locate a target, verify the success of an action, or monitor real-time feedback which may lead to errors, inefficient movements, and fatigue. We observed participants missing status updates and visual feedback because of occlusion. This can lead to serious errors, such as a user assuming they successfully pressed the save button when they had not – hand occlusion of the “file is being saved” message at the bottom of the display prevented verification. Other frustrating moments occurred when users assumed the system was in a certain state, but it was not. For example, we saw more than one case of a wasted interaction in the top portion of the display to adjust formatting because the object to be formatted had been unintentionally de-selected. This occurred in spite of the application previewing the formatting on the object itself. Unfortunately, when the destination object is occluded and the user assumes the system is in a desired state (the object is selected), the previews do not help and the error is not prevented. To reduce the effect of occlusion, we observed users adopting unnatural postures and making inefficient movements. For example, when adjusting a parameter that simultaneously requires visual feedback, we noticed participants changing the posture of their hand rather than adjusting the location of the parameter window. This posture, which we call the “hook,” did enable them to monitor an area of the display that would otherwise be occluded, but unfortunately this type of posture is not stable or comfortable. We also found that occlusion can lead to inefficient movements such as glimpses, backtracks, and ramps. Glimpses and backtracks tend to occur during dragging operations. Since dragging uses a kinaesthetic quasi-mode (Sellen et al., 1992) with the pen pressed against the display, the user may not lift their hand momentarily to perform a quick visual search for an occluded target when in mid-drag. To work around this limitation, we observed expert users intentionally deviating from the known selection area while drag selecting, to visually
111
acquire the target and complete the selection. We call this a glimpse. We also observed novice users backtracking after they accidentally passed the intended target and had to move back again – in some ways an unintentional glimpse. Our tendency to drag and select from left-to-right, to match how text is read in Western languages, seems to make glimpse and backtrack movements more common. Note that this does not only occur when selecting text: 9 out of 12 pen participants chose to trace lines from left-to-right, in spite of commenting that occlusion made this more difficult. The ramp is a characteristic movement which adjusts the movement path to reveal a greater portion of the intended target area. When the hand is in mid movement, it can occlude the general visual search area and require a deviation to visually scan a larger space. We observed ramp movements most often when locating the scrollbar widget on the extreme right side of the display – the pen moves down as it moves to the right, to maximize the non- occluded portion of the scrollbar to increase the chance that the target is not occluded. We also saw ramp movements, sometimes with helical paths, when moving to other targets, most often when the target was located at the upper left. Finally, pen users tend to adopt a consistent home position to provide an overview of the display when not currently interacting. Participants would return their hand to this position between tasks and even after command phrases, such as waiting for a dialog to appear after pressing a toolbar button. For right-handed pen users, the home position is near the lower right corner, just beyond the display.
Ergonomics
Although the display of a Tablet PC is relatively small, there still appear to be ergonomic issues when reaching targets near the top or at extreme edges. We found that pen movements covered a greater distance with more limb coordination compared to the mouse. Not only can this lead to more repetitive strain disorders and fatigue, but studies have shown that coordinated limb movement lead to decreased performance and accuracy (Balakrishnan & I. S. MacKenzie, 1997). In support of this, we found a different distribution of target selection error rate compared to the location of all taps/clicks: more errors seem to occur in the mid-upper-left and right-side. However, as we discuss above, there may be an influence of target size which we did not control for.
112
Possible explanations for the extra distance covered by the pen compared to the mouse include making deviating movements to reduce occlusion, how the pen tip moves more frequently in three dimensions to produce taps, and to arc above the display when travelling between distant targets. However, the main contributing factors are most likely the unavailability of any control display gain manipulation with the pen since it is a direct input device, and the tendency for pen users to return to a home position between tasks. By frequently returning to this home position, there are more round trip movements compared to mouse participants who simply rest at the location of the last interaction. Although the home position allows an overview of the display due to occlusion, it may also serve to rest the arm muscles to avoid fatigue and eliminate spurious errors that could occur if a relaxed hand accidently rests the pen tip on the display. Another issue with reach may be the size and weight of the tablet. Not surprisingly, we found that tablet users moved the device more than mouse users, but they only moved it less than 7% of the distance moved by the pen (in spite of the tablet resting on their lap which we expected would make it easier to move and tilt using the legs and body). Further support can be seen in the characteristic slant of some tablet participants’ written text – these people elected to write in a direction that was most comfortable for their hand, regardless of the position of the tablet. This suggests that the pen is more often moved to the location on the display, rather than the non-dominant hand bringing the display to the pen to set the context. Note that the latter has been shown to be a more natural movement with pen and paper or an indirect tablet on a desk (Fitzmaurice et al., 1999). Our speculation is that the problem may be due to the size and weight of the device.
Cognitive Differences
Cognitive differences between the pen and mouse are difficult to measure, but our observations suggest high level trends. Pen users prefer to single-click instead of double- click and hover visualizations appear more distracting. These may reveal a difference in the conceptual model of the GUI when using a pen compared to a mouse. Pen users preferred to single click, even for objects which are conventionally activated by a double-click, such as file and folder objects. The difficulty of double-clicking also leads to errors such as accidental file rename or duplication; making it important to reduce problems.
113
We observed differences when pen users interacted with objects which displayed tooltips and other information triggered by hover. There seemed to be more tooltips appearing and disappearing with the pen group compared to the mouse group, an effect we refer to as “hover junk.” This can be not only visually distracting, but pen participants also seemed to more consciously avoid tooltips when selecting the underlying target compared to mouse users. It was as though pen users perceived the depth ordering order for the pointer, tooltip, and target differently. The direct input nature of the pen seemed to bring the display pointer above the tooltip instead of below. Mouse users did not seem to be bothered by tooltips and simple clicked “through” them.
Limited Input
It is perhaps obvious, but the lack of a keyboard appears to be a serious handicap for pen users. The main problem of entering text with only a pen has been an active research area for decades: refinements to handwriting recognition, gesture-based text input, and soft keyboards continue. However, even though text entry was not part of our task, several pen participants noted the lack of a keyboard and even mimed pressing common keyboard shortcuts like copy (CTRL-C) and undo (CTRL-Z). We observed mouse users instinctively reaching for the keyboard to access command shortcut keys, list accelerator keys, and entering quantitative values. Although all the tasks in our study could be completed without pressing a single key, this is not the way that users work with a GUI. Recent command-line-like trends in GUIs such as full text search and keyboard application launchers will further contribute to the problem.
Study Methodology
Our hybrid study methodology incorporates aspects of traditional controlled HCI experimental studies, usability studies, and qualitative research. Our motivation was to enable more diverse observations involving a variety of contexts and interactions – hopefully approaching how people might perform in real settings.
114
Degree of Realism
Almost any study methodology will have some effect on how participants perform. In our study we asked participants to complete a prepared set of tasks on a device we supplied, instrumented them with 3-D tracking markers and a head-mounted camera, and ran the study in our laboratory. These steps were necessary to have some control over the types of interactions they performed and to provide us with rich logging data to analyze. It is important to note that our participants were not simply going about their daily tasks as they would in a pure field study. However, given that our emphasis is on lower level widget interactions, rather than application usability or larger working contexts, we feel that we achieved an appropriate degree of realism for our purposes.
Analysis Effort and Importance of Data Logs
Synchronizing, segmenting, and annotating the logs to get multi-faceted qualitative and quantitative observations felt like an order-of-magnitude increase beyond conducting a usability study or a controlled experiment. Our custom built software helped, but it did not eliminate long hours spent reviewing the actions of our participants. Qualitative observations from multiple rich observation logs are valuable, but not easy to achieve. Using two raters proved to be very important. Training a second coder forced us to iterate and formalize our coding decision process significantly. We feel this contributed greatly to a consistent assignment of codes to events and high level of agreement between raters. In addition, with two raters we were able to identify a greater number of events to annotate. Regardless of training and perseverance, raters will miss some events. We found that each of the observational logging techniques gave us a different view of particular interaction and enabled a different aspect to analyze. We found the combination of the pen event log, screen capture video, and head-mounted user view invaluable for qualitative analysis. The pen event log and screen capture video are the easiest to instrument and have no impact on the participant. The head-mounted camera presents a mild intrusion, but observations regarding occlusion and missed clicks would have been very difficult to make without it. For quantitative analysis, we relied on the pen event log and the motion capture logs. Although the motion capture data enabled the analysis of participant
115
movements, posture, and visualizing 3-D pen trajectories, it required the most work to instrument, capture and process.
3.7 Summary
We have presented and discussed results from our study of direct pen interaction with realistic tasks and common software applications. Our findings reveal five overarching issues when using direct pen input with a conventional GUI: lack of precision, hand occlusion, ergonomics when reaching, cognitive differences, and limited input. We feel that these issues can be addressed by improving hardware, base interaction, and widget behaviour without sacrificing the consistency of current GUIs and applications. Moreover, previous research has focused on issues other than occlusion, yet our results suggest that occlusion also has a profound effect on the usability of direct pen input.
Improving Direct Pen Input with Conventional GUIs
Ideally, addressing the overarching issues identified above should be done without radical changes to the fundamental behaviour and layout of the conventional GUI and applications. This would enable a consistent user experience regardless of usage context – for example, when a Tablet PC user switches between slate mode with pen input, and laptop mode with mouse input – and ease the burden on software developers in terms of design, development, testing, and support15. With this in mind, we feel improvements should be made at three levels: hardware, base interaction, and widget behaviour. Hardware improvements which reduce parallax and lag, increase input sensitivity, and reduce the size and weight of the tablet are ongoing, and will likely improve. Other improvements which increase the input bandwidth of the pen, such as pressure, tilt, and rotation sensing, may provide additional utility – but past experience with adding buttons and wheels to pens has not been encouraging. More innovative pen form factors may provide
15 Techniques which automatically generate user interface layouts specific to an input modality (Gajos & Weld, 2004) may ease the design and development burden in the future, but increased costs for testing and support will likely remain.
116
new direction entirely: for example, a pen-like device which operates more like a mouse as the situation requires. Base interaction improvements target basic input such as pointing, tapping, dragging, as well as adding enhancements to address aspects such as occlusion, reach, and limited input. Conceptually these function like a pen-specific interaction layer which sits above the standard GUI. A technique can become active with pen input, but without changing the underlying behaviour of the GUI or altering the interface layout. Windows Vista includes examples of this strategy: the ripple visualization for taps, the tap-and-hold for right-clicking, and “pen flicks” for invoking common commands with gestures. However, the success of these specific techniques is dubious since we did not observe any experienced Tablet PC using them. The behaviour of individual widgets can also be tailored for pen input, but this should be done without altering their initial size or appearance to maintain GUI consistency. Windows operating systems have already contained a simple example of this for some time. There is an explicit option to cause menus to open to the left rather than right (to reduce occlusion with right-handed users). This illustrates how a widget’s behaviour can be altered without changing its default layout – the size and appearance of an inactive menu remains unchanged.
Applications to Other Paradigms and Input Contexts
Our work emphasizes direct pen interaction with a conventional GUI. However, many of our results apply to other interaction paradigms and input contexts as well. During any type of direct manipulation (be it tapping, crossing, gestures, etc.), there are times when a target must be selected on the display. When this is necessary, then many of the same issues we identify above apply. Techniques such as crossing may eliminate the type of tapping errors we observed, and gestures will help reduce problems resulting from limited input, but issues such as occlusion are inherently part of any form of direct input.
117
Next Step: Base Level Techniques which Address Occlusion
After hardware improvements, which will continue to occur with engineering advancements, well-designed base interaction techniques have the greatest capability to improve direct input overall. In fact, researchers have already addressed many of the issues we identified, and in most cases they appear be to be compatible with base level improvements to convention GUIs (though this aspect is not always demonstrated). Ren and Moriya’s enhanced selection techniques (2000), Accot and Zhai’s crossing paradigm (2002), Ramos et al.’s Pointing Lenses (2007), and Ren et al.’s Adaptive Hybrid Cursor (2007) are all designed to enhance precision. Grossman et al.’s Hover Widgets (2006), Ramos and Balakrishnan’s pressure-based combined command and lasso selection (2007), and Forlines et al.’s HybridPointing (2006) address limited input and ergonomic reach. Cognitive differences such as click versus double-click for file selection could be easily implemented. One aspect missing from previous efforts are techniques that address occlusion. Recall that we found that occlusion from the hand and forearm make it difficult to locate targets, verify the success of actions, and monitor real-time feedback. This often leads to errors, inefficient movements, and fatigue. Our hypothesis is that if we can make improvements that address occlusion at a base level, the usability of pen input can be improved. However, before designing new interaction techniques, a thorough understanding of occlusion is needed.
118
4 Investigating Occlusion
In the previous chapter, we found that occlusion likely contributed to user errors, led to fatigue, and created inefficient movements. These results are based on an observational study with realistic tasks and common GUI software applications. The next logical step is to further validate and quantify these fundamental aspects of direct pen occlusion in a controlled setting. Certainly, any designer can simply look down at their own hand while they operate a Tablet PC and attempt to take the perceived occlusion into account, but this type of ad hoc observation is unlikely to yield sound scientific findings or universal design guidelines. To study occlusion properly, we need to employ experimental methods. In this chapter we present three experiments which explore the area and shape of occlusion, how occlusion affects target selection performance, and ways in which users contort their hand posture to minimize its effect. Our first experiment, Experiment 4-1, uses a novel combination of video recording, computer vision marker tracking, and image processing techniques to capture images of the hand and arm as they appear from the point-of-view of the user. We call these images occlusion silhouettes. Analyses of these silhouettes found that the hand and arm can occlude as much as 47% of a 12 inch display and that the shape of the occluded portion of the display varied across participants according to the style and size of their pen grip. The second experiment, Experiment 4-2, examines the effect of occlusion when performing three fundamental GUI interactions: tapping, dragging, and tracing. Our results show that although it is difficult to control for occlusion within a single direct input context,
119
there is reasonable evidence that occlusion has an effect on these fundamental GUI interactions. The third experiment, Experiment 4-3, investigates how participants contort their hand posture to minimize occlusion while performing a simultaneous monitoring task. We found that this posture contortion reduces performance and discuss how different participants use different posture contortion strategies. Based on the results of these experiments, we propose a small set of simple guidelines for designers and researchers regarding how to avoid the occluded area.
4.1 Related Work
Few researchers have investigated occlusion directly, but many have speculated on its effect and use this as motivation for the design of interaction techniques or to explain unexpected behaviour during usability studies and experiments. In pen computing, the design of Ramos and Balakrishnan’s (2003) Twist Lens, a sinusoidal shaped slider, is partially motivated to reduce occlusion from the user’s hand; Apitz and Guimbretières’ (2004) CrossY, uses predominant right-to-left movement to counteract occlusion with right-handed users; and Schilit, Golovchinsky, and Price’s pen- based XLibris ebook reader (1998) places a menu bar at the bottom of the display to avoid occlusion when navigating pages. In touch screen and tablet top interaction, occlusion is also cited as motivation. Shen et al. (2006) discuss table top techniques to combat occlusion, including remote manipulation of objects and visual feedback that expands beyond the area typically occluded by a finger. Other strategies include: placing the hand behind (Wigdor, Forlines, Baudisch, Barnwell, & Shen, 2007) or under the display (Wigdor et al., 2006); and shifting a copy of the intended selection area up and out of the area occluded by the finger (Vogel & Baudisch, 2007). Other researchers have cited problems with occlusion in non-occlusion specific experiments and usability studies. Grossman et al. (2006) found that users sometimes moved away from the experimental target to invoke their hover widget which they attribute to hand occlusion. Hinkley et al. (2007) discovered that conventional GUI tooltips could be easily blocked by the hand. Hinckley et al. (2006) found that users needed a chance to lift their hand to view the screen and verify progress when making a lasso selection. Dixon,
120
Guimbretière, and Chen (2008) located a start button below their main experimental stimulus to counteract hand occlusion. Ramos et al. (2007) argue that accuracy is impaired when using a direct pen because of pen tip occlusion, but provide no evidence. However, these occlusion-related design decisions and observations are based on an ad hoc understanding. To our knowledge, there are only five examples of researchers studying occlusion explicitly.
Brandl et al.
Brandl et al. (2009) use a simple paper-based experiment to determine which slices of a circular pie menu are most often occluded. A circle was drawn on paper with 12 slices identified by letters (Figure 4-1a). Participants placed a pen at the centre of the circle, and self-reported which slices they could see. Based on results from 15 right-handed and 3 left- handed participants, the authors calculated the mean number of times each slice was reported as visible (Figure 4-1b).
(a) experiment (b) results
Figure 4-1. Brandl et al.’s occlusion area experiment. (a) paper based experiment where participants self-reported visibility of 12 labelled pie slices; (b) results showing frequency of occluded pie slice. (from Brandl et al., 2009)
For right-handed participants, they found that four pie slices are occluded more than 50% of the time and that these slices emanate to the right of the pen position. Left-handed results are almost mirrored, with most occluded pie slices rotating clockwise one position. Their results provide adequate justification for the design of their occlusion-aware pie menu, but they are difficult to generalize since they only cover 12 discrete areas in the immediate vicinity of the pen.
121
Bieber, Rahman, and Urban
Bieber et al. (2007) used a simplified analytical analysis to quantify the amount of hand occlusion with a hand-held pen computer. They use a single shape to represent the occluded area (Figure 4-2b), and used this shape to calculate the amount of occlusion across all possible pen positions on the PDAs display (a 160 × 160 px, 5.7 × 5.7 cm display size). Based on this analysis, they found that 55% of the display can be occluded when pointing at the top-left corner, and that the bottom-right corner is almost always occluded. However, they do not describe how they created the occlusion shape in the first place, making reproduction and validation of their results difficult.
(a) occlusion with PDA (b) occlusion shape
p p
Figure 4-2. Bieber, Rahman, and Urban’s analytic study. The authors created a single shape (b) which is intended to represent the area occluded when using a PDA at point p (a). (diagram based on Bieber et al., 2007)
Forlines and Balakrishnan
Forlines and Balakrishnan (2008) investigate the interaction of tactile pen feedback and occlusion with tapping and crossing selection methods. They use the same 1-D tasks as Accot and Zhai (2002), but in addition to the tactile pen condition, they also include a between-subjects factor for direct and indirect input (the authors argue that a within-subjects design for input type may have asymmetric skill transfer). Their assumption is that direct and indirect input controls for occlusion. Based on results from a crossing task, the authors argue that tactile feedback can make up for loss of visual feedback caused by pen and hand occlusion – the implicit finding is that
122
occlusion reduces the benefit of visual feedback. In addition, they suggest that occlusion is less problematic for discrete crossing tasks, where the pen is lifted during the movement, compared to continuous crossing tasks, where the pen tip maintains contact with the display (Figure 4-3b). They explain that the user can survey the display when their hand is lifted and visually acquire the target more quickly. However, these results are based on crossing selection, not tapping. With indirect input and tapping, the authors found no significant differences (Figure 4-3a) when augmenting visual feedback with the tactile pen.
(b)experiment two tasks
(a) experiment one tasks
Figure 4-3. Experimental tasks used by Forlines and Balakrishnan. (from Forlines & Balakrishnan, 2008)
For the tapping task, the authors found a surprising interaction of target width and height with direct and indirect input: at the largest target width of 5.6 mm, direct input was faster than indirect input; and at the largest target distance of 179.8 mm, direct input was faster than indirect input. The authors suggest that this is because users can more easily track the position of the physical pen tip when using direct input, compared to tracking a cursor. Perhaps most alarming is that this interaction shows that using direct versus indirect input as experimental control for occlusion is problematic, especially as the target size and distance increase.
Hancock and Booth
Hancock and Booth (2004) investigate 2-D tapping selection performance with direct and indirect pen input. Like Forlines and Balakrishnan, they use direct versus indirect input as an experimental control for occlusion, but as a within-subjects factor. In their experimental task, a 6.1 mm end target is located at one of 12 radial positions 35 mm from the starting pen
123
position (the task ID is 2.8)16. They divided the display into four quadrants, to control for different starting positions, but found no statistical differences. Since their motivation is the performance of context menu invocation, the end target remains hidden until the pen first taps the display. Overall, they found direct input faster than indirect (with means times of 0.7 s and 1.0 s respectively) – a pronounced difference for such a simple task. Note that the authors used identical hardware for input (a 1024 × 768 px, 21 × 16 cm Tablet PC), but for the indirect condition, they rendered the targets on a much larger vertical display (a 1024 × 768 px, 141 × 102 cm SmartBoard) requiring a control-display ratio of 0.15:1. This may have introduced a confounding effect as participants reconciled the large visual output with the much smaller input. There may also have been an asymmetric transfer effect suggested by Forlines and Balakrishnan. Within the direct or indirect input conditions, there were significant effects for target direction. Post-hoc analysis revealed that, in the direct input condition, there was one target direction significantly slower than three or more other directions for right-handed participants (Figure 4-4b), and three target directions significantly slower than three or more other directions for left-handed participants (Figure 4-4a). There were no directions in the indirect input condition slower than three or more other directions. Since the physical hand movement is identical in either condition, the authors argue that the greater significant differences in the direct condition could be attributed to occlusion. However, the difference at the -60° direction for left-handed participants is unexpected given where we would expect the hand to be located.
16 Note that the target size and distance are reported in the paper as the unbelievably large dimensions of 61 mm and 350 mm respectively. In a personal communication with the first author (June 30, 2009), we confirmed that this was a typographical error.
124
(a) left-handed (b) right-handed N -90° N -90° -120° 1.0 -60° -120° 1.0 -60°
-150° -30° -150° -30° 0.5 0.5
W E W E 0 0 180° 0° 180° 0°
150° 30° 150° 30°
120° 60° 120° 60° S 90° S 90°
direct input indirect input Figure 4-4. Hancock and Booth’s results for direct and indirect input task time. (a) left-handed users; (b) right-handed users. Longitudinal axes in seconds, latitudinal axes compass direction. Data points which are reported as significantly greater than three or more other directions in the same input condition are circled. (data from Hancock & Booth, 2004)
The authors also asked participants which quadrant they preferred to have a context menu located relative to the pen. For right-handed participants, the least preferred location was the bottom-right and the most preferred the bottom-left. Left-handed participants responded with the exact same rankings, but vertically mirrored.
Inkpen et al.
Inkpen et al. (2006) conducted a series of four quantitative and qualitative experiments studying how left-handed users interacted with right- and left-aligned scrollbars on hand-held pen computers. In scrolling tasks in which participants were asked to acquire a specific icon or line of text in a list, participants were faster using a left-aligned scrollbar. Participants also reported that it was difficult to use the right-aligned scrollbar without blocking the display with their hand (Figure 4-5a). The authors observed some participants raise their grip on the pen to so they could keep their hand below the display, or arched their hand over the screen, to reduce occlusion (Figure 4-5b,c). The authors conclude that occlusion makes it difficult to visually scan the list while simultaneously scrolling increasing fatigue and decreasing performance. The scrolling task is an example of a simultaneous monitoring task, but without a strong control for the position of the content to be monitored.
125
(a) (b) (c)
Figure 4-5. Inkpen et al.’s left-handed users and right-aligned scrollbars. (a) occlusion when using right-aligned scrollbar; (b) and (c) compensating gestures used to counteract occlusion.
Summary
These examples suggest three avenues for investigation of occlusion: its area and shape, its effect on target selection performance; and compensatory postures to minimize its effect. However, the results remain inconclusive. Brandl et al. and Bieber et al.’s findings with regard to the area and shape of the occluded area are incomplete. Forlines and Balakrishnan use a 1-D target task with is unlikely to adequately capture the effect of occlusion on selection performance according to target direction. Hancock and Booth only investigate one type of target selection when the target is initially hidden. Inkpen noted compensatory postures, but their experiment tasks enable them to only compare across left- and right- aligned scrollbars.
4.2 Experiment 4-1: Area and Shape
The goal of our first experiment is to measure the size and shape of the occluded area of display. To accomplish this, we record the participant’s view of their hand with a head- mounted video camera as they select targets at different locations on the display. We then extract key frames from the video and isolate an occlusion silhouette of the participant’s hand and pen as they appear from their vantage point.
126
Participants
22 people (8 female, 14 male) with a mean age of 26.1 (SD 8.3) participated. All participants were right-handed and pre-screened for color blindness. Participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level physical behaviour. At the beginning of each session, we measured the participant’s hand and forearm since anatomical dimensions likely influence the amount of occlusion (Figure 4-6).
EL elbow to fingertip length
SL shoulder to elbow length
UL upper limb length including hand
FL upper limb length, elbow to crease of wrist, EL - HL
UL hand length, crease of the wrist to the tip of finger
HB hand breadth, maximum width of palm
UL
HL SL FL HB
EL
Figure 4-6. Anthropomorphic measurements. Diagram adapted from Pheasant & Hastlegrave (2006).
We considered controlling for these dimensions, but recruiting participants to conform to certain anatomical sizes proved to be difficult, and the ranges for each control dimension were difficult to define.
127
Apparatus
The experiment was conducted using a Wacom Cintiq 12UX direct input pen tablet. It has a 307 mm (12.1 inch) diagonal display, a resolution of 1280 × 800 pixels, and a device resolution of 4.9 px/mm (125 DPI). We chose the Cintiq because it provides pen tilt information which is unavailable on current Tablet PCs. We fixed the tablet in portrait-orientation and supported it such that it remained at an angle of 12 degrees off the desk, oriented towards the participant. Participants were seated in an adjustable office chair with the height adjusted so that the elbow formed a 90 degree angle when the forearm was on the desk. This body posture is the most ergonomically sound according to Pheasant & Hastlegrave (2006). To capture the participant’s point-of-view, we use a small head-mounted video camera which records the entire experiment at 640 × 480 px resolution and 15 frames-per-second (Figure 4-9a). The camera is attached to the head harness using hook-and-loop strips. This made it easy to move it up or down so that it could be aligned as close as possible to the center of the eyes without interfering with the participants’ line of sight. In pilot experiments, we found that we could position the camera approximately 40 mm above and forward of the line of sight, and the resulting image was very similar to what the participant saw. We considered mounting a pair of miniature cameras above each eye, and then warping the image using stereo vision to achieve the exact participant perspective. However, we were concerned about additional error introduced by stereo reconstruction. Perhaps more importantly, the characteristics of human stereo vision may horizontally shrink the actual occluded area compared to the area captured by a monocular, centrally located camera. Human eyes are separated by an intraocular distance of about 60 mm (depending on population) which horizontally shifts the image perceived with each eye by a factor of that distance (Steinman & Garzia, 2000). As an example, consider a hand like shape 40 mm wide and located 40 mm above the surface of the tablet with the eye (or eyes) centrally located 500 mm from the tablet (Figure 4-7). In this example, the occluded area produced by the hand would be 37.5 mm wide with stereo vision and 45 mm with monocular vision. The actual error will vary according to height of the occluding object and its width, and the relative location of the eye. This a limitation to single camera capture, but we felt this
128
was acceptable and adequate for our purposes especially considering it would be simpler to process, synchronize, and store.
(a) stereo (b) monocular
60 mm eyes are 500 mm from tablet
hand hand
40 mm 40 mm 40 mm 40 mm
37.5 mm 45.0 mm
Figure 4-7. Estimated error introduced by monocular versus stereo view. Example with eye(s) located 500 mm from the tablet, occluding object such as a hand located 40 mm above the tablet and 40 mm wide. The width of the occluded area would be 37.5 mm for stereo vision and 45.0 mm for monocular. Note: diagram is not to scale.
Fiducial markers were attached around the bezel of the tablet to enable us to transform the point-of-view frames to a standard, registered image perspective for analysis (Figure 4-9b). These are printed paper markers designed to work with the augmented reality marker tracking toolkit we used. Details of the image analysis steps are in the next section.
(a) (b)
Figure 4-8. Experiment apparatus. (a) seated at desk with head mounted camera to capture participants’ point-of-view; (b) fiducial markers attached around the tablet bezel (image from head mounted camera frame).
129
Figure 4-9. Head mounted camera.
Task and Stimuli
Participants were presented with individual trials consisting of an initial selection of a home target, followed by selection of one of two types of measurement targets. The 13.0 × 26.3 mm (64 × 128 px) home target was consistently located at the extreme right edge of the tablet display, 52.0 mm from the display bottom. This controlled the initial position of the hand and forearm at the beginning of each trial. We observed participants instinctively returning to this rest position when in our initial observational study. The location of the measurement target was varied across trials at positions inscribed by a 7 × 11 unit invisible grid (Figure 4-10a). This created 77 different locations with target centers spaced 24.9 mm (122 px) horizontally and 25.1 mm (123 px) vertically. We observed two primary styles of pen manipulation in our initial observational study: localized interactions where the participant rested the palm of their hand on the display (such as adjusting a slider), and singular, short interactions performed without resting the hand (such as pushing a button). Based on this observation, our task had two types of selection of measurement targets: tap – selection of a 13.0 mm (64 px) square target with a single tap (Figure 4-10b); and circle – selection of a circular target by circling within a 5.7 mm (28 px) tolerance between a 0.8 mm (4 px) inner and 6.5 mm (32 px) outer radius (Figure 4-10c). The circle selection is designed to encourage participants to rest their palm, while the tap
130
selection can be quickly performed with the palm in the air. The different shapes for the two selection tasks were intended to serve as a mnemonic to the user as to what action was required. The circle selection used an ink trail visualization to indicate progress. Errors occurred when the pen tip moved beyond the inner or outer diameter. We wanted this to be difficult enough to require a palm plant, but not tedious. In practice, participants took at least half-a- second to circle the target, which seemed to be enough to plant the palm. At the beginning of each trial, a red home target and a gray measurement target were displayed. After successfully selecting the home target, the measurement target turned red and the participant selected it to complete the trial. We logged all aspects of pen input, including pressure and tilt. Video 4-1 provides a demonstration of the experiment apparatus and task.
(a)
(b) (c)
Figure 4-10. Experiment 4-1 experimental stimuli. (a) 7 x 11 grid used to place the measurement target; red start target is located near the bottom right; (b) square measurement target for tapping; (c) circle measurement target showing ink trail (to encourage a longer interaction with the palm resting).
00:27 Vogel_Daniel_J_201006_PhD_video_4_1.mp4 Video 4-1. Area and shape experiment demonstration.
131
Design
We presented 3 blocks of trials for each of the two tasks. A block consisted of 77 trials covering each target position in the grid, making 3 repetitions for each grid position and task type. Trials were presented in randomized order within a block and the presentation order of tasks was balanced across participants. Before beginning the first block of a task, the participant completed 40 practice trials. In summary, the experimental design was: 2 Tasks (Tap, Circle) × 77 Target Positions × 3 Blocks = 462 data points per participant
Image Processing
To transform the point-of-view video into a series of occlusion silhouettes, we performed the following steps with custom built software (Figure 4-12):
Frame Extraction
We extracted video frames taken between the successful down and up pen events selecting the square target or just before the circular target was completely circled. To do this, we had to synchronize the video with the data log. We used a visual time marker which functions similar to a movie clapperboard. The time marker is a large red square containing a unique number. When this square is tapped, it disappears and a timestamp is saved to our data log. After the experiment, we scrubbed through the video and found the video time where the time marker disappeared. Then, using linear interpolation between bounding time marks, we located the corresponding video frame for a given log time. In most cases, the frame captured the pen at the intended target location, but occasional lags during video capture produced a frame with the pen separated from the target location.
Rectification
We used the ARToolkitPlus augmented reality library (Wagner & Schmalstieg, 2007) to track the fiducial markers in each frame and determine the location of the four corners of
132
the display. In practice, this sometimes required hand tuning when the markers were occluded by the hand or were out of frame due to head position. Using the four corner positions, we un-warped the perspective using the Java Advanced Image (Sun Microsystems, n.d.) functions PerspectiveTransform and WarpPerspective with bilinear interpolation, and cropped it to a final 267 × 427 px image. Note that due to our single camera set-up, the rectification step will shift the image of the hand down slightly relative to the actual eye view. As an example, if the eye position is at the end of a vector 500 mm and 50° from the centre of the tablet, and the camera is located 40 mm above and forward of the eye, the rectified image of a point on the hand 40 mm above the tablet will be shifted down by 6.2 mm (about 4 px in our rectified image) (Figure 4-11). The exact error will vary according to participant size and grip style, but the values above are typical. Rather than try to compensate for this slight shift and possibly introduce additional errors, we accepted this as a reasonable limitation of our technique.
40 mm 40 mm
eye camera eye is 500 mm from tablet
m m 8 5.6º 4 4
point on hand
m m 2 40 mm 5 5.6º
p 50º p’
error 6.2 mm 27.3 mm 33.5 mm tablet Figure 4-11. Estimated rectification error from head-mounted camera. Example with eye located 500 mm from the centre of the tablet at an angle 50°, camera located 40 mm above and forward of the eye, and point on hand located 40mm above tablet surface. The rectified position of the point on the hand as captured by the camera will be shifted down by 6.2 mm from p to p’. Note: diagram is not to scale.
133
Isolation
We used simple image processing techniques to isolate the silhouette of the hand. First, we applied a light blur filter to reduce noise. Then we extracted the blue color channel and applied a threshold to create an inverted binary image. We were able to use the blue channel to isolate the hand because the camera’s color balance caused the display background to appear blue (it was actually white). Since the color space of skin is closer to red, this made isolating the hand relatively easy. To remove any edge pixels from the display bezel, we applied standard dilation and erosion morphological operations (Dougherty, 1992). Finally, we filled holes based on the connectivity of pixels to produce the final silhouette.
(a) (b) (c)
Figure 4-12. Image processing steps. (a) frame extraction; (b) rectification; (c) silhouette isolation.
Results
Unfortunately, lighting and video problems corrupted large portions of data for participants 7, 14, 21, and 22 making isolation of their occlusion silhouettes unreliable. Capture problems with participant 8 corrupted the first block, but we kept this participant and their remaining blocks. In the end, our analysis included 18 out of the original 22 participants
(6 female, 12 male) with a mean age of 26.3 (SD 8.4). These types of problems are typical when using video capture to generate empirical data: it is difficult to produce the same kind of “clean” data generated by experiments recording straightforward variables such as performance time and errors. Researchers attempting similar work should recruit extra participants and run multiple trials as we did, to ensure a reasonable amount of clean trials can be obtained.
134
Participants occasionally produced errors (mean 4.4%), but we included the silhouette regardless. Since each target must be successfully tapped or circled before continuing, the final video frame for an error trial would not differ. Also, the logged pen tilt values were very noisy, in spite of silhouette images suggesting tilt should be more uniform. Our attempts to filter them post-hoc were unsuccessful, and we were forced to leave them out of our analysis.
Occlusion Ratio
We define the occlusion ratio as the percentage of occluded pixels within all possible display pixels. We used a ratio, rather than actual area, for unit independence. The actual area can be computed using the display area of 42,654 mm2. Since occlusion ratio varies according to pen location, we calculate the occlusion area for each X-Y target location in the 7 × 11 grid. Not surprisingly, we found the highest occlusion ratios when the pen was near the top left of the display. However, the highest value did not occur at the extreme top, but rather a short distance below (Figure 4-13). The highest values did not differ greatly by task with 38.6% for circle (SD 6.2) and 38.8% for tap (SD 14.2). Participant 1 had the highest occlusion ratio with 47.4% for tap and 46.3% for circle. These mean ratios may reflect a sampling bias among our participants since controlling for aspects such as anatomical size and pen grip style is difficult to do a-priori. To help address this, we compare occlusion ratios given participant size.
38.8% 38.6%
50 50 % 25 Y (pixels) % 25 Y (pixels) 0 0
X (pixels) X (pixels) 800, 1280 800, 1280 (a) circle (b) tap Figure 4-13. Mean occlusion ratio. Plotted by X-Y display location for: (a) circle task; (b) tap task.
135
Influence of Participant Size
We established a simple size metric S to capture the relative size of each participant’s arm and hand compared to the general population. S is the mean of three ratios between a participant measurement and 50th percentile values from a table of anthropomorphic 17 statistics . We use measurements for shoulder length (SL), hand length (HL), and hand
breadth (HB). Since tables of anthropomorphic statistics are divided by gender, we compute S th for men and women using different 50 percentile values. We found mean S values of 0.99
(SD 0.04) and 1.01 (SD 0.06) for men and women respectively, indicating that the size of our participants was representative. We expected to see a relationship between S and the maximum occlusion ratio since, larger hands and forearms should cover more of the display. However, a plot of S vs. maximum occlusion ratio does not suggest a relationship (Figure 4-14).
50
45 female 40 male
35
30 Max Occlusion Raio (%) Raio Occlusion Max 0.90 1.00 1.10 S - Participant Size Ratio Figure 4-14. Participant size (S) vs. max occlusion ratio.
Occlusion Shape
Although occlusion ratio gives some sense of the scope of occlusion, it is the shape of the occluded pixels relative to the pen position that is most useful to designers. Figure 4-15 illustrates the mean shapes for participants for circling and tapping tasks. Since the captured
17 Anthropomorphic statistics for U.S Adults 19 to 65 years old (Pheasant & Hastlegrave, 2006).
136
image of the forearm and hand are increasingly cropped as the pen moves right and downward, we illustrate shapes for positions sampled near the middle-left portion of display. It is immediately apparent that occlusion shape varies between participants. There are differences which are likely due to anatomical size, possibly related to gender: compare how slender female participant 4 appears compared to male participant 5. Some participants adopt a lower hand position occluding fewer pixels above the target: contrast the height of participant 8 with participant 9. The forearm angle also often varies: for example, participant 20 has a much higher angle than participant 10. A few participants grip the pen far away from the tip, occluding fewer pixels around the target: participant 18 in the tapping task is one example. When comparing individual participant shapes between the tap and circle tasks, the visual differences are more subtle and inconsistent. For example, we expected the higher speed of the tapping task to create a more varied posture resulting in blurry mean shapes. This seems to be the case for participants 2, 8, and 17, but there are contrary examples when circling shapes are more blurred: see participants 6 and 20. Only participants 2 and 12 seemed to adopt very different postures for tapping (low) and circling (high).
137
(a) tap task
different grips overall size fist height 1234568910
11 12 13 15 16 17 18 19 20
(b) circle task
1234568910
11 12 13 15 16 17 18 19 20
different grips grip height forearm angle
1000 mm2 female
Figure 4-15. Occlusion shape silhouettes for each participant. (a) tap task and (b) circle task. Generated from 9 samples from 3 pen positions at middle-left portion of display; see text for discussion of participant and task comparison highlights.
The combined participant mean silhouette gives an overall picture of occluded pixels near the pen position across all participants (Figure 4-16). As with individual participants, differences between tasks are subtle. The tapping task mean silhouette appears slightly larger, higher (Figure 4-16a), and sharper compared to the circling task (Figure 4-16b). In both cases
138
many pixels above the horizontal position of the pen tip are typically occluded. Note that fewer pixels are occluded in the immediate vicinity of the pen’s position.
(a) tap (b) circle
100 mm2
Figure 4-16. Mean occlusion silhouettes. (a) tap task; (b) circle task. The lower row is visually augmented to show silhouette areas with greater than 50% concentration.
139
Pixels Most Likely to be Occluded
Another way to view occlusion shape is to look at which display pixels are most likely to be occluded given a distribution of pen positions. To create a simple baseline for analysis, we assume that the probability of accessing any position on the display is uniform. Under this distribution, commonly occluded display pixels across participants and target positions form a cluster of frequently occluded pixels emanating from the lower two-thirds along the right edge (Figure 4-17). There appears to be no difference between the circle and tap tasks. A uniform distribution of pen positions is not representative of common application layouts: consider the frequency of accessing menus and toolbars located along the top of the display. With this in mind, the often occluded pixels near the bottom right are even more likely to be occluded.
140
(a) tap (b) circle
Figure 4-17. Pixels most likely to be occluded. Given a uniform distribution of pen positions: (a) tap task; (b) circle task. Darker pixels are occluded more often, the lower row is visually augmented to show areas with greater than 50% concentration.
Discussion
The results of this experiment reveal four main findings:
1. A large portion of the display can be occluded depending on pen position; with our participants it was typically as high as 38%, but could range up to 47%.
141
2. The pixels immediately below the pen position are not occluded by the hand as much as we expected, but more pixels are occluded above the pen tip horizon than previously thought.
3. Individuals typically have a signature occlusion silhouette, but silhouettes for different individuals can have large differences.
4. There appears to be no simple relationship between the size of the occluded area and anatomical size.
The Impact of Grip Style
The difference in occlusion silhouettes is due to a combination of different pen grip styles (Figure 4-18) with some contribution of anatomical size. In chapter 2, we discussed research investigating and classifying pen grips in traditional pen and paper (Figure 2-9) as well as Wu and Luo’s (2006a) observations of grips used when operating a Tablet PC (Figure 2-18). We did not see the extreme style of grip Wu and Luo describe, but observed more traditional pen grip styles observed in writing. While we could classify our participant grips as variants of the tripod grip or perhaps even inefficient power grips, these classifications alone do not provide a relationship to the area and shape of the occlusion silhouette. Instead, describing pen grip according to three basic dimensions would be more useful: the size of the fist, the angle which the pen is held, and the height of grip on the shaft of the pen. We believe it is these characteristics of grip style that interact with anatomical measurements and ultimately govern occlusion area.
142
(a) (b) (c) Figure 4-18. Video stills of observed grip styles. (a) loose fist, low angle, medium grip height; (b) tight fist, high angle, high grip height; (c) loose fist, straight angle, low grip height.
Left-handed Users
We conducted a small follow-up study with two left-handed users. Similar to Hancock and Booth’s finding with performance (2004), we found that the left-handed data mirrored the right-handed individuals (Figure 4-19).
143
(a) tap task
12
(b) circle task
12
Figure 4-19. Left-handed participant results. Individual occlusion shapes on left, pixels most likely to be occluded on right for (a) tap task; (b) circle task. Darker pixels are occluded more often, the right row is visually augmented to show areas with greater than 50% concentration. Influence of Clothing
We gathered our data for sleeveless participants to maintain a consistent baseline, but we recognize that size of the occlusion silhouette could be much larger when clothed (consider using a tablet outside while wearing a parka and mittens, or even with a loose fitting sweater or jacket). As a general rule, Pheasant and Hastlegrave (2006) suggest adding 25 mm to all anatomical dimensions for men and 45 mm for women to account for thickness of clothing.
4.3 Experiment 4-2: Performance
In this experiment, we wish to measure the effect of occlusion on performance with three fundamental GUI selection tasks: tapping, dragging, and tracing. We extend the work of Hancock and Booth (2004) with the addition of dragging and tracing, multiple target distances, a visible condition for the end target, and controlling for occlusion in a single direct input context. In the language of Forlines and Balakrishnan (2008), tapping is a discrete task, and dragging and tracing are continuous tasks.
144
Previous work using indirect devices has already established that tapping (or “clicking”) is faster than dragging (I. S. MacKenzie et al., 1991), and dragging is faster than tracing (also referred to as a “trajectory task”) (Accot & Zhai, 1997, 1999). This is because of the decreased freedom of movement as one goes from tapping to dragging and from dragging to tracing: when tapping, the location of the pen between targets has no effect on the success or failure of the task; when dragging, the pen must stay pressed against the surface of the display, but the intermediate 2-D location has no effect; and when tracing, the pen must not only be pressed down, but the 2-D path between targets is also constrained. Hancock and Booth (2004) and Forlines and Balakrishnan (2008) argue for an effect of occlusion based on a comparison of direct and indirect input conditions. However, de- coupling input and display space, especially when display orientation is changed, may introduce other hidden variables which confound a strict control for occlusion. We approach this differently, by controlling for occlusion in a single direct input context using a cross-hair visual augmentation (on the end target) intended to circumvent the effect of occlusion. We include two additional conditions for target visibility: where the end target is hidden until the start target is selected; and where the end target is always visible. Hancock and Booth (2004) only include the hidden condition since their motivation was to simulate a context menu. Our experiment has three main goals:
1. Investigate the effect of end target direction and distance on performance:
we expect that when the end target is occluded, or when an occluded portion of the display separates the current location from the end target, performance will suffer;
we expect this effect will be most pronounced with tracing, followed by dragging, and then tapping;
we expect this effect will be most pronounced when the end target is initially hidden.
2. Determine if the cross-hair visual augmentation mitigates the effect of occlusion:
we expect the cross-hair augmentation will have higher performance than other target visibilities.
145
3. Verify the relative performance of tapping, dragging, and tracing with a direct input pen device:
we expect lowest performance with tracing, followed by dragging, and then tapping;
we expect the ordering to be the same regardless of target visibility.
Participants and Apparatus
18 people (7 female, 12 male) with a mean age of 25.3 (SD 7.0) participated18. All participants wrote with their right-hand and were pre-screened for color blindness. As in Experiment 4-1, participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level physical behaviour. The apparatus is identical to Experiment 4-1: a Wacom Cintiq 12UX direct input pen tablet for input and display, fiducial markers attached around the tablet bezel, and a head- mounted video camera to capture the participant’s point-of-view (Figure 4-9).
Task and Stimuli
Participants were asked to complete three types of fundamental target selection tasks: Tapping, Dragging, and Tracing. Each of these tasks has a start target and end target which begin and end the task respectively. We introduced three variations for displaying the end target: Hidden, where the end target only appeared after the start target is selected; Visible, where the end target is visible from the beginning; and Crosshair which is the same as Visible except for the addition of a large crosshair to visually augment the target position. Note that the distinction between Visible and Hidden is based on when the target is rendered on the display, not according to whether the target was visible or hidden due to hand
18 In fact, we ran 22 participants. We used three participants at the beginning as a pilot to fine tune the experimental conditions. After running 18 more participants for the main study, initial analysis found that one participant was an outlier. Their overall mean time was almost twice as slow as all other participants, so that during outlier removal, 37% of their non-error trials would have been removed. This may have been due to an emphasis on accuracy rather than speed (their mean error rate was 1.2% error rate overall, compared to the overall participant mean closer to 5%), or because they were our oldest participant at 50 years of age, a factor shown to reduce performance in pointing tasks (Worden, Walker, Bharat, & Hudson, 1997). We decided to remove this participant and run one additional person in their place.
146
occlusion. Controlling for hand occlusion a-priori is difficult given different occlusion silhouettes we observed in Experiment 4-1. We chose the Visible and Hidden types of end target reveal based on common situations we observed in our observational study in chapter two. The hidden condition models cases such as selecting a tab which in turn reveals a palette of widgets for subsequent selection. The visible condition models cases such as pressing a button on a visible toolbar. We added the Crosshair visibility condition in an effort to minimize the effect of occlusion within a single direct input context. The crosshair is composed of vertical and horizontal gray lines that extend the full length and width of the display and intersect behind the end target. In theory, with the addition of this cross hair, participants should locate the end target more quickly even when it is occluded by their hand (Figure 4-20).
(a) (b) Figure 4-20. Crosshair to minimize effect of occlusion. (a) the end target is initially occluded (indicated with a dashed outline), but visually augmented by the crosshair; (b) the end target is selected.
Details of Tasks
The Tapping task requires selecting a start target and then an end target in rapid succession, both 6.1 mm (30 px) square (Figure 4-21a). The Dragging task required the participant to first tap the pen down on a red, drag-able start target which was 6.1 mm (30 px) square, then, with the pen pressed down, drag it so that it was positioned completely inside the bounds of a 12.2 mm (60 px) square docking area end target displayed as a red dashed outline (Figure 4-21b). The centre of the start target snaps to the pen position, so the 12.2 mm dock and 6.1 mm drag-able target required
147
the same precision as taping a 6.1 mm target. There was no constraint on the path between start target and dock. When an error occurred, the start target remained at the last position instead of forcing the participant to repeat the trial again from the beginning. Like tapping, there are two visibility variants: the dock is visible before the start target is dragged, and the dock is hidden until after the drag has begun. The Tracing task required the participant to draw a line from a start position to an end position while keeping the pen within an 8.2 mm (40 px) wide, straight path (Figure 4-21c). Progress was displayed as an ink trail and errors occurred when the pen crossed the path boundary lines or continued past the end position. If an error occurred, a new start position was displayed at the farthest point traced thus far, and the participant could continue from there. The visibility variant applies to tracing as well; with the hidden variant showing the path only after the participant begins drawing in the start position.
initial state
task completed
(a) Tapping (b) Dragging (c) Tracing Figure 4-21. Tapping, Dragging, and Tracing tasks. Examples are for the Visible end target condition. For each task, the initial state on top row with start target in red and end target in blue. Final state after task completion is shown on bottom row. An in progress state is shown in between. The Hidden condition is similar, except the blue end target is not rendered in the initial state.
Note that only a single target size was used for each task, since unlike previous work examining precision (Ren & Moriya, 2000) or modeling a device with Fitts’ law (I. S. MacKenzie et al., 1991), we are primarily interested in the effect of occlusion given target distance and direction. The distance and direction of the end target relative to the start target was experimentally controlled to cover eight radial directions and four distances (see Figure
148
4-22). The position of the start target was randomly placed to prevent participants from anticipating a trial.
Time: 00:35 Vogel_Daniel_J_201006_PhD_video_4_2.mp4 Video 4-2. Performance experiment demonstration.
N -90°
142.9 mm (700 px)
-135° -45° NW102.0 mm NE (500 px)
61.2 mm (300 px)
20.4 mm (100 px)
180° 0°
W E
135° 45° SW SE
S 90° Figure 4-22. Target directions and distances.
Design
A repeated measures within-participant factorial design was used. The independent variables were: Task (Tapping, Dragging, Tracing); Visibility (Crosshair, Visible, Hidden); Direction (N, NE, E, SE, S, SW, W, NW); and Distance (20.4, 61.2, 102.0, 142.9 mm) (100, 300, 500, 700 px).
149
The presentation of the 3 Tasks were counter balanced across participants and the 3 Visibilities we presented in order, from most visible to least visible (first Crosshair, then Visible, and last Hidden). Ideally, both Task and Visibility would have been fully counter balanced, but this would have required 9 participant orders. We felt that a progression of Visibility was more acceptable since any transfer effect would most likely reduce the strength of significant effects, making our analysis more conservative. All 8 Directions and 4 Distances were presented in random ordering as a block of trials. The position of the start target was also randomized. In early pilots, we tried placing the start target at a constant location at the centre of the display like Ren and Moriya (2000), but participants were able to more easily anticipate the length and direction of the end target. In addition, Hancock and Booth (2004) did not find an effect for start position on their larger display, so a fully random position is reasonable. There were 3 blocks for each Task and Visibility combination creating 3 repetitions of each direction and distance trial. Before each new Task, the experimenter gave a demonstration and the participant completed 9 practice trials. In summary, the experimental design was: 3 Tasks (Tapping, Dragging, Tracing) × 3 Visibility styles (Crosshair, Visible and Hidden) × 3 Blocks × 4 Distances (20.4, 61.2, 102.0, 142.9 mm) × 8 Directions (N, NE, E, SE, S, SW, W, NW) = 864 data points per participant
Data Preparation
Response time measurements tend to be positively skewed, since it is possible for a single trial time to be many times longer than the mean, but virtually impossible for the inverse to occur. This can be due to loss of concentration, momentary hesitation, environment distraction, or hardware problems. We compensate by removing outliers (I. S. MacKenzie & Buxton, 1992). We removed 574 error-free trials (3.7% of 15552 total trials)
which had a selection time more than 3 SD away from the cross-participant mean for a corresponding combination of Task, Visibility, Direction, and Distance.
150
Results
Repeated measures analysis of variance (ANOVA) showed that order of presentation of Technique had no significant effect on time or errors, indicating that a within-subjects design was appropriate.
Learning and/or Fatigue Effects
A 3 × 3 (Block × Task) within subjects ANOVA found a significant interaction between
Block and Task on selection time (F2,30 = 4.352, p = .022, Greenhouse-Geisser adjustment). A post hoc analysis19 found one significant difference between Blocks 2 and 3 for Tapping(p = 0.001). The mean selection time was 22 ms slower in Block 3 compared to Block 2, with
mean times of 663 ms (SE 18.0) and 641 ms (SE 18.3) respectively. Another 3 × 3 (Block ×
Task) within subjects ANOVA also found a significant interaction between Block and Task on
error rate (F4,60 = 2.570, p = .047). A post hoc analysis found significant differences between Block 3 and Blocks 1 and 2 for Tapping (p < .04). The mean error rate for Block 3 was .02 higher than Block 1 and .023 higher than Block 2, with mean rates of .043 (SE 0.8), .040 (SE
0.9), and .063 (SE 1.1) for Blocks 1, 2, and 3 respectively. These results suggest no learning effects are present, but that participants experienced fatigue at Block 3 during the Tapping task. Although the effect is relatively small, we decided to remove this block from subsequent analysis.
Error and Time by Visibility
To determine if the cross-hair visibility augmentation was successful in mitigating occlusion, we examine its overall effect in terms of error rate and selection time. We aggregated the data by Task and Visibility and performed two 3 × 3 (Task × Visibility) within
subjects ANOVAs using error rate and selection time as dependent variables. For error rate, most values were reasonably low between 3% and 6%. We did not find a
main effect for Visibility, but did find a significant Task × Visibility interaction (F4,68 = 11.776, p < .001). A post hoc multiple means comparison found that for Tracing, Hidden had a higher error rate than Visible, for Dragging, Hidden had a slightly lower error rate than
19 All post-hoc analyses use the conservative Bonferroni adjustment.
151
Visible, and for Tapping, Hidden had a slightly lower error rate than Crosshair (Figure 4-23). There were no Task differences between Crosshair and Visible.
0.10 Crosshair Visible Hidden
0.08
0.06
Error Rate Error 0.09 0.04 0.06 0.06 0.06 0.05 0.02 0.04 0.04 0.03 0.03
0.00 Tapping Dragging Tracing Task Figure 4-23. Error rate by Task and Visibility. Error bars are 95% confidence interval.
For selection time, we found a main effect for Visibility (F1.5,25.1 = 528.359, p < .001,
Greenhouse-Geisser adjustment) and a significant Task × Visibility interaction (F4,68 = 28.060, p < .001). A post hoc multiple means comparison found all Visible conditions significantly different overall, with Crosshair 14 ms slower than Visible, and Hidden 227 ms slower than Crosshair. Within each task, a post hoc multiple means comparison found a similar pattern (Figure 4-24): Hidden slower than Crosshair and Visible (at least 160 ms, 209 ms, and 310 ms slower for Tapping, Dragging, and Tracing respectively). Crosshair and Visible were only significantly different in the Dragging Task, with Crosshair 30 ms slower in that case. Note that although it is tempting to compare performance between Tasks in Figure 4-24, this does not consider the relative task index of difficulty (ID). Later, we compare Tasks by modelling them with Fitts’ law based on ID.
152
1.4
1.2 Crosshair Visible Hidden
1.0
0.8
0.6 1.27 1.02 0.96 0.95 Selection Time (s) 0.4 0.75 0.81 0.78 0.59 0.59 0.2
0.0 Tapping Dragging Tracing Task Figure 4-24. Selection time by Task and Visibility. Error bars are 95% confidence interval.
Our analysis of selection times and error rates for the Crosshair condition for each Task by Angle, and by Angle × Distance, did not reveal any obvious differences from the Visible condition. This, together with the results discussed in the preceding paragraphs, suggests that the Crosshair visibility condition did not provide any significant benefit beyond the Visible condition. Therefore, to simplify reporting, we removed Crosshair from subsequent analysis, and focus on differences between Visible and Hidden.
Angle × Distance Interactions for Selection Time and Error Rate
The data was aggregated for each Task and Visibility by Direction and Distance across blocks for subsequent analysis. We analyze each Task and Visibility combination individually with six separate 8 × 4 (Direction × Distance) within subjects ANOVAs. The Direction × Distance interaction is the most useful for interpretation. For selection time, the Direction × Distance interaction was significant for all Task and
Visibility combinations (all F21,357 > 3.3, p < .001). Post hoc multiple means comparisons found many differences. Figure 4-25 illustrates Directions which are significantly slower than three or more other Directions within the same Distance: this highlights more pronounced trends. For the most part, more significant differences were found as Task becomes more complex (Tapping to Dragging to Tracing) and when the end target is Hidden. With the exception of Tapping, Hidden (Figure 4-25b), the shortest Distance has the fewest significantly slower Directions. Most interesting is that overall, the most common Directions within a Distance to be significantly slower are NW, E, and SE.
153
(a) Tap,Visible N (b) Tap, Hidden N 1.0 1.0
NW NE NW NE
0.5 0.5
W 0 E W 0 E
SW SE SW SE
S S
(c) Drag,Visible N (d) Drag, Hidden N 1.5 1.5
NW NE NW NE 1.0 1.0
0.5 0.5
W 0 E W 0 E
SW SE SW SE
S S
(e) Trace,Visible N (f) Trace, Hidden N 2.5 2.5
NW 2.0 NE NW 2.0 NE 1.5 1.5
1.0 1.0
0.5 0.5
W 0 E W 0 E
SW SE SW SE
S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-25. Mean Selection Time by Distance, Direction, Task, and Visibility. Longitudinal axes in seconds, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. Note that the most common Directions within a Distance to be significantly slower are NW, E, and SE.
154
For error rate, the Direction × Distance interaction was only significant for the Tracing
and Hidden combination (all F21,357 = 1.794, p = .018). A post hoc multiple means comparison found that the E Direction had a higher error rate than NW, N, and W at the shortest Distance ,and a higher error rate than S at the longest Distance (Figure 4-26).
Trace, Hidden N 0.4
NW NE
0.2
W 0 E
SW SE
S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-26. Error rate by Distance, Direction for Tracing and Hidden Visibility. Longitudinal axes is error rate, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than one or more Directions in the same Distance band are circled.
To explore error in the Tracing task in more detail, we examine the type of error which occurred. During the experiment, we logged three types of errors for the Tracing task: a premature pen lift before the entire line was traced (Up error); when the pen went beyond the width of the tracing boundary (Tolerance error); and when the pen overshot the end of the line (Overshoot error). Note that only a single error type can occur with the Tapping and Dragging tasks. We analyze each of the Tracing task error types with each Visibility individually using
six separate 8 × 4 (Direction × Distance) within subjects ANOVAs. The only significant Direction × Distance interaction was with the Hidden visibility and Overshoot error type
(F5.5,91.1 = 3.117, p = .01, Greenhouse-Geisser adjustment). A post hoc multiple means comparison did not find any differences, most likely due to the conservative Bonferroni adjustment and the large number of comparisons. However, a plot may suggest some trend towards more Overshoot errors in the E Direction (Figure 4-27).
155
Trace, Hidden N 0.2
NW NE
0.1
W 0 E
SW SE
S 20.4 mm (100 px) 142.9 mm (700 px) Figure 4-27. Overshoot errors with Tracing Task and Hidden Visibility. Longitudinal axes is mean number of overshoot errors, latitudinal axes compass direction. Only the shortest and longest Distances are shown, since only they had significant differences with overall error rate.
Other Performance Measures
To examine movement characteristics during each trial, we use two accuracy measures proposed I. S. MacKenzie, Kauppinen, and Silfverberg (2001) and a third measure we devised which is more specific to pen input. Movement Direction Change (MDC) captures the rate at which the movement path changes direction (Figure 4-28b) relative to the ideal task path (Figure 4-28a). To normalize across distances, we report this statistic as number of changes per cm. Movement Error (ME) captures how far the movement path deviates from the ideal task path between start and end target (Figure 4-28c):
∑| | (4-1)
where is the distance from each movement sample point to an ideal left-to-right task trajectory along 0. For calculation purposes, we transform all movement samples and target locations to this simplified left-to-right task trajectory reference frame. We report ME in mm. Out-of-range (OOR) captures the rate at which the pen is lifted out of the digitizer’s tracking range (Figure 4-28d). This statistic is only relevant to the Tapping task, since a pen
156
lift in Dragging or Tracing would cause an error. We report this statistic as number of out-of- range events per cm.
(a) idealized movement trajectory
(b) Movement Direction Change (MDC)
(c) Movement Error (ME)
(d) Out-of-range (OOR)
Figure 4-28. Illustration of other performance measures. Illustration based on Figures 1, 3, and 4 in I. S. MacKenzie et al. (2001)
We chose these measures to capture qualities of movement which are most likely caused by occlusion. VDC and ME should capture movement deviation strategies used to visibly acquire an occluded end target during the Tapping and Dragging tasks. MDC should capture occlusion-related hesitation or contortion symptoms during the Tracing task. As before, the Direction × Distance interaction is perhaps most useful for interpretation. For this analysis, as with selection time analysis, the data was aggregated for each Task and Visibility by Direction and Distance across blocks for subsequent analysis. We analyze each Task and Visibility combination individually with six separate 8 × 4 (Direction
× Distance) within subjects ANOVAs. To characterize overall differences between Tasks, we aggregated Task and Visibility across all Directions, Distance, and blocks, and perform a single 3 × 2 (Task × Visiblity) within subjects ANOVA. Although this level of analysis is specific to our experimental design, it does indicate a possible trend of a Task and Visibility effect with these performance measures.
Movement Direction Change (MDC)
Overall, there was a significant Task × Visibility interaction on ME (F2,34 = 52.315, p < .001). Post hoc multiple means comparisons found differences between all Tasks in the Visible condition and between Tracing and the other two Tasks in the Hidden condition
157
(Figure 4-31). The slower, constrained movement while Tracing may introduce more direction changes as the user adjust their posture to see beyond their hand.
2.0
Visible Hidden 1.5
1.0 1.61
1.15
MDC (number per cm) per (number MDC 0.5 0.96 0.92 0.76 0.82
0.0 Tapping Dragging Tracing Task Figure 4-29. Movement Direction Change (MDC) by Task, and Visibility. Error bars are 95% confidence interval.
The Direction × Distance interaction with MDC was significant for all three Tasks with
the Hidden Visibility and the Tracing Task with Visible Visibility (all F21,357 > 1.8, p < .02). Post hoc multiple means comparisons found several differences, but only in the Tracing Task were there Directions which are significantly greater than three or more other Directions within the same Distance (Figure 4-30). These cases appear exclusively in the E Direction.
158
(a) Tap, Hidden N (b) Drag, Hidden N
3.0 3.0 NW NE NW NE
2.0 2.0
1.0 1.0
W 0 E W 0 E
SW SE SW SE
S S
(c) Trace,Visible N (d) Trace, Hidden N 3.0 3.0 NW NE NW NE
2.0 2.0
1.0 1.0
W 0 E W 0 E
SW SE SW SE
S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-30. Movement Direction Change (MDC) by Distance, Direction, Task, and Visibility. Longitudinal axes in number per cm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. Note that Tapping and Dragging with Visible Visibility are not shown since no statistical differences were observed.
Movement Error (ME)
Overall, there was a significant Task × Visibility interaction on ME (F2,34 = 34.111, p < .001). Post hoc multiple means comparisons found differences between all Tasks in the Visible condition and between Tracing and the other two Tasks in the Hidden condition (Figure 4-31). In both Visibility conditions, Tracing has lower ME which is expected due to the constrained nature of the task – in fact, Tapping and Dragging had ME four times greater in the Hidden condition. Dragging has slightly lower ME than Tapping in the Visible condition.
159
5.0 Visible Hidden
4.0
3.0
ME (mm) 4.45 2.0 4.06
1.0 1.85 1.43 0.90 0.95 0.0 Tapping Dragging Tracing Task Figure 4-31. Movement Error (ME) by Task, and Visibility. Error bars are 95% confidence interval.
For ME, the Direction × Distance interaction was significant for all Task and Visibility combinations except the Tracing Task with Hidden Visibility (all F21,357 > 2.008, p < .001). Post hoc multiple means comparisons found several differences, but only a few Directions which are significantly greater than three or more other Directions within the same Distance (Figure 4-32). With the Hidden Visibility there are pronounced increases at certain Directions and certain Distances for Tapping and Dragging, concentrated around the E Direction. With Dragging these differences occur at the 102.0 mm and 142.9 mm Distances, but with Tapping, they occur at the 61.2 mm and 102.0 mm Distances. With the Visible Visibility, there are fewer pronounced increases, with almost all occurring with Dragging: at the SE Direction with the 61.2 mm and 102.0 mm Distances. Unlike the Tracing Task, the intermediate X-Y pen positions (and therefore ME) with Tapping and Dragging are unconstrained – ME will of course be much smaller and more regular for the constrained Tracing task.
160
(a) Tap,Visible N (b) Tap, Hidden N 12.0 3.0 NW NE NW NE 8.0 2.0
1.0 4.0
W 0 E W 0 E
SW SE SW SE
S S
N N (c) Drag,Visible (d) Drag, Hidden 12.0 3.0 NW NE NW NE 8.0 2.0
1.0 4.0
W 0 E W 0 E
SW SE SW SE
S S
N N (e) Trace,Visible (e) Trace, Hidden 12.0 3.0 NW NE NW NE 8.0 2.0
1.0
W 0 E W E
SW SE SW SE
S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-32. Movement Error (ME) by Distance, Direction, Task, and Visibility. Longitudinal axes in mm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. *The E Direction for the 102.0 mm Distance of the Dragging Task with Hidden Visibility is significantly different than all other Directions at that Distance.
161
Out-of-Range (OOR) For OOR, we only compare Visibility within the Tapping Task using a one-way within subjects ANOVA. An overall effect of Visibility was found (F1,17 = 14.884, p = .001) with higher OOR (more pen lifts) in the Hidden condition (see Figure 4-33 for values).
0.030
Visible Hidden 0.025
0.020
0.015
0.023 0.010 OOR (number per cm) per (number OOR 0.005 0.005 0.000 Tapping Dragging Tracing Task Figure 4-33. Out-of-Range (OOR) by Task, and Visibility. Error bars are 95% confidence interval. Note that because Dragging and Tracing require that the pen contact the display throughout, the OOR is 0.
The Direction × Distance interaction was significant for the Tapping Task and both
Visibility combinations (all F21,357 > 2.297, p < .001). At three E Directions, post hoc multiple means comparisons found differences which were significantly greater than three or more other Directions within the same Distance (Figure 4-34).
162
N N (a) Tap,Visible 0.05 (b) Tap, Hidden 0.10
NW 0.04 NE NW 0.08 NE 0.03 0.06
0.02 0.04
0.01 0.02
W 0 E W 0 E
SW SE SW SE
S S 20.4 mm (100 px) 61.2 mm (300 px) 102.0 mm (500 px) 142.9 mm (700 px) Figure 4-34. Out-of-Range (OOR) by Distance, Direction, Visibility for Tapping Task. Longitudinal axes in number per cm, latitudinal axes compass direction. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled.
Discussion
Recall that our experiment had three main goals: investigate the effect of end target direction and distance on performance; determine if the cross-hair visual augmentation mitigates the effect of occlusion; and verify the relative performance of tapping, dragging, and tracing with a direct input pen device. We now discuss how our findings relate to these goals.
The Effect of Target Direction and Distance and Occlusion
Overall, across dependent variables, there is a pattern where most significant differences are in the E and SE Directions. A comparison of our Tapping and Hidden Visibility selection results with Hancock and Booth (2004) reveals the same pattern, with most frequent significant differences at the E Direction.
163
(c) comparison N 1.0 (a) Hancock et al. (b) our results
N N NW NE 1.0 1.0 NW NE NW NE 0.5
0.5 0.5
W 0 E W 0 E W 0 E
SE SW SE SW
S S SW SE
S 35.0 mm, Hancock et al. 20.4 mm (100 px) 61.2 mm (300 px) Figure 4-35. Comparison with Hancock and Booth’s results. (a) Hancock et al.’s results for right-handed, direct horizontal condition; (b) our results for similar distances and our equivalent Tapping and Hidden Visibility condition; (c) a combination of both graphs for comparison. Hancock et al. use the same 6.1 mm target. Direction data points which are significantly greater (p < .05) than three or more Directions in the same Distance band are circled. (data from Hancock & Booth, 2004)
These common significant differences are typically more frequent with target Distances greater than 20.4 mm, and with Hidden Visibility. Based on selection time alone, Tracing has more numerous significant differences in the E and SE directions across Distances compared to Dragging or Tapping. Recalling our results in Experiment 4-1 for the mean occlusion shape (Figure 4-16), these differences seem to coincide with the occluded area. To explore this observation more specifically, we created a mean occlusion shape using the participants in Experiment 4-2. We processed occlusion silhouettes captured at the moment participants successfully selected the start target20. Unlike Experiment 4-1, we did not have a strict control for this position. Instead, we selected all starting positions that were within 31 mm of the left side of the display, and between 82 mm and 123 mm from the top of the display. These bounds were chosen such that the majority of the hand and forearm would be inside the display area, and are similar to the positions sampled in Experiment 4-1. There were about 20 samples from each participant, roughly spread across Task and Visibility
20 We had to exclude participants 5 and 6 due to lighting problems making silhouette isolation difficult.
164
conditions. We used the same image processing steps for frame extraction, rectification, and isolation described above in Experiment 4-1. The results match our expectation that the E Direction is most often occluded, with some occlusion evident in the SE and NE Directions as well (Figure 4-36). Focusing on the area with greater than 50% concentration (more than half of the silhouettes cover this area), we see that the 61.2, and 102.0 mm Distances are centrally located within this often occluded area in the E Direction. It should follow that targets located at these positions have the greatest chance of being occluded. Referring back to our results for selection time (Figure 4-25) we see that the 102.0 mm Distance has more than 3 significant differences in the E Direction for all Task and Visibility combinations except Tracing with Visible Visibility, and the 61.2 mm Distance has more than 3 significant differences in the E Direction for all Task and Visibility combinations except Tapping and Dragging with Visible Visibility. Moreover, for Movement Error (ME), Tapping and Dragging, have pronounced significant spikes in the E Direction for the 102.0 mm Distance.
N -90°
142.9 mm (a) -135° -45° (b) NW NE 102.0 mm
61.2 mm
20.4 mm
180° 0° W E
135° 45° SW SE
S 90°
Figure 4-36. Comparison of target position and mean occlusion silhouette. (a) target positions with mean occlusion silhouette superimposed and visually augmented to show silhouette areas with greater than 50% concentration; (b) mean occlusion silhouette. Note the similarity of this mean occlusion shape with the ones reported in Experiment 4-1 (Figure 4-16).
165
To further visualize a possible effect of occlusion to illustrate these differences in movement time and ME, we generated motion paths for Dragging and Tapping for the 61.2 mm (Figure 4-37) and 102.0 mm (Figure 4-38) Distances. Note the greater variety of paths used when the target was Hidden at the E Direction for Tapping and Dragging, especially at the 102.0 mm Distance. This matches our observations regarding inefficient dragging movements into an occluded area in the observation study of chapter 3.
166
(a) Tapping -Visible (b) Tapping - Hidden
N N NW NE NW NE
W E W E
SW SE SW SE S S
(c) Dragging -Visible (d) Dragging - Hidden
N N NW NE NW NE
W E W E
SW SE SW SE S S
significant differences in: selection time ME. MDC, or OOR Figure 4-37. Motion paths by Direction for 61.2 mm Distance. All error-free trials shown across all participants. Directions which have significantly greater selection time, ME, MDC, or OOR than three or more other Directions for are circled.
167
(a) Tapping -Visible (b) Tapping - Hidden
N N NW NE NW NE
W E W E
SW SE SW SE S S
(c) Dragging -Visible (d) Dragging - Hidden
N N NW NE NW NE
W E W E
SW SE SW SE S S
significant differences in: selection time ME. MDC, or OOR Figure 4-38. Motion paths by Direction for 102.0 mm Distance. All error-free trials shown across all participants. Directions which have significantly greater selection time, ME, MDC, or OOR than three or more other Directions for are circled.
Based on the comparison with the mean occlusion silhouette, the general pattern of motion paths for the E Direction, and the quantitative results, there does appear to be evidence that when the end target is occluded, or when an occluded portion of the display separates the current location from the end target, performance will suffer due to occlusion.
168
We do not illustrate Tracing motion paths since the constrained nature of the task makes them visually identical. Indeed, Tracing has much lower ME overall (Figure 4-31). However, for MDC, all three significant differences are in the E Direction and curves across Distances bulge somewhat in the E Direction (most pronounced in Hidden Visibility) (Figure 4-30). In addition, only Tracing with Hidden Visibility had a significant Distance by Direction interaction with significantly greater error rates in the E direction only (Figure 4-26). Although not significant, there appears to be a possible trend towards more Overshoot errors in the E Direction (Figure 4-27). After reviewing the video logs for a sampling of Tracing trials, we believe this is due to hesitation and posture changes to view the occluded area mid task. These often cause the pen position to momentarily pause and backtrack slightly, before continuing in the intended direction. Another symptom of this behaviour is the significant interaction of error rate and Direction for Tracing in the NE Direction. The astute reader will notice that the NW and W Direction also has several significant differences in selection time and ME. Hancock and Booth (2004) also report a similar pattern. These cannot be due to occlusion since targets at these locations are far away from the occluded area (based on the mean silhouettes in Figure 4-25). Moreover, the movement paths (Figure 4-37 and Figure 4-38) do not reveal any obvious visual differences in these directions. For selection time, there are actually more significant differences in the NW and W with Visible Tapping and Dragging with the 142.9 mm Distance, suggesting that this non- occlusion effect is related to target distance.
Cross-Hair Augmentation
The cross-hair visual augmentation had no positive effect on selection time or error, behaving virtually the same as the standard visible target condition. There are two possible reasons for this: occlusion has little or no effect when the goal target is already visible on the display; the visualization failed to counteract the effect of occlusion. We believe it is, in fact, a combination of these reasons. Compared to Hidden, there were fewer significant differences in the Visible condition in Directions most likely to be occluded (E, SE and NW). This suggests that occlusion may be a small factor in performance when the end target is Visible. However, the motion paths in the Visible conditions still show more variance in the E, SE and NW Directions, indicating
169
that although it may be small, an effect for occlusion is present. If the effect of occlusion is small in this case, any further compensation for occlusion can only make a small, but perhaps undetectable improvement. The simple cross-hair visualization may not have provided enough assistance to overcome an effect of occlusion. To use it effectively, the user needs solve the visual intersection of the horizontal and vertical components quickly and move accordingly. The mental overhead to do this may counteract any possible benefit. We originally designed a halo visualization, which has been shown to be effective for locating off-screen targets (Baudisch & Rosenholtz, 2003). To make this effective, the halo had to be larger than the largest hand, and pilot participants found this to be distracting. In fact, even with the minimal 1 pixel grey cross-hair visualization, several participants said that it was distracting.
Relative Performance of Tracing, Dragging, and Tapping
To compare the relative performance of Tapping, Dragging, and Tracing, it is most useful to perform a Fitts’ law linear regression analysis and compare the resulting model parameters. Since we use a single target size and remove error trials, we use target width rather than computing effective width (I. S. MacKenzie, 1992) when calculating ID. This simplifies analysis and interpretation and makes a comparison with the tracing task more reasonable since it has no effective width equivalent. We performed the regression on aggregated data points for each Task and Visibility by Direction and Distance across blocks. Plots of mean selection time by ID visually suggest good fits to Fitts’ law (Figure 4-39), but the computed r2 values (Table 4-1) are lower than those previously published. The low r2 values are a symptom of using many data points across participants in the regression, rather than single overall mean times by ID as is often done. Recall also that our experiment design only includes a single target width, and relatively small range of task IDs (at least for Tapping and Dragging), so these results are useful for comparing within our range of IDs, but are not definitive models. Our intent is not to perform a rigorous verification of Fitts’ law, but to use Fitts’ law to characterize the relative differences between Tasks under different Visibility conditions.
170
(a) Visible (b) Hidden
2.0 Tap Drag Trace 2.0 Tap Drag Trace
1.5 1.5
1.0 1.0 Selection (s) Time Selection (s) Time 0.5 0.5
0 0 0 5 10 15 20 0 5 10 15 20 Index of Difficulty (ID) Index of Difficulty (ID)
Figure 4-39. Relationship of Time to Index of Difficulty (ID). (a) Visible Visibility; (b) Hidden Visibility
Visibility Task Model r2
Visible Tapping 152 + 122 ID .53
Dragging 42 + 205 ID .71
Tracing 228 + 73 ID .79
Hidden Tapping 501 + 70 ID .24
Dragging 467 + 154 ID .36
Tracing 568 + 72 ID .73
Table 4-1. Linear regression values for Time from Index of Difficulty (ID).
The relative ordering of Task in each Visibility condition is somewhat surprising. Dragging is slower than Tapping and Tracing for equivalent IDs in both Visible and Hidden Visibilities. Tapping is slower than Tracing in the Visible condition, but faster in the Hidden condition. This may be partially explained by how the Tracing Task is rendered on the display – the user needs to simply follow the path from start to finish, whereas Dragging and Tapping provide no intermediate visual guidance. However, in the Hidden condition, since Tapping is a discrete task, the user can lift their hand to survey the display which may provide a greater benefit than following the Tracing path (Forlines & Balakrishnan, 2008).
171
A comparison of the time intercept values (the constant value in each model, Table 4-1) shows that the Hidden has a large constant time overhead of about 0.5 s, regardless of task. This is most likely due to an initial visual search time to find the initially hidden target. Both Dragging and Tapping have lower slope values with Hidden Visibility (the coefficient of ID in each model, Table 4-1), showing that the increase of selection time with increasing ID is more gradual. Over our range of IDs, Dragging and Tapping in the Hidden condition are slower – but according to our regression models, Visible Tapping would become slower than Hidden Tapping at an ID of 6.75 and Visible Dragging would become slower than Hidden Dragging at an ID of 8.3. This seems very unlikely to occur, and may be symptomatic of performing a regression on a narrow range of IDs.
4.4 Experiment 4-3: Influence on Hand and Arm Posture
In Experiment 4-1, we examined the shape and area of occlusion resulting from a typical, neutral posture. While these may account for the majority of interactions, users may occasionally contort their hand and wrist to alter the naturally occluded area. For example, Inkpen et al. (2006) observed left-handed users arching their hand when using a right-hand scrollbar, and we observed users adopting a high arching “hook” posture when adjusted image parameters using nearby slider widgets during the observational study reported in chapter 3. These two examples of posture contortion occur during simultaneous monitoring tasks: the user adjusts a parameter while at the same time monitoring feedback which occurs in a naturally occluded area. The goal of this qualitative experiment is to observe how participants adjust their posture to counteract occlusion while performing a simultaneous monitoring task.
Participants and Apparatus
20 people (12 female, 8 male) with a mean age of 26 (SD 9) participated. All were pre- screened for color blindness. As in Experiment 4-1 and 4-2, participants had little or no experience with direct pen input, but this is acceptable since we are observing a lower level
172
physical behaviour. We measured and recorded the same anatomical dimensions used in experiment one (Figure 4-6). The apparatus is identical to Experiments 4-1 and 4-2: a Wacom Cintiq 12UX direct input pen tablet for input and display, fiducial markers attached around the tablet bezel, and a head-mounted video camera to capture the participant’s point-of-view (Figure 4-9).
Task
The high level task is to adjust a numeric value using a slider so that it matches a target value. To begin the task, the participant tapped a 13.0 × 26.3 mm (64 × 128 px) home target positioned the same as Experiment 4-1: 52.0 mm (255 px) from bottom of display at the extreme right side (Figure 4-41b). The slider is located at the centre left side of the display, oriented horizontally and 41.0 × 4.9 mm (200 × 24 px) in size with a 4.0 mm (20 px) wide drag-able thumb (Figure 4-40a). The participant’s objective is to drag the slider thumb, to adjust the numeric value so that it matches a single target value displayed beside the current slider value in a feedback box (Figure 4-40b), both set in a 40 pt font. After the target value is matched, the feedback box turns red with an equality sign (Figure 4-40d), and a 8.1 × 4.9 mm (40 × 24 px) red continue button appears beside the slider (Figure 4-40d). The participant taps this button to continue to the next task.
continue target (c) in progress state (a) slider -4 ≠ 10
thumb (d) satistfied state -4 ≠ 10 10 = 10 (b) feedback box
Figure 4-40. Simultaneous monitoring task. (a) Slider with continue button; (b) feedback box with current value and target value; (c) In- progress state where the participant drags the red slider thumb to change the current value of -4 to match the target value of 10. (d) Satisfied state after the target value of 10 has been reached and held constant for 250 ms: the next task button is now red and the thumb grey.
173
Time: 00:27 Vogel_Daniel_J_201006_PhD_video_4_3.mp4 Video 4-3. Simultaneous monitoring demonstration.
We took steps to ensure that participants actually monitored the displayed value while simultaneously manipulating the slider. First, the slider’s numeric scale, direction, target, and start position were randomized before each trial to prevent participants from using the slider thumb position to visually locate the target value. Second, the slider value had to rest at the target value for more than 250 ms before correct match feedback appeared. This prevented participants from watching for a red flicker on the continue button to locate the target value. Finally, target values were selected so they were never the minimum or maximum slider value. This prevented locating the target value by simply pushing the thumb all the way to the left or right. Note that during the experiment we did not observe any participants “cheating” – all performed the simultaneous monitoring task as intended. The specific steps to select values for the task were:
1. A left value for the slider was randomly selected in the range [-100, 100].
2. A right value for the slider was randomly set to be one of left + 50 or left – 50, creating a random direction.
3. A random start value was selected in the range [left, right].
4. A random target value was selected in the range [left, right], such that | target – start | > 4, | target – left | > 4, and | target – right | > 4.
174
(a) -90° (b)
-60°
-30°
0° 31 mm
30°
120° 60° 90° Figure 4-41. Simultaneous monitoring task positioning. (a) the target value is shown in a feedback box which is placed at one of 8 different positions around the slider; (b) the positioning of the slider, feedback box, and start target relative to the display
Design
The target value display was placed at 8 different positions (Figure 4-41a) on a 31.0 mm (150 px) radius arc from -90° to 120°. These positions were chosen from pilot experiments and cover a wide enough range to include what we expect to be normally occluded and not occluded regions. Three blocks were presented; each block presented all positions in random order. In summary, the experimental design was: 8 target value Positions × 3 Blocks × = 24 observations per participant
Data Preparation
We removed 7 outlier trials (1.5% of 480 total trials) which had a task time more than
3 SD away from the cross-participant mean for corresponding Positions.
175
Results
There are two primary dependent variables:
Errors: Since a participant could encounter multiple errors during a single trial, our error measure is the mean number of error occurrences per trial.
Completion Time: This is the total time from successful selection of the start target until selection of the continue button.
Note that completion time includes all trials regardless of whether errors are encountered. Unlike experiments measuring low level movements such as Fitts’ law target selection, our task is inherently more complex and the time to recover from errors is a natural part of task completion.
Learning and/or Fatigue Effects
A one-way within subjects ANOVA found a significant main effect for Block on number 21 of errors (F2,38 = 5.933, p = .006). A post hoc multiple means comparison found that Block
3 had .24 more errors than Block 2 (but not Block 1), with means of .47 (SE .06) for Block 1,
.40 (SE .06) for Block 2, and .63 (SE .1) for Block 3. A one-way within subjects ANOVA did not find a significant main effect for Block on completion time. Since completion time includes error recovery time and the mean number of errors for Block 3 was not significantly different than Block 1, we felt the fatigue effect was slight and decided to not remove any blocks in the subsequent analysis.
Errors
We aggregated errors by Position across blocks to perform a one-way within subjects
ANOVA. There was no significant main effect for Position (using Greenhouse-Geisser adjustment).
21 All post-hoc analyses use the conservative Bonferroni adjustment.
176
Completion Time
Recall that we used random start and end values for each trial. This was done to prevent our participants from completing the task using the visual slider thumb distance to “cheat” – we wanted them to rely on the current value shown in the feedback box. However, after running the experiment, we realized that this may introduce a confounding effect on completion time. According to Fitts’ law, changing the distance between start and end thumb positions while the target width is held constant will produce different times. Our random start and end values could result in Fitts’ law IDs between 2.3 and 5.4 (given the small 4 px target). However, because the end target position is essentially hidden throughout, the movement cannot be pre-planned as in a classic Fitts’ law task. Yet, the Pearson product-moment correlation coefficient (Pearson’s r) was .301 and found to be significant in a two-tailed test (p < .001). However, this is a relatively low correlation, and since the target position is intentionally hidden, the confounding effect is relatively weak. Thus, we continue with our analysis of completion time, but with some caution. In chapter 6, we report our results for a refined version of this experiment using a constant distance between start and end values, and confirm the pattern of these results. We aggregated completion time by Position across blocks to perform a one-way within
subjects ANOVA. There was a significant main effect for Position (F7,133 = 5.640, p < .001). A post hoc analysis found Position -30°, 0°, and 30° to be significantly slower than at least one other Position: -30° was 707 ms slower than 120°; 0° was 1110 ms slower than 90° and 1240 ms slower than 120°; and 30° was 482 ms slower than 120° (Figure 4-42).
177
-90° -60° 6
4 -30°
Time (s) 2
0 0°
30°
120° 60° 90° Figure 4-42. Completion time by target box Position. Position data points which are significantly greater (p < .05) than one or more Positions are highlighted with a circle.
Pen Azimuth Angle
Since this task used slower, consistent pen movement which were centrally located on the device, the pen tilt data exhibited much less noise than in the previous two experiments. We were able to extract the azimuth angle of the pen at the moment the values were successfully matched. This data reveals that participants hold the pen such that the azimuth angle samples form two clusters, one above and one below 0° (Figure 4-43a). The mean azimuth angle reveals that participants move from below 0° to above 0° (Figure 4-43b), roughly opposite to the progression of target box Positions (Figure 4-41a).
(a)
-90° -60° -30° 0° 30° 60° 90° 120°
-10.3° -17.3° (b) 0.0° 1.6° 21.6° 30.0° 30.9° 27.8°
Figure 4-43. Pen azimuth angle by target box Position. (a) all pen azimuth angles across participants; (b) mean pen azimuth angle in thick red, standard deviation shown on either side in magenta.
178
Occlusion Silhouettes
To explore the pattern of hand postures suggested by mean pen azimuth angle, we created occlusion silhouettes for all 480 video frames captured at the moment a participant successfully matched the values. To compensate for different end positions for the scrollbar thumb, we registered all silhouettes using the pen location. We used similar image processing steps for frame extraction, rectification, and isolation described above in Experiment 4-1. However, instead of using the blue channel to isolate the hand, we transformed each RGB image into YCbCr colour space, and used the Cr channel (the red-difference chroma component). This had the advantage of capturing areas of the hand and forearm outside the display, but did not include the pen as part of the silhouette. Since we are not calculating quantitative occlusion statistics as in Experiment 4-1, this trade-off is acceptable – in fact, it is somewhat advantageous since the area left out by the pen can be seen in the silhouette. The mean silhouettes for each Position form two separate shapes, one above and one below (Figure 4-44a). By visually augmenting the mean silhouettes to show areas with greater than 70% concentration (Figure 4-44b), one can see the same “predominately below 0°” to “predominately above 0°” trend as seen with the mean pen azimuth angle (Figure 4-43).
(a)
-90° -60° -30° 0° 30° 60° 90° 120°
(b)
Figure 4-44. Mean occlusion silhouette by Position. (a) mean occlusion silhouettes; (b) mean occlusion silhouettes with visual augmentation showing areas with greater than 70% concentration
Discussion
Our results show that posture contortion used to compensate for occlusion does have an effect on performance. Although there was no effect for number of errors, task time was
179
longer when the target box was at Positions -30°, 0°, 30°, Positions which are likely to be occluded when compared with a mean occlusion silhouette (Figure 4-45).
-90° -60°
-30°
0°
30°
120° 60° 90°
Figure 4-45. Comparison of target box position and mean occlusion silhouette. The mean occlusion silhouette from the circle task in Experiment 4-1 (Figure 4-16) is the superimposed and visually augmented to show silhouette areas with greater than 50% concentration. Based on this comparison, Positions -60°, -30°, 0°, and 30° are fully occluded; and Positions -90° and 60° are partially occluded.
Analysis of occlusion silhouettes and pen azimuth angles, reveals a pattern of compensatory hand contortion to minimize occlusion. To explore why contortion affects performance times, we reviewed video segments for commonly occluded tasks. We observed two likely reasons: the deviation from a neutral posture reduced their ability to comfortably control the pen; and the extra time needed to plan their posture at the beginning of the task, or adjust their posture during the task.
Contortion Strategies
The mean clusters of pen azimuth angles (Figure 4-43a) and occlusion silhouettes (Figure 4-44) suggest that participants are using different contortion strategies. To explore this further, we found four individual participants who illustrate different strategies, and created mean occlusion silhouettes for each (Figure 4-46). Since each participant repeated a target box Position three times, there are three silhouettes per Position.
180
Participants 8 and 18 used a mixed strategy, in which they arched their hand above for some target box positions and below for others (Figure 4-46a,b). These two participants exhibited different cross-over points, where they moved from a below to an above strategy: participant 8 crossed-over between 0° and 30°, and participant 18 crossed over between -30° and 0°. Participants 20 and 3 used a uniform strategy where they adopted nearly the same posture for all positions (Figure 4-46c,d). Participant 20 kept their hand arched above the target box nearly all the time, while participant 3 kept their hand below. With all of these participants, there are examples in which they use a different strategy at different times for the same position. The different permutations of strategy collaborates our argument that some extra time is needed for posture planning.
181
-90° 0.8 -60°
0.6
-30° 0.4 Time (s) Time 0.2 (a) 0 0°
30° -90° -60° -30° 0° 30° 60° 90° 120°
120° 60° 90°
-90° 0.8 -60°
0.6
-30° 0.4 Time (s) Time 0.2
(b) 0 0°
30° -90° -60° -30° 0° 30° 60° 90° 120°
120° 60° 90°
-90° 0.8 -60°
0.6
-30° 0.4 Time (s) Time 0.2
(c) 0 0°
30° -90° -60° -30° 0° 30° 60° 90° 120°
120° 60° 90°
-90° 0.8 -60°
0.6
-30° 0.4 Time (s) Time 0.2
(d) 0 0°
30° -90° -60° -30° 0° 30° 60° 90° 120° 120° 60° 90° Figure 4-46. Different occlusion contortion strategies (a) participant 8 using mixed strategy with cross-over between 0° and 30°; (b) participant 18 using mixed strategy with cross-over between -30° and 0°; (c) participant 20 using near- consistent “above” strategy; (d) participant 3 using near-consistent “below” strategy.
182
4.5 Design Implications
The results of these experiments suggest basic implications for the design of direct pen input interfaces. Experiment 4-2 and 4-3 demonstrate that occlusion has an effect on performance, and Experiment 4-1 provides guidance regarding how to avoid the occluded area. These can be summarized as the following design implications for right-handed users:
1. Avoid displaying status messages or document previews in the right third and bottom four-fifths of the display, since it may be often occluded by the hand (Figure 4-47b).
2. Avoid showing simultaneous visual feedback or related widgets in the occluded area relative to the pen (Figure 4-47a).
3. When designing for occlusion, be aware that real users have a wide range of pen grips and postures.
The third implication seems to contradict the first two. Universal rules which govern which pixels are likely occluded can be problematic in light of individual differences. We could make the above rules more conservative to cover a wide range of individual grip styles, but this will reduce the space of available pixels to use. Additional factors such as loose clothing would also affect the area being occluded. Note that our initial left-handed results reported above suggest that implications 1 and 2 can be mirrored about the vertical axis when designing for left-handed users.
183
(b) occluded areas on display
(a) occluded areas near pen
0 cm 5 10 15
5 5
0 0 cm
least often occluded 5 5 most often 80% occluded
10 10
0 cm 5 10 15
33% 100 mm2 Figure 4-47. Design guidelines for avoiding occluded areas. (a) mean occlusion silhouette thresholded to 50% concentration (based on circle task in Figure 4-16); (b) areas most likely to be occluded given a uniform distribution of pen positions (based on circle task in Figure 4-17). For left-handed users, reflect these guidelines across the vertical.
4.6 Summary
The experiments described above provide relevant findings regarding characteristics of occlusion, but they also demonstrate novel ways to perform experiment logging and analysis. We extend the utility of recording video taken from the participant’s point-of-view from simply acting as an observational tool in chapter 2, to a useful quantitative measurement tool. By combining simple computer vision marker tracking and image processing steps, we demonstrate how to rectify and isolate images of the hand and arm relative to the display to produce occlusion silhouettes. When the video is synchronized with our traditional experiment event log, we can perform these steps for frames captured at critical moments
184
during experiment tasks. We show how the occlusion silhouettes can be used for quantitative analysis and visualization in Experiment 4-1, to illustrate an interaction between the hand and arm and target positions in Experiment 4-2, and to relate different posture strategies to task stimulus in Experiment 4-3. In the next chapter, we continue using occlusion silhouettes for analytical testing. In Experiment 4-1, our results show that the shape of occlusion varies among participants largely due to grip style rather than simply anatomical measurements. Yet, individuals tend to adopt a consistent posture for long and short localized interactions. Moreover, the general shape of occluded pixels is higher relative to the pen than previously thought, and given a uniform distribution of pen positions, a large part of the display is often occluded. When a right-handed user is accessing pen positions near the upper-left of the display, as much as 47% of a 12 inch display is occluded. In Experiment 4-2, based on an analysis of time, error, movement paths, and other performance metrics we find evidence the occlusion has an effect on three fundamental GUI interactions. Our experiment confirms Hancock and Booth’s (2004) findings for tapping when the end target initially hidden at a single distance. However, we expand their study design significantly by adding two more fundamental GUI interactions, dragging and tracing; introduce a baseline condition where the end target is initially visible; and incorporate four target distances. Our findings provide evidence that when selecting occluded targets, participants typically perform more slowly. This effect is most pronounced when dragging in which the resulting movement paths and movement error metric show large deviations from an optimal path. Detailed analysis suggests that participants are more likely to raise their pen out-of-range when moving to tap on an occluded target, and more likely to encounter errors when tracing into an occluded area, possible caused by a less consistent pen movements suggested by higher movement direct changes. Overall, we find that occlusion appears more subtle when the end target is not initially hidden. However, it is important to note that our attempt to explicitly control for occlusion using a visual augmentation did not appear to work, so our results (like Hancock and Booth’s) likely include an interaction effect between target direction and characteristics of hand movement. The third experiment, Experiment 4-3, found that posture contortion is used to minimize occlusion during a simultaneous monitoring task reduces performance. Moreover,
185
different participants use different posture contortion strategies. Some use a predominantly constant strategy with slight adjustments: such as keeping their hand primarily below or above the area to be monitored. Others use a mix of above and below postures, and chose to switch posture at different locations of the target to be monitored. One possible side-effect of this type of posture contortion could be discomfort and, if repeated many times over a long period, physical damage from repetitive strain. Recall that Haider, Luczak, and Rohmert (1982) found their participants had high amounts of muscle strain when using a light pen on a vertical display, so even conventional pen use may be tiring. Unlike Haider et al, we did not measure muscle strain directly, nor did not ask participants to self-report on this issue. Without this data, we cannot even speculate on possible short or long term physical effects. Although we provide guidelines for avoiding the occluded area in Figure 4-47, in practice this may have two potential problems. First, we attempt to quantify these guidelines by providing an illustration of the mean occluded area with an overlaid measurement grid. This may be adequate as a coarse set of rules when designing static layouts or to guide widget behaviours, but it would be difficult to implement in software. Second, and most important, is that these general guidelines do not accommodate the wide range of pen grips and postures which we observed. In the next chapter, we describe a model of the occluded area which addressed these two problems by expressing the occluded area relative to the pen in a simplified geometric way suitable for real-time software applications, and provide a mechanism to adapt to specific user grips and postures.
186
5 Modelling Occlusion
Our observational study in chapter three found that occlusion likely contributed to error and fatigue, and Experiments 4-2 and 4-3 presented in the previous chapter suggest that occlusion can hurt performance. If system designers had at their disposal a representation of the currently occluded area, could this be used to counteract these issues? In addition, if researchers had this information, could this aid experimental analysis? The most accurate representation of the occlusion area would be a literal image of the hand and arm as seen from the user’s point of view (Figure 5-1a); we introduced this in the previous chapter as an occlusion silhouette. We already described a methodology for extracting and un-warping video frames captured from a head-mounted video camera to produce these silhouettes off-line. However, asking users to wear a head-mounted camera at all times is obtrusive to say the least, and relying on a fixed camera in the environment (such as Cotting & Markus Gross, 2006) is not practical for mobile contexts. Capturing the occluded area un-obtrusively would require a multi-touch device capable of tracking objects above the surface (Echtler, Huber, & Klinker, 2008; Hilliges et al., 2009), but these devices are still being developed and they typically require a strong above-surface light source. An alternative is to use a simplified representation to model the actual occluded area (Figure 5-1b). Although a model is only an approximation of the actual occluded area, it can be configured and positioned without capturing a literal occlusion silhouette, perhaps only requiring input such as pen position and pen tilt.
187
(a) literal silhouette (b) model
Figure 5-1. Approximating the actual occluded area with a model. (a) literal representation of occluded area, an occlusion silhouette image taken from the point- of-view of a user and rectified; (b) an approximate model representation capturing the general shape of the occluded area.
Previous examples exist where design or implementation rules suggest underlying simple models. Some researchers and designers use a simple rule to avoid occlusion which implicitly suggests a bounding-rectangle model (Figure 5-2a), where all pixels below and to the right of the pen position are considered occluded. Brandl et al. (2009) and Hancock and Booth (2004) employ specialized rules, which can be thought of as using underlying model- like representations (Figure 5-2b,c). While these examples can all function with limited input, aside from handedness, there is no consideration for different shapes produced by different user grips. In addition, the latter two examples are designed specifically for menu placement, and thus only attempt to describe what is occluded in the immediate vicinity of the pen. A more accurate, more user-specific, and more complete description of the occluded area would be desirable for more general applications. In this chapter, we describe a configurable model of occlusion which fulfills these goals. At its core is a five parameter geometric model, comprised of an offset circle and pivoting rectangle (Figure 5-1b), capturing the general shape of the occluded area. To adapt the model to different user grips and anatomical differences, we introduce a simple configuration process. To assess our model’s fidelity, we conduct analytical tests using occlusion silhouettes gathered in Experiment 4-1 from the previous chapter. To evaluate the usability of our configuration process and test the performance of a real model configuration outcome, we conducted a short user study. We found all participants
188
completed configuration successfully, but that further refinements could make the process easier and more self-explanatory. Finally, we briefly discuss future improvements to the model including adaptations for large displays, multi-users contexts, and direct touch interaction.
5.1 Related Work
There has been no previous work explicitly presenting models for tracking occlusion. However, researchers have developed systems to capture literal representations for related purposes, and interface techniques have been designed and implemented using rules that suggest underlying model-like representations.
Literal Representations
Cotting and Gross (2006) demonstrate a tabletop which detects shadows of objects which occluded the beam of a ceiling mounted projector. They do this using a video camera mounted near the projector to capture the shadows, and then distort the display contents to avoid those areas. Their goal is to address problems resulting from front-projector occlusion, but in theory, the system could be adapted to detect hand and arm occlusion from the user’s point-of-view. This would require either mounting the camera on or near the users head. Of course a head-mounted camera is intrusive and mounting a camera near a user’s point-of- view in a mobile Tablet PC context would be challenging. An alternative is to use a multi-touch device (e.g., Han, 2005) to capture a view of the hand and arm as seen from below, and then warp the image to approximate the user’s point- of-view. However, most multi-touch devices can only track the hand and arm when they are touching the surface. Researchers have presented ways to track objects above the surface as well (Echtler et al., 2008; Hilliges et al., 2009), but these typically require a strong above- surface light source which would be difficult to control in a mobile Tablet PC context. Another consideration for literal representations is that capturing and warping can be processor intensive and the result often noisy. Regardless, such a high level of fidelity may not be needed for most applications anyway.
189
Rules-of-Thumb and Model Like Representations
Rather than capture a literal image, designers and researchers have used rules-of-thumb, which can be expressed as simple models, to guide their designs in avoiding occluded areas. The simplest model, which we call the bounding-rectangle model, is often used implicitly by researchers and designers (Figure 5-2a). In this model, all pixels below and to the right of the pen position are considered to be occluded. For example: the Twist Lens slider (Ramos & Balakrishnan, 2003) displays relevant information to the left and above the pen position, avoiding the area to the right due to occlusion; CrossY (Apitz & Guimbretière, 2004) use a predominant right-to-left movement to avoid displaying subsequent targets in a sequence to the right of the pen position; and XLibris (Schilit et al., 1998) places a menu bar at the bottom of the display to avoid occlusion when navigating pages. Although simple to use and implement, this model considers too much below and to the right of the pen to be occluded, and misses any occlusion above and to the right. Hancock and Booth (2004) use an experimentally validated rule-of-thumb which can be expressed as a four-quadrant model centred at the pen position (Figure 5-2b). Their rule considers movement time as well as occlusion, and recommends menu placement in the bottom-left quadrant (for a right-handed person). They monitor pen location and orientation over a period of time, and use a classifier to determine if the user is left- or right-handed. This enables the rule to be automatically flipped to accommodate left- and right-handed users. Brandl et al. demonstrate an occlusion-aware pie menu (2009) which uses an underlying model (although the authors do not refer to it as a model) to label pie slices near the pen as typically occluded or not, given a reference hand orientation (Figure 5-2c). Based on where the hand and pen each contact the surface, the pie menu is rotated to minimize occlusion. Although their implementation uses a multi-touch table, pen tilt could provide similar hand to pen orientation information. However, like Hancock and Booth, Brandl et al. only identify occlusion in the immediate vicinity of the pen as it pertains specifically to positioning a menu widget.
190
p p p
(a) bounding rectangle (b) Hancock & Booth (c) Brandl et al. Figure 5-2. Previous implicit and explicit occlusion models. (a) simple bounding rectangle used implicitly; (b) Hancock and Booth’s four quadrant rule- based model for context menu placement; (c) Brandt et al.’s radial pie slice representation (2009) for pie menu orientation.
Aside from adjustments for handedness, all of these are “one-size-fits all” models, in that they do not compensate for individual grip styles. They also do not attempt to model the whole occluded area of the hand or the forearm. Our aim to is provide a more flexible model for designers and researchers, one that can adapts to specific users beyond accommodating handedness, and provides a complete representation of the occluded area on the display.
5.2 Geometric Model for Occlusion Shape
Experiment 4-1 in the previous chapter revealed that the shape of the occluded area is quite uniform within each participant, and across participants there were high-level similarities. We wondered if a geometric model could be created to predict the complete shape and location of the occluded area, given relatively sparse inputs such as the pen position and aspects such as the user’s anatomical size and grip style. This model would essentially function as a binary classifier, labelling each pixel on the display as occluded or not occluded. There are many possible approaches to representing the complete shape of the occluded area. We have already discussed perhaps the most straightforward approach, the bounding rectangle model (Figure 5-2a). This model is constant relative to the pen’s position and requires no other input, but the accuracy is poor. At the other end of the spectrum, we could create a model with a flexible shape such as one composed of Bézier spline segments (Figure
191
5-3d). While this would certainly yield a very accurate representation of the occluded area, but the huge number of parameters would make creating and using the model difficult and less practical for use in real interface design. Our aim then is to create a relatively simple model with a small number of parameters, yet still produce a reasonable degree of accuracy, ensuring it is viable for practical use.
p p p p
(a) oriented rectangle (b) circle & rectangle (c) rectangle, ellipse, trapezoid (d) spline Figure 5-3. Different geometric models of occlusion. (a) oriented rectangle; (b) offset circle and pivoting rectangle; (c) rectangle for pen, ellipse for hand, trapezoid for forearm; (d) highly detailed Bézier spline.
We noticed that occlusion silhouettes often resembled a lopsided circle for the fist, a thick narrowing rectangle sticking out the bottom for the arm, and, with some participants, there was also a thinner rectangle puncturing the top of the ball for the pen. The irregularity of shape suggested that a single oriented rectangle (Figure 5-3a) would be unlikely to capture all grip styles accurately, but that a composition of simple shapes may suffice. Our first approach was to create a geometric model using an ellipse for the fist, an isosceles trapezoid for the arm, and an oriented rectangle for the pen (Figure 5-3c). This model captures most aspects of a typical silhouette, but even this relatively simple model required 11 parameters to position. Instead, we simplified our representation further to an offset circle and a rectangle with only 5 parameters (Figure 5-3b). This ignores some details like the protruding pen and the widening forearm, but with so few parameters, we expected that it could be configured easily and implemented to function in real time with very few inputs. The five model parameters are illustrated in Figure 5-4 and described below:
1. q is the offset distance from the pen position p to the edge of the circle,
192
2. r is the radius of the circle over the fist area,
3. Φ is the rotation angle of the circle around p (expressed in degrees where Φ = 0° when the centre is due East, Φ = -45° for North- East, and Φ = 45° for South-East, see Figure 5-4b),
4. Θ is the angle of rotation of the rectangle around the centre of the circle (using the same angle configuration as Φ),
5. w is the width of the rectangle representing the forearm.
p c q r
-90° -135° -45°
180° 0°
w 135° 45° 90° (a) (b)
Figure 5-4. Offset circle and pivoting rectangle model parameters. (a) model parameters; (b) illustration of angular convention used by Θ and Φ
For convenience, we refer to the centre of the circle in the geometric model as c, and for device independence, all non-angular parameters are recorded in millimetres. Note that the length of the rectangle is infinite for our purposes. If we were building a model for larger displays, this may become another parameter, but at present we are concerned with smaller displays such as the portable Tablet PC.
193
Analytic Evaluation of Performance
To test the fidelity and performance of our geometric model, we perform a series of analytical evaluations using the occlusion silhouette data from Experiment 4-1 in chapter 4 (see Table 5-1 a,b,c) and new silhouette data gathered in Experiment 5-1 from this chapter (see Table 5-1 d,e,f).
Purpose of Test Model(s) Experiment Notes Tested Test Data
(a) What is the theoretical upper Fitted Models 4-1 Use optimization to fit the model to bound for geometric each silhouette
approximation? Also test simple bounding box for
baseline comparison
(b) How well does a mean model Mean Model 4-1 Mean model uses mean parameter settings from fitted models in test perform? (a).
(c) How well could a configured Participant 4-1 Use per-participant mean model model perform? Mean Models parameters form test (a) to approximate an actual
configuration.
(d) Re-test theoretical upper bound Fitted Models 5-1 Use optimization to fit the model to each silhouette as in test (a), but for geometric approximation. using different data.
(e) How well does a mean model Mean Model 5-1 Use exact same mean model parameters used in test (b), but perform? (using different test data) different test data.
(f) How does the model perform Participant 5-1 Use actual model configurations with actual user configurations? Configured performed by participants in an Models experiment.
Table 5-1. Overview of model tests.
Note that the purpose of these tests is not to build a model (in the machine learning sense). Our five-parameter geometric model has already been built and the purpose of these tests is to evaluate its performance. For example, by fitting our model to each occlusion silhouette using optimization, we can test how well it captures the real silhouette data (see tests a and d in Table 5-1). Testing how well a user configured model performs are similar, but instead of fitting the model to each silhouette, we use only a single model configuration per participant (see tests c and f in Table 5-1). The mean model could be thought of as the result of very naive learning, so it is likely that testing the mean model with the same data
194
that generated the mean values (Experiment 4-1 data, see b in Table 5-1) will likely exhibit some over-fitting (Bishop, 1995). However, we validate the performance of these same mean model parameters using Experiment 5-1 test data, thus any over-fitting is minimal.
F Scores, Precision, and Recall
For quantitative assessment, we use precision-recall plots and F scores, standard measures used in information retrieval (Van Rijsbergen, 1979). This can be justified by considering the geometric model as a binary classifier which labels each pixel as occluded or not occluded. In this context, precision is the number of pixels correctly classified as occluded, over all pixels classified as occluded. Recall is the number of number of pixels correctly classified as occluded, over all pixels that are actually occluded. Precision is a measure of the model’s exactness, whereas recall is a measure of its completeness. As an example, the model parameters can be rather easily configured to achieve perfect precision (Figure 5-5a) or perfect recall (Figure 5-5c). In both cases, however, the other measure will be low. The challenge is to find a reasonable balance between precision and recall (Figure 5-5b).
(a) perfect precision, low recall (b) good precision, good recall (c) perfect recall, low precision Figure 5-5. Illustration of precision and recall. (a) perfect precision is achieved because all pixels covered by the model are actually occluded, but recall is low because the model misses many occluded pixels; (b) a balance between precision and recall; (c) perfect recall is achieved because the model covers all pixels that are actually occluded, but precision is low because the model also covers many non- occluded pixels.
In our case, we would prefer to have slightly higher recall by including more occluded pixels even if it means losing some precision. Misclassifying non-occluded pixels as occluded will result in a more conservative design or layout, but misclassifying many pixels
195
that are actually occluded, may lead to lower performance since occlusion will still be a problem. An F score expresses precision and recall in one value. The inherent trade-off between precision and recall is captured by a weight, which can be adjusted to weight one or the other
more strongly. We use a weighting known as the F2 score, which emphasises recall more than twice as much as precision22: