University of Nevada Reno

Non-Visual Natural User Interfaces

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering

by

Anthony Morelli

Dr. Eelke Folmer/Dissertation Advisor

December, 2011

i

Abstract

Natural user interfaces (NUI) have recently become popular due to their character- istics that capitalize on a user’s ability to utilize skills acquired through real world experiences. NUI’s provide a method of device interaction that may be easier when compared to a traditional graphical user interfaces because users can interact by using gestures and motions which mimic motions that are used outside of device interaction. One example of NUI’s is the recent trend in exergames - video games that require actions such as running in place, moving the arm in a bowling motion, or swing- ing the arm as if to swing a tennis racquet. Unfortunately, most of the cues given to the player to perform these natural motions are visual cues which makes it diffi- cult for people who have visual impairments to participate in these activities. This dissertation investigates non-visual natural user interfaces. Three exergames were created to show techniques for using a combination of haptic and audio cues in order to promote physical activity for people who are visually impaired. A method, Real Time Sensory Substitution, was developed which allowed people who are visually im- paired to participate in a commercially available exergame by introducing additional haptic cues. The exergames and Real Time Sensory Substitution were effective in promoting physical activity for temporal based challenges, however they lacked the information to assist users in complex spatial challenges. Techniques were developed and compared using proprioception (the body’s ability to sense its own position) to assist users in finding targets in one, two, and three dimensions without using any visual cues. Proprioception was also used as a cell phone interface where the phone can convey information to the user without using the graphical display of the phone, but using the human body as the display mechanism. The techniques developed in these studies have set the stage for enhanced access to technology without the use of a video display. ii

Acknowledgments

First, I would like to thank John Foley and Lauren Lieberman. Their assistance and collaboration was very important to this research. I would also like to thank the members of my committee, Sergiu Dascalu, Kostas Bekris, Dwight Egbert, and Nora Constantino. I would also like to thank the kids at Camp Abilities for their inspiring way of life. The Northern Nevada Chapter of the National Federation for the Blind has been a huge help with testing our software. Finally, I would like to thank my advisor, Eelke Folmer, his guidance has made this research possible. November 30, 2011 iii

Contents

Abstract i

List of Tables vi

List of Figures vii

1 Introduction 1 1.1 Background and Related Work ...... 3 1.2 NV NUI Overview ...... 6

2 Exergames 9 2.1 Background and Related Work ...... 9 2.2 VI Tennis Methodology ...... 15 2.2.1 Wii Tennis Gameplay & Feedback ...... 16 2.2.2 Gameplay ...... 18 2.2.3 Feedback ...... 19 2.3 VI Tennis User Study ...... 22 2.3.1 Participants ...... 22 2.3.2 Instrumentation ...... 23 2.3.3 Experimental Trial ...... 23 2.3.4 Results ...... 23 2.3.5 Qualitative Analysis ...... 25 2.4 VI Tennis Discussion ...... 26 2.4.1 Sensory Substitution ...... 26 2.4.2 Active Energy Expenditure ...... 27 2.5 VI Tennis - Future Work ...... 28 2.5.1 Whole Body Exercise ...... 28 2.5.2 Sensorimotor Skills ...... 29 2.5.3 Motor Learning ...... 29 2.5.4 Barriers to Physical Activity ...... 30 2.6 VI Tennis Conclusion ...... 30 2.7 VI Bowling Design ...... 31 2.7.1 Controls ...... 33 2.7.2 Sensory Substitution ...... 34 iv

2.7.3 Tactile Dowsing ...... 37 2.8 VI Bowling User Study ...... 40 2.8.1 Participants ...... 40 2.8.2 Instrumentation & Experimental Trial ...... 41 2.8.3 Results ...... 41 2.8.4 Qualitative Analysis ...... 42 2.9 VI Bowling Discussion and Future Work ...... 44 2.9.1 Active Energy Expenditure ...... 44 2.9.2 Tactile Dowsing Based Motor Learning ...... 44 2.9.3 Temporal-Spatial Challenges ...... 45 2.9.4 Barriers to Physical Activity ...... 45 2.10 VI Bowling Conclusion ...... 46 2.11 Pet-N-Punch Game Design ...... 46 2.11.1 Game Play ...... 47 2.11.2 Technical Implementation ...... 50 2.12 Pet-N-Punch User Study ...... 53 2.12.1 Participants ...... 54 2.12.2 Physical Activity Measurement ...... 55 2.13 Pet-N-Punch Results ...... 56 2.13.1 Error Rates ...... 56 2.13.2 Physical Activity ...... 58 2.13.3 Player Survey Results ...... 59 2.14 Pet-N-Punch Discussion ...... 60 2.14.1 Visual Observations ...... 60 2.14.2 Physical Activity ...... 60 2.14.3 Accuracy ...... 61 2.14.4 Maximizing Results ...... 63 2.14.5 Socialization ...... 63 2.15 Pet-N-Punch Future Work ...... 63 2.15.1 Higher Activity Intensity ...... 63 2.15.2 Health Benefits ...... 64 2.15.3 Socialization ...... 64 2.16 Pet-N-Punch Conclusion ...... 64

3 Real Time Sensory Substitution 66 3.1 Introduction ...... 66 3.2 Background and Related Work ...... 67 3.3 Real Time Sensory Substitution ...... 70 3.3.1 How it Works ...... 71 3.3.2 Runtime Video Analysis ...... 72 3.4 User Study 1 - Sighted Players ...... 76 3.4.1 User Study 2 - Players with Visual Impairments ...... 78 v

3.5 Results ...... 78 3.5.1 Sighted Player Performance Results ...... 78 3.5.2 Player Performance Results - Players with VI ...... 80 3.6 Discussion ...... 81 3.6.1 Limitations ...... 82 3.7 Future Work ...... 84 3.8 Conclusion ...... 85

4 Proprioceptive Displays 87 4.1 Introduction ...... 87 4.2 Discrete Proprioceptive Display Background ...... 90 4.3 Discrete Proprioceptive Display ...... 92 4.3.1 Scanning ...... 92 4.3.2 Auto-Semaphoring ...... 93 4.4 Twist-N-Lock ...... 94 4.5 User Studies ...... 96 4.6 Discrete Proprioceptive Display Discussion and Future Work . . . . . 100 4.7 Discrete Proprioceptive Display Conclusion ...... 101 4.8 2D Target Selection - Background and Related Work ...... 102 4.9 Tactile-Proprioceptive Displays ...... 104 4.9.1 Information Space ...... 105 4.9.2 Gesture Based Interaction ...... 105 4.10 2D Target Acquisition Study 1: Target Acquisition ...... 106 4.10.1 Instrumentation ...... 106 4.10.2 Participants ...... 108 4.10.3 Procedure ...... 108 4.10.4 Results ...... 109 4.11 2D Target Acquisition Study 2: Performing Directed Gestures . . . . 111 4.11.1 Instrumentation ...... 111 4.11.2 Participants ...... 112 4.11.3 Procedure ...... 112 4.11.4 Results ...... 112 4.12 2D Target Acquisition Discussion ...... 113 4.13 2D Target Acquisition Future Work ...... 114 4.14 2D Target Acquisition Conclusion ...... 115 4.15 3D Target Acquisition Related Work ...... 115 4.16 3D Target Selection Prior Work and Motivation ...... 116 4.17 3D Scanning ...... 117 4.18 3D Target Selection Methods ...... 118 4.18.1 Instrumentation ...... 118 4.18.2 Participants ...... 119 4.18.3 Procedure ...... 119 4.19 3D Target Selection Results ...... 120 vi

4.20 3D Target Selection Discussion and Future Work ...... 122

5 Conclusions and Future Work 128

Bibliography 130 vii

List of Tables

2.1 Participants’ characteristics ...... 22 2.2 Average Active Energy Expenditure Kcal/Kg/Min ...... 24 2.3 Total time spent in MVPA ...... 24 2.4 Participants’ characteristics and results ...... 40 2.5 Participants’ characteristics ...... 53

3.1 Kinect Hurdles combined results for both studies ...... 81 4.1 Results for eight orientations ...... 99 4.2 Average Aiming Error in Euler Angles ...... 112 4.3 Mean corrected search time (and stdev) for each axis (mm/ms) . . . 120 viii

List of Figures

2.1 Children who are blind playing Pet-N-Punch...... 10 2.2 Wii Sports Tennis: (left) Player serving the ball (right) Player return- ing the ball ...... 17 2.3 Primary, secondary, audio, visual and tactile cues in Wii Tennis (top) and VI Tennis (bottom) shows the resulting primary and secondary cues implemented in VI Tennis...... 19 2.4 Children who are blind playing VI Tennis ...... 21 2.5 Computer controlled player level (CCPL) for tactile/audio and audio for both trials over time...... 25 2.6 Wii Sports Bowling: The red line (i) indicates the direction in which the ball will be thrown and users can shift this direction using the arrow keys on their controller. (ii) shows the grouping of the pins and (iii) shows the current score...... 33 2.7 Sensory substitution map which shows the primary and secondary cues for audio, visual and tactile modalities in Wii Bowling (top) and VI Bowling (bottom). Dashed events indicate alternative cue events. . . 35 2.8 Tactile dowsing: the player moves the Wii remote in the horizontal plane (left); The closer the Wii remote points to the target direction the more continuous the perception of vibrotactile feedback will feel (right)...... 37 2.9 Combined graph showing average dowsing time and average number of pins hit per frame...... 41 2.10 User with visual impairment performing dowsing (left); and throwing (right) in VI Bowling...... 43 2.11 Delay/Response Time Vs Level...... 53 2.12 Level Vs Errors...... 55 2.13 Dominant Arm Error Types Vs Level...... 57 2.14 Two Arm Error Types Vs Level...... 58 2.15 Average Activity Intensity ...... 61 2.16 Percent Heart Rate Increase Over Resting Rate ...... 62

3.1 A legally blind player (right) playing Kinect Hurdles game (left) where visual cues that indicate when to jump are detected using real time video analysis and substituted with vibrotactile cues that are provided with a handheld controller...... 66 ix

3.2 Individuals who are blind playing VI Tennis (left); and VI Bowling (right)...... 69 3.3 RTSS-System Setup...... 71 3.4 Runtime video analysis of the Kinect Hurldes game. The yellow box indicates the area in which we look for the visual cue that indicates to the player to jump...... 73 3.5 Sample XML Configuration FIle ...... 76 3.6 Kinect Sports Javelin throw. The area within the defined box turns yellow indicating the player must throw their javelin...... 76 3.7 Jump Accuracy ...... 79

4.1 Users rotating a smartphone to find one of six disjunct targets in space [UP, DOWN, FORWARD, BACK, LEFT, RIGHT] that are rendered using vibrotactile feedback...... 92 4.2 (Left) orientation is outside the tactile window. (Right) orientation is inside tactile window and vibrotactile feedback is provided...... 94 4.3 Average scanning time for 6 orientations ...... 96 4.4 Average scanning time per orientation ...... 96 4.5 Average scanning time for 4 orientations ...... 98 4.6 Average scanning time per orientation ...... 98 4.7 User scans for a target in a plane using a handheld controller with the target location rendered using vibrotactile feedback. Direction of the target is conveyed using proprioception, upon which the user performs a directed gesture towards the target (right)...... 102 4.8 Search strategies for linear scanning (left) and multilinear scanning (right) ...... 109 4.9 Search time plotted against index of difficulty (distance to target) for both scanning techniques. Upper figures represent all the data and the lower figures have 30% of the least representative items filtered out. 124 4.10 Average search time plotted against target location ...... 125 4.11 Distance from target prior to performing gesture plotted against the target distance from center...... 125 4.12 Directional vibrotactile feedback guides the user to move their con- troller to the 3D target. Green frustum indicates the display size. . . 126 4.13 Left: multilinear scanning, where the user moves the controller along the X-, Y- and Z-axis as indicated using directional vibrotactile feed- back. Right: projected scanning, where the user first rotates the con- troller along its X- and Y-axis to point it at the target upon which the user moves along the projected axis to select the target...... 126 4.14 Typical scanning strategies for multilinear scanning (left) and pro- jected scanning (right)...... 127 1

Chapter 1

Introduction

Natural user interfaces (NUI) have become increasingly popular as they capitalize on the innate abilities that users have acquired through interactions with the real world. NUI’s define novel input modalities, such as touch, gestures, motions and speech, that model natural human interactions with the goal to get intermediating hardware, such as a keyboard and pointing device, ”out of the way” [30] as to facilitate an invisible and -presumably- non-impeding interface that users may perceive as more intuitive and natural to use. Gestural interfaces, such as multi-touch, have become de facto for mobile inter- action but these are still firmly grounded and rooted in the domain of the graphical user interfaces (GUI) as gestures directly manipulate on-screen content. For NUI’s to truly move beyond the confines of the traditional desktop and GUI based environ- ments, one can argue why graphical displays are still a part of this equation, as NUI designers can draw from a myriad of interaction options that are only constrained by the physical capabilities of the human body. There are several contexts where the use of graphical displays is not feasible or even possible. More and more people are adopting mobile devices, such as smart- phones or tablets, for their computing needs. Portability of mobile devices not only curbs screen real estate -but more importantly- allows for users trying to interact with their mobile device when they are active and the use of a screen may severely impede users’ safety, for example, when they are driving or walking. Users with visual impairments cannot use a graphical display and the emergence of NUI’s is raising new 2 barriers for them. One specific use of non-visual NUI’s could assist those with visual impairments (VI) exercise. It has been suggested that individuals with VI do not have the same opportunities to exercise as the general population due to the barriers to physical activity they face such as social (e.g, need for an exercise partner) [95], safety (e.g, concerns from teachers, parents, and loved ones) [60], and self-imposed (e.g, not knowing what to do or fear of being ridiculed) [96] and thus are at a greater risk of developing serious health problems [60]. Although these barriers exist, people with VI can exercise through open and closed adapted sports. Open sports are those where the variables in the sport change often such as beep baseball or goal ball. Closed sports are those where variables remain constant such as guided running or running on a treadmill. Studies have shown that people with VI prefer open sports to closed sports [62], however they are more difficult to make accessible due to the number of variables in the environment. Although video games have been identified to be a contributing factor to obesity, a new genre of video games called exergames has been found to stimulate greater energy expenditure than when playing sedentary video games [94]. Exergames are games that require large motions as input to the game often mimicking a real world activity such as swinging a tennis racquet or running in place. Our research identified that exergames may have some unique properties that could allow individuals with VI to overcome some of the barriers to physical activity that they face because: (1) exergames can lead to a greater independence as they do not require an exercise partner or sighted guide to be present; (2) exergames may be safer to perform than existing physical activities as they are performed in place; and (3) being able to play the same games as their sighted peers and family, could increase socialization. Non-visual natural interfaces could allow a person with VI to play exergames and potentially live a more active lifestyle. In addition to exergames, non-visual natural user interfaces may be able to control cell phones or other electronic devices in an ear free and eye free manner. Using 3 proprioception (the ability to sense the position of one’s body), the user’s body can be used as an output mechanism to inform the user of information traditionally revealed by visual means. Proprioception has been used as in input device. In Skinput [43], an armband containing an array of sensors analyzed mechanical vibrations from finger tapping on the arm or hand. These tappings were used as a method to provide input to an electronic device. This research demonstrates that proproiception can also extend the output of an electronic device by using the human body to represent information normally found on a visual display.

1.1 Background and Related Work

This research focusses on the use of non-visual natural user interfaces in two differ- ent areas: exergames and proprioceptive displays. The first section of this research analyzes their use in exergames. Until this time, exergames created for people who are blind or visually impaired did not exist, although several video games have been created or modified for users with VI. They primarily use additional sounds to rep- resent the necessary visuals in a different modality. Some examples of video games modified or created for people with VI are listed below. AudiOdyssey [38] is a music game in which the player creates complex musical tracks. Players use a Wii Remote to respond to audio instructions by making gestures. Blind Hero [102], makes use of additional haptic cues instead of additional audio cues. Haptic cues refer to the sense of touch. In Blind Hero, players play by feeling vibrations directed to the finger that needs to press on the guitar buttons. Haptic is a good choice because many games already rely on audio cues, and adding more sounds could either make the game difficult to play because the additional sounds are lost in the midst of the original sounds, or the original sounds, which in some cases are vitally important to the game play experience, could be harder to perceive. To assess the feasibility of exergames as a viable health intervention method that could aid people with VI to overcome barriers to physical activity, they need to be 4 made accessible to players with VI. Existing exergames rely predominantly on visual stimuli to provide information about what input to provide and when. Modifying commercial exergames is difficult as the source code is proprietary and unavailable. Thus, the VI Fit Platform (www.vifit.org) was developed to make it easier to create exergames directed at people with VI. VI Fit is an open source project which results in all games built on this platform being free to download and use, and the ability to modify existing games or create new games is encouraged. VI Fit runs on a Windows PC and contains support for Wii remotes, which are inexpensive motion sensing controllers, to capture body movements required by the games. Wii remotes are readily available, and contain methods to directly communicate to the player through non-visual cues using audio through the built in speaker or haptic through the rumble capability. Games created for the VI Fit platform and modeled after commercial titles must go through a process known as sensory substitution [19]. This is the process of converting required visuals to haptic and/or audio cues while trying to not affect the fun factor of the original game. The exergames created for this research were all created using the VI Fit platform and are available for a free download. The exergames created on the VI Fit platform contained mainly temporal chal- lenges. For example a player needed to swing a tennis racquet at the correct time, the location of the swing did not matter. Most games such as tennis contain both spatial and temporal challenges, i.e. where to swing and when to swing. In order to add these additional spatial challenges, a method for communicating to the user spatial information in both two and three dimensions needed to be developed. The method used was proprioception. The second use of non-visual natural user interfaces investigated in this research is their use in proprioceptive displays. Proprioception is the body’s ability to sense its position. For example, if a person raises his hand over his head, his sense of proprioception tells him that his hand is up in the air. Leveraging proprioception to locate targets in a horizontal plane as an output technique was recently explored in the 5 following approaches. Sweep-Shake [90] is a mobile phone application that can point out geo-located information. The phone’s compass and GPS are used to determine the user’s location and the direction in which the phone is pointing. Directional vibrotactile feedback conveys the location of an object of interest. A study with four sighted users found that they were able to find targets in a 360◦ circle around the user on average in 16.5 seconds. Magnusson [71] evaluates a system similar to Sweep-Shake [90], where a non- directional audio cue indicates whether the user is pointing the phone within a window that contains a beacon that users must physically approach. Vibrotactile feedback is increased when the user gets closer to the beacon. Different sized target windows are evaluated with 15 sighted users, where a window of size 30◦ to 60◦ was found to be most efficient. PointNav [70] is an extension of the previous system but modified as to provide a 50ms non-directional vibrotactile cue when the phone points within a 30◦ window of the object of interest. Ahmaniemi [13] explored finding targets using a mobile device that consists of a high precision inertial tracker (gyroscope, compass and accelerometer) and a C2 vibrotactor. Two types of vibrotactile cues were explored for rendering targets: (1) an on target cue (260Hz sine wave mixed with a 30Hz envelope signal); and (2) direc- tional cue using a tactile window of 10◦ around the target (using the same on target cue where the frequency and amplitude of the envelope shape is increased linearly). Targets were rendered randomly at eight different locations on a 90◦ horizontal line with varying widths. A user study with eight sighted users found users were able to find targets on average in 1.8 seconds using a scanning velocity of 45.1◦/s No signif- icant difference was found between vibrotactile feedback provision for efficiency and target size, though smaller targets took longer to find than larger targets. Target sizes larger than 15◦ were most effective. Directional vibrotactile cues are more efficient than non-directional cues when target distance is furthest but it negatively affects finding targets that are close. It makes it also harder to distinguish targets that are close to each other as distinguishing the edges of a target becomes harder. 6

1.2 NV NUI Overview

This dissertation is organized as follows. Chapter 2 investigates the use of non-visual natural user interfaces as a method for individuals with visual impairments to ex- ervise. Chapter 3 demonstrates how non-visual techniques can be used to provide an audio and haptic closed captioning style extension which can make commercial video games accessible to people with visual impairments. Chapter 4 demonstrates how non-visual cues can exploit the sense of proprioception and allow people to search and find targets in one, two, and three dimensions as well as to use their body as a display device in cell phone interaction.

The NV NUI research seeks to answer the following questions: (Q1) Can exergames be created using non-visual modalities? (Q2) Does a combination of audio/haptic cues result in better performance when compared to audio cues alone? (Q3) Can non-visual modalities be used to orient a person in a proper direction? (Q4) Do exergames utilizing both arms provide more energy expenditure than games using the dominant arm? (Q5) Are exergames utilizing both arms significantly more error prone than games utilizing the dominant arm? (Q6) Can exergames using non-visual modalities provide a fun gaming experi- ence? (Q7) Can a commercially available exergame be enhanced with an external device such that a person with VI can play the same game as their sighted peers? (Q8) Can an off the shelf cell phone be used as a proprioceptive display? (Q9) Is multilinear scanning or linear scanning faster when searching for a target in 2D space using non-visual cues? (Q10) Is multilinear scanning or projected scanning faster when searching for a target in 3D space using non-visual cues? 7

This research has resulted in the following publications:

(1) Tony Morelli. Haptic/audio based for visually impaired individu- als. ACM SIGACCESS Accessibility and Computing Issue 96 (January 2010) Pages: 50-53. [Chapter 2]

(2) John Foley, Eelke Folmer, Tony Morelli, Meghan Morningstar, Nicole Corco- ran, Lauren Lieberman. Comparison Of Vibrotactile/Audio And Audio Cues While Playing An Exergame For Users Who Are Blind. 57th Annual Meeting and inaugural World Congress on Exercise is Medicine of the American College of Sports Medicine 2010. [Chapter 2]

(3) Tony Morelli and Eelke Folmer. Whole body Exergaming for users who are Visually Impaired Impaired (Workshop). ACM CHI 2010. [Chapter 2]

(4) Tony Morelli, John Foley, Luis Columna, Lauren Lieberman and Eelke Folmer. VI Tennis: a Vibrotactile/Audio Exergame for Users who are Visually Impaired, Pro- ceedings of Foundations of Digital Interactive Games 2010, Pages 147-154, Monterey, California, June 2010. [Chapter 2.2]

(5) Tony Morelli, John Foley and Eelke Folmer. VI Bowling: A Tactile Spatial Exergame for Individuals with Visual Impairments, Proceedings of the 12th interna- tional ACM SIGACCESS Conference on Computers and Accessibility. Pages 179-186, Orlando, Florida, October 2010. [Chapter 2.7]

(6) Tony Morelli, John Foley, Lauren Lieberman and Eelke Folmer. Pet-N-Punch: Upper Body Tactile/Audio Exergame to Engage Children with Visual Impairments into Physical Activity, Proceedings of Graphics Interface 2011, Pages 223-230, St John. New Foundland, Canada, May 2011. [Chapter 2.11] 8

(7) Tony Morelli and Eelke Folmer. Real-time Sensory Substitution to Enable Players who are Blind to Play Gesture based Videogames, Proceedings of Founda- tions of Digital Interactive Games 2011, Pages 147-153, Bordeaux France, June 2011. [Chapter 3]

(8) Tony Morelli, John Foley, Lauren Lieberman and Eelke Folmer, Improving the lives of youth with VI through exergames, INSIGHT: Research and Practice in Visual Impairment and Blindness, accepted for publication August 2011. Allen Press. [Chapter 2]

(9) Eelke Folmer and Tony Morelli. A Non-Visual NUI: 2D Target Acquisition and Performing Directed Gestures using a Tactile-Proprioceptive Display, Proceedings of the Sixth international conference on tangible, embedded, and embodied interac- tion, To Appear, Kingston ON, Canada, February 2012. [Chapter 4.8]

(10) Tony Morelli and Eelke Folmer. A Discrete Tactile-Proprioceptive Display for Eye and Ear Free Output on Mobile Devices, Proceedings of the IEEE Haptics Symposium 2012, To Appear, Vancouver, Canada, March 2012. [Chapter 4.2] 9

Chapter 2

Exergames

2.1 Background and Related Work

Video games have been identified as a contributing factor to children’s increasingly sedentary behavior and associated higher levels of obesity [99], however, a new genre of video games, called exergames, has the potential to turn couch potatoes into jumping beans [79]. Exergames are video games that use upper and/or lower-body gestures, such as steps, punches, and kicks to provide their players with an immersive expe- rience that engages them into physical activity and gross motor skill development [[58]. Studies with exergames show that they stimulate greater energy expenditure than when playing sedentary video games [40, 53, 54]. Exergames vary significantly in the amount of physical activity that is required to play them and a recent meta analysis [22] of exergame studies found that exergames that involve whole-body move- ments, such as dance-based games, yield significantly higher energy expenditures than exergames that only involve dominant upper limb movements, such as Wii games [3]. Because exergames simulate real physical activities [23] they involve spatial-temporal challenges that rely upon the visual sense [63]. The motions that players need to provide, such as kicks, punches, steps or swings, and when to provide the gesture are typically indicated in the game with visual cues. Players who are blind are unable to see these cues and are unable to play exergames. When compared with regular physical activities, exergames have some attractive properties for individuals with visual impairments because: (1) exergames can be 10

Figure 2.1: Children who are blind playing Pet-N-Punch. played independently; and (2) exergames are performed in place, which significantly minimizes the risk of injury. Exergames are strongly related to video games, however, whereas video games typically only involve fine motor skills of the hands, exergames involve gross motor skills [11] such as running, kicking, hitting and throwing that involve whole body motions. Popular commercial exergames include: (1) Konami’s Dance Dance Rev- olution in which players match visual cues that scroll on screen to positions on a pressure sensitive dance pad; (2) Nintendo’s Wii Sports in which players emulate playing tennis, bowling, boxing, golf, baseball through gesture recognition using a handheld motion sensitive controller called a Wii remote; and (3) Sony’s EyeToy Ki- netic which superimposes animated objects to be punched and kicked over a video image of the player which is captured with a camera. Though exergames have been criticized for yielding energy expenditures not as high as traditional forms of exercise, recent studies point out that exergames can achieve moderate-to-vigorous-physical- activity (MVPA) [94, 32], the amount of physical activity that yields health benefits. Exergames are often classified by the technology that is used to recognize motion, 11 such as cameras, accelerometers, and custom controllers such as dance mats, exercise bikes, and heart rate monitors. Exergames are distinguished by the type of gameplay:

• Pattern-matching exergames such as Dance Dance Revolution and EyeToy Kinetic require the player to match lower or upper body movements, such as dance steps or punches, to visual cues [84]. Music plays an important role in these games as a round of gameplay is tied to a song and patterns in the music are used to facilitate pattern matching.

• Sports based exergames, such as Wii Sports, emulate playing sports using motions (swinging, punching) that resemble the way these sports are played.

• Pervasive exergames such as Fish n Steps [66] decouple the physical activity provided by the player from directly manipulating something in a game, but rather they accumulate the physical activity and allow the player to expend it in the game at a later time. Typically such games are played competitively.

Playing video games is a challenge for players who are visually impaired as video games provide predominantly visual feedback that the player must interpret to de- termine what input to provide [103]. Though games may provide audio or tactile cues, this generally doesn’t contain sufficient information to determine what input to provide and when [103]. Because games are typically entirely visual, they lack any textual representation that can be read with assistive technology such as a screen reader or tactile display. These limitations affect players with the most severe forms of visual impairments (legally blind and totally blind) more than players who are partially sighted or who have low vision. For these players it may still be possible to play the game on a large display or make the game accessible using operating system supported accessibility features such as a magnifier or high contrast color schemes. Players who are legally blind or have low vision may be able to play existing commercially available exergames. Increasing the contrast, or displaying on a large projected screen may provide enough of a visual such that a player with low vision 12 would be able to sense the visuals cues and perform the desired actions. However, a player with no vision cannot rely on visuals at all. Exergames also contain sounds relevant to the game play, but they are typically played as a result of something that has happened, not as a cue to the player to do something. Video games have been adapted for people with visual impairments. Typically these games replace the necessary visuals with audio. Sound effects can supplement the graphics, but they also can be enhanced to provide more detailed spatial audio to assist those with VI. Using the left/right/front/back speakers can enhance the virtual representation such that a person with VI can picture the state of the game. In addition to spatial cues, game state cues can be represented as additional audio cues as shown in Battleship SV [93], a Battleship game modified for players who are visually impaired. This game utilized speech synthesis to provide spoken information to the user about the results of each turn and in game navigation. An audio technique called sonification, uses earcons or sound radar to assist those with VI as demonstrated in the game Audio Quake [17]. Substituting visuals with audio can be invasive as exergames are often played in social contexts [78] where players are playing with friends or family and the reliance on audio can interfere with socialization. Also, music is present in most exergames [46] and adding additional sounds might interfere with the music [102]. Removing the music or any other original sounds may subtract from the game play experience. Games such as Blind Hero [102] and Rock Vibe [15] replace visuals with vibrations in adapted versions of Guitar Hero and Rock Band. In the past decade there has been an active movement to create games that can be played without visual feedback [103], for example using audio feedback. An extensive overview of such games can be found on http://audiogames.net. No exergames exist that can be played by players who are blind, though we identified four games that have elements of gameplay that are closely related to exergames:

• Finger Dance [73] is a music game, which uses the pattern matching used in Dance Dance Revolution. Finger Dance uses four different audio cues that 13

the player must match by pressing the corresponding keys on the keyboard in sequence with the music that is playing.

• AudiOdyssey [38] is a music game in which players receive audio instructions and provide gestures with a Wii Remote, to create and record musical beats. The player can then layer these recordings to create complex musical tracks.

• Blind Hero [102] is an accessible version of Guitar Hero; a pattern matching music game in which players emulate playing of rock music using a guitar shaped controller. Blind Hero uses vibrotactile cues provided with a haptic glove to overcome the limitation of being able to use audio due to the presence of music.

• Rock Vibe [15] is an accessible version of Rock Band, a pattern matching music game. Rock Vibe allows for playing drums using tactile cues provided to the arms.

A survey [103] of strategies used in games accessible to players who are blind, revealed that most games use a combination of audio techniques, ranging from speech and audio cues to sonification based techniques, e.g., the modulation of acoustic properties such as volume, frequency or timbre. Only recently, the use of vibrotactile feedback was successfully explored for a memory game [87] and [102, 15]for music games. Exergames and video games have significant differences and it is important to elicit these in order to understand what strategies are feasible to employ to make exergames accessible to players who are visually impaired:

• Music plays a dominant role in pattern matching based exergames [84], but its presence limits the use of audio based feedback, as this may interfere with the music and players may find this to be detrimental to the game experience [102].

• Socialization is an important aspect of exergaming [21]. Exergames are often played with friends or family. Being able to talk with other players is difficult when player must focus on interpreting audio-based forms of feedback such as audio cues. 14

• Moderate exercise has a facilitating effect on sensory and motor processes [83]. This effect could be exploited to facilitate exergaming using non-visual forms of feedback. However, studies with players performing choice response tasks while exercising show significantly higher error rates when using audio cues [18]. As an exergame is a form of a choice response task, it may indicate that the use of audio cues could be detrimental to the players’ performance.

There is considerable evidence that the overweight and obesity rates are higher among persons with disabilities than among the general population [26]. Compared to children and adolescents in other disability groups, those with visual impairments have been identified to be the most inactive, with 39% classified as sedentary and only 27% classified as active [68]. Individuals with visual impairments do not have the same opportunities to par- ticipate in physical activities that yield adequate fitness and a healthy standard of living as their sighted peers [61]. Children with visual impairments have limited ac- cess to physical education, recreation, and athletic programs because of: (1) limited social opportunities, such as lack of exercise partners or sighted guides with whom to exercise [95]; (2) fear of injury while exercising [61] and safety concerns of parents and teachers [67]; and (3) self barriers, such as fear of being made fun of while exercising [80], and a general lack of exercise opportunities [62]. People with visual impairments (VI) can exercise by participating in adapted physical activities. Physical activity is typically a combination of eye-body (sen- sorimotor) and muscle (cerebellar) control [27]. Tennis and baseball are typically sensorimotor based activities as they require the player to sense the state of the game (ie. the position of the ball and the opponents) and use that information to properly perform an action (swing and hit the ball to the proper location). These sports are also known as open sports by adapted sports researchers [65]. Open sports contain many variables that change often within the game. Tennis contains the ball and the opponent’s position. Running on a treadmill and cycling on a stationary bicycle are primarily cerebellar as the muscle movements are constant. Also known as closed 15 sports, these activities do not contain many variables that change often. A person with VI has issues participating in most sensorimotor activities as they rely heavily on vision. Activities such as cycling or running can be performed by a person with VI, however it usually requires the use of a sighted guide (as in assisted running) or a sighted pilot (as in tandem cycling) to perform the sensorimotor piece of the exercise while the person with VI performs the cerebellar activity (running or pedaling). Few unaided sensorimotor activities exist for those with VI. Games such as beep baseball and goal ball utilize special balls and equipment that emit sounds in order to substi- tute for the missing visuals and rely on ear-body coordination. Studies have shown that people with VI prefer adapted sensorimotor activities to cerebellar activities [62], however they are more difficult to make accessible. The ability to participate in sensorimotor activities contributes to normalization [81]. Exergames rely on non-traditional input devices. Exergames such as Dance Dance Revolution utilize dance mats, where a large mat is placed on the floor and players must place their feet on symbols on that mat which match the corresponding symbol displayed on the screen. Wii Fit utilizes motion sensing controllers and play- ers must mimic the motions displayed on the screen by their virtual character. Xbox Kinect utilizes a video camera that places the actual player in the middle of the game and it is the players responsibility to move his body in such a way to interact with the virtual objects on the screen by kicking or punching. In order to assess the feasibility of exergames as an exercise mechanism for people with VI, three new games were created from scratch. The games include VI Tennis, VI Bowling and Pet-N-Punch. All three games and their associated user studies are described throughout the rest of this chapter.

2.2 VI Tennis Methodology

The active energy expenditure that a player can achieve with an exergame is directly related to the gameplay experience [23]. Consequently an exergame that is not fun to play is unlikely to engage the player in physical activity for long periods of time. 16

Exergames differ due to different types of stimuli, rules, and behavioral requirements - factors that contribute to the game experience [11]. These properties are intrinsically determined by the nature of the exergame, such as a sport or activity, but also through reinforcement mechanisms such as rewards, points, and positive visual and audio feedback [11]. To mitigate our research on developing an accessible interface for exergames from such intricate dependencies, we modified an existing exergame rather than developing a new accessible exergame with unproven gameplay. Wii Sports is a popular exergame that emulates playing tennis, bowling, boxing, golf, and baseball through a handheld motion sensitive controller called a Wii Remote. The Wii remote is an inexpensive controller that, in addition to its pointing and motion- sensing abilities, includes a speaker, a rumble (vibrotactile) feature, and an expansion port for additional input devices. Using a handheld Wii remote, players play each Wii Sport game using motions similar to the sport it simulates is played; for example, in Wii Baseball, players swing their arms holding the Wii Remote like a baseball bat. Studies with adolescents playing Wii Sports games [40] show that players’ energy expenditure is significantly higher than when playing sedentary video games. Of the five Wii sports games, the tennis game achieves the highest energy expenditure. Though other exergames such as Dance Dance Revolution have yielded higher levels of energy expenditure in studies [32], these games typically involve whole body motions and/or pattern matching. Rather than solving three problems at the same time, we limit our research to exploring providing non-visual forms of feedback that will allow a user who is blind to engage in physical activity using upper limb motion. This type of feedback could provide the basis for developing exergames that allow their players to engage in higher levels of activity using pattern matching or whole body motions.

2.2.1 Wii Tennis Gameplay & Feedback

Wii Sports tennis (Wii Tennis) is played as follows. Players participate in a game of doubles, where four tennis players are visible on the screen (See Figure 2.2). Up to four players can play this game, where each player controls a tennis player. One or two 17

Figure 2.2: Wii Sports Tennis: (left) Player serving the ball (right) Player returning the ball players can also team up against computer controlled players. The player simulates hitting a tennis ball by swinging their Wii remote similar to swinging a tennis racket at the appropriate time. Players either serve or return the ball depending on whose turn it is to serve (see Figure 2.2 left and right). Wii Tennis registers forehands, backhands, volleys, lobs, slices, spin and power depending on how fast the user swings and at what angle. Though players can aim the ball in a particular direction, Wii Tennis does not offer a spatial challenge but only a temporal challenge, as Wii Tennis automatically moves players into position and players only control the swinging of the racket. Wii Tennis provides visual, audio and tactile feedback. For example, the speed and distance of the tennis ball is displayed; players hear the sound of the ball bouncing and feel vibrotactile feedback after they successfully hit the ball. Two types of feedback [103] are distinguished: (1) primary cues require the player to respond in a certain way as they indicate what to do and when, for example, the visualization of an approaching ball indicates what to do (prepare to return the ball) and when (when the ball is close to you); and (2) secondary cues such as reinforcement feedback [11] indicating whether the player’s provided response was correct, such as a cheering 18 crowd after scoring a point or a vibrotactile buzz after hitting the ball successfully. When players learn to play a game, a cognitive model of the game is created, where in-game actions are mapped to preceding cues. To be able to play a game successfully one must: (1) have a mental model of how to play the game; and (2) be able to perceive primary cues [103]. Cues can further be discrete, such as the sound of the ball bouncing or continuous such as the visualization of the ball. Figure 2.3:top shows an event graph with all cues that Wii Tennis provides.

2.2.2 Gameplay

Because Wii Sports itself cannot be modified, we created a PC game called Visually Impaired (VI) Tennis using Microsoft’s XNA framework and which communicates with a Wii remote using a Bluetooth. The gameplay of VI Tennis is modeled after Wii Tennis. Players participate in a game of singles where each tennis player is positioned at the baseline. Players can use forehand or backhand strokes but no volleys, slices, spins or lobs. The tennis ball also moves with constant speed and no direction can be given to keep things simple. Players can play against another player or against a computer-controlled opponent. Similar to Wii Tennis, VI Tennis implements dynamic difficulty adjustment [31] (DDA) to keep the ability of the computer-controlled player at the same level as the player and which for accommodating varying abilities as players who are blind may have never played a or tennis before. DDA adjusts difficulty every 5 points and if the difference in score is greater than 3, the ability of the computer-controlled player is either increased or decreased. The level of the computer-controlled player is set between 1 and 9 (start at 6) and represents the probability out of 10 that the computer will return the ball. VI Tennis further implements a tally scoring system, which is easier to understand. Players can only score points when serving the ball and if the player does not swing, or swings too late the player loses a point. Players are not penalized for swinging too early, i.e., if players swing too early, they can still swing again as long as VI Tennis detects a swing in the allotted time frame. The sensitivity and timing of the controls of VI 19

Figure 2.3: Primary, secondary, audio, visual and tactile cues in Wii Tennis (top) and VI Tennis (bottom) shows the resulting primary and secondary cues implemented in VI Tennis.

Tennis are based on Wii Tennis.

2.2.3 Feedback

Figure 2.3:top shows all primary and secondary, visual, tactile and audio cues involved in the sequence of: (1) the player serving; (2) the opponent returning; and (3) the player returning the ball. The event of the opponent serving is identical to the opponent returning. Dashed events indicate alternative cues where two different events can happen, for example, the player either successfully returns the ball or the player misses it, in which case different feedback is provided. To be able to successfully play a game, players must be able to perceive primary cues [103]. We identified that all primary cues in Wii Tennis are either visual and/or audio and tactile feedback is only used as a secondary cue. An event such as the bouncing of the ball may be encoded in two modalities, e.g., a visual and audio cue. When a cue is encoded in two modalities, a cue from one modality could be left out, as it will still allow for being able to play the game -though possibly with a lower performance- because multimodal cues presented simultaneously can be detected at lower thresholds, faster and more accurately than when presented separately in each modality [74]. 20

Primary visual cues must be substituted with non-visual cues to allow a player who is blind to play video games [102]. Sensory substitution is challenging due to the limited spatial and temporal resolutions of audio and tactile modalities [19]. Patterns for encoding information tactually generally include intensity, duration, temporal patterns, and spatial locations [29]. Encoding information with audio may include speech, audio cues or sonification. Video games further require quick responses from their players, which limits the use of complex encoding schemes as players may not be able to distinguish these fast enough. Earlier we identified that the application of audio feedback is limited because of socialization or music constraints. Constraints we accommodated during the design of VI Tennis were: 1) preserve the gameplay of Wii Tennis as much as possible; 2) minimize the use of additional audio and therefore prefer tactile encoding of cues; and 3) keep the number of tac- tile/audio cues small; or facilitate multimodal encoding of primary cues to increase their successful recognition. The following four steps were taken and we play tested our VI Tennis prototype between each step with an adult who was blind. 1. Implement primary and secondary audio and tactile cues. We first ported Wii Tennis’ primary and secondary audio and tactile cues. This prototype turned out to be somewhat playable. Though there is no cue indicating exactly when to hit the ball, after some time, our subject was able to deduce this from the timing between the audio cues that indicate the ball’s speed and distance, such as the ball bouncing and opponent serving or returning the ball. Ball speed depends on how hard it is hit. A problem occurs when the player hits the ball harder, as this disrupts the timing between cues, which makes it harder to deduce when to hit the ball. 2. Substitute primary unimodal visual cues. Visual multimodal cues, such as the bouncing of the ball, are not substituted as these are encoded as audio already. We explored conveying the ball’s speed and distance through modulation of the pitch of a tone. Though this worked, the subject found this to interfere too much with the ability to perceive the other audio cues. We then explored the use of vibrotactile 21

cues, but had to remove existing secondary tactile cues to do so. A Wii Remote provides feedback with a frequency of 250Hz, whose intensity can be varied only by pulsing the motor activation. It is difficult to provide a continuous cue as perceiving changes in intensity becomes difficult as the subject’s hand becomes numb from being exposed to vibrotactile feedback continuously. Instead, two discrete vibrotactile cues were used: (1) a (250ms) cue indicated the ball bouncing; and (2) a (2000ms) cue indicating the timeframe in which the ball must be returned. 3. Ensure multimodal encoding. Some cues in Wii tennis are encoded in two modalities (see Figure 2.3:top). To allow for the same rate of recognition, the speed of the ball was made constant, to allow players to deduce also when to hit the ball from the timing between the preceding audio cues. We explored providing an audio cue that tells when to hit the ball, but this simplified the gameplay too much. 4. Substitute visual secondary unimodal cues. Speech cues were added that indicate to the player whose turn it is to serve, after play testing revealed that our subject found this difficult to determine. Score is displayed visually after each point in Wii tennis and in VI Tennis this is implemented using speech cues.

Figure 2.4: Children who are blind playing VI Tennis 22

2.3 VI Tennis User Study

VI Tennis was evaluated at Camp Abilities, a developmental sports camp for children who are visually impaired, blind, or deaf/blind, and which is held annually at the College at Brockport. The goal of this user study was to evaluate whether VI Tennis allows children to engage into MVPA. We were specifically interested in evaluating the effectiveness of providing vibrotactile cues in addition to audio cues, motivated by the observation that multimodal cues are recognized at higher rates [74]. Two versions of VI Tennis were used in the study, one providing audio/tactile cues and one with the tactile cues turned off. Both versions were play tested before the study to ensure playability. We defined the following hypotheses:

H0: Players who are blind achieve the same performance with tactile/audio cues as with audio cues.

H1: Players who are blind achieve the same levels of active energy expenditure with tactile/audio as with audio cues.

2.3.1 Participants

Table 2.1: Participants’ characteristics Characteristic All { n = 13 }(σ) Gender (M/F ) 9/4 Age (years) 12.6 (2.5) Height (m) 1.54 (0.1) Weight (kg) 53.2 (17) Body mass index (kg/m2) 22.0 (5.4)

Children were recruited prior to arriving at Camp Abilities and were classified according to the sports classifications of the U.S. association of Blind Athletes. B1 athletes are totally blind with no functional vision, B2 athletes have travel vision and B3 athletes are legally blind. Four girls and nine boys from the B1 category were 23 selected. Participants with orthopedic impairments were excluded from our study. Parents and adolescents consented to the study prior to participation. Children’s height and weight was measured using standard anthropometric techniques. Three children were classified to be obese and three were overweight. Table 2.1 provides a summary of the participants’ characteristics.

2.3.2 Instrumentation

Active energy expenditure was captured through an Actical omnidirectional accelerom- eter worn on the child’s wrist. Accelerometers have been successfully used to estimate the energy expenditure of activity [101], they do not impede the user’s ability to play the game, and they are more suitable to capture energy expenditures of the arm than hip positioned capturing techniques [40]. VI Tennis tracks the player’s score and the computer-controlled player’s level (CCPL) in a log file for each play session.

2.3.3 Experimental Trial

User studies were held over two days. Children were randomly assigned to either group A or B. Group A (n=6) played VI Tennis on day one (T1) and the audio only version on day two (T2). Group B (n=7) played the games in reverse order. Prior to playing the game children were allowed to familiarize themselves with the controller while receiving a verbal tutorial on how to play the game. Children played the game in a quiet room. Children were allowed to play the game for 5 minutes before they were equipped with an accelerometer on their dominant arm. Children played the game for 10 minutes against a computer-controlled player, after which the accelerometer was removed for analysis.

2.3.4 Results

Players’ performance was analyzed using the CCPL. Due to a crash, data for one participant playing the tactile/ during trial 2 was lost. Figure 2.5 shows an overview of the CCPL for both versions and trials for each minute that the game 24

Table 2.2: Average Active Energy Expenditure Kcal/Kg/Min Kcal/Kg/Min T1 (σ) T2 (σ) Mean (σ) Audio 3.56 (1.1) 4.49 (2.0) 3.99 (1.6) Audio+Tactile 4.70 (2.3) 3.47 (1.0) 4.03 (1.8)

Table 2.3: Total time spent in MVPA Minutes T1 (σ) T2 (σ) Mean (σ) Audio 9.71 (0.5) 9.5 (0.8) 9.62 (0.7) Audio+Tactile 9.83 (0.4) 9.71 (0.5) 9.77 (0.4) was played including the error rate. Both versions start out at level 6. For both trials, the audio version initially shows a decrease in CCPL, which only starts to increase after 6 minutes. The tactile/audio version shows an increase throughout the ten minutes that the game is played for both trials. Due to the nonparametric nature of the CCPL data, a series of Wilcoxon Signed Ranks Test was performed to analyze whether these results were significantly different. To adjust for multiple testing α was set a priori at 0.01. Data was analyzed in one-minute increments. The tests show that the first significant divergence between the tactile/audio and the audio versions of VI Tennis appeared after 3 minutes of game play (Z2,12 = 2.83 p < 0.01). Because of these results, H0 was rejected. Previous research has shown little difference in physical activity levels between male and females [101]. This observation, along with the limited sample sizes typical in disability research, motivated us to collapse gender into one group. Table 2.2 shows the average Active Energy Expenditure for participants and Table 2.3 shows how many minutes participants were engaged in moderate to vigorous physical activity (MVPA). MVPA and minutes in MVPA were calculated by the Actical Software and is based on the estimated Metabolic Equivalent

(MET) values [44]. No significant difference (T2,12 = 0.179 p > 0.01) was detected between both versions, which required us to accept H1. 25

Figure 2.5: Computer controlled player level (CCPL) for tactile/audio and audio for both trials over time.

2.3.5 Qualitative Analysis

Qualitative data was collected through interviews held after the second trial. Lik- ert scales for children are recommended to have a limited number of points on the scale or use a visual scale, neither of which is applicable in our study. Because we are performing a comparative analysis, children were asked to participate in an in- formal one-on-one interview after the second trial. We used non-directed interviews with open-ended questions, for example we asked children to describe the games they played, whether they liked playing each game, which version of the game they pre- ferred and how each game could be made easier for them to play. Interviews lasted between 5 and 15 minutes and were recorded. We later transcribed these recordings and analyzed them to identify recurring themes. All children were very excited to participate in our user study. Many children had heard about the Wii games from friends. Five children had played video games before and and one child told us she had played Wii Bowling before with help of her parent who provided her with verbal cues on how to play the game. All (n=13) children expressed feelings of a positive experience playing both ver- sions of VI Tennis. Children stated: ”It was really great!” and ”I want to play tennis 26

again!”. After asking what they liked about the game several children mentioned health benefits such as: ”it is good as you get a lot of strength in your arms” or ”you get to exercise”. All children preferred the tactile/audio version as they said: ”with sounds is like we have to pay more attention, but with vibrations I just feel it and just hit or swing the remote”, ”I liked the one that vibrates, because it would do a nice long vrrrrrr and tell you when to hit.” and ”I like the one that buzzes because I can actually feel this one”. Overall children enjoyed playing VI Tennis as they told us: ”We can have fun with other people” and ”it is really good to play with friends or whoever!” and ”it’s like having fun, and we can do things they can do, like even though we cannot see” which emphasizes the social and normalization [81] aspects of playing games and exergames in particular. Children did not make any suggestions on how the game could be easier for them to play but two children suggested we should add the ability to play VI Tennis with two players (which had been implemented already).

2.4 VI Tennis Discussion

2.4.1 Sensory Substitution

Children scored significantly better with the tactile/audio version of VI Tennis and expressed enjoying this version more than the audio version. A plausible explanation is that the tactile/audio version was easier to play, as the primary cues indicating when to hit the ball are encoded in multiple modalities (tactile/audio), and therefore have a higher chance of being recognized than when these are encoded in one modality [74]. Interviews with children confirmed that it was easier to recognize tactile cues. Audio cues have found to yield higher error rates [18] in studies with choice response tasks while exercising, though children only engaged in brief periods of exercise, it could be of influence. Because the tactile/audio version provides a more engaging experience, it is more likely to engage player in physical activity for longer bouts of time, which may yield larger health benefits. 27

We found multimodal cues to be more effective than unimodal cues, though, for more complex exergames it may not be possible to facilitate multimodal encoding of primary cues due to the limited temporal and spatial resolution of audio and tactile feedback. A Wii remote contains only one vibrotactor and adding more vibrotactors such as using a tactile suit [59], would be impractical or too costly, which precludes large scale deployment. For VI Tennis a number of tradeoffs were made, such as not being able to hit the ball in a direction or not being able to hit the ball with different speeds. Though some of these features could be added in newer versions of VI Tennis, for this study we found these tradeoffs to be acceptable as the lack of such features was not deemed significantly detrimental to the game experience during the 10 minute trials. The four steps that we used in the sensory substitution will be followed in the adaptation of other exergames as to identify their general validity. Substituting visual primary cues is an essential step to making games accessible, even if this must include removing secondary cues from audio or tactile modalities. We recommend play testing often to identify whether such tradeoffs are detrimental to the game experience. We did not evaluate VI Tennis in the exergaming contexts where the use of audio is limited, for example, when music is present or when players engage in social interaction, though it should be evident that vibrotactile feedback in these contexts is more usable.

2.4.2 Active Energy Expenditure

No significant difference in active energy expenditure could be detected between both versions. It is more difficult to play the game with audio cues; we observed that children playing the audio only version often started hitting the ball too early -and then not tried to swing again- or they swung too late, and therefore failed returning the ball. Though both versions start at level 6, Figure 2.5 shows that CCPL for audio only initially goes down, and only starts to increase after 6 minutes where in comparison CCPL for tactile/audio increases throughout the 10 minute trial. Because the computer immediately starts to serve again when the players misses the ball, no 28 significant difference in active energy expenditure between a successfully rally and consecutively missing the ball can be detected as in both cases children will still swing their arm. Like real tennis, a significant difference in energy expenditure may be detected if we introduce a brief pause before the computer player starts serving again.

2.5 VI Tennis - Future Work

2.5.1 Whole Body Exercise

In our study with VI Tennis, children were able to engage in MVPA, but closer analysis revealed that the majority of this time was spent in moderate -rather than vigorous activity. Because children have higher metabolic rates, the daily- recom- mended amount of MVPA is 60 minutes for children and 30 minutes for adults [1], although it is advised children engage in vigorous activity at least 3 days per week. The amount of physical activity achieved with VI Tennis may be sufficient for adults but may not be high enough for children. Because exercising at an early age has lifelong benefits [100], future work will focus on researching tactile/audio exergames that yield higher active energy expenditure. Studies with pattern matching exergames such as EyeToy Kinetic and Dance Dance Revolution have yielded higher levels of MVPA [32, 101]. There are two possi- ble explanations: (1) Wii remote based exergames only involve motions of the domi- nant arm, whereas pattern matching exergames involve whole body movements; and (2) Sports based exergames such as Wii Tennis and Wii golf are self-paced, whereas pattern matching exergames due to their reliance on music are externally paced, and typically performed at a higher pace than sports based exergames. We seek to facilitate whole body exergaming and pattern matching through whole body tactile/audio motion instructions by using multiple Wii remotes. Players will have a Wii remote strapped to each leg and hold one in each hand. As such, an EyeToy Kinetic based exergame can be developed that provides motion instructions 29 specific to each limb that a Wii remote is strapped to. For example when a cue is provided to a leg, the player must kick their leg forward to destroy an object. Pattern matching may be facilitated if players can successfully memorize sequences of tactile/audio rhythms that are correspond to patterns in the music.

2.5.2 Sensorimotor Skills

A number of physical activities have been adapted for users with visual impairments [59]. Many of these activities pair a user with a visual impairment with a sighted guide. For example, in tandem cycling, a sighted guide performs the sensorimotor skill (steering) and the user with a visual impairment performs the strength skill (pedaling). Few physical activities exist where individuals with visual impairments can participate in sensorimotor skills and adapted physical education researchers have therefore called for the inclusion of users with visual impairments in sensorimotor skill activities [59]. Tennis is a sensorimotor challenge, but in Wii Tennis this is reduced to a timing based challenge as the game automatically moves the player into position. We seek to explore how a spatial challenge can be added to VI Tennis, for example by indicating with tactile/audio cues to the player whether the ball needs to be returned backhand or forehand. As a larger number of tactile/audio cues will be required to indicate spatial motion instructions, our current research focuses on determining the number of vibrotactile/audio cues that players with visual impairments can comfort- ably distinguish using one Wii remote. The technology developed for tactile/audio exergaming may eventually be used to make real sports accessible, for example, a tennis racket providing vibrotactile/audio cues could indicate where and when to hit the ball.

2.5.3 Motor Learning

Children were verbally instructed how to play VI Tennis, but we found that some children did not know or understand how to swing a tennis racket. Verbal instructions are abstract and rely upon a mental model [59] and players with visual impairments 30 may lack such mental models. Though we assisted some children in performing the desired motions during the tutorial, during the trial, we observed them developing completely new ways to swing their racket. The sensitivity of the controls of VI Tennis were based on Wii Tennis, but this did not preclude children from using motions that had lower active energy expenditures than the motions that we taught them. Though it is debatable whether players who are blind should perform the exact same motions as sighted individuals to achieve the same amount of active energy expenditure, it may be useful to adjust the sensitivity of VI Tennis such to take into account the size of the motion made with the controller. Increasing the threshold for detecting a stroke in VI Tennis could avoid players from developing such short cuts. We are also investigating a technique for motor learning that can teach players who are visually impaired how to perform the motions in the exergame, without the aid of a sighted guide.

2.5.4 Barriers to Physical Activity

Section 2.1 discusses some of the barriers to physical activity that Individuals with visual impairments face. Once more exergames have been developed; we aim to conduct user studies where individuals with visual impairments can play exergames at their home over a longer period of time. This will allow for evaluating whether exergames can overcome these barriers and whether exergames are viable alternatives to existing physical exercise if it can be determined that they are safer while yielding amounts of active energy expenditure that are considered healthy. In our qualitative analysis of VI Tennis children were only able to play VI Tennis for a total of 30 minutes. Attitudes may be different if our exergames are played over a longer period of time.

2.6 VI Tennis Conclusion

This section presents VI Tennis, an exergame for players who are visually impaired and which provides tactile and audio feedback. Four steps for sensory substitution 31 are provided which game developers can use to adapt their exergames to the abilities of players who are visually impaired. The effectiveness of providing multimodal versus unimodal cues was evaluated through a user study with 13 children who are blind. We found that children were able to achieve moderate to vigorous levels of physical activity with a version of VI Tennis that provides tactile/audio cues and audio only cues. No significant differ- ence in active energy expenditure was found between both versions, though children scored significantly better with the tactile/audio version and also expressed enjoying playing this version more, which emphasizes the potential of tactile/audio feedback for engaging players in physical activity for longer periods of time. Future work will focus on: (1) exploring whole body tactile/audio cues to achieve higher active energy expenditures; (2) adding spatial challenges to VI Tennis to inves- tigate how sensorimotor sports can be made accessible; (3) developing a technique for motor learning; and (4) investigating whether exergames can overcome the barriers to physical activity that individuals with visual impairments face.

2.7 VI Bowling Design

Bowling is an anaerobic physical activity in which players score points by rolling a bowling ball along a flat surface into standing wooden objects called pins. This sport is primarily visual-spatial as players aim and throw their bowling ball into a particular direction to strike the pins. Bowling is self paced and its temporal challenge is relatively small, which makes bowling a good candidate physical activity for exploring exclusively how spatial challenges in exergames can be made accessible to users with visual impairments. The amount of active energy expenditure that a player can achieve with an exergame is directly related to the gameplay experience [23]. An exergame that is not fun to play is unlikely to engage the player in physical activity for long periods of time. Factors that have identified to contribute to the gameplay experience are different types of stimuli, rules, and behavioral requirements [62]. These factors are 32 intrinsically determined by the nature of the exergame, such as the sport it emulates, but also reinforcement mechanisms used, such as rewards, points, and positive visual and audio feedback. To insulate the research on exploring techniques for making spatial challenges accessible from these intricate game dependencies, the gameplay of an existing exergame was implemented, rather than developing a new game from scratch with unproven gameplay. Ten-pin bowling exists as an exergame as part of the popular Nintendo Wii Sports exergame (see Figure 2.6), which is played with a Wii Remote. Wii Bowling has been found to yield an average active energy expenditure of 11.7 kJ/min [40], which is of high enough intensity to contribute towards the recommended daily amount of exercise in adults [1]. Because Wii Bowling was found to be particular successful in promoting regular exercise among senior citizens [48], an accessible version of this exergame could be attractive to individuals with visual impairments, as elderly constitute a significant portion of the population of visually impaired. The core gameplay mechanics of Wii Bowling consist of the following two discrete consecutive steps:

• 1. Players aim where to throw their ball by manipulating a visual cue with the arrow keys on their Wii remote (see Figure 2.6). This is primarily a spatial challenge that depends on the visual feedback that the game provides on how many pins are still standing up.

• 2. Players throw the ball using the Wii remote using a motion that resembles how bowling is played. The player raises the Wii remote up in front of them while holding the trigger button on the controller using their index finger. The player then moves the Wii remote down alongside their body backwards and then swings the controller forward. Players then release their bowling ball by releasing the trigger button on their controller. This is primarily a temporal challenge, as the game does not take into account the direction in which the controller is moved. Players can add spin to their ball by twisting the Wii 33

remote to the left or to the right as they release the ball.

Players repeat these two steps twice (two balls per frame) and typically a game consist of ten frames. Players can play against up to 8 friends where Wii remotes can be passed around between players. Once the ball has been thrown, it can end up as a gutter ball or successfully strike a number of pins down.

Figure 2.6: Wii Sports Bowling: The red line (i) indicates the direction in which the ball will be thrown and users can shift this direction using the arrow keys on their controller. (ii) shows the grouping of the pins and (iii) shows the current score.

An initial analysis of the gameplay shows that the player primarily engages into physical activity during the second step where the player throws the ball. To make this game accessible it is important to make the first step efficient because the faster players are able to aim their throw the faster they can play the game, which may yield a higher total AEE. Because Wii Sports is closed source, a PC game was created called Visually Impaired (VI) Bowling, which implements the gameplay of Wii Bowling.

2.7.1 Controls

Similar to Wii bowling, VI Bowling uses a Wii remote. The Wii remote is an in- expensive controller, which features an infra red (IR) optical sensor, a 3-axis linear 34 accelerometer, a speaker, a rumble (vibrotactile) feature, and an expansion port for additional input devices. The timing of the controls of VI Bowling was based on Wii Bowling, yet the threshold for detecting a throw was set higher than in Wii Bowling. This was done based on the previous experiences with VI Tennis. VI Tennis’s controls were modeled after Wii Tennis. In the user study some children developed shortcut motions with a lower AEE than the motions that were taught to them. The sensitiv- ity of the controls in Wii Sports are relatively forgiving to appeal to a mass audience, yet players are engaged into higher physical activity if a larger size of detected motion is required for throwing the ball, which also avoids shortcut motions.

2.7.2 Sensory Substitution

To analyze how Wii Bowling could be made playable using non-visual feedback, the modalities and types of feedback that Wii Bowling provides to the player were iden- tified (See Figure 2.7). Wii Bowling provides visual, audio and tactile feedback. For example, visual information includes the visualization of the pins (see Figure 2.6), the ball and the lane. Audio feedback includes the sound of the ball rolling, and the ball striking the pins. Different sounds are played depending on the number of pins hit. Two types of feedback [103] are distinguished:

• Primary cues require the player to respond in a certain way as such indicate what to do and when. In Wii Bowling this is the visualization of the pins as it indicates to the player where to throw the ball.

• Secondary cues are not essential to be able to perceive to be able to play the game such as reinforcement feedback [12] used to state the outcome of a particular player action, e.g., the sound of the ball rolling as well as a vibrotactile buzz confirm to the player that the ball was thrown.

Cues can further be discrete, such as the sound of the ball striking the pin or con- tinuous such as the visualization of the pins. An overview of all feedback is provided 35

Figure 2.7: Sensory substitution map which shows the primary and secondary cues for audio, visual and tactile modalities in Wii Bowling (top) and VI Bowling (bottom). Dashed events indicate alternative cue events. in Figure 2.7. Dashed events indicate alternative cues where two different events can be provided, for example, a strike versus a gutter ball have different feedback associated with them. To successfully play a game, players must be able to perceive primary cues [103]. Consequently to make a game accessible to someone with a sensory disability, primary cues presented in an absent modality need to be substituted with cues from a com- pensatory modality [19]. It is hypothesized that when a primary cue is presented in multiple modalities, a cue from the absent modality can be omitted without affecting the ability to play the game [77]. The game may be played with lower performance- because multimodal cues presented simultaneously can be detected at lower thresh- olds, faster and more accurately than when presented separately in each modality [74]. Sensory substitution is especially challenging for visual impairments due to the limited spatial and temporal resolutions of audio and tactile modalities [19]. Video games further require quick responses from their players, which limits the use of com- plex encoding schemes, as it may not allow for players to be able to distinguish such cues fast enough. 36

From the abundance of visual cues that games typically present, often only a small subset of cues in a different modality can be feasibly interpreted by a player with a visual impairment. Often significant tradeoffs with regard to the gameplay must be made, which may be detrimental to the overall game experience [103]. Because the game of bowling is self-paced, no fast responses are required from its players, however the use of audio is limited due to the socialization constraints that were discussed earlier. VI Bowling was iteratively developed where feedback from one individual with a visual impairment who play tested our game was incorporated. VI Bowling was implemented using a systematic approach that was developed for VI Tennis [77]:

• 1. Implement primary and secondary audio and tactile cues. The exist- ing audio and tactile cues of Wii Bowling were implemented (see Figure 2.7:top). For the audio cues creative commons licensed sounds were used. As the spa- tial challenge part is entirely visual this part was excluded from the game and instead the computer randomly aimed for the player. Due to the multimodal encoding of cues such as the location and speed of the ball, the number of pins hit, and what type of ball the player has rolled (strike/spare), leaving their visual representation out was not found to significantly affect players’ ability to play the game. Play testing revealed that although different audio cues are used to indicate different numbers of pins being hit, it was difficult to determine exactly how many pins were hit.

• 2. Substitute primary unimodal visual cues. Because players’ score is only represented visually, speech cues were added that convey the number of pins hit after each throw and score after each frame (see Figure 2.7 bottom). Though the use of audio should be minimized due to social constraints of exergaming, the use of speech is motivated as score and number of pins hit are most easily interpreted as speech rather than other forms of audio or haptic feedback. When the player throws a strike or spare, the number of pins hit is not conveyed, as the player can already deduce this. 37

After the second iteration the temporal part of Wii bowling was found to be playable. However, playing the game using random aim offers very little challenge to the player, as the player does not exert any control over how many pins are being hit, which is the most important challenge of bowling. This emphasizes the importance of exploring techniques for making spatial challenges accessible, as these often affect the elements of gameplay that are the most fun to perform.

Figure 2.8: Tactile dowsing: the player moves the Wii remote in the horizontal plane (left); The closer the Wii remote points to the target direction the more continuous the perception of vibrotactile feedback will feel (right).

2.7.3 Tactile Dowsing

The spatial challenge of Wii bowling contains of two steps:

• Determine where to throw the ball, which is a cognitive step. Based on the pins that are still standing up, players will most likely aim towards the location where the majority of the remaining pins are, or in case of a split (only the outer pins remain standing) they may try to throw an effect ball directed towards one of the remaining pins.

• Align a visual cue using the arrow keys on the controller with the direction in which the player wants to throw their ball which is a physical step (see Figure 2.6). 38

These steps together take a few seconds. After aiming the player throws the ball. Wii Bowling does not take into account the direction in which the Wii remote is swung, only that a motion of significant magnitude was detected before releasing the trigger on the controller. Though several strategies could be employed to make each step individually accessible using non-visual feedback, a novel technique called tactile dowsing was developed that provides a tactile- spatial challenge that combines both steps into an efficient step that can point out a direction to the player. Tactile devices for indication of a direction have primarily been explored in navi- gation of visually impaired, some examples include: belts [33], vests [91] or handheld devices [16]. None of these solutions are commercially available nor can they be easily used while exercising due to their size and weight. Instead the tactile capabilities of the Wii remote were explored as it contains a vibrotactor that provides feedback with a fixed frequency of 250Hz, which can be varied only by pulsing the motor activa- tion. Though the tactile spatial resolution of a Wii remote is limited, an abundance of tactile receptors makes fingertips the most sensitive to vibrotactile feedback [19]. Due to its low cost this solution has potential for large-scale implementation. Tactile dowsing combines the features of an infrared optical sensor and a vibro- tactor in one integrated handheld device, i.e., the Wii remote. Players use the Wii remote like a dowsing rod to find the target direction. A form of haptification is used; a tactile window of 19.3◦ is defined on both sides of the target direction. Play- ers move the Wii remote in the horizontal plane until they enter the tactile window. VI Bowling will then start pulsing the vibrotactor for 100ms with 2500 ms delays. This delay is decreased linearly with 125 ms for every 1◦ of error. Players find the direction of error by moving the Wii remote in a direction to determine if the pulse delay is increasing or decreasing. To find the target direction, the player needs to point the Wii remote such to maximize the continuity of the vibrotactile signal. A wireless IR emitter peripheral (Wii Sensor bar) is used to detect the direction in which the Wii remote is pointing. This sensor bar has five IR LEDs at each end of the bar. The light emitted from each end of the Sensor Bar is focused onto the 39

Wii remote’s optical sensor as two dots on a 1024x768 canvas, if it is visible within 16ft of the sensor. The distance between the sensed dots can be used to calculate the distance and the orientation of the Wii remote to the sensor bar. For VI bowling, the IR Emitter bar was modified to light only one block of IR LEDs, which is enough to determine the direction in which the Wii remote is pointing and makes it easier to disambiguate which dot is sensed. After tactile dowsing, the player proceeds with throwing the ball. In Wii Bowling, no sensor bar is used and the direction the ball will go is only determined by twisting the Wii remote at the end of the throw. VI Bowling signifi- cantly deviates from Wii Bowling as the direction the ball will travel is determined by the angle between the target direction and the direction the Wii remote is pointing at when the player releases the ball. This was implemented specifically to investigate whether tactile dowsing can be used to support a simple form motor learning. The procedure for motor learning in bowling works as follows: (1) the player finds a target direction using tactile dowsing by holding the controller in front of the player; (2) the player then moves the Wii remote down alongside their body backwards and then swings the controller forward until it points in the target direction (See Figure 2.10). When the player releases the trigger key, VI bowling records the X value of the sensed dot. Depending on how far the recorded value is off the target the ball will go either left or right. For the first throw of the frame the target direction is at the center of the sensor bar. A strike is any throw within 3.8◦ of the target. To get one pin down, the player must throw within 19.3◦ of the target or else the player will throw a gutter ball. The number of pins knocked down is determined by the accuracy of the throw and is distributed linearly from 1 to 10 throughout the tactile window. These values for the degrees were determined through play testing. In Wii Bowling the magnitude of the throw affects how many pins are hit. To evaluate the accuracy of tactile dowsing the speed of the ball was made constant, to avoid any inaccuracies due to vigorous motions. The target direction for the second throw of the frame is 40 adjusted to point out the error of the player in the first throw. If the player’s first throw was to the right of the target, the target position for the second throw is to the left representing the center of the remaining pins. Due to the sensory substitution, certain gameplay elements were simplified, e.g., in Wii Bowling it is possible to have a space between the remaining pins, i.e., a split. VI Bowling ensures remaining pins are clustered, as it would be difficult to indicate the location of two disjunct groups of pins using tactile dowsing. To accommodate dowsing, the tactile cue that indicates a successful throw (see Figure 2.7) was removed.

Table 2.4: Participants’ characteristics and results User ID P1 P2 P3 P4 P5 P6 MEAN σ Gender M F F M M M - - Age (years) 30 43 61 77 81 45 58 21 Impairment L T T L L L - - Height (cm) 168 160 154 182 177 183 169 11.3 Weight (kg) 79.3 49.9 60.3 79.3 74.3 81.6 70.8 12.8 Score 86 94 130 122 219 143 132 47 Time (s) 532 573 250 281 322 273 371 142 AEE(kJ/min) 6.63 4.97 3.82 6.30 3.10 2.80 4.61 1.62

2.8 VI Bowling User Study

The goal of the user study was to assess: (1) if VI Bowling is a fun activity to perform; (2) whether it can engage its players into physical activity; and whether (3) tactile dowsing can be used for motor learning.

2.8.1 Participants

VI Bowling was evaluated with six adults with visual impairments, who were re- cruited using the email list of the Reno/Tahoe chapter of the Nevada Federation of the Blind. Participants’ height and weight was measured using standard anthropo- metric techniques. Two participants were totally blind (T) and the rest legally blind (L). Three participants were classified to be overweight according to the Center for 41

Disease Control and Prevention’s (CDC) definitions. Table 2.4 lists an overview of the participants’ characteristics.

2.8.2 Instrumentation & Experimental Trial

Active energy expenditure was captured through an Actical omnidirectional accelerom- eter worn on the participant’s wrist. Accelerometers have been successfully used to estimate the energy expenditure of activity [101] and they don’t impede players’ ability to play the game [77]. VI Bowling records the player’s score and throwing accuracy in a log file. Prior to playing participants could familiarize themselves with the controller and learn how to play the game using a brief five minute tutorial, which included a practice session. Players were then equipped with an accelerometer on their dominant arm, which was initialized using their weight, height, age and sex. Players were positioned about 5 ft from the sensor bar and played ten frames.

2.8.3 Results

Figure 2.9: Combined graph showing average dowsing time and average number of pins hit per frame.

Participants achieved an average active energy expenditure (AAE) of 4.61 (σ=1.62) kJ/Min, which is significantly lower than what (8.12kJ/Min) was achieved in previous studies with children playing Wii bowling [39, 40]. These values are not MVPA, which 42

is defined as higher than 9 kJ/Min for adults [1]. Lower AEE is explained due to a different population as well as that the spatial challenge in VI Bowling takes longer to perform than in Wii Bowling. Figure 2.9 shows a combined graph of the dowsing time, e.g., the time it took to find the target direction and the average number of pins hit per frame. The overall average dowsing time per frame is 8.78 (σ=8.34) seconds, and though no comparative data was collected for Wii bowling this visual-spatial challenge typically takes a few seconds. Though dowsing time appears to decrease over the ten frames, no significant difference between the times of the first three and

the last three frames could be detected (T2,34 = 0.02 p < 0.05). A contributing factor to this could be that players did not feel pressured to quickly find the target direc- tion, as bowling is self-paced. If the total tactile dowsing time is subtracted from the total playing time, and AEE is reassessed using the times participants were actually physically active (throwing) an AEE of 9.19(σ=5.68) kJ/Min can be found which is considered MVPA for adults. The average aiming error measured at the end of each throw was 9.76◦(σ=6.23). Though direction found during tactile dowsing phase was not recorded, all participants were observed to always find the maximum continuous intensity before throwing. Average number of pins hit each frame is a function of throwing accuracy, there fore this score used to assess the accuracy of tactile dowsing based motor learning. Due to the non-parametric nature of the data, number of pins hit was dichotomized (1 if number of pins was 10, and else 0). A McNemar test using a 2x2 contingency table compared the first three frames versus the last three frames and showed that participants became more accurate with throwing as they significantly increased their ability to finish a frame with all pins knocked down over the course of the game (χ2,34 = 0.28 p > 0.05).

2.8.4 Qualitative Analysis

Participants were interviewed using a questionnaire after the test to determine the usability and playability of VI Bowling. First participants’ exercise behavior and experience with bowling was identified, then they were then asked to rate the features 43

Figure 2.10: User with visual impairment performing dowsing (left); and throwing (right) in VI Bowling. of VI Bowling on a 5-point Likert scale (ranging from strongly disagree (1) to strongly agree (5)) and then they could make suggestions for improving VI Bowling. All participants had played bowling before but none had played Wii bowling or a Wii game. P2 and P3 did not exercise regularly and the other participants primarily engaged in walking. P2 and P3 did not consider themselves fit and P1 did not know if he was fit. All participants liked VI Bowling (M=5.0, σ=0.0) and found the game easy to play (M=4.6, σ=0.52) and the tutorial easy to follow (M=4.8, σ=0.41). The tactile dowsing was found to be challenging (M=4.5, σ=0.55) and all participants except one believed this game could help them exercise (M=4, σ=1.55). Participants made the following suggestions for improving Wii Bowling: (P1; P2) adding a multiplayer option; (P5) using spatial audio to indicate where the pins were hit; (P5) adding more environmental sounds, such as a cheering crowd when the player hits a strike or spare; and (P6) tactile feedback should be provided more gradually over a larger range. 44

2.9 VI Bowling Discussion and Future Work

2.9.1 Active Energy Expenditure

The CDC recommends adults engage in MVPA for 150 minutes a week to remain fit [1]. VI Bowling provides light physical activity that is comparable to walking and hence it does not significantly contribute to this recommendation. These findings cor- roborate with results from an earlier study conducted with Wii Bowling with children [40], though physical activity levels that define MVPA for children are higher than for adults, as children have higher metabolic rates than adults. Due to the sensory substitution, it can be argued whether the visual-spatial challenge in Wii Bowling and the tactile-spatial challenge in VI Bowling are equivalent, however, the latter takes longer to perform. Over ten frames a significant increase in accuracy was detected, but no significant decrease in dowsing time. Two solution strategies can be explored to increase AEE: (1) make it easier to find the target direction by increasing the tactile range of the sweet spot; and (2) incorporate the size of the motion made with the Wii remote to affect the number of pins hit. This may yield more vigorous throws but these may be more inaccurate. Regardless, VI Bowling stimulates positive activ- ity behavior; players are on their feet performing basic motor control and movement skills, which -considering the limited existing exercise opportunities that individuals with visual impairments have- should be encouraged over inactive sedentary behavior.

2.9.2 Tactile Dowsing Based Motor Learning

Using tactile dowsing, players were able to make a motion with their arm in the target direction with an average error of 9.76◦. In VI Bowling, tactile dowsing implements a relatively simple form of motor learning because players do not receive feedback on whether the motion that was made was correct, e.g., players can make a thrust motion or a swinging motion with their arm and both have the same result. For motor learning, tactile dowsing could potentially be used to point out different stages of the desired motion. Depending on the type of motion, this approach cannot be 45 implemented using one sensor bar, as it requires a line of sight. Multiple sensor bars would be required, which introduces a new problem, as sensor bars are difficult to distinguish from each other. An alternative is to facilitate tactile dowsing using the Wii remote’s built in accelerometer, but as low cost accelerometers are prone to fuzzy input, this may not be accurate enough for tactile dowsing. No qualitative comparison was made between a version of VI Bowling with and without tactile dowsing, but participants liked the challenge tactile dowsing provides. None of the participants found tactile dowsing too hard or too easy. The target direction for the second throw (if there was any) only varied in a relatively small range, which may have allowed for some memorization of the target direction for the first throw. Future work will evaluate the ability of tactile dowsing to point out varying target directions within a full 360◦ range. Tactile dowsing could be used for indoor navigation or to make other physical activities accessible that have some aiming component such as baseball.

2.9.3 Temporal-Spatial Challenges

VI Bowling only contains a tactile-spatial challenge and VI Tennis [77] contains only a tactile-temporal challenge. As most physical activities are a combination of spatial and temporal challenges, future work will combine elements of gameplay to create an exergame with a temporal-spatial challenge, which may be more challenging to play. For example, VI Tennis could be extended with a spatial challenge where players use tactile dowsing to determine whether the player must hit a forehand or a backhand. VI Tennis can engage players with visual impairments into MVPA and as moderate exercise has a facilitating effect on sensory and motor processes, higher tactile dowsing performances may be observed.

2.9.4 Barriers to Physical Activity

Individuals with visual impairments face barriers to participate into physical activi- ties. VI Bowling and VI Tennis are available for free on our website http://www.vifit.org. 46

A long-term study will evaluate whether access to exergames can overcome these bar- riers and whether exergames are viable alternatives to existing physical activities. In our user study participants were only able to play VI Bowling for a total of ten frames. Attitudes towards VI Bowling may be different if individuals with visual impairments are able to play this game over a longer period of time.

2.10 VI Bowling Conclusion

This section presents VI Bowling, an exergame for players who are visually impaired. VI Bowling explores tactile dowsing, a novel technique for making spatial sensorimotor challenges in exergames accessible. Players use a low-cost motion-sensing controller as a dowsing rod to find the direction in which to throw their bowling ball using vibrotactile feedback, which can be used to support a simple form of motor learning. VI Bowling was evaluated with six adults who were blind. All participants enjoyed playing VI Bowling, and the amount of physical activity achieved with VI Bowling is comparable to walking. Solutions to increase physical activity include: (1) making it easier to find the direction; (2) combining the tactile-spatial challenge with a temporal challenge; and (3) incorporating the size of motion made with the controller to affect the number of pins hit. The user study demonstrated the feasibility of tactile dowsing for motor learning and future work will continue exploring tactile dowsing for other kinds of spatial activities.

2.11 Pet-N-Punch Game Design

In VI Tennis [76], players played a virtual game of tennis based on Wii Sports Tennis with no display and sent input to the game by swinging a motion sensing controller in a tennis like fashion. Two versions of VI Tennis were evaluated, (1) a game that provided only haptic cues and (2) a game that provided both haptic and audio cues. Player performance and player preference sided with the game that provided both haptic and audio cues. VI Tennis was evaluated with children with VI and was 47 shown to encourage physical activity. The average energy expenditure (AEE) was found to be 16.9 kJ/min (σ=7.4). According to the United States Center For Disease Control, participants in the study on average were able to achieve enough physical activity to be considered healthy for adults, but not enough to be considered healthy for children. Several deficiencies were noted in the VI Tennis study. Players were only utilizing their dominant arm, which leaves the possibility that more physical activity is possible if a player uses more of his body. A second issue was errors. Players were not penalized for swinging too early. As long as a swing occurred in the required time frame the ball would be hit. For players who desired to do well but did not quite understand the required timing, this resulted in players just swinging constantly. Although this technique created physical activity, it did not create an enjoyable game. In order for the physical activity to be meaningful, it must be performed over a long period of time. A game that can be won by swinging wildly would probably not have a lot of replay value and thus may not be the best choice when promoting physical activity. We seek to design a game for people with VI that promotes physical activity. To advance on previous studies by encouraging more physical activity, the difference in physical activity of utilizing both arms, as opposed to just the player’s dominant arm will be evaluated. Also, the response times will be evaluated to determine the optimum rate at which a player can respond to non-visual cues in an exergame.

2.11.1 Game Play

In order to analyze the accuracy and energy expenditure between an exergame uti- lizing only the dominant arm, and an exergame that utilizes motions performed by both arms, a game was created from scratch. The goal was to create a game that was fun to play and provided a means to encourage physical activity to a person with VI. Based on the results from previous work, the game needed to utilize both haptic and audio cues in order to have a higher success rate of performing the motions. It also needed to be non-self paced to encourage physical activity at a known rate. And 48

finally it should create a fun gameplay experience with the potential for a high replay rate. The game created, called Pet-N-Punch, is a VI accessible version of a game similar to the amusement game Whac-A-Mole. In Whac-A-Mole, a player swings a large padded hammer onto the head of moles as they pop out of a playing field consisting of five holes. Players are awarded points if the moles are hit on the head when they are out of their holes. This game is difficult to play by a person with VI as the outcome of the game is directly connected to the players ability to visually sense where the mole is on the playing field and then successfully hit it on the head. Pet-N-Punch is a virtual representation of Whac-A-Mole adapted to be playable by a person with VI. No graphical interface was needed as the players interact with the game by means of sounds and vibrations only. Two modes of play were available, one with one hammer held in the player’s dominant hand, and a second mode where a player holds a hammer in each hand. Players were asked to help a farmer rid his farm of rodents by smacking them on the head with their hammer(s) which were motion sensing controllers. Players were alerted to the presence of rodents by two modalities: (1) The sound of a rodent, and (2) tactile feedback through the rumble in the controller. In order to avoid players simply swinging wildly, cats were also present within the playing field and players were penalized if cats were hit on the head. Collectively, cats and rodents will be referred to as creatures. The game begins with an in game tutorial informing the players of the basic situation (a farmer with a rodent problem) and how the player could help (hit the rodent on the head). The farmer asks the player to swing down hard with his controller to hit the rodent on the head. The threshold for detecting a rodent hit was set high in order to encourage large motions. The player cannot start the game until the farmer is satisfied with the intensity of the player’s swing. This allows the player to get a feel for how big and fast of a motion is needed to succeed in the game. The required motion for successfully hitting a rodent is two quick downward swings in succession. The farmer also notifies the player about his cats that are in the fields as 49 well. The farmer instructs the player to not smack the cats on the head when they are encountered, but to gently pet the cat. Once again, the player must perform the correct motion of petting the cat prior to moving onto the real game. The farmer also informs the player that the rodents are eating his carrots and it is the player’s mission to prevent the carrots from being eaten. The fields start with 100 carrots. Every time a player does not correctly hit a rodent, or hits a cat on the head instead of petting it, one carrot would be deducted from the score. At the conclusion of the game, a higher number of carrots remaining represents a successful run through the game. If a player did not correctly hit the rodent, audio feedback of a carrot being eaten was played in order to notify the player the rodent was not successfully hit. If a player swung hard when a cat was present, a sound of a cat in pain was played, once again to signal an incorrect motion was performed through the use of audio feedback. The length of the audio cues were 1.5 seconds for a rodent, and 0.5 seconds for the cat. Although the rodent sound was 1.5 seconds, it was initially loud, and then faded out. In the event of a timeout occurring and the creature sound still being played, the sound was immediately stopped. The sounds used in the game were either recorded specifically for this game or used from royalty free audio repositories. Players performing correct moves were presented with haptic and audio feedback as well. A player correctly hitting a rodent on the head would feel the rumble stop, as well as an sound indicating the rodent was hit correctly. Petting a cat correctly would result in the sound of a cat purring. No tactile feedback was given to the correct petting of a cat as the tactile cue to pet the cat was a short buzz as opposed to the constant buzzing of the rodents. Players were also presented with audio feedback by hearing a whipping sound when the motion of a hard downward swing was detected. This helped the players determine their process through defeating a rodent because they would need to hear the swing sound twice. The sound of the second hit was at a slightly higher pitch than the sound of the first hit to allow the players to get into a rhythm of hitting the rodents. There was no audio feedback for the motion of petting a cat except for the sound of the cat purring when the motion was successfully 50 completed.

2.11.2 Technical Implementation

The motion sensing controller used in this study is the Nintendo WiiMote. This inexpensive controller contains a 3 axis accelerometer which was used to determine motion. These controllers are readily available and were chosen not only for their technical abilities but for their low cost and availability to the general public. At the conclusion of the study, the game was made available for a free download and cost and ease of use were taken into consideration when choosing an input device. Players held one or two WiiMotes in their hands based on which version of the game they were playing. The software runs on a computer running Windows XP, Vista, or Windows 7 equipped with bluetooth in order to communicate with the WiiMote. All cues are either audio or haptic based, so there is no display needed to play the game. The software was written in C++ using MS Visual Studio Express, and the WiiYourself open source library [10] to communicate with the WiiMotes. The determination of a correct or incorrect motion was based on a running aver- age of the z axis measurements from the accelerometer in the WiiMote. The WiiMote was polled every 33 milliseconds and the z axis values were recorded. Kalman Fil- tering helps in removing noise read from the accelerometers, however the idea was to look for large motions, so these types of filters were not used. The technique for determining motion is as follows. The WiiYourself library returns the value of each of the three accelerometers in the range of +/- 3g. The last five reads (165ms) of the z axis accelerometer were stored for analysis to determine the swing type. If the current value was less than the previous value, a downward direction was detected. Once a value was larger than the previous value, a complete swing was detected as the direction has changed. At that point, the type of swing was determined to be a hard swing, a soft swing, or neither. In order to ignore small fluctuations in movement due to a persons’ natural movements, once a change in direction was detected, the total range of motion since the last change in direction 51 was analyzed. If the summations of the deltas of the last five reads was greater than 0.9375g then a downward movement was detected. Next, the highest value within the last five reads was checked against hard coded thresholds. If the highest value was between 1.125g and 0.1875g, a soft hit was detected. If it was higher than 1.183g, a hard hit was detected. Any motion outside of these ranges was ignored. Initially an autocalibration mode was introduced to customize the game for each player’s technique, but it was dropped as players could swing softly in the calibration section of the game and in turn make the rest of the game easier. The hard coded thresholds were determined by making the desired motions and recording the accelerometer values. When a player was presented with a rodent, the controller would rumble con- tinuously until either the player performed the correct motion and successfully hit the rodent, or the time window for hitting the rodent expired. A cat’s arrival into the playing field was represented by a short 250 millisecond rumble on the player’s controller. The duration of the haptic cue was varied as opposed to varying the hap- tic intensity due to the capabilities of the vibrotactile motor contained within the WiiMote. The WiiMote is only capable of providing a constant vibration at 250 Hz. In the two arm version of the game, the haptic cue was targeted to the arm that was required to perform the motion. Each creature encountered was also paired with an audio cue. In the two arm version of the game, these cues were directional, meaning the sound was played either through the right or left speaker at the same time the corresponding controller would rumble. In order to be sure the players were holding the controller in the correct hand, the controller in the right hand would be used to start the game, and the tutorial would expect the player to swing with the right controller. The controllers contain 4 LEDs that were also encoded with unique identifiers for right and left hands which could be observed and corrected if necessary by the game administrators. The game was broken down into two parts to determine how the players would react to different response requirements. The first part of the game gradually de- 52 creased the time between creatures. The second part of the game kept the time between creatures at a constant rate, but gradually decreased the necessary reaction time. The actual values for each level can be seen in Figure 2.11. The levels of the game were created this way to encourage physical activity within timing constraints and to also determine what the minimum response time the players were capable of. The average reaction time could have been measured by recording the reaction times of the players without requiring the player to respond within a certain amount of time. Using this method, the player would have no incentive to respond as fast as possible, thus the declining reaction time requirement was introduced as it provided an incentive to the player to respond as quickly as possible and the error rates would identify the timing constraints when it became more difficult to respond. The time between creatures was the time the game waited after either the player successfully completed the required action, or the reaction timeout expired. The reaction timeout timer was started as soon as the audio and haptic cues were started, not when they were completed. The total game play consisted of 10 minutes of play through 11 levels, each one progressively harder than the previous. The timing of events on each level was consistent between both versions of the game. The only difference between the two versions was that the two arm version alternated between creatures appearing on the left and right side of the body at a predetermined and random sequence. Although the sequence was generated randomly, all players played the same random sequence. With the exception of the first two levels, each level was for 60 seconds in du- ration. At the completion of each level, the farmer would announce the number of carrots left in his fields to keep the player up to date on the progress being made. The farmer would also audibly note the level number before the start of each level. Levels 1 and 2 were just 30 seconds each. These levels would be considered introductory levels where the player learns exactly how to play the game. In level 1, the player would only encounter rodents. This allows the player to have an opportunity to fine tune the swing technique for hitting rodents. Level 2 was comprised completely of 53 cats. This allows for the player to practice the petting motion. The remaining levels are mixed with both cats and rodents at the rate of 20% cats and 80% rodents

Figure 2.11: Delay/Response Time Vs Level.

Table 2.5: Participants’ characteristics Characteristic All { n = 12 }(σ) Gender (M/F ) 8/4 Age (years) 12.2 (2.1) Height (m) 1.49 (0.11) Weight (kg) 53.6 (23) Resting heart rate (bpm) 85.6 (6.15) Body mass index (kg/m2) 23.52 (7.25)

2.12 Pet-N-Punch User Study

Pet-N-Punch was evaluated in July 2010 at Camp Abilities (Figure 2.1), a develop- mental sports camp for children who are visually impaired, blind or deaf/blind held annually at the College of Brockport in New York. The goal of this study was to promote physical activity in a fun way and to identify the differences in accuracy and physical activity between a one handed VI accessible exergame and a VI acces- 54

sible exergame requiring motions from both arms. This study seeks to analyze the following:

• H0: Error rates will be significantly higher in a game utilizing both arms as opposed to a game utilizing only the dominant arm.

• H1: The energy expenditure will be significantly higher in a game utilizing movements of both arms as opposed to a game utilizing only the dominant arm .

2.12.1 Participants

Twelve children (Table 2.5) who were identified as B1 by the US Association of Blind Athletes, participated in the study over the course of two days. B1 athletes are completely blind with no functional vision, B2 athletes have travel vision and B3 athletes are legally blind. Parents and participants consented to the study prior to participation. To prevent errors in data recording due to familiarity of the game, the full group was broken down into two smaller groups and students were randomly placed into each group. Group A played the two arm version on day one and the dominant arm version on day two. Group B played the games in the reverse order with the dominant arm game being played on day one and the two arm version on day two. Participants played the game for ten minutes in a closed off room with only their camp counselor and a game administrator present. Prior to the study commencing on day one, the players’ height, weight, and age were recorded. After the study, participants were escorted to an isolated room. In this room, they sat with only their camp counselor present and rested for 10 minutes. The heart rate monitor remained active and their resting heart rate was measured and used as a baseline to determine the heart rate increase. The resting heart rate was taken after the study due to camp schedules not providing an opportunity to acquire the data prior to the study. Players also completed a subjective survey at the conclusion of the study. 55

Figure 2.12: Level Vs Errors.

2.12.2 Physical Activity Measurement

Players were equipped with an accelerometer worn on each wrist to monitor the amount of motion for each arm and to not interfere with gameplay. Although the players were not required to swing their non-dominant arm in the dominant arm version of the game, they were still equipped with an accelerometer on each wrist to provide a comparison. A measure of physical activity intensity level is known as a MET (metabolic equivalent intensity level), and the Actical accelerometers computed this number by using their validated children’s energy expenditure algorithm [5] after analyzing the motions of the players. Players also wore a wireless heart rate monitor attached via a small harness across their chest to measure their heart rate throughout the game. The data from the heart rate monitor was collected in real time and contained the current beats per minute (bpm) of the player at one second intervals. 56

2.13 Pet-N-Punch Results

2.13.1 Error Rates

In order to determine the accuracy of a game using both arms when compared to a game using only the dominant arm, success rates were calculated. The success rate is the number of correct motions performed divided by the total number of motions required. The data (Figure 2.12) shows the difference between success rates of the dominant arm vs the two arm version of this game. A Wilcoxon signed-rank test

showed there was significant difference (Z2,12 = 2.325 p < 0.05) in decline between the two modes. This causes us to accept H0. Although the results were similar, the success rates for the two arm version were consistently lower. The additional complexity of deciding which arm to move created a more difficult game. As the required response time decreased to 500 milliseconds, error rates jumped to 25% and 50% for the dominant arm and the full body versions of the game. Errors in level 2 were higher than other levels. This level was 100% cats, and the increase in errors can be attributed to the learning curve of players finding the correct range of motion for petting the cats. Too slow of a motion would not be detected, and too quick of a motion would be considered too hard. As compared to the rodent motion which could be performed by simply swinging very fast. Figures 2.13 and 2.14 analyze the different kinds of errors encountered during game play. A non successful round was the result of 1 out of 2 scenarios, timeouts and errors. An error is represented by a player either swinging soft when he should have been swinging hard, swinging hard when he should have been swinging soft, or swinging with the wrong hand. A timeout is an error condition where the required action was not completed within the time allotted. Figures 2.13 and 2.14 show the break down of the types of errors for both the dominant arm and the two arm versions of the game. Timeouts were a higher percentage of the errors on the two handed version when compared with the percentage of timeouts on the one handed version. A paired 2 sample t-test with α set to 0.05 showed there was significant difference 57

Figure 2.13: Dominant Arm Error Types Vs Level.

(T2,11 = 5.81 p < 0.05) between two. This could be a result difference of the player needing to make between a decision the of which arm to move and how to move it as opposed to just how to move it. Both saw an increase in timeouts in the last round largely due to the player not being able to perform the required actions in the shortened time constraints. To see if the drop in performance in the last level was significant, the performance of levels 1-11 were compared to just the performance of level 11. The dominant arm version of the game showed a success rate of 91.7% (σ=6.3) for levels 1-11 and a success rate of 75% for the last level. A similar trend was observed for the 2 arm version of the game where the success rate was 87.4% (σ=14) for levels 1-11 and a success rate of 51.2% for the last level. A paired 2 sample t-test with α set to 0.05

showed significant difference in performance in both the 1 handed (T2,11 = 8.71 p <

0.05) and 2 handed (T2,11 = 9.30 p < 0.05) versions of the game for level 11 when compared to all levels. Looking a little closer at the jump in error rates for the last level of the two arm version, 47% of the timeouts were due to the player successfully performing the first of two required hits. 53% of the timeouts were due to the player not successfully performing the first of the two hits. This shows that a response time 58

Figure 2.14: Two Arm Error Types Vs Level. of 500 milliseconds may be not be enough time for a person to correctly interact with the game.

2.13.2 Physical Activity

The activity intensity data was determined directly from the accelerometers worn on the wrists. Two sets of data were recorded (one for each wrist) and that data was averaged. The Actical accelerometers directly provide an activity intensity value in METs. The accelerometers are initialized with the players’ height, weight, sex, and age and those parameters are taken into consideration within the Actical software when generating the data. The MET value is assigned a number between 0.9 (sleep- ing) and 18 (running) [5]. General MET categories consist of light intensity (1.1-2.9 METs), moderate (3.0-5.9 METs), and vigorous (6.0+METs) The data (Figure 2.15) shows an increase of activity intensity until the later levels in the game, which may be related to accuracy. Although outliers in the study did achieve physical activity considered to be vigorous, a majority of the participants fell within the light to moderate range. The AEE was found to be 11.72 kJ/min (σ=1.07) for the dominant arm version and 10.79 kJ/min (σ=1.05) for the two arm version. 59

The amount of physical activity generated in this study can be equated to the range of activity between walking and playing volley ball [14]. There was a significant increase (Z2,12 = 2.28 p < 0.05) in energy expenditure expenditure between the start and the end of the game. The energy expenditure difference between the dominant arm version of the game and the two arm version of the game showed no significant difference (Z2,11 = 0.53 p > 0.05). This causes us to reject H1. According to the American Heart Association (AHA), the resting heart rate of a child should be between 70 and 100 beats per minute (bpm) [3]. Participants in this study had their resting heart rate recorded at the conclusion of the study, and all had resting heart rates that fell within this range. Since all participants had different resting heart rates, the data was normalized as a percentage increase based on the resting heart rate. All of the heart rate readings for each one minute interval were averaged together and used as a comparison to the resting heart rate. As shown in Figure 2.16, participants demonstrated heart rate increases throughout both versions of the game, however the average variation between the dominant arm and the two arm versions of the game was only 2%. The heart rate showed a higher increase after the five minute mark. This suggests that the quicker required response times had a direct effect on the heart rate.

2.13.3 Player Survey Results

Participants in the study were asked 18 questions designed to find an enjoyment level based on a Physical Activity Enjoyment Scale (PACES) [50]. The questions were answered on a 1 (lowest) to 8 (highest) Likert scale, and the maximum total score is 144 with the minimum being 18. Some of the questions included, ’It is no fun’ vs ’It is a lot of fun’, ’I enjoy it’ vs ’I hate it’, ’I feel bored’ vs ’interested’. The results of the survey found the average score to be 131.3 (13.42) which indicates a strong interest in playing the games. This is an important result because any long term health benefits attributed to repeated plays of the game can only be determined if the player has a desire to play the game repeatedly. 60

2.14 Pet-N-Punch Discussion

2.14.1 Visual Observations

One of the goals of this study was to remove the ability for a player to constantly swing wildly and succeed in the game. The placement of cats into the mix was one way this was achieved. Players wanted to not hit the cats hard because they would lose points in the game. This was probably enough of an incentive, but also the children appeared to form relationships with the cats. By observation only, the kids would wince, or verbally announce sadness when a cat would be hit too hard. Whether the threat of losing points or the threat of hitting a cat was more of an incentive is unknown, but the combination of the two seemed to work well.

2.14.2 Physical Activity

The physical activity based on the accelerometer data went down in the last levels. This can be attributed to the game moving too quickly for the players to correctly keep up. As the players were making error after error, they would pause to get back in sequence with the game. The activity intensity was not significantly different between the dominant arm version and the two arm version of the game. Although both arms were in motion, they were never in motion at the same time, in effect the same number of motions were required for each version of the game. Another possible explanation for the similarity of the energy expenditure between the dominant arm and the two arm version of the game is the concept of Rocking and Self Stimulation. A study [72] has shown that 29% of children with visual impairments exhibit some form of motion through body rocking or hand movements that are not necessary to perform the task at hand. These extra motions could create more physical activity for the dominant arm version of the game when compared with a user who does not exhibit these features. The AEE from this study was larger than VI Bowling and smaller than VI Ten- nis. Since Pet-N-Punch was not self paced, having a larger AEE than VI Bowling was 61

Figure 2.15: Average Activity Intensity expected. Having a smaller AEE than VI Tennis, could be because of several factors. In VI Tennis, players could succeed by swinging constantly, where as in Pet-N- Punch they would be penalized for such a play style. Another difference could have been the style of swing. Both games had similar thresholds for determining motion, how- ever players in the VI Tennis game may have made larger motions due to previous knowledge of how to swing a real tennis racquet. The Pet-N-Punch in game tutorial taught the player how to swing and players may have learned a shorter swing style than used in VI Tennis. And finally, the cats randomly placed throughout the levels required players to use less motion when they were encountered. In contrast to the accelerometer data, the heart rate data indicates a constant increase throughout the game. This could be related to the players mental intensity as they have a strong desire to succeed which keeps the heart rate up, while the motions go down as the player attempts to correct his mistakes.

2.14.3 Accuracy

The accuracy of the game showed significant difference between the two versions of the game. The accuracy of the two arm version was significantly lower than the accuracy 62

Figure 2.16: Percent Heart Rate Increase Over Resting Rate of the dominant arm version. The last level was significantly lower when compared to the other levels. This shows that whether the player is required to decide which arm to move (two arm version) or the player already knows which arm to move (dominant arm version) 500 milliseconds is an unreasonable amount of time to expect the player to react and perform the desired actions. Errors (performing the incorrect motion) were a higher percentage of errors when compared to timeouts (not performing the correct motion in the required time) in the one handed version of the game. With the exception of the last level, the two arm version saw the inverse with timeouts being more popular than errors. This could be an effect of the player waiting to react, instead of already being ready. With the dominant arm version of the game the player already had the knowledge as to which arm to move next, the player simply had to wait to determine the type of motion. The two arm version of the game required the player to determine both the motion and the arm. This decision time could have resulted in a slower reaction time. An analysis of the timeouts in the last level show that the timeouts were roughly split when looking at timeouts that were due to not performing the first part of the required action and timeouts that occurred due to the player not performing any part of the required action. This suggests that 500 ms is not enough time for a player to 63 perceive and recognize a cue, and then perform the desired action - even if that action is only to swing down quickly once. Games requiring action from a player using only non-visual modalities should give the player more than 500 ms to react to a cue.

2.14.4 Maximizing Results

One of the goals of this study was to find the point when accuracy and physical activity were both at their peaks. By maximizing both of them, an exergame can be created that is fun to play, and achieves high levels of physical activity. Looking at all the data combined, it seems as though there is a point where physical activity and accuracy are both maximized. This point occurred at the 8 minute mark for both the dominant arm and the two arm version of the game. This suggests for this game the optimum delay between events is 500 milliseconds, and the optimum required response time is around 2500 milliseconds. At this point, the physical activity is highest as well as the success rates. Although faster response times are possible, at this value players were not required to pause and wait to get in sequence with the game, and they were able to exert more amounts of physical activity than previous levels.

2.14.5 Socialization

Although socialization was not specifically measured, it was observed. This game was only a one player game, but the score (number of carrots remaining in the fields) was announced at the end of the game. After playing the game, the children could be heard discussing their scores. They wanted to have the highest score and to hear who had the highest score.

2.15 Pet-N-Punch Future Work

2.15.1 Higher Activity Intensity

Players were able to achieve light to moderate physical activity. In addition to using both arms at once, utilizing the legs or requiring larger motions may help promote 64 more physical activity. Another opportunity for higher activity intensity would be to require more complicated movements. In this study the player was required to perform two quick movements for each cue perceived, if the player was required to make more movements for each cue the activity intensity could be higher.

2.15.2 Health Benefits

The results indicate that exergames may be a viable form of exercise for people with visual impairments. They do show in increase in physical activity, however it is unknown what the long term effects of utilizing exergames as a method for people with visual impairments to exercise will be. The study shows that exergames may be a method to overcome the three barriers associated with lack of physical activity for people with visual impairments (dependence, safety, and self-imposed). The game contains elements to increase the playability, and a long term study will determine the overall health benefits as well as the replay value.

2.15.3 Socialization

Video games may contribute to the socialization of people with visual impairments. A multiplayer exergame may also increase the physical activity as players will compete against each other. In addition to achieving a high score, players may have the incentive of competing and defeating their opponent. This could result in a higher energy expenditure as the player exerts more energy in order to win. A multiplayer exergame for people with visual impairments should be created to examine how the social aspect will affect energy expenditure. It was observed that the participants in this study shared their scores with each other, and that could create competition.

2.16 Pet-N-Punch Conclusion

This section presents Pet-N-Punch an approach to use tactile/audio cues that indicate motions to one or both arms in order to engage children with visual impairments into physical activity A study with 12 children who are visually impaired found they were, 65 on average, able to achieve light to moderate levels of physical activity, however a version of the game involving both arms showed no significant difference in physical activity when compared to a version of the game involving the dominant arm only. Although the energy expenditure was not high, the game stimulates active behaviors; players were on their feet performing basic motor control and movement skills, which, considering the limited exercise opportunities available for children who are blind, should be encouraged over inactive sedentary behavior. Optimum values for accuracy and physical activity were found to be consistent and will be used in future game designs. Subjective surveys showed a very strong interest in this game and that could contribute to a higher replay rate and to long term benefits . 66

Chapter 3

Real Time Sensory Substitution

3.1 Introduction

Figure 3.1: A legally blind player (right) playing Kinect Hurdles game (left) where visual cues that indicate when to jump are detected using real time video analysis and substituted with vibrotactile cues that are provided with a handheld controller.

The way we interact with software is increasingly modeled after how we interact with the real world; as such interaction is most natural to us. The emergence of more immersive and healthier forms of interaction, through the use of whole-body gestures has propelled video gaming to the cutting edge of human computer interaction design. All console manufacturers offer gesture-based interaction that use computer vision (Sony EyeToy, Microsoft Kinect) or computer vision combined with inertial sensing using a handheld controller (Nintendo Wii, Playstation Move). 67

Because gesture based games are intuitive to play, they have successfully at- tracted elderly [48] and they also facilitate more social forms of gaming [92]. Despite their appeal to non-traditional gamers, gesture-based video games are inaccessible to players who are blind as they require the player to perceive visual cues that indicate what gesture to provide and when. Access to gesture based games, exergames in particular, could create new exercise [76, 77] and socialization opportunities for users who are blind, which is important as: (1) they suffer from higher levels of obesity due to fewer opportunities to be physically active [60]; and (2) users who are blind are often isolated and lonely [96]. Video games can be made accessible to players who are blind using sensory substitution [19], e.g., replacing visual cues with non-visual, i.e, audio or tactile cues [20]. Previous research found that gesture based exergames can be made accessible using vibrotactile cues [76, 77], however, computer vision based gesture recognition systems are controller-less and to implement sensory substitution access to the source code is required, which is not attainable for commercial games. This chapter presents real-time sensory substitution, a technique for sensory substitution without modifying the source code of a game. This chapter is organized as follows. Section 3.2 provides background and related work, Section 3.3 discusses real-time sensory substitution. Section 3.4 presents the results of a user study. Section 3.5 discusses these results, Section 3.6 outlines future work and the paper is concluded in Section 3.7.

3.2 Background and Related Work

Gesture-based video games typically simulate real physical activities, because they use whole body gestures. A physical activity, such as tennis, typically involves spatial (where to hit the ball) and temporal challenges (when to hit the ball). Performing spatial and temporal challenges relies upon a combination of sensorimotor (senses and motor coordination) and cerebellar (muscle) control [37, 85]. Sensorimotor-based physical activities are difficult to perform for players with visual impairments, as they are mostly visio-spatial [64]. For example, in tennis to successfully hit the ball, 68 which is a combination of spatial and temporal challenges, the location of the ball is predominantly acquired visually, which -when you are visually impaired- may be difficult or impossible to achieve. Gesture-based games typically involve similar temporal and spatial challenges as the physical activity that they simulate. Some gesture-based games only involve a temporal challenge, for example Wii Sports Tennis [4], only involves swinging the Wii remote at the right time but the direction in which the Wii remote is swung is not taken into account, to keep the game simple to play. Other exergames such as Eye Toy Kinetic [6] involve spatio-temporal challenges as this game superimposes virtual objects at random locations to be punched and kicked over a video image of the player using an external camera. Targets are defined in space in front of the player and player has to provide directed gestures aimed at these targets. Because video games and gesture based video games primarily use visual cues to indicate what input and what gesture to provide they are inaccessible to players with visual impairments. This limitation affects players who are legally blind and totally blind more than players who are partially sighted or those with low vision, as these individuals have been identified to be able to play existing exergames through small modifications, such as increasing the contrast or volume of the game [36]. Gesture-based interaction using non-visual modalities has been explored in the following approaches. Finger gestures have been defined that allow users with visual impairments to interact with mobile touch screens [49, 97], where primarily audio feedback is provided, but these approaches don’t use whole body gestures or in- volve games where fast responses are required. AudiOdyssey [38] is a music game in which players receive audio instructions that indicate what gestures to provide using a motion-sensing controller (Wii Remote), as to create and record musical beats. The player can then layer these recordings to create complex musical tracks. Providing the right gesture as indicated using an audio cue is primarily a temporal challenge. In previous work we explored how to make gesture based games, specifically exergames, playable using non-visual feedback. VI Tennis [76] (see Figure 3.2:left) 69 implements the gameplay of a popular upper-body tennis exergame (Wii Sports Ten- nis) that is played with a Wii remote. Wii Sports Tennis only involves performing a temporal challenge, e.g., when the ball approaches, the player has a few seconds to provide the upper-body gesture, e.g., swing their Wii remote from back to forward like a tennis racket to return the ball. VI Tennis implements the same audio feedback as Wii Sports Tennis, though vibrotactile cues provided with a Wii remote are used to convey the location of the ball. Because this game requires fast responses, sim- ple vibrotactile patterns were used to increase their correct identification. A 250ms vibrotactile cue, provided with a fixed frequency of 250Hz indicates the bouncing of the ball to indicate to the player to prepare for returning the ball, and a 2000ms cue indicates the time frame in which the player must provide their gesture.

Figure 3.2: Individuals who are blind playing VI Tennis (left); and VI Bowling (right).

In user studies with this game vibrotactile feedback showed significantly better performance than when only audio cues are used. A reason for this improved perfor- mance is that cues presented simultaneously in multiple modalities can be detected at lower thresholds, faster and more accurately than when presented separately in each modality [74]. Vibrotactile feedback has the benefit over audio feedback in that it doesn’t interfere with being able to talk with other players while playing the game or listening to music. Gesture- based games such as exergames are often played in social contexts [78] or they feature music [84] and the input the player needs to provide is matched with the rhythm and beats of a song. 70

VI Bowling [77] implements the gameplay of Wii Sports bowling (see Figure 3.2:right). The sport of bowling is selfpaced, as it consists predominantly of a spatial challenge, e.g., throwing the ball at the pins. In Wii Sports bowling players aim their ball by adjusting a visual marker on screen using the arrow keys on their Wii remote and by twisting the Wii remote while throwing (the direction in which the Wii remote is moved is not taken into account). In VI Bowling longitudinal shape and the direction in which the Wii remote is pointing are used to convey the location of the pins that is indicated using directional vibrotactile feedback. Players scan their remote along the horizontal plane, like a dowsing rod and the pins’ location is rendered using a directional vibrotactile cue of 250Hz. When the remote points within 20◦ of the target, the vibrotactor is pulsed for 100 ms with 2,500 ms delays and this delay is decreased linearly with 125 ms for every 1◦ of error. When the player points the remote precisely in the target direction, the vibrotactile cue feels continuous and the player’s arm conveys the direction of the pins. A user study with six blind adults found that the users were able to find the target direction on average in 8.78 seconds and were able to throw their ball with an average error of 9.76◦.

3.3 Real Time Sensory Substitution

One of the biggest barriers towards making games accessible for players who are blind is that access to the source code of the game is required to implement sensory substitution, which for most commercial games is not possible. Though several games have been identified in the related work section that use gesture-based interaction and are accessible to players who are blind, these games are often clones of existing games or they are novel games that have been developed from scratch. The goal of Real- Time Sensory Substitution (RTSS) is to allow for sensory substitution of visual cues into audio or haptic cues but without requiring any modifications to the game, which may allow for more cost effectively creating accessible games. 71

Figure 3.3: RTSS-System Setup.

3.3.1 How it Works

In a previous study with children with visual impairments playing unmodified ex- ergames, such as Dance Dance Revolution and Wii Sports, it was found that children were able to play these games when a human observer provided verbal cues to them that indicated what input to provide and when [36], for example step left. This ob- server watches the visual cues provided by the game on the screen and analyzes these as to determine what verbal cues to provide to the visually impaired player. Based on these experiences, we developed a solution that uses real time video analysis to identify particular visual cues on screen and substitute these in audio or vibrotactile cues, similar to how a human observer would do this. Certain games lend themselves to RTSS more than others. Additional audio cues can make the game more accessible, but at the same time such cues could potentially interfere with existing audio or music in the game [102]. For gesture based games, vibrotactile feedback was determined to be a more feasible modality of feedback than audio for sensory substitution [76, 77] as it doesn’t interfere with existing audio, such as music, and it allows players to socialize with each other while playing the game. Vibrotactile cues can be provided with most controllers as most of them contain a rumble feature. However a controller can only be connected to one host, e.g., the 72 console that runs the game and additional vibrotactile cues can therefore only be provided by modifying the game, which is impossible. Alternatively, a player who is blind could hold or wear an additional controller but this setup was deemed not to be very practical. Recent gesture based input systems have become controllerless such as Microsoft Kinect, which opens up the opportunity for the player who is blind to hold a controller that could provide vibrotactile feedback. Consequently, to investigate RTSS, we use a setup as displayed in Figure 3.3. A USB video capture unit (Dazzle Video Creator Plus HD) was used to provide a video stream to an external laptop. Although the XBOX 360 supports high definition video, the non-high definition NTSC composite video signal was used to limit the video processing time. The video feed was split, with the same feed being displayed on a flatscreen TV and one being sent to the video capture unit simultaneously. To provide vibrotactile feedback we used a Nintendo Wii remote, which contains a vibrotactor that provides feedback with a fixed frequency of 250 Hz, which can be varied only by pulsing the motor activation. Though the tactile spatial resolution of a Wii remote is limited, an abundance of tactile receptors makes fingertips the most sensitive to vibrotactile feedback [19]. Another benefit of using a Wii remote is their low cost and they can be easily connected to any computer that supports Bluetooth.

3.3.2 Runtime Video Analysis

Kinect Sports was used for our initial study. This game was chosen as it was one of the first games to utilize the controller-less gesture based input system called Kinect. Players can play six motion-controlled sports such as bowling, boxing, track and field etc. We specifically focused on the track and field games, which include hurdles, sprint, javelin, discuss throw and long jump, because these games utilize visual cues that can be easily be identified through video analysis. For example in the hurdle game, the player makes their character run by jogging in place. When the player needs to jump over a hurdle, a visual cue of a yellow cloud would appear at the location of the upcoming hurdle to notify the player to prepare. When it is time to 73 jump, the yellow cloud would briefly change to green (see Figure 3.4).

Figure 3.4: Runtime video analysis of the Kinect Hurldes game. The yellow box indicates the area in which we look for the visual cue that indicates to the player to jump..

Similar cues were used in the other track and field games, which made it an easy target to find within a captured video frame. The software for the video analysis was written in C# and ran on Windows XP. Rather than tailor the video analysis to one game, we opted for developing a generic approach that could be used for different games. An XML based configuration file describes what the video analysis software should do in certain situations for a specific game. The video feed from the video capture card was processed at the rate of 24 frames per second (fps) and had a resolution of 640 x 480 pixels. The software runs in constant loop analyzing the video stream for key video sections defined within the XML configuration file. If one of the key video sections is found within the current frame, the software then perform the specified action as defined in the configuration file (either play a certain audio or haptic cue or both). There was a small amount of lag in analyzing the frame, and then starting the appropriate cue, however this lag did not affect the game play. The actual amount of lag could vary based on the amount of time between when the key video section appeared, and the next frame was grabbed for analysis, but usually the appropriate cue could be started in less than 74

50ms. Figure 3.5 shows a sample configuration file used by the RTSS software for one search area. A rectangular space within each frame is defined in the

tag where also a color range is defined in RGB values. Since the video capture was analog, the colors were not always going to be an exact value and to address this, maximum and minimum values for each RGB component have to be entered. For each frame, we test if any pixel in the section is within the defined color range. If this is the case, we stop analyzing the current frame and one or more actions will be executed as described in the configuration file. Subsequent frames will be tested to check if the cue is still visible. The system can play an audio cue or provide a vibrotactile cue using the Wii remote, or both. The WiiMote tag defines which Wii remote will activate its rumble motor (0-3), and for how long which allows for the use of multiple Wii remotes or vibrotactile patterns with different length to indicate different types of inputs. A value of -1 for duration indicates to keep the rumble running as long as a pixel with the specified color is detected. Alternatively a value can be provided that indicates the number of milliseconds that a vibrotactile cue needs to be provided. The software can also play an audio cue through the laptop speakers when the test is true. The audio cue can be looped when the value is set to 1. If this value is 0 then the audio cue will be played to completion once and if this value is -1 it will loop until the test fails. During the development of RTSS a legally blind individual play tested three dif- ferent Kinect Sports track and field games, e.g., hurdles, javelin and long jump using RTSS and provided us with feedback. These games are somewhat accessible as they use a form of sonification, e.g., the frequency of an audio cue increases the closer the player gets to when the input needs to be provided. However, in previous studies with exergames we found players to perform better when audio and vibrotactile cues are provided simultaneously [13]. We instructed our participant to face the camera and start jogging in place and jump or throw (depending on the game) when a vi- brotactile cue was felt. Different configuration files were created for each of these games. For example, for the hurdles game, once the yellow cloud was detected over 75 the approaching hurdle, we would provide a vibrotactile cue with the Wii remote to indicate that the player should jump. Because the player only has a few seconds to jump, a simple vibrotactile pattern was used to indicate to the player when to provide the input. Once the cloud turned green, the rumble was stopped, as it was too late for the player to react and jump within the allotted time. For both the javelin throw

100 300 150 300 230 999 230 999 0 100 0 -1

and the long jump the part at the end of the track the player is running on turns 76

Figure 3.5: Sample XML Configuration FIle yellow, when the player needs to throw their javelin or jump (see Figure 3.6). After tuning some parameters our blind test subject was able to play each one of the games with good performance. To facilitate creating XML files for different games and to help define areas and color ranges, players can capture a frame by pressing the trigger button on their Wii remote. A sighted person could do this and once the frames have been saved, they can be opened in a graphics editor where the color of the visual cue can be determined as well as the area where it is visible.

Figure 3.6: Kinect Sports Javelin throw. The area within the defined box turns yellow indicating the player must throw their javelin.

3.4 User Study 1 - Sighted Players

To assess the effectiveness of RTSS, we conducted a user study with 28 sighted par- ticipants. The XBOX 360, Kinect, a flatscreen TV and a laptop running the RTSS application were placed in a public area and made available to anyone who wanted to play the Kinect games during a video game party. This video game party was organized to allow computer science students to demo the games they developed for various game development courses. 77

The XBOX was set up to run Kinect Sports and was always configured to play the one player hurdles mini game. The RTSS software was configured for the hurdles mini game as well. The demo was set up for 5 hours. A game administrator was present at all times to ensure the XBOX game was configured to play the hurdles game as well as to ensure that the RTSS software was running correctly. The game administrator also recorded the results of each game. Participants were hence tested and recruited anonymously, an approach for doing user studies that has been recently explored [7] and which allows for better under- standing of users as it tests them in a more natural, non-test environment. No per- sonal information was collected about the participants other than that they consisted of students and faculty members. This study took place less than 2 weeks after the release of the Kinect, and it can therefore be safely assumed that this was the first time playing Kinect Sports Hurdles for 95% of the participants. Two versions of the game were tested. The first version of the game was to play the standard Kinect hurdles game using visual and audio feedback. This version was played by 12 participants. The second version of the game, played by 16 participants, was using RTSS and was played using haptic and audio feedback. During the five hours of the party after testing 3 participants, we switched between versions by turn- ing off or on the flatscreen TV. We ensured participants only played one version of the game. As a player approached our demo, instructions were given based on the version of the game. For the regular version of the game, players were told to run in place and jump over the hurdles when their on screen character approached the hurdles. For the RTSS version, players were given a Wii remote and told to run in place and jump when they felt a vibrotactile cue. A single race consisted of players running around a virtual track and encountering four hurdles. Not correctly jumping over a hurdle would cause the virtual player to stumble and slow down which would result in a slower overall time. Players only ran one race and their time, and number of hurdles successfully jumped were recorded. The laptop screen was not visible to players. 78

3.4.1 User Study 2 - Players with Visual Impairments

We conducted a follow up study that included users with visual impairments. We recruited seven users (3 female, average age 40.9, σ=24.9) to participate in our user study. Participants were recruited through the local National Federation of the Blind chapter. All participants were visually impaired (3 totally blind, 2 legal, 2 low vision). None had any self-reported impairments in mobility or hearing. None had played Kinect before. All participants played the same RTSS version of the Kinect Hurdles game as in the first user study. An XBOX 360 with a Kinect, a laptop running the RTSS software and external speakers hooked up to the XBOX 360 were placed in a secluded room free of any nearby obstacles. A game administrator was present during the test and participants were given a brief tutorial of the game. We used the same procedure as in the first study, e.g., players were not given any practice laps and were instructed to start running in place once the audio cue was given from the commercial Kinect Hurdles game. Players only ran one race and the number of hurdles successfully jumped and their time were recorded. Based on the results from our previous study we put a rope on the floor indicating the visible area of the camera to players, in an attempt to prevent them from getting out of range of the camera.

3.5 Results

3.5.1 Sighted Player Performance Results

A single factor ANOVA showed no significant difference (F(1,24) = 0.03, p >0.05) in player performance between the success rates of the standard version versus the RTSS version. The RTSS version had a higher success rate with a 39% (σ=26) chacge of the player successfully jumping the hurdle whereas for the standard version the chance was 36% (σ=32). There was also no significant difference (F(1,24) = 0.33, p >0.05) between the total time of the race between the two versions. Running a lap using the RTSS version took on average time of 28.67 (σ=8.41) seconds compared to the standard version which took on average 26.76 (σ=4.76) seconds. For the analysis we 79

Figure 3.7: Jump Accuracy used a single factor ANOVA as opposed to a multivariate ANOVA as time to a larger extent is determined by the pace the user is jogging at and to a smaller extent on whether a hurdle is jumped correctly. There was a significant difference in types of unsuccessful hurdle attempts (see Figure 3.7). An unsuccessful attempt was either due to jumping too early, or too late. Although the error rates were similar between the two games, the errors in the RTSS version of the game were mostly due to the player jumping too early, while the standard version was due to the player jumping too late. The RTSS version contained an average of 2.13 (σ=1.02) unsuccessful attempts due to jumping too early compared to the standard version which showed an average of 0.31 (σ=0.75) unsuccessful attempts due to jumping too early. The results of a single factor ANOVA show this is significant (F(1,24) = 27.6, p <0.05)). Jumping too late also showed a 80

difference where the RTSS version contained 0.31 (σ=0.45) unsuccessful attempts due to jumping late compared to the standard version that had an average of 2.23 (σ=1.36) unsuccessful attempts per race due to jumping late. This is a significant difference ( F(1,24) = 20.95, p <0.05) according to a single factor ANOVA. Another significant difference between the two versions was in the critical errors. Critical errors were errors that suspended the play of the game, and in this case were either false starts (starting too early) or by moving outside of the Kinect camera’s acceptable range. Players were to wait until a gunshot signaled the start of the race. Two of the players playing the standard version of the game created a false start by running too early. Compared to the RTSS version where none of the players created a false start. This could be attributed to the players playing the RTSS version being completely focused on sounds while the players playing the standard version of the game could be distracted by visuals contained within the display. Three of the players playing without the display ran outside of the acceptable range of the camera, while none of the players playing the standard version ran outside the acceptable range. The Kinect sensor was always located in the same place, but players playing without the display had no reason to pay attention to where they were physically located. Players playing the standard version were focused on the display and that may have created an environment where a player could really run in place without moving. Players playing the RTSS version had no display to look at and therefore had no ability to judge where they were in relation to the Kinect sensor.

3.5.2 Player Performance Results - Players with VI

Table 3.1 provides an overview of the combined results for both studies. Players with visual impairments had a slightly lower average success rate (32%) and took longer to run the track (34.08 seconds), but a single factor ANOVA found no significant difference for success rate (F(2,33) = 0.169, p >0.05) and time (F(2,33) = 2.867, p >0.05) with results from sighted players for the standard and the RTSS version that participated in our first study. 81

Table 3.1: Kinect Hurdles combined results for both studies Standard RTSS(sighted (σ) RTSS(VI) (σ) (σ) Time (s) 26.78 (4.76) 28.67 (8.41) 34.08 (3.79) Success rate 0.36 (0.32) 0.39 (0.26) 0.32 (0.12) Avg Early Jump 0.31 (0.75) 2.13 (1.02) 1.57 (0.78) Avg Late Jump 2.23 (1.36) 0.313 (0.48) 0 (0)

None of the visually impaired participants jumped too late, and an average of 1.57 (σ=0.79) hurdles were missed due to jumping too early per session. On average 1.14 (σ=1.07) hurdles were missed due to the player not jumping high enough. This behavior was not observed at all in the previous study. A single factor ANOVA found a significant difference between the three groups for jumping too early F(2,33) = 15.141, p <0.05) and jumping too late F(2,33) = 17.072, p <0.05). A Tukey post-hoc test only found an additional significant difference between blind players and sighted players using the standard version for jumping too late. (p=0.00). Critical errors for people with VI were consistent with the RTSS version. Five out of the seven players drifted outside of the acceptable range of the camera. All of them were a result of the player drifting too close to the kinect sensor. None of the players with VI started the race too early.

3.6 Discussion

The biggest difference between the games was how the hurdles were unsuccessfully encountered. The RTSS version tended to have players jumping too early while the standard version had players jumping too late. The haptic cue given to the player in the RTSS version was constant and the player had no idea where the virtual player was in relation to the hurdle. The player would simply just jump when the controller started to vibrate. If the player was running fast, as soon as the vibration started, it was the correct time to jump. If the player was running at a slower pace, the player should have waited to jump. The player had no knowledge as to how long the haptic 82

cue would last. Playing the RTSS version of the game, players were able to determine a success or failure of jumping over a hurdle by listening to the sounds. However, these sounds did not give a description as to if the player jumped too early or too late. Whereas the standard video enabled version of the game gave a visual to the player which would indicate if the player jumped too early or too late and that information could be used by the player to adjust the timing for future jumps. The haptic cue would start when the visual cue turned from nothing to a yellow cloud. If a player was running at a slow pace, he would jump too soon. Varying the haptic pulse in order to give the player a sense of the distance between the hurdles and his position in the screen may help a player determine more accurately when to jump regardless of the speed in which the virtual player was running.

3.6.1 Limitations

Although the results showed visual and haptic cues are equally significant, there were several limitations with the current RTSS software. Video Event Detection. Currently, a rectangular area and a color range within that area is supported. This works well for games such as Kinect Sports where ac- tion areas are easily identifiable, are different from the regular background, and are always in the same place. If any of those characteristics are not present, RTSS will not work. There are several games where RTSS is possible in its present form such as Eyetoy Kinetic and the other mini games contained within Kinect Sports, however there are others that will not function properly because the action area is not easily identifiable, or not different from the background or not always in the same place.

Ability to learn from mistakes. The focus of this system was to substitute important visual cues with other modalities in order to play the game without visu- als. The important cues substituted in this study were the visual cues that told the player it was time to perform an action. Nothing was enhanced to notify the player 83 the action was done incorrectly. For example, when a player would move out of the acceptable range for the Kinect sensor, the game would play an error tone and put up a message box explaining -through visuals- what the error was and how to fix that error. Without this visual being substituted, a player without sight would have to rely on a person with sight to explain what the error was. Also, the RTSS system gave no immediate feedback as to how the player was performing the jogging motion. Players using the display could see their character moving in sync with their feet, and they could adjust their running style if needed. This piece was lost when playing without the display.

Text Prompts. Text prompts are present throughout video games, and in order to navigate through the menus, they will need to be substituted for other modalities, for example through audio prompts. Without these present, players without sight need to rely on players with sight to set the game up and navigate to the game. Al- though the game play visuals are substituted to make the game playable, substituting the in game prompts as well could lead to more independence for people without sight.

Sensory Substitution Time. Visually impaired players showed the same per- formance as their sighted peers, however, sighted and visually impaired players both playing RTSS showed a significant higher chance of jumping too early when compared with the standard version. Kinect Sports gives advanced notice about an important event that is about to take place, however, depending on the duration of that ad- vanced notice, the substituted cue may or may not come at the correct time. In this specific case, the haptic cue would start when the hurdle was highlighted in yellow. The duration of the yellow visual cue is based on the rate at which the player is run- ning. If the player is running slow, the cue will be longer and the player may jump too soon. The opposite is also true. If a player is running too quickly the duration of the visual cue may be short, which results in a shorter notice in the substituted cue. In such cases, combined with the delay in being able to provide vibrotactile feedback, 84

sensory substitution may be too slow to allow for playing the game. This problem can only be fixed by building in a delay in the game for when it expects input but this requires modifications to the source code of the game.

3.7 Future Work

Measure Energy Expenditure. This study was performed to show the feasibility of an RTSS system. The results show that this is a good approach. We seek to measure the energy expenditure of a person with visual impairments while playing an exergame equipped with an RTSS module to see if vigorous levels of physical activity can be achieved. If the combination of RTSS and Kinect games can create vigorous levels of physical activity, this could be a viable form of daily exercise for people with visual impairments.

Menu System. Although the RTSS approach worked when in the game, it did nothing to assist a player with visual impairments in navigating the menus. An additional piece to use a screen reader for games that do not have an additional audio track, or where the in game instructions are not audible, would be a great addition to the RTSS system and make the experience more usable for a person with visual impairments. It would also create an environment where a person with visual im- pairments could exercise his independence by playing games without the need for assistance from a sighted peer.

Object Recognition. The RTSS system outlined here relied on simple cues available in Kinect Sports. Many of the games contained blobs that were always in the same place and always the same color. This simplified the object recognition as only a specific color in a specific location was monitored. An image signature where a blob of image data in the captured frame can be compared to a known image and determined to likely be the same data should be incorporated into the RTSS system. 85

This will allow for more complex sensory substitution such as identifying moving objects. If this piece is implemented and functions correctly, it will open the doors to making many different kinds of existing standard games accessible to people with visual impairments. There are potential social benefits from this as well as a person who is blind will be able to play the same popular games as his friends.

Open Interface. One potential area for future work would be an interface present in all commercially available titles. If games contained the information re- quired by the configuration files and were able to get that information to the RTSS system, this approach could become more widely used. It would not be a hindrance to the processing power required by the console to run the game, as all the post process- ing would be done by the RTSS system and supplemental cues would be directed to the player without any involvement of the game console except to periodically update the RTSS system with the visual cues to look for and what actions to perform when those cues are discovered.

3.8 Conclusion

This paper presents a technique called real time sensory substitution (RTSS) that allows players who are blind to play gesture-based video games, without having to make any modifications to the game. A user study with 28 sighted players and 7 players with VI playing Kinect Hurdles found no significant difference in performance based on the accuracy or time to play the game in a version with visual and audio cues when compared to a version of the game with audio and haptic cues. Players using the RTSS system tended to perform the action too early when compared to the standard version where players performed the action too late. Players playing without visual cues tended to wander outside of the playing area defined by the Kinect system, where as players playing with visual cues tended to miss important audio cues. Future work will measure the energy expenditure with hopes to create 86 enough energy expenditure to be considered vigorous physical activity. We will also support the menuing system to encourage independence for people who are blind, and contain better object recognition which will make more games accessible to people with visual impairments. 87

Chapter 4

Proprioceptive Displays

4.1 Introduction

Interaction capabilities of mobile devices are typically restricted by their weight and size, which curbs their functionality, usability [41], and accessibility. For example, mobile devices increasingly feature touch screens to optimize available input and output space, but interacting with onscreen keyboards often proves to be error prone due to their small buttons. The lack of tactile feedback is also an important barrier towards their use by users with visual impairments. Alternative interaction techniques have been developed that seek to increase in- put space of mobile devices without compromising their portability. Internal ap- proaches transform the mobile device itself into an input device either by: (1) sensing its orientation [45, 57]; (2) sensing its position on a flat surface [41]; and (3) sensing of gestures made with the device [52, 51] or against the device [47]. External ap- proaches seek to borrow input space from the user’s direct environment [42] or the user’s body [41]. Most of these techniques require either the mobile device or the user to be enhanced with additional sensing capabilities. Most research has focused on increasing available input space, but we argue that output capabilities of mobile devices are also constrained, not only because of their small screens and limited audio capabilities, but also because of their specific contexts of use: (1) users often interact with their mobile devices when they are active and the use of a display or audio may impede the users’ safety when they are walking or 88 driving; (2) the use of audio feedback may be limited due to noisy environments and safety and privacy issues, or simply because users prefers to listen to music on their mobile devices; and (3) mobile screens may be difficult to view in outside environments with direct sunlight. Tactile feedback has superior temporal discrimination and because mobile devices are often held closely to our skin, tactile feedback lends itself well to the design of mobile interfaces [86] to achieve eyes and ears free interaction. However, the tactile feedback provision on current mobile devices remains relatively underutilized [69]. Most mobile devices feature only one low cost vibrotactor capable of providing feed- back with a fixed frequency that our skin is most sensitive to (250hz). Consequently the temporal resolution of tactile feedback on mobile devices is typically restricted to a limited number of patterns. Proprioception is an interoceptive sensory modality distinct from exteroceptive sensory modalities such as sight, touch, and hearing that has remained largely unex- plored as a modality of feedback. In mobile navigation, single analog proprioceptive displays have been recently explored for acquiring the direction to a target in a hor- izontal plane by having users scan a horizontal line with their mobile phone. A vibrotactile cue indicates when the phone is pointed at the target, with the direction of the target conveyed to the user using proprioception. Natural user interfaces (NUI) have become increasingly popular as they capitalize on the innate abilities that users have acquired through interactions with the real world. NUI’s define novel input modalities, such as touch, gestures, motions and speech, that model natural human interactions with the goal to get intermediating hardware, such as a keyboard and pointing device, ”out of the way” [30] as to facilitate an invisible and -presumably- non-impeding interface that users may perceive as more intuitive and natural to use. Gestural interfaces, such as multi-touch, have become de facto for mobile inter- action but these are still firmly grounded and rooted in the domain of the graphical user interfaces (GUI) as gestures directly manipulate on-screen content. For NUI’s 89

to truly move beyond the confines of the traditional desktop and GUI based environ- ments, one can argue why graphical displays are still a part of this equation, as NUI designers can draw from a myriad of interaction options that are only constrained by the physical capabilities of the human body. There are several contexts where the use of graphical displays is not feasible or even possible. More and more people are adopting mobile devices, such as smart- phones or tablets, for their computing needs. Portability of mobile devices not only curbs screen real estate -but more importantly- allows for users trying to interact with their mobile device when they are active and the use of a screen may severely impede users’ safety, for example, when they are driving or walking. Users with visual impairments cannot use a graphical display and the emergence of NUI’s is raising new barriers for them. Only recently this fourth modality (proprioception) of feedback was explored to appropriate the human body to become a display device, by augmenting an exte- roceptive modality, such as haptic feedback with proprioceptive information. For example, tactile proprioceptive displays [13, 77, 90] have been explored that allow for users to acquire the direction to a target in an ear and eye free manner. Users scan their environment with a handheld orientation aware device. A vibrotactile cue indicates when the device is pointed at the target, upon which the target direction is conveyed to the user using her/his own arm. Tactile proprioceptive displays only require a small amount of haptic feedback to facilitate a much larger information space that can be accessed in an ear and eye free manner. Such interaction has found to be useful in mobile contexts for pointing out an object of interest [90] or to engage a blind user in gesture based interaction [77]. To date, proprioception has been explored for non-visual 1D target acquisition. The work in this chapter extends preliminary work in this area to the domain of natural user interfaces with the goal to facilitate non-visual gesture based interac- tion. The first section of this chapter seeks to extend or replace the display of a cell phone by using haptic cues and the orientation of the device. The second and third 90 studies in this chapter explore the use of tactile-proprioceptive displays for 2D target acquisition and evaluates users’ ability to perform directed gestures aimed at these acquired targets. The last study of this chapter explores 3D target acquisition. Such a technique could have useful assistive or augmented reality applications, for example, it could point out the exact location of an object rather than a direction to it, which could be useful in a navigation system for environments with no visibility or for users unable to see.

4.2 Discrete Proprioceptive Display Background

Proprioception has been explored for developing robust ear and eye free forms of input for mobile devices [41, 55, 56, 51] and virtual environments [75, 28, 29]. Leveraging proprioception to locate targets in a horizontal plane as an output technique was recently explored in the following approaches: Sweep-Shake [90] is a mobile phone application that can point out geo-located in- formation by sweeping the phone in the users environment. The phone’s compass and GPS are used to determine the user’s location and the direction in which the phone is pointing. Directional vibrotactile feedback (increasing its magnitude) conveys the location of an object of interest. Authors do not provide information about the size of the targets or the window around the target upon which vibrotactile feedback is provided. Users can perform gestures to interact with the object of interest but these are not directed at the found target. A study with four sighted users found that they were able to find targets in a 360◦ circle around the user in 16.5 seconds. As shown in Chapter 2, VI Bowling is a tactile/audio exergame for users who are blind [77]. This game is played using a motion-sensing controller (Wii Remote) with a built-in vibrotactor and optical sensor. The orientation and longitudinal shape of the Wii remote is used for pointing out the direction in which the bowling pins are located and in which direction the user must aim their throw. The orientation and position of the Wii remote is tracked using a wireless IR emitter peripheral. A process called tactile dowsing allows the player to find the location of the pins. A tactile window of 91

38.6◦ is defined around the target and the vibrotactor starts pulsing for 100ms with 2500 ms delays when the user enters this window, which is then decreased linearly by 125 ms for every degree of error. Vibrotactile feedback is used to accommodate specific contexts of exergaming for example, players socializing with each other. A user study with six blind adults found that the users were able to find the target direction on average in 8.78 seconds and were able to throw their ball with an average error of 9.76◦. Ahmaniemi [13] explored finding targets using a mobile device that consists of a high precision inertial tracker (gyroscope, compass and accelerometer) and a C2 vibrotactor. Two types of vibrotactile cues were explored for rendering targets: (1) an on target cue (260hz sine wave tional cue using a mixed tactile with 30hz envelope signal); and (2) directional cue using a tactile window of 10◦ around the target (same cue as on target but frequency and amplitude of the envelope shape is increased linearly). Targets were rendered randomly on a 90◦ horizontal line with varying widths. A user study with eight sighted users found they were able to find targets on average in 1.8 seconds with no significant differences between vibrotactile feedback provision for efficiency and target size, though smaller targets were found to take longer to find than larger targets. Directional feedback makes distinguishing targets that are close to each other more difficult as the edges of a target becomes harder to distinguish. Magnusson [71] evaluates a system similar to Sweep-Shake [90], where a non- directional audio cue indicates whether the user is pointing the phone within a window that contains a beacon that users must physically approach. The phone’s location and the direction it is pointing are acquired using GPS and a compass. Vibrotactile feedback is increased when the user gets closer to the beacon. Different sized target windows are evaluated with 15 sighted users, where a window of size 30◦ to 60◦ was found to be most efficient. PointNav [71] is an extension of the previous system but modified phone as points to provide within a 50ms a 30◦ non-directional window of the vibrotactile cue 92 when the object of interest. Point-Nav was evaluated qualitatively with five visually impaired users where all users were able to find all beacons. All these approaches restrict themselves in finding targets that are rendered using tactile feedback on a horizontal line using a handheld device.

4.3 Discrete Proprioceptive Display

Figure 4.1: Users rotating a smartphone to find one of six disjunct targets in space [UP, DOWN, FORWARD, BACK, LEFT, RIGHT] that are rendered using vibrotac- tile feedback.

Proprioceptive displays exploit the distinct rectangular shape of a mobile device to point out a direction to a target in space to the user using: (1) proprioception: the ability to sense the position and orientation of the body and its parts; and (2) stereognosis: the perception of the shape of 3D objects using touch. Users find the direction of the target using scanning, where unique orientations of the device are used to communicate information using auto-semaphoring.

4.3.1 Scanning

Mobile devices increasingly feature accelerometers [47], which measure linear accel- eration of the device along three axes. The accelerometer’s axes are aligned with the mobile device’s chassis and because the earth’s gravity is an acceleration, this can 93 be used for determining its orientation (see Figure 4.2). An initial direction of the mobile device can be specified as Φ = (φ,θ,ω)T where φ is roll, θ is pitch, and ω is yaw. A target vector is specified as: ω =(u,v,w)T . A tactile window of β degrees is defined around the target vector. Users change the orientation of their mobile device by rotating their hand until ω · φ < β. When the device points inside the tactile window, directional vibrotactile feedback is provided. Tactile feedback may better accommodate particular mobile contexts of use, such as noisy environments. Vibrotactile capabilities of mobile phones are limited to provide tactile feedback at a fixed frequency of 250Hz and they don’t support sophisticated drive signals. However, directional vibrotactile feedback can be provided by pulsing motor activation with a certain delay, where the delay is decreased linearly for every degree of error between ω and φ. Users can find the direction of error by moving their device in a particular direction to determine if the pulse delay is increasing or decreasing where the target it found when the vibrotactile cue feels continuous.

4.3.2 Auto-Semaphoring

A short vibrotactile cue could indicate a state change in the mobile device upon which the user can start scanning to identify the exact state change, e.g., the phone posi- tioned forward could mean a missed call and positioned up could mean a new text message. State changes of a mobile device can then be retrieved entirely through tac- tile and proprioceptive modality, leaving ears and eyes free for sensing the immediate environment. Proficient drivers are able to manually shift gears without taking their eyes off the road [55], therefore it can be assumed that the motor operations required for rotating the hand produce little cognitive load, which may allow proprioceptive displays to be used in conjunction with other activities, such as walking. Users may be able to memorize the different orientations, which may allow for quickly cycling through the orientations to find the target one, as these can be recalled from the user’s kinesthetic memory. As such, no directional vibrotactile feedback is required but only an on-target cue. 94

Figure 4.2: (Left) orientation is outside the tactile window. (Right) orientation is inside tactile window and vibrotactile feedback is provided.

4.4 Twist-N-Lock

Games can be powerful motivators, and by evaluating the temporal resolution of dis- crete proprioceptive displays using a game ( Twist-N-Lock ), we may measure optimal performance. Accelerometers are limited in their ability to report the exact orienta- tion of the device. When positioned flat on a table, rotations about the length (pitch) and width (roll) of a mobile device can be retrieved from its accelerometer, but not rotations around the center (yaw). Yaw can be retrieved by combining accelerom- eter readings with either a magnetometer or a gyroscope. Magnetometers may be subject to interference in the presence of large metal objects and gyroscopes have only recently become available in mobile devices. For proprioceptive displays this is a problem when the longitudinal aspect of the mobile device is used to point out a vector to the user, so we restrict our research to measuring unique orientations using one accelerometer. We developed Twist-N-Lock in Java using the Android SDK for the HTC Evo smartphone. In a previous study with an analog proprioceptive display, Ahmaniemi [13] used an expensive [9] high precision inertial tracker (accelerometer, magnetometer, gyroscope) with an accuracy of +/-0.05◦. For this study we use a com- mercially available smartphone to illustrate that a discrete proprioceptive display can be implemented using its existing features. A potential risk with using inexpensive 95 accelerometers is fuzzy input [88], though Kalman filters have been employed suc- cessfully to increase their accuracy [98]. Because targets are rendered with a fixed orientation along a different axis, we do not require the precision that is required for rendering analog targets on a line. When the phone is held relatively still, a vec- tor indicating earth’s gravity can be retrieved from its 3-axis accelerometer as it is decomposed along each axis. When an axis is horizontal this value is 0.0 and when vertical -10.0 or 10.0. When the value on any axis is maximal or minimal, six unique orientations can be distinguished, e.g., [ UP, DOWN, FORWARD, BACK, LEFT, RIGHT ]. Each orientation is shown in Figure 4.1. More disjunct orientations could possibly be measured, but we restrict ourselves to six, as an easy mnemonic for the user to recall each orientation is to visualize the mobile device to be held inside a cube and then have the front of mobile device face each of the inside faces of the cube. A target orientation negative axis. Ahmaniemi [13] found is target indicated sizes larger as a positive than 15 ◦ or to be most effective for an analog proprioceptive display conveying targets on a horizontal line. Based on early trials, a target size of 9 was found feasible (i.e., within 1.0 of either the minimum or maximum). The vibrotactor is activated to provide an on-target cue when the phone is pointed at the target vector (see Figure 4.2) and no directional vibrotactile feedback is used because we used target vectors with fixed orientations. Preliminary trials and previous studies [77, 13] found no significant interference of vibrotactile feedback with readings from the accelerometer when the phone is held in the hand. The goal of the game is to find the target orientation, which is one of the six orientations. This target orientation is determined at random, as it can be assumed that the type of message to be communicated using auto-semaphoring is not known a priori. If the phone is moved vigorously, false positives may be generated but by requiring the player to hold the phone in the tactile window for 1 second these can be avoided. When the player holds the phone for 1 second in the target direction a sound is played and a new target orientation (different from the current one) is determined at random. This is implemented to mitigate the effect of the initial orientation of 96 the device in our evaluation, as in mobile contexts user may hold the device in any orientation. Players score more points the faster they find the target. For each game, we record (1) target orientation; and (2) the scanning time, e.g., the time it takes to find the target orientation from which we subtract the 1-second holding time.

Figure 4.3: Average scanning time for 6 orientations

Figure 4.4: Average scanning time per orientation

4.5 User Studies

12 able-bodied participants were involved in a user study (3 female, mean age 34.9). Participants were taught all six orientations using the cube mnemonic and were told 97

that the phone would vibrate when held in the correct orientation. This orientation would have to be held until they heard a sound. They were told that orientations would be picked at random and the faster the correct orientation was found, the higher their score would be. Participants played the game for 5 minutes using their dominant hand, while they were sitting down. To avoid interference, the screensaver and the autorotate features on the phone were disabled and the buttons at the bottom of the screen were covered with tape. Players received $5 each for their participation. Users found a total of 599 orientations (M=49.9,σ=16.8). The mean scanning time was 5,597 ms (σ=4,490 ms) and minimum and maximum scan times were 1,395ms and 34,143ms. Figure 4.3 shows the scanning time for all participants for data points below 8,000 ms during the 300 seconds that the game was played. The variance of scanning time is large (σ2=20.2s), which is attributed to probability as target orientations were picked at random and players scanned orientations randomly to find the target. Figure 4.3 displays the means of each minute as bars. A slight decrease in means can be observed (not significant) which is explained either through chance or because players were able to memorize the available orientations and could cycle through them faster. If the first two minutes are considered a learning phase, the mean scanning time is 5,387 ms. Figure 4.4 shows the mean scanning time per orientation, but no significant difference between them was detected. The LEFT and DOWN orientations show large standard errors, which could indicate that they were often explored last when trying to find the target orientation as they may have been slightly more difficult to achieve physically. Probability significantly affects the per- formance of scanning and this increases based on the number of distinct orientations used (bits in the information space). Our study showed that with an information

target orientation space of 2.58 bits (e.g. log2(6) six distinct orientations) a target orientation could be found on average in less than 6 seconds, with a standard devia- tion of 4.5 seconds. The strategy players adopt to scan through the orientations also affects performance. When targets are rendered in a space, e.g., the inside faces of a cube, they can’t be scanned through with constant velocity. For comparison, Ah- 98 maniemi [13] (see related work) found in their study that users could find a random target on average in 1.8 seconds. However, these targets were located on a 90◦ line, which allows users to use linear scanning with a constant velocity.

Figure 4.5: Average scanning time for 4 orientations

Figure 4.6: Average scanning time per orientation

A follow up study was performed where we used an information space of 2 bits and targets were render in a vertical plane where the following four orientations were used, e.g., [ FORWARD, UP, BACK, DOWN ]. These orientations can be found faster as they can be scanned through with a constant velocity, by rotating the hand holding the phone 270◦ counter-clockwise when the phone is initially held forward. 99

Eight able bodied participants were involved in this user study (2 female, mean age 32.3) where the same protocol was used as for the user study with six orientations. Participants were taught the four orientations using the ’hand rotation’ mnemonic. Users found a total of 749 orientations (M=96.4, σ=15.9). The mean scanning time was 3,373 ms (σ=1,357 ms) and minimum and maximum scan times were 1,278ms

and 9,459ms. This performance is significantly better (T2,1346 = 12.84 p < 0.001) than using six orientations, though the information space used is smaller. Because targets can be scanned through more easily, the variance of scanning time is much smaller (σ2= 1.84 s) see Figure 4.5. Figure 4.6 shows the mean scanning time per orientation for the four orientations, where it can be observed that the standard error is now much lower and there is less deviation between the standard errors of each orientation. A third study was performed in which users scanned through 8 orientations. Six users participated in this study and were shown all eight positions prior to the study beginning. The eight orientations were DOWN, DOWN-LEFT, LEFT, LEFT-UP, UP, UP-RIGHT, RIGHT, and RIGHT-DOWN. The results are shown in Table 4.1.

Table 4.1: Results for eight orientations Orientation Frequency Time (ms) Stdev (ms) DOWN (1) 330 2,067 759 DOWN-LEFT 44 5,242 3,535 LEFT (2) 54 2,982 1,421 LEFT-UP 46 3,735 1,262 UP (3) 46 3,441 1,289 UP-RIGHT 41 3,824 1,216 RIGHT (4) 45 5,393 4,115 RIGHT-DOWN 59 4,804 2,625 Total & Average 655 3,936 2,175 100

4.6 Discrete Proprioceptive Display Discussion and Future Work

Proprioceptive displays may not be as efficient as more sophisticated forms of haptic feedback provision. For example, Chan et al [24] present seven haptic icons or tac- tons that can be recognized on average in 2.5 seconds with 95% accuracy while the user is engaged in a visual/auditory task. Though we did not assess the accuracy of associating an orientation with a particular message using auto semaphoring, pro- prioception is an output modality distinct from modalities such as sight, touch, and hearing and can potentially be used to augment these modalities of feedback as to create richer forms of feedback or to gain multimodal representation. For example, a proprioceptive display with 6 orientations could be combined with the seven tactons as to create an information space of 5.39 bits (42 states). An important drawback of using tactons is that they cannot be implemented with vibrotactors that are typically implemented in current mobile devices. Mobile devices feature low cost vibrotactors that provide vibrotactile feedback with a fixed frequency and they don’t support sophisticated drive signals. Proprioceptive displays on the other hand can be implemented using an accelerometer and a simple vibro- tactor, features that have become omnipresent on current mobile devices and which may lead to a greater adoption and dissemination of proprioceptive displays. For a complete analysis into the efficacy of proprioceptive displays, we must also evaluate the users’ ability to perform selfsemaphoring, e.g., whether users are able to asso- ciate a particular orientation of their device using proprioception to correspond with a message, which will be the focus of future research. In our experiment we assume the type of message to be communicated is deter- mined at random, which may not be a feasible assumption. For example, in a mobile phone application it may be more likely for the user to receive a text message than a voicemail. Such information can be used to optimize scanning. Rather than having the user perform a random scan, users could perform a directed scan, e.g., by explor- 101 ing orientations that are associated with messages that are more likely to occur first. For example, the sequence [ FORWARD, UP, BACK, DOWN ] could correspond to: new message, new email, new voicemail, low battery, which may decrease the average time to communicate a message. This was explored in the 8 orientation study, where the user was forced to start scanning in the down position. The data shows that the positions furthest from the down position were the hardest to find, but also the second position was often longer than the other positions. This could be due to a user moving the phone too quickly and skipping the first position. By combining readings from an accelerometer with those from a gyroscope, the exact orientation of a mobile device can be determined. This opens up the opportu- nity to use a proprioceptive display for having the mobile device point out a direction relative to the user. Users may find it easier to point the device in different directions than to arrange their mobile device in one of the unique orientations, which we seek to investigate in future work. Proprioceptive displays could probably be used in con- junction where a user holds a device in each hand. This may have useful applications in the domain of assistive technology; for example, semaphores composed of combi- nations of orientations φ similar to American Sign Language could indicate different letters, which could be a useful display technique for individuals who are deaf- blind. Using two proprioceptive displays increases the number of bits in the information space significantly (i.e., approximately 5.16 bits if 6 orientations per mobile device are used, which is large enough to hold a simple alphabet). Using two proprioceptive displays in conjunction may increase the cognitive load significantly, though, users with visual impairments may be more proficient in the use of proprioceptive displays due to the plasticity effect [25].

4.7 Discrete Proprioceptive Display Conclusion

Proprioceptive displays combine kinesthetic input with vibrotactile cues to create a novel ear and eye free display technique. Different orientations of a mobile device can 102 be used to communicate messages to the user using auto semaphoring. Using a game (Twist-N-Lock) we evaluated the efficacy of proprioceptive displays. A user study with 12 and 8 participants found that for six and four orientations users could find the target orientation on average in 5,597ms and 3,373ms. When targets are aligned in a plane they can be scanned through with constant velocity and they can be found faster. The efficiency of proprioceptive displays is lower than more sophisticated tactile displays, though proprioceptive displays have the benefit that they can be constructed from features often already present in mobile devices.

Figure 4.7: User scans for a target in a plane using a handheld controller with the target location rendered using vibrotactile feedback. Direction of the target is con- veyed using proprioception, upon which the user performs a directed gesture towards the target (right).

4.8 2D Target Selection - Background and Related Work

Leveraging proprioception to locate targets in a horizontal plane as an output tech- nique was recently explored in the following approaches. Sweep-Shake [90] is a mobile phone application that can point out geo-located information. The phone’s compass 103 and GPS are used to determine the user’s location and the direction in which the phone is pointing. Directional vibrotactile feedback conveys the location of an object of interest. Authors do not provide information about the size of the targets or the window around the target upon which vibrotactile feedback is provided. A study with four sighted users found that they were able to find targets in a 360 degree circle around the user on average in 16.5 seconds. Magnusson [71] evaluates a system similar to Sweep-Shake [90], where a non- directional audio cue indicates whether the user is pointing the phone within a window that contains a beacon that users must physically approach. Vibrotactile feedback is increased when the user gets closer to the beacon. Different sized target windows are evaluated with 15 sighted users, where a window of size 30 degrees to 60 degrees was found to be most efficient. PointNav [70] is an extension of the previous system but modified as to provide a 50ms non-directional vibrotactile cue when the phone points within a 30 degree window of the object of interest. Ahmaniemi [13] explored finding targets using a mobile device that consists of a high precision inertial tracker (gyroscope, compass and accelerometer) and a C2 vibrotactor. Two types of vibrotactile cues were explored for rendering targets: (1) an on target cue (260Hz sine wave mixed with a 30Hz envelope signal); and (2) directional cue using a tactile window of 10 degrees around the target (using the same on target cue where the frequency and amplitude of the envelope shape is increased linearly). Targets were rendered randomly at eight different locations on a 90 degree horizontal line with varying widths. A user study with eight sighted users found users were able to find targets on average in 1.8 seconds using a scanning velocity of 45.1 degrees/s. No significant difference was found between vibrotactile feedback provision for efficiency and target size, though smaller targets took longer to find than larger targets. Target sizes larger than 15 degrees were most effective. Directional vibrotactile cues are more efficient than non-directional cues when target distance is furthest but it negatively affects finding targets that are close. It makes it also harder to distinguish targets that are close to each other as distinguishing the 104 edges of a target becomes harder. VI Bowling [77] is a tactile/audio exergame for users who are blind that is played using a commercial-of-the-shelf motion sensing controller with an integrated vibro- tactor. Players are guided to point their controller at the pins using directional vibrotactile feedback, where the position of the controller is tracked using a wireless IR emitter peripheral. Proprioception is used to facilitate a basic form of motor learn- ing as users throw a virtual bowling ball at the sensed location of the pins. With a close-to-target window of 38.6 degrees and a target size of 7.2 degrees a user study with six legally blind adults found that users were able to find the target on average in 8.78 seconds and users were able to perform a directed gesture (throw) with an average error of 9.76 degrees towards the target.

4.9 Tactile-Proprioceptive Displays

Proprioception is the human ability to sense the position and orientation of the limbs and its extremities without the use of vision [35]. We define a proprioceptive display to be an output technique that turns the space of allowable and distinguishable positions and orientations of a limb into an information space that the user has access to in an ear and eye free manner using proprioception. Proprioceptive displays should be considered ”hybrid” displays that rely upon a small amount of feedback in another modality, for example haptic feedback, for the arrangement of the limb into a specific orientation or position as to facilitate a significantly larger information space. All techniques discussed in the related work section implement proprioceptive displays to facilitate target acquisition using a handheld orientation aware mobile device, for example, a mobile phone with integrated compass. Initially users sense the direction their device is pointing using stereognosis, e.g., the perception of the shape of a 3D object using touch. Mobile devices are typically elongated and have identifiable tactile features, such as buttons, which further facilitates this process. Users scan their environment either by adjusting the direction their forearm [13, 90] or stretched arm [77] is pointing. A vibrotactile cue indicates when the mobile device is pointed 105 at the target, upon which the direction of the target is conveyed to the user using proprioception using the current position of their arm or forearm. Though for example sonification could be used to orchestrate a limb, haptic feedback facilitates ear and eye free interaction, which is useful in mobile contexts.

4.9.1 Information Space

The information space of a proprioceptive display is constrained by the amount of available rotation of a limb. Adjusting the direction in which the arm or forearm is pointing involve rotations of the shoulder (Glenohumeral) joint. With approximately 135 degrees of flexion, 40 degrees of extension, 170 degrees of abduction and 35 degrees of adduction, the shoulder joint is the most mobile joint in the body [16]. The arm or forearm can freely rotate within in a hemisphere defined in front or to the side of the user, which creates an information space that is able to point out a 3D vector relative to the user. Existing proprioceptive displays have only explored a subset of this space as the user scans a horizontal line to find a target, which only conveys a 2D vector to the user.

4.9.2 Gesture Based Interaction

In this study we focus on natural user interfaces that require the user to perform upper body gestures that are directed towards targets defined in 3D space. Exercise games such as Sony’s Eye Toy Kinetic [6] or Microsoft Kinect Adventures [2] superimpose virtual objects at locations in a space in front of the user to be touched, punched or kicked over a video image of the player or an avatar representation of the player - both which are created using an external camera. Players acquire the loci of targets using the visual modality. Because targets are typically rendered at an arm’s length to stimulate larger physical activity, the target acquisition only involves finding a point in a plane defined in front of the player. Morelli [77] explores gesture based interaction and target acquisition using a tactile-proprioceptive display but their target has a fixed location on a horizontal line. 106

4.10 2D Target Acquisition Study 1: Target Ac- quisition

Our first study evaluated two different scanning strategies for acquiring a target in a plane using a proprioceptive display.

4.10.1 Instrumentation

In a previous work Ahmaniemi [13] used an expensive high precision inertial tracker (accelerometer, magnetometer, gyroscope) with an accuracy of 0.05 degrees to im- plement a proprioceptive display. However, inertial trackers are subject to drift over time when the player is actively moving [98], which may be the case when users are performing gestures. We therefore combine inertial tracking with camera based tracking, both of which can be facilitated using a recently introduced commercial off-the-shelf controller called the Sony Move [7]. The Move controller features an internal accelerometer, a gyroscope, and a magnetometer. Though no specifications were provided, using reverse engineering we found that it features a vibrotactor whose frequency can be adjusted to vary between 91hz to 275hz. The Move controller fea- tures an LED that serves as an active marker. The uniform spherical shape and known size of the LED allows for controller’s position to be tracked in three dimensions with high precision (1 millimeter) [8], using an external camera, e.g., the PlayStation Eye [7] which captures video at 640x480 (60Hz). Because games are considered powerful motivators, this may allow for measuring optimal performance. A game called NV-INVADERS was created using the Microsoft XNA framework. In NV-INVADERS players use a proprioceptive display to acquire the location of an alien and shoot it with the Move controller. The accuracy with which the player aims the con- troller at the alien determines whether it is destroyed and the faster aliens are destroyed the more points a user scores. No visual feedback is provided only sounds and haptic feedback. NV-INVADERS runs on a laptop and communicates with an application called Move.me that ran on a Playstation 3 to 107 retrieve the current position and orientation of the Move con- troller and to commu- nicate with the controller to adjust the vibrotactile feedback. For their proprioceptive display, Ahmaniemi [13] found target sizes larger than 15 degrees to be most effective for a range of 90 degrees. Because the Move controller’s LED is tracked using the camera in a 640x480 resolution, we use a target size of 100 pixels to have the same relative target size (for at least the X-coordinate). A target was generated randomly within the 640x480 range excluding a 20 pixel border to eliminate the need to scan near the cameras boundary. Two different types of scanning were developed based on the tactile capabilities of the move controller. In linear scanning the user acquires the X and Y coordinate of the target sequen- tially. A horizontal band of 100 pixels is defined around the target’s X coordinate in which the controller provides vibrotactile feedback. Directional vibrotactile feedback is used to indicate the Y coordinate. The frequency of vibrotactile feedback provided with the Move controller can be made to vary between 91hz to 275hz by setting a value in the range of 1 to 255 (0=off). In a related study with target selection using a haptic mouse Oron-Gilad et al [82] found that targets can be found significantly faster when the difference between the on-the-target cue and close- to-target cues is larger on the border of a target. Ahmaniemi [13] implemented this as well in their proprioceptive display. A close-to-target cue was implemented by varying the fre- quency linearly based on the Y target error with a maximum value of 200 at the edge of the target, which was boosted to 255 when the player was on target (i.e. within 50 pixels). If the user moved the controller outside of the resolution of the camera vibrotactile feedback would stop. To acquire a target the user scans the X-axis with their controller until vibrotactile feedback was felt upon which the user would move up or down to find the direction of error as to find the Y coordinate. Using multilinear scanning, in addition to using frequency to encode the error on the Y-axis, directional feedback was implemented on the X-axis to indicate error using pulse duration. The haptic pulse was for 200 milliseconds followed by a delay that decreased linearly at the rate of 3 milliseconds per pixel based on the X target 108 error. We increased the difference between the on-the-target cue and close-to-target cues by setting the pulse delay to 0 when player was on target. To acquire a target users can move their controller diagonally to find the direction of error on both axis. A sound of a missile firing was played whenever a player pulled the trigger on the Move controller. To hit the alien, the player needed to be within 150 pixels of the target when the trigger was pulled where we determined this value through play testing to make the game not too hard to play. When the alien was hit, vibrotactile feedback stopped and a sound effect of the alien dying was played. A new alien would be generated after a score was announced using synthetic speech, which was score = 20 - t with t the number of seconds it took to eliminate the alien. Background music was played to mask the sound produced by the vibrotactor in the controller.

4.10.2 Participants

We recruited 14 users (4 female, 10 male, average age 28.64, σ=6.51) to participate in a user study. Participants were randomly divided into two seven people groups (A,B) where group A played the game using linear scanning and group B using multilinear scanning. All were right handed and none had any impairments in tactile perception or motor control. We also measured players’ height (M=69.8 inches, σ=1.86) and arm length (M=23.3 inches σ=1.92).

4.10.3 Procedure

User studies took place in a small office room. Participants played the game using their dominant arm while standing. An observer was present for all user studies. Due to players having different heights and different arm lengths, we used an application that was part of the Move SDK that allows for calibrating the position of the player as such that their full horizontal and vertical arm movements would match the entire range of the camera. After calibration, we avoided players movement by requiring them to stand on a piece of paper. Players were then instructed what the goal of the game was and how to play the game either using linear or multilinear scanning. 109

Players could try the game to familiarize themselves with the scanning technique. Players then played the game until 40 aliens were hit. This number was chosen to keep the study to around 10 minutes per participant. Participants received a $5 gift certificate for their participation. The game would record in a log file: (1) all target locations; (2) all locations of the controller (sampled with 60 fps) including time stamps; (3) all shots fired and distance from the target when the shot was fired.

Figure 4.8: Search strategies for linear scanning (left) and multilinear scanning (right)

4.10.4 Results

The average search time for a target was 10,423 (σ=11,295) ms for linear and 8,204 ms (σ=7,990) for multilinear scanning. However this is an unfair comparison as targets were generated at random and the distance between targets could vary significantly. Search time corrected for distance yields: 52.17 ms/pixel (σ=63.83) for linear scanning and 38.22 ms/pixel (σ=35.81) for multilinear scanning. Error was defined as the number of shots fired by the player that failed to hit the target. Linear scanning 110

yielded an average error of 1.257 (σ=2.91) whereas multilinear scanning found an error of 1.275 (σ=2.31). A one-way MANOVA found a statistically significant difference between linear and multilinear scanning on performance (e.g., error and corrected

2 search time) (F1,557= 6.919, p <.001, Wilk’s λ = .976, partial e = .024). Post hoc tests using the Bonferroni correction revealed no statistically significant difference between error rates (p=.936) for both scanning types, but multilinear scanning was faster as corrected search time was statistically significant different (p < .001). Figure 4.8 shows typical target searching for both scanning techniques that shows that players who used multilinear scanning actually scan along two axes simultaneously, which is faster than linear scanning. Fitts’s law models human movement that predicts the time it takes to move to a target as a function of the distance the size of the target. In related work Ahmaniemi [13] found the performance data for their one-dimensional proprioceptive display to fit poorly with Fitts’s law [34]. We repeated their analysis for two dimensional target finding. Because our target had a fixed size, we only used to target distance as index of difficulty. Figure 4.9 displays search time as a function of the index of difficulty. The Pearson correlation was 0.18 for linear scanning and 0.26 for multilinear scanning. There are many data points where search time was far above the baseline. There are two situations where the player will move away from a target: (1) when a new target is generated the player unknowingly moves away to figure out the direction of error or; (2) users would unknowingly skip over the target while figuring out the direction of error. Moving diagonally allows for finding out the direction of error on both axis but there is only a 25% chance the player moves closer to the target. The correlation can be improved slightly by removing potential outliers. Similar to Ahmaniemi [13] we filtered out 30% of the slowest search time only to see the correlation improve to 0.30 (linear) and 0.46 (multilinear), which is still considered a poor fit. Figure 4.9 shows average search time plotted against the location of the target on the screen as distance from the center. The lack of directional feedback on the X-axis for linear scanning is the reason why players took longer to find targets that were 111 defined close to the edge as the standard deviation of search time for linear scanning increases significantly for those targets. While scanning for the target players may not have scanned the full range of the display upon which they would move from left to right and then go back to left to explore the edge of the display.

4.11 2D Target Acquisition Study 2: Performing Directed Gestures

The first study provided useful insights into the efficacy of different scanning mech- anisms. The results of this study are used to inform a second study which evaluates being able to perform upper body gestures directed at targets that are acquired using a proprioceptive display.

4.11.1 Instrumentation

A second game was created called NV-POP. In NV-POP players use a proprioceptive display to acquire the location of a balloon but instead of shooting it (like in NV- INVADERS ) players are required to make a directed gesture (thrust motion) at the acquired location of the balloon to pop it. Multilinear scanning was implemented. The accuracy of the gesture was mea- sured by the controller’s gyroscope. The Move.me application running on the Playsta- tion 3 provided the Euler angles (pitch, yaw, and roll) of the orientation of the con- troller. A gesture was detected if the position of the controller moved closer to the camera by 6 inches, which we measured using the Z-axis values of the controller using the camera. At the start of the gesture, the current X and Y values were recorded in a log file as well as pitch, yaw, and roll were recorded. Once no more Z-axis forward motion was detected, the same values were recorded to determined the accuracy of the gesture. The difference between these values for the beginning and the end of the gesture were taken into consideration to determine whether the balloon was popped. We used the same target size (100 pixels) for the balloon and considered it a hit if the 112 player started their gesture within a 150 pixels of the target and the orientation of the controller remained within 5% error on each of the Euler angles throughout the gesture. These values were determined using play testing. If the balloon was hit successfully the sound of a popping balloon would be heard. If the player missed the balloon the sound of an hand sweeping through the air would be played. Background music was played to mask the sound produced by the vibrotactor in the controller.

4.11.2 Participants

We recruited 8 users (1 female, 7 male, average age 28.13, σ=3.56) to participate in a user study. None of these participants had participated in the first user study. All were right handed and none had any impairments in tactile perception or motor control. Players’ height (M=70.6 inches, σ=3.73) and arm length (M=23.8 inches σ=2.43) were measured.

4.11.3 Procedure

The same procedure was used as for NV-INVADERS. Where in the first study players held the controller with their arm stretched, for this study we required them to scan the plane using their forearm as to be able to make a thrust gesture. Players were instructed hold the controller in their hand like a dagger and to make a thrust gesture to pop the balloon while keeping the direction of the controller aimed at the balloon during the gesture. Players played the game until 40 balloons were popped.

Table 4.2: Average Aiming Error in Euler Angles Angle Pitch Yaw Roll Average average 3.35 3.50 4.27 3.71 stdev 3.84 3.50 5.62

4.11.4 Results

The average search time for a target was 10,847 (σ=11,268) ms, the average number of misses per target was 1.51 (σ=2.44) and corrected search time was 53.6 (σ=116.6) 113 ms/pixel. Table 4.2 shows the average aiming error for each successful gesture per Euler angle but no significant difference (P > .001) was found between them. Fig- ure 4.11 shows how close users were to the target before they started their gesture based on the distance of the target from the center of the display. This shows a slight but not significant increase as targets get farther away from the center, which corroborates with results using multilinear scanning from our first study.

4.12 2D Target Acquisition Discussion

Because proprioception is a novel output modality distinct from other modalities, we contrast our results with previous results achieved with proprioceptive displays. Ahmaniemi [13] found that users could detect a target using a proprioceptive display in a 90 degrees horizontal range within 1.8 seconds. Morelli [77] used a horizontal display of 38.4 degrees where users were able to find a target in 8.78 ms and users could perform a directed gesture in 2D with an error of 9.76 degrees. Because we calibrated our proprioceptive display for each player to utilize their full range of rotation of the arm at the shoulder joint, our display has an effective width of 180 degrees and a height of 135 degrees. The average search time for multilinear scanning was four times as large as Ahmaniemi and comparable with Morelli though our display was considerably larger in size as well as in dimension. A comparison of aiming error is difficult as Morelli only measures an error in 2D. A previous study found 1D target acquisition using a proprioceptive display to fit Fitt’s model poorly [34]. Our study which expanded the use of proprioceptive displays to two dimensions found this correlation to become even worse. Fitt’s model is derived from visual target acquisition where a target is acquired instantly where using a proprioceptive display the user must scan for a target. Possibly Fitt’s model could be extended to include some probabilistic component that models a brief trial and error phase where the user figures out the direction of error to find the target. Some players spent a significant amount of time searching for the target outside the range of the camera and an improvement would be to indicate to the player for 114 example using audio that s/he is outside of the range rather than interrupting the haptic feedback provision. Alternatively techniques for implementing a proprioceptive display could be used that are not confined to a limited range. Inertial sensing could be used but they are subject to drift over time where the error in the estimation of the controller’s orientation may grow unbounded [98], when users are actively performing gestures. For the second study players scanned with their forearm as opposed to their whole arm as to be able to make a gesture. Though the forearm has the same range of motion as the arm, e.g., a hemisphere, because the forearm is part of the arm this hemisphere is smaller. Though this hemisphere may be scanned through faster as it involves smaller motions the spatial resolution of this information space is smaller as it may become harder to distinguish targets that are close to each other.

4.13 2D Target Acquisition Future Work

Proprioceptive displays allow for a user to scan to a target in a plane that conveys a direction to a target in a space. We used haptic feedback (frequency, pulse length) to convey X and Y coordinates of a target but another type of haptic feedback (am- plitude) could be used to convey the (Z-coordinate) distance to this target. Gestures with a different length or magnitude can now be provided, which could have applica- tions in non-visual augmented reality environments where users would use different gestures to grab virtual objects that are close by or far away. This implementation exceeds the haptic capabilities of the controller we used in our studies. Potential application areas for proprioceptive displays are low cost approaches for motor reha- bilitation for developing countries and to develop exercise games for individuals with visual impairments. There is considerable evidence that the overweight and obesity rates are higher among persons with visual impairments than among the general pop- ulation [89]. Some preliminary research has been performed in this area [76] but of interest would be to explore the conjunctional use of two proprioceptive displays as to stimulate larger physical activity. This setup may allow for more efficient target 115

finding as a proprioceptive display can be divided into two smaller displays which would also allow for detect multiple targets as players can scan the space using two proprioceptive displays simultaneously.

4.14 2D Target Acquisition Conclusion

We present two studies that explore the use of a tactile proprioceptive display for non- visual 2D target acquisition and performing directed gestures aimed at the acquired targets. The first study explored two different scanning strategies, e.g., multilinear scanning with directional vibrotactile feedback provided using frequency and pulse duration; and linear scanning with directional vibrotactile feedback using frequency. Multilinear scanning was found to be significantly faster with users being able to find a target on average in 8,204 milliseconds. No significant difference in error rate between both scanning techniques was found. A second study evaluated the accuracy of being able to perform an upper body gesture aimed at a target that was acquired using a proprioceptive display where we found that users could provide a gesture with an average aiming error of 3.51 degrees for each Euler angle.

4.15 3D Target Acquisition Related Work

Sweep-Shake [90] is a mobile application that points out geolocated information using a tactile-proprioceptive display. The user’s location and the phone’s orientation are determined using a compass and GPS. Directional vibrotactile feedback using pulse delay is used to render targets. A study with four users found users could locate a target on a 360 degree horizontal circle in 16.5 seconds. Ahmaniemi [13] explored tar- get acquisition using a mobile device that consists of a high precision inertial tracker (gyroscope, compass and accelerometer). Directional and non-directional vibrotactile feedback (frequency and amplitude) were explored for rendering targets with varying sizes on a 90 degree horizontal line. A user study with eight sighted users found they were able to find targets on average in 1.8 seconds. Target sizes larger than 15 degrees 116 were most effective. Directional feedback was found to be more efficient than non- directional feedback when target distance is furthest but it negatively affects finding targets that are close. Virtual shelves [55] is an input technique where users trigger shortcuts by orient- ing a motion sensing controller with an integrated vibrotactor and IR camera (Wii remote) within a circular hemisphere in front of them. This technique includes a training phase where a non-directional on-target vibrotactile cue is provided to iden- tify targets and to facilitate the development of a spatial memory. VI Bowling [77] is a tactile/audio exergame for users who are blind that is played using a Wii remote. A tactile-proprioceptive display with directional vibrotactile feedback (pulse delay) reveals the location of the pins to the user, which allows for the player to perform a directed gesture at the acquired target. With a close-to-target window of 38.6 degrees and a target size of 7.2 degrees a user study with six legally blind adults found that a target can be found on average in 8.78 seconds and users could perform a throwing gesture with an average aiming error of 9.76 degrees.

4.16 3D Target Selection Prior Work and Motiva- tion

Because existing work with tactile-proprioceptive displays has all involved 1D tar- get acquisition, in the last section we explored 2D target acquisition. A tactile- proprioceptive display was implemented using a commercially available game con- troller that is used for gesture-based interaction (Sony Move). This controller’s po- sition and orientation can be tracked with high accuracy using an external camera and inertial sensing. Its integrated vibrotactor is capable of providing directional vibrotactile feedback using pulse delay and frequency (91- 275Hz). A display was implemented whose size is constrained by the length of the user’s arm that holds the controller and which defines a plane in front of the player. An augmented reality game was developed that involves players scanning to a random tar- get and shooting it by pressing the controller’s trigger. The faster the player finds 117 the target the more points the player scores. Two different target-scanning strate- gies were developed. Linear scanning involves finding the target’s X coordinate using a non-directional on-target vibrotactile cue, upon which the Y coordinate can be found using directional vibrotactile feedback (frequency). Multilinear scanning uses directional vibrotactile feedback on both axes (pulse delay for X, frequency for Y). Preliminary experiences found multilinear scanning more difficult to perform but a user study with 14 participants (4 female, average age 28.64) found multilinear scan- ning to be significantly faster than linear scanning. Users could scan to a target on average in 8.20 seconds (σ=7.99). Based on these results we explored gesture-based interaction. A user study with 8 participants using multilinear scanning found that users could perform a thrust gesture in the direction the controller was pointing when pointed at the target, with an average aiming error of 20.74 degrees.

4.17 3D Scanning

For this study, instead of a plane, the user scans a frustum defined in front of them whose depth is defined by the length of the user’s arm and the location of the camera (see Figure 4.12). The back plane has a width that covers the entire horizontal range of the user’s arm when it rotates at the shoulder joint and its height is restricted by the camera’s resolution. Based on our prior experiences we identified two scanning strategies to explore for 3D target scanning: Multilinear scanning uses directional vibrotactile feedback on each axis to indi- cate the target’s location (see Figure 4.13). Prior studies found users could do this on two axes simultaneously and this approach adds directional vibrotactile feedback on the Z-axis. The user can find the direction of error in one gesture by moving the controller in any of the eight directions that are defined by x,y,z with X, Y, Z either positive or negative. If the direction of error on all axes is known, the user can scan directly to the target. Projected scanning exploits stereognosis, e.g., the human perception of the shape of a 3D object using touch. Our prior study found users were able to provide a thrust 118 gesture in the direction of the found target. Projected scanning involves two steps: (1) the direction to the target is acquired in 2D by rotating the controller along its X and Y axis as indicated using directional vibrotactile feedback; and (2) the user then moves the controller along a projected axis (P) that is defined by the controller’s elongated shape and its current orientation (see Figure 4.13). Directional vibrotactile feedback indicates how far to move along the P axis. Because rotations can be performed faster than moving along axes, we assume this technique to be faster.

4.18 3D Target Selection Methods

4.18.1 Instrumentation

Our tactile-proprioceptive display is facilitated using the Sony Move [7] platform. The user holds a controller in their dominant hand to scan the frustum. This controller provides directional vibrotactile feedback to indicate the target’s X (pulse delay) and Y (frequency) position. Because a Move controller is restricted to only provide two types of directional feedback, a second controller is used in the user’s other hand to indicate the target’s Z position (multilinear) or P position (projected) using frequency modulation. For both types of directional vibrotactile feedback we boosted the on- target cue with 25% compared with the close-to-target cue, based on a previous study with target selection [82]. A game involving 3D target selection was developed, which was inspired by a scene from Star Wars where Luke Skywalker learns to sense the location of an orb and destroy it without using vision or hearing. Players must touch the target to destroy it and the faster players destroy it the more points they score. The game runs on a laptop and communicates with a PlayStation 3 to retrieve the current position and orientation of a Move controller and to control its vibrotactor. The camera used 75 degree field of view with a 640x480 resolution (60 fps). The controller’s X and Y are reported in pixels; its Z in millimeters; and its orientation in quaternions. The size of our target was informed by a previous study investigating target size for 1D scanning that found a target size of 15 degrees to be most effective 119 for a range of 90 degrees (e.g. 16% of the search space). A target size of 200 pixels was chosen for the X and Y axes and for the P/Z axis a target size of a 200 mm was used. As such, the volume of the target occupied approximately 13% of the search space based on the average length of an adult arm. Targets were defined at random locations within the frustum. If the user moves (multilinear) or points (projected) the controller out of the frustum, vibrotactile feedback on all axes would be stopped. Due to the camera’s aspect ratio, the user is more likely to move or point outside of the Y-range, which motivated us to use frequency for that axis. If the controller’s coordinates are within 100 pixels/mm of the target for one second, the target is destroyed, a sound effect is played, score is announced, and a new random target is generated. Background music was played to mask the sound produced by the vibrotactor.

4.18.2 Participants

We recruited 12 students (6 male, average age 29.50, σ=3.26) for our user study. All were right handed and none had any self-reported impairments in tactile perception or motor control. We measured players’ height (M=170.81 cm, σ=7.59) and arm’s length (M=59.21 cm, σ=2.98).

4.18.3 Procedure

A between subjects study was performed with six randomly selected participants playing the game using multilinear scanning and the others using projected scanning. User studies took place in a small office room. Participants played the game using their dominant arm while standing. Due to players having different height and arm length, an application that is part of the Move SDK was used to calibrate the position of the player and define the size of the frustum. Players were placed facing the camera. Using a visual task that was displayed on the laptop screen, players would be positioned as such to ensure the full horizontal range of their arm at the shoulder joint would match the horizontal resolution of the camera, e.g., the display ranges 180 120

degrees by 135 degrees (due to camera’s aspect ratio). The player would then stretch their arm forward and press the trigger on the controller to define the depth of the frustum. Once the position of the player was calibrated we required players to stand on a piece of paper and the display was turned off. Players were then instructed what the goal of the game was and how to play the game using projected or multilinear scanning and how to find the direction of error on the axes. For projected scanning players were suggested to start their search with the controller close to their chest. For multilinear scanning no specific start position was suggested. When they felt familiar with their scanning technique and they could successfully scan to a target, players would play the game until 20 targets were hit. Players received $5 for their participation. All targets and all locations and orientations of the controller were recorded in a log file.

Table 4.3: Mean corrected search time (and stdev) for each axis (mm/ms) AXIS PROJECTED MULTILINEAR X .162(.411) .213(1.15) Y .179(.444) .296(1.628) P/Z .049(.097) .151(.608)

4.19 3D Target Selection Results

The average search time for a target was 16,682 ms (σ= 19,151) for multilinear and 20,747 ms (σ=20,412) for projected scanning. To accommodate for targets being generated at random, we computed search time corrected for target distance. X and Y are reported in pixels but we convert from pixels to millimeter using the user’s arm length, which yields corrected search times of .100 mm/ms (σ=.182) for multilinear scanning and .056 mm/ms (σ=.049) for projected scanning. This difference was found to be statistically significant (T2,126 = 2.458; p < .05). We analyzed search performance for each axis by calculating the corrected search time (for target distance) based on the last time the target border was crossed. In 13.75% of the search attempts 121 the player was already within or extremely close to the target range and remained in the target range, which results in significant outliers for that axis. We cap outliers for a single axis at 1:0 and Table 4.3 lists the results. A one-way MANOVA found a statistically significant difference between pro- jected and multilinear scanning for corrected search time on each axis (F2.236 = 2.739, p < .05, Wilk’s λ = .966, partial e2 = .034). Post hoc tests using the Bonferroni cor- rection revealed no statistically significant difference between corrected search times for X (p=.484) and Y (p=.301) axes between both scanning types, but for the P/Z axis multilinear scanning was significantly faster than projected scanning (p < .05). We then analyzed corrected search time for each axis within each scanning type. A

Welch’s ANOVA found a significant difference between corrected search times (F2,173 = 8.755, p < .05) between axes for projected scanning. A Games-Howel posthoc test revealed a significant difference in corrected search times between X and P (p = .011) and Y and P (p = .006). No significant difference between corrected search times for each axis for multilinear scanning (F2,357 = .442, p > .05) was found, which also indicates that there was no significant effect on scanning performance for type of directional vibrotactile feedback. We then analyzed all collected traces of the controller’s position. For multilinear scanning we found users spent an average of 426ms (σ=39) moving the controller outside of the frustum. For projected scanning users pointed their controller 2,175 ms (σ=1482) outside the frustum. This difference was statistically significant (T2,5.007 = 2.830; p < .05). Closer analysis found for projected scanning this to predominantly to occur on the Y-axis and for multilinear on both axes. Figure 4.14 shows a typical trace for each technique. For projected scanning (right), we found that users had problems following the projected axis especially when the target was defined at the edge of frustum. After finding the X,Y coordinate of the target, users would often deviate from either of these while following the projected axis P, which explains its significant higher corrected search time. For multilinear scanning, users would initially find the direction of error on all three axes but they 122 would often scan the X,Y-axis simultaneously before scanning the Z-axis. Towards the end of the game some users were able to move to the target on all axes simultaneously, which could indicate that being able to do requires practice. Search time as a function of the index of difficulty where only target distance is included due to our target’s fixed width found a Pearson correlation of .201 for pro- jected scanning and .011 for multilinear scanning. This indicates a poor fit with Fitts’s law [34], which is understandable as acquiring a target using a tactile-proprioceptive display is fundamentally different than traditional visual-based target selection tasks.

4.20 3D Target Selection Discussion and Future Work

Contrary to our expectations, multilinear scanning was faster than projected scan- ning. No significant difference in corrected search times for X and Y between each technique was found, which contradicts our initial assumption that scanning for X and Y target coordinates using rotations could be performed faster than moving the controller along axes. For projected scanning, in addition to users taking more time to follow the projected axis, users spend four times more time searching for the target outside of the frustum. With multilinear scanning the search space is physically de- limited by the length of the user’s arm, which may allow for these users to developed a better feel for the size of the search space. Contrasting our results with previous and our prior results with tactile-proprioceptive displays we make the following observation. Ahmaniemi [13] found an average target search time of 1.8 seconds for a 90 degree horizontal display. Extrapolating this re- sult to match the range of the display that we used (180 degrees) in our studies, we observe that search time doubles each time an axis is added to the target search, e.g., 3.6s , 8.2s , 16.7s. This observation is not entirely valid as the search space on each axis was not entirely equal. The length of the user’s arm and the resolution of the camera used define the size of our search space. As a result the search space on the X,Y-axes is almost 123 twice the size of the search space on the Z-axis. Though we corrected search time for target distance it doesn’t really allow for a fair comparison of search efficiency between axes (especially considering the Z-axis). A benefit of a frustum sized search space that is naturally constrained by the length of the user’s arm which avoids users searching outside of the search space, which is not the case for uniform search spaces. Future research will investigate different sizes for the search space as well as explore different target sizes, which may be useful in determining how many unique targets can be rendered on the display as to determine its effective resolution. Another area that we deem interesting to explore is whether users can use two proprioceptive displays simultaneously, which may allow for increasing the horizontal range of the display to 270 degrees, nearly doubling the size of search space. Occlusion problems with tracking each controller could be avoided by using two cameras, but a different solution has to be developed to for three types of directional haptic feedback due to the current limitations of the Move controller. We implemented our tactile-proprioceptive display using a camera, which does not make it very useful for mobile contexts where ear and eye free interaction is often desired, for example, as part of a navigation system. Current smartphones often feature gyroscopes, which allow for accurately determining their exact orientation. Future work will also focus on exploring camera- less facilitation of tactile-proprioceptive display using inertial sensing. 124

Figure 4.9: Search time plotted against index of difficulty (distance to target) for both scanning techniques. Upper figures represent all the data and the lower figures have 30% of the least representative items filtered out. 125

Figure 4.10: Average search time plotted against target location

Figure 4.11: Distance from target prior to performing gesture plotted against the target distance from center. 126

Figure 4.12: Directional vibrotactile feedback guides the user to move their controller to the 3D target. Green frustum indicates the display size.

Figure 4.13: Left: multilinear scanning, where the user moves the controller along the X-, Y- and Z-axis as indicated using directional vibrotactile feedback. Right: projected scanning, where the user first rotates the controller along its X- and Y-axis to point it at the target upon which the user moves along the projected axis to select the target. 127

Figure 4.14: Typical scanning strategies for multilinear scanning (left) and projected scanning (right). 128

Chapter 5

Conclusions and Future Work

This research shows that NV NUIs are possible and have many benefits. Ten research questions were posed at the beginning of this dissertation, and the following results were found. Exergames were created that allowed people who are blind to achieve moderate levels of physical activity (Q1). A combination of audio and haptic cues were found to be more efficient than audio cues alone when communicating to a user how to interact with an exergame (Q2). Non-visual modalities, specifically haptic vibrations through the use of a game controller, were used to orient a player while playing a virtual game of bowling (Q3). No significant difference was found in the energy expenditure in an exergame involving one arm when compared to an exergame involving both arms (Q4). However, an exergame involving both arms did show a significant increase in error rates when compared to an exergame involving the dominant arm only (Q5). Player surveys showed a very high enjoyment level of the Pet-N-Punch game (Q6). Kinect sports was made accessible to a person with VI through Real Time Sensory Substitution (Q7). Cell phones were shown to be used as a proprioceptive device in the Twist-N-Lock game (Q8). Multilinear scanning was found to be more effective when searching for a target in 2D space while being limited to non-visual modalities (Q9). Multilinear scanning also proved to be most effective while searching for a target in 3D space (Q10). This research has laid the groundwork for the following future research. 1. Increase Energy Expenditure. It has been shown that people were able to achieve moderate levels of physical activity, however vigorous levels of physical activ- 129 ity were never achieved. Using the techniques found in the target acquisition studies, it may be possible to create an exergame that utilizes the entire body which may create more physical activity. For example, the 2D target finding example may allow people to find a target and then vigorously punch at that target. The 3D target finding example could allow for players to move forwards and backwards searching for a specific space in 3D space to perform a motion. Using all three dimensions could get the legs as well as the arms involved in the exercise routine. 2. Closed-Captioning haptic based interface to augment commercially available video games with extra haptic and audio cues to assist those with VI. Although closed- captioning has become standard to assist those with hearing impairments watch tele- vision, there has been little progress in assisting those with visual impairments to have equal access to video games. The real time sensory substitution showed that an unmodified commercially available game can be augmented with additional vibra- tions or sounds in order to make a game playable to a person with VI when it was previously unplayable. 3. Non-visual electronic device interaction. The Twist-N-Lock game showed that players were able to scan and locate 4,6, and 8 orientations of the phone and as a result used their bodies as a proprioceptive output device when the display was not available. This could be enhanced to provide more than 8 pieces of information. If only 4 different patterns were available at all 8 of the orientations, there would be 32 combinations of vibrations and positions. This could give enough information for the entire english alphabet. Also these vibrations and positions could be reversed, as the user could orientate the device and tap a haptic pattern onto the screen which would create an eye free and ear free input method by use of proprioception combined with the user touching the screen. 130

Bibliography

[1] Center for disease control, how much activity do you need? http://www.cdc.gov/physicalactivity/everyone/guidelines/children.html. [2] Microsoft kinect, http://www.xbox.com/en-US/kinect. [3] Nintendo wii, http://www.nintendo.com/wii. [4] Nintendo wii sports, http://www.nintendo.com/games/detail/ 1OTtO06SP7M52gi5m8pD6CnahbW8CzxE. [5] Phillips respironics, http://actical.respironics.com/. [6] Sony eyetoy kinetic, http://www.us.playstation.com/PS2/Games/EyeToy_ Kinetic. [7] Sony playstation move controller, http://www.us.playstation.com/ps3/ playstation-move/. [8] Sony reveals what makes playstation move tick, . cbs interactive, http://gdc.gamespot.com/story/6253435/ sony-reveals-what-makes-playstation-move-tick. [9] Xsens mvn inertial motion capture suit, http://www.xsens.com/en/general/ mvn. [10] Wii yourself! http://wiiyourself.gl.tter.org/, 2010. [11] M. A. Adams, S. J. Marshall, L. Dillon, S. Caparosa, E. Ramierz, J. Phillips, and G. J. Norman. A theory-based framework for evaluating exergames as persuasive technology. In Proc of Persuasive 09, pages 1–8, 2009. [12] Marc A. Adams, Simon J. Marshall, Lindsay Dillon, Susan Caparosa, Ernesto Ramirez, Justin Phillips, and Greg J. Norman. A theory-based framework for evaluating exergames as persuasive technology. In Persuasive ’09: Proceedings of the 4th International Conference on Persuasive Technology, pages 1–8, New York, NY, USA, 2009. [13] Teemu Ahmaniemi and Voukko Lantz. Augmented reality target finding based on tactile cues. In Proceedings of the 2009 international conference on Multi- modal interfaces, ICMI-MLMI 09, 12(1):335–342, 2009. 131

[14] B.E. Ainsworth, W.L. Haskell, M.C. Whitt, and M.L. Irwin. Compendium of physical activities: an update of activity codes and met intensities. In Com- pendium of Physical Activities: an update of activity codes and MET intensities, pages 498–516. Med. Sci. Sports Exerc. Vol 32 No. 9, 2000. [15] Troy Allman, Rupinder K. Dhillon, Molly A.E. Landau, and Sri H. Kurniawan. Rock vibe: Rock band R computer games for people with no or limited vision. In Assets ’09: Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility, pages 51–58, New York, NY, USA, 2009. ACM. [16] Tomohiro Amemiya and Hisashi Sugiyama. Haptic handheld wayfinder with pseudo-attraction force for pedestrians with visual impairments. In Assets ’09: Proceedings of the 11th international ACM SIGACCESS conference on Com- puters and accessibility, pages 107–114, New York, NY, USA, 2009. ACM. [17] Matthew T. Atkinson, Sabahattin Gucukoglu, Colin H. C. Machin, and Adrian E. Lawrence. Making the mainstream accessible: redefining the game. In sandbox ’06: Proceedings of the 2006 ACM SIGGRAPH symposium on Videogames, pages 21–28, New York, NY, USA, 2006. ACM. [18] Michel Audiffren, Phillip D Tomporowski, and James Zagrodnik. Acute aerobic exercise and information processing: energizing motor processes during a choice reaction time task. Acta Psychol, 129(3):410–9, Nov 2008. [19] Paul Bach-y Rita and Stephen W Kercel. Sensory substitution and the human- machine interface. Trends Cogn Sci, 7(12):541–6, Dec 2003. [20] Eelke Folmer Bei Yuan, Manjari Sapre. Seek-n-tag: A game for labeling and classifying virtual world objects. In Graphics Interface. Canadian Information Processing Society, To appear, 2010. [21] Nadia Bianchi-Berthouze, Whan Woong Kim, and Darshak Patel. Does body movement engage you more in digital game play? and why? In ACII ’07: Proceedings of the 2nd international conference on Affective Computing and In- telligent Interaction, pages 102–113, Berlin, Heidelberg, 2007. Springer-Verlag. [22] Elaine Biddiss and Jennifer Irwin. Active video games to promote physical activity in children and youth: a systematic review. Arch Pediatr Adolesc Med, 164(7):664–72, Jul 2010. [23] I. Bogost. The rhetoric of exergaming. In Digital Arts and Cultures (DAC) Conference. IT University Copenhagen, December 2005. [24] Andrew Chan, Karon MacLean, and Joanna McGrenere. Learning and identi- fying haptic icons under workload. In Proceedings of the First Joint Eurohaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, WHC ’05, pages 432–439, Washington DC, USA, 2005. IEEE. 132

[25] L G Cohen, P Celnik, A Pascual-Leone, B Corwell, L Falz, J Dambrosia, M Honda, N Sadato, C Gerloff, M D Catal´a,and M Hallett. Functional rele- vance of cross-modal plasticity in blind humans. Nature, 389(6647):180–3, Sep 1997. [26] R A Cooper, L A Quatrano, P W Axelson, W Harlan, M Stineman, B Franklin, J S Krause, J Bach, H Chambers, E Y Chao, M Alexander, and P Painter. Research on physical activity and health among people with disabilities: a consensus statement. J Rehabil Res Dev, 36(2):142–54, Apr 1999. [27] Chris Crawford. Chris Crawford on Game Design. New Riders Games, June 2003. [28] Joan De Boeck, Erwin Cuppens, Tom De Weyer, Chris Raymaekers, and Karin Coninx. Multisensory interaction metaphors with haptics and proprioception in virtual environments. In Proceedings of the third ACM Nordic Conference on Human-Computer Interaction (NordiCHI 2004), pages 189–197, New York, NY, USA, 2004. ACM. [29] Joan De Boeck, Chris Raymaekers, and Karin Coninx. Exploiting proprio- ception to improve haptic interaction in a virtual environment. In Presence: Teleoperators and Virtual Environments, pages 627–636, New York, NY, USA, 2006. [30] Paul Dourish. Where the Action Is: The Foundations of Embodied Interaction (Bradford Books). The MIT Press, 2004. [31] Martijn van Welie Eelke Folmer and Jan Bosch. Bridging patterns - an ap- proach to bridge gaps between se and hci. Journal of Information and Software Technology, 48:69–89, 2006. [32] Leonard H Epstein, Meghan D Beecher, Jennifer L Graf, and James N Roem- mich. Choice of interactive dance and bicycle games in overweight and nonover- weight youth. Ann Behav Med, 33(2):124–31, Apr 2007. [33] Jan B. F. Van Erp, Hendrik A. H. C. Van Veen, Chris Jansen, and Trevor Dobbins. Waypoint navigation with a vibrotactile waist belt. ACM Trans. Appl. Percept., 2(2):106–117, 2005. [34] P.M. Fitts. The information capacity of the human motor system in controlling the amplitude of movement. J Exp Psychol., 47(6):381–391, 1954. [35] E Gardner and J Martin. Coding of sensory information. Principles of Neural Science, fourth edition, 21(1):411–429, 2000. [36] Barbara Gasperetti, Matthew Milford, Danielle Blanchard, Stephen P. Yang, Lauren Lieberman, and John T. Foley. Dance dance revolution and eyetoy kinetic modifications for youth with visual impairments. The Journal of Physical Educaiton, Recreation and Dance, 2010. 133

[37] M Glickstein, S Buchbinder, and J L May, 3rd. Visual control of the arm, the wrist and the fingers: pathways through the brain. Neuropsychologia, 36(10):981–1001, Oct 1998. [38] Eitan Glinert and Lonce Wyse. Audiodyssey: an accessible video game for both sighted and non-sighted gamers. In Future Play ’07: Proceedings of the 2007 conference, pages 251–252, New York, NY, USA, 2007. ACM. [39] Diana L Graf, Lauren V Pratt, Casey N Hester, and Kevin R Short. Play- ing active video games increases energy expenditure in children. Pediatrics, 124(2):534–40, Aug 2009. [40] Lee Graves, Gareth Stratton, N. D. Ridgers, and N. T. Cable. Comparison of energy expenditure in adolescents when playing new generation and sedentary computer games: cross sectional study. BMJ, 335(7633):1282–1284, December 2007. [41] Chris Harrison and Scott Hudson. Minput: enabling interaction on small mobile devices with high-precision, low-cost, multipoint optical tracking. In CHI ’10: Proceedings of the 28th international conference on Human factors in computing systems, pages 1661–1664, New York, NY, USA, 2010. ACM. [42] Chris Harrison and Scott E. Hudson. Scratch input: creating large, inexpensive, unpowered and mobile finger input surfaces. In UIST ’08: Proceedings of the 21st annual ACM symposium on User interface software and technology, pages 205–208, New York, NY, USA, 2008. ACM. [43] Chris Harrison, Desney Tan, and Dan Morris. Skinput: appropriating the body as an input surface. In CHI ’10: Proceedings of the 28th international conference on Human factors in computing systems, pages 453–462, New York, NY, USA, 2010. ACM. [44] Daniel P Heil. Predicting activity energy expenditure using the actical activity monitor. Res Q Exerc Sport, 77(1):64–80, Mar 2006. [45] Ken Hinckley, Jeff Pierce, Make Sinclair, and Eric Horvitz. Sensing techniques for mobile interaction. In In UIST ’00: Proceedings of the 13th annual ACM symposium on user interface software and technology, pages 91–100, New York, NY, USA, 2010. ACM. [46] Johanna Hoysniemi. International survey on the dance dance revolution game. Comput. Entertain., 4(2):8, 2006. [47] Scott Hudson and Chris Harrison. Whack gestures: inexact and inattentive interaction with mobile devices. In TEI ’10: Proceedings of the fourth inter- national conference on Tangible, embedded, and embodied interaction, pages 109–112, New York, NY, USA, 2010. ACM. [48] Wijnand Ijsselsteijn, Henk Herman Nap, Yvonne de Kort, and Karolien Poels. Digital game design for elderly users. In Future Play ’07: Proceedings of the 2007 conference on Future Play, pages 17–22, New York, NY, USA, 2007. ACM. 134

[49] Shaun K. Kane, Jeffrey P. Bigham, and Jacob O. Wobbrock. Slide rule: mak- ing mobile touch screens accessible to blind people using multi-touch interaction techniques. In Proceedings of the 10th international ACM SIGACCESS confer- ence on Computers and accessibility, Assets ’08, pages 73–80, New York, NY, USA, 2008. ACM. [50] D. Kendzierski and K. J. DeCarlo. Physical activity enjoyment scale: Two validation stidies. Journal of Sport and Exercise, (13):50–64, 1991. [51] Jaakko Keranen, Janne Bergman, and Jarmo Kauko. Gravity sphere: gestural audio-tactile interface for mobile music exploration. In Proccedings of the 27th international conference on Human factors in computing systems CHI ’09, pages 1531–1534, New York, NY, USA, 2009. ACM. [52] Jungsoo Kim, Jiasheng He, Kent Lyons, and Thad Starner. The gesture watch: A wireless contact-free gesture based wrist interface. In ISWC ’07: Proceedings of the 2007 11th IEEE International Symposium on Wearable Computers, pages 1–8, Washington, DC, USA, 2007. IEEE Computer Society. [53] Lorraine Lanningham-Foster, Teresa B Jensen, Randal C Foster, Aoife B Red- mond, Brian A Walker, Dieter Heinz, and James A Levine. Energy expenditure of sedentary screen time compared with active screen time for children. Pedi- atrics, 118(6):e1831–5, Dec 2006. [54] Scott T Leatherdale, Sarah J Woodruff, and Stephen R Manske. Energy ex- penditure while playing active and inactive video games. Am J Health Behav, 34(1):31–5, 2010. [55] Frank Chun Yat Li, David Dearman, and Khai N. Troung. Virtual shelves: interactions with orientation aware devices. In UIST ’09: Proceedings of the 22nd annual ACM symposium on User interface software and technology, pages 125–128, New York, NY, USA, 2009. ACM. [56] Frank Chun Yat Li, David Dearman, and Khai N. Troung. Leveraging propri- oception to make mobile phones more accessible to users with visual impair- ments. In Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility, ASSETS ’10, pages 187–194, New York, NY, USA, 2010. ACM. [57] W Li, G Bebis, and N Bourbakis. 3d object recognition using 2d views. In IEEE Transactions on Image Processing, 2009 (in press). [58] Debra Lieberman. Dance games and other exergames: What the research says, http://www.comm.ucsb.edu/faculty/lieberman/exergames.htm, 2007. [59] J. Lieberman and C. Breazeal. Tikl: Development of a wearable vibrotac- tile feedback suit for improved human motor learning. IEEE Transactions on Robotics, 23(5):919–926, 2007. [60] L. Lieberman and E. McHugh. Health-related fitness of children who are visually impaired. Journal of Visual Impairment & Blindness, 95:272–287, 2001. 135

[61] L. J. Lieberman, C. Houston-Wilson, and F. Kozub. Perceived barriers to in- cluding students with visual impairments in general physical education. Adapted Physical Activity Quarterly, 19(3):364–377, 2002. [62] L. J. Lieberman, B. Robinson, and Heidi Rollheiser. Youth with visual impair- ments: Experiences within general physical education. Rehabilitation Education for Blindness and Visual Impairment, 38(1):35–48, 2006. [63] Lauren J. Lieberman and Cathy Houston. Strategies for Inclusion: A Handbook for Physical Educators (2nd Edition). Wilson Publishing Company, 2009. [64] L.J. Lieberman. Adapted Physical Education and Sport, chapter Visual Impair- ments. Human Kinetics, Champaign, IL, 2005. [65] L.J. Lieberman. Adapted Physical Education and Sport (5th ed.). Chicago: Human Kinetics, 2011. [66] James J. Lin, Lena Mamykina, Silvia Lindtner, Gregory Delajoux, and Henry B. Strub. Fish ’n’ steps: Encouraging physical activity with an interactive com- puter game. In UbiComp 2006: Ubiquitous Computing, pages 261–278, 2006. [67] P Longmuir. Considerations for fitness appraisal, programming, and counselling of individuals with sensory impairments. Can J Appl Physiol, 23(2):166–84, Apr 1998. [68] P. E. Longmuir and O. Bar-Or. Factors influencing the physical activity lev- els of youths with physical and sensory disabilities. Adapted Physical Activity Quarterly, 17:40–53, 2000. [69] Joseph Luk, Jerome Pasquero, Shannon Little, Karon MacLean, Vincent Levesque, and Vincent Hayward. Texture displays: a passive approach to tac- tile presentation. In CHI ’06: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 171–180, New York, NY, USA, 2006. ACM. [70] Charlotte Magnusson, Miguel Molina, Kirsten Rassmus-Grohn, and Delphine Szymczak. Pointing for non-visual orientation and navigation. In Proceed- ings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries, NordiCHI 10, 12(1):735–738, 2010. [71] Charlotte Magnusson, Kirsten Rassmus-Grohn, and Delphine Szymczak. Scan- ning angles for directional pointing. In Proceedings of the 12th international conference on Human computer interaction with mobile devices and services, MobileHCI 10, 12(1):399–400, 2010. [72] B.E. McHugh and L.J. Lieberman. The impact of developmental factors on in- cidence of stereotypic rocking among children with visual impairments. Journal of Visual Impairment & Blindness, 97:453–474, 2003. [73] Daniel Miller, Aaron Parecki, and Sarah A. Douglas. Finger dance: a sound game for blind people. In Assets ’07: Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility, pages 253–254, New York, NY, USA, 2007. ACM. 136

[74] J Miller. Divided attention: evidence for coactivation with redundant signals. Cogn Psychol, 14(2):247–79, Apr 1982. [75] Mark Mine, Frederick Brooks, and Carlo Sequin. Moving objects in space: exploiting proprioception in virtual-environment interaction. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’97, pages 19–26, New York, NY, USA, 1997. ACM. [76] Tony Morelli, John Foley, Luis Columna, Lauren Lieberman, and Eelke Folmer. VI-Tennis: a vibrotactile/audio exergame for players who are visually impaired. In Proceedings of Foundations of Digital Interactive Games (FDG’10), Mon- terey, California, 2010. [77] Tony Morelli, John Foley, and Eelke Folmer. VI bowling: A tactile spatial ex- ergame for individuals with visual impairments. In Assets ’10: Proceedings of the 12th international ACM SIGACCESS conference on Computers and acces- sibility, Orlando, Florida, 2010. [78] Florian Mueller, Stefan Agamanolis, and Rosalind Picard. Exertion interfaces: sports over a distance for social bonding and fun. In CHI ’03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 561–568, New York, NY, USA, 2003. ACM. [79] Cliona Ni Mhurchu, Ralph Maddison, Yannan Jiang, Andrew Jull, Harry Pra- pavessis, and Anthony Rodgers. Couch potatoes to jumping beans: A pilot study of the effect of active video games on physical activity in children. Int J Behav Nutr Phys Act, 5:8, 2008. [80] Magda Nikolaraizi and Neil De Reybekiel. A comparative study of children’s attitudes towards deaf children, children in wheelchairs and blind children in greece and in the uk. European Journal of Special Needs Education, 16(2):167 – 182, 2001. [81] B. Nirje. The basis and logic of the normalization principle. Journal of Intel- lectual & Developmental Disability, 11(2):65–68, 1985. [82] Tal Oron-Gilad, J.L. Downs, R.D. Gilson, and P.A. Hancock. Vibrotactile guidance cues for target acquisition. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 5:993–1004, 2007. [83] F G Paas and J J Adam. Human information processing during physical exer- cise. Ergonomics, 34(11):1385–97, Nov 1991. [84] J. R. Parker and J. Heerema. Musical interaction in computer games. In Future Play ’07: Proceedings of the 2007 conference on Future Play, pages 217–220, New York, NY, USA, 2007. ACM. [85] Michael W Parsons, Deborah L Harrington, and Stephen M Rao. Distinct neural systems underlie learning visuomotor and spatial representations of motor skills. Hum Brain Mapp, 24(3):229–47, Mar 2005. 137

[86] Ivan Poupyrev, Shigeaki Maruyama, and Jun Rekimoto. Ambient touch: de- signing tactile interfaces for handheld devices. In UIST ’02: Proceedings of the 15th annual ACM symposium on User interface software and technology, pages 51–60, New York, NY, USA, 2002. ACM. [87] Roope Raisamo, Saija Patom¨aki,Matias Hasu, and Virpi Pasto. Design and evaluation of a tactile memory game for visually impaired children. Interact. Comput., 19(2):196–205, 2007. [88] Bertis Rasco. Where’s the wiimote? using kalman filtering to extract ac- celerometer data, http://www.gamasutra.com/view/feature/1494/wheres_ the_wiimote_using_kalman_.php. , 2007. [89] James A Rimmer and Jennifer L Rowland. Physical activity for youth with disabilities: a critical need in an underserved population. Dev Neurorehabil, 11(2):141–8, 2008. [90] Simon Robinson, Parisa Eslambolchilar, and Matt Jones. Sweep-shake: finding digital resources in physical environments. In Proceedings of the 11th Interna- tional Conference on Human-Computer Interaction with Mo- bile Devices and Services, MobileHCI 09,, 12(1):1–10, 2009. [91] David A. Ross and Bruce B. Blasch. Development of a wearable computer orientation system. Personal Ubiquitous Comput., 6(1):49–63, 2002. [92] Allison Sall and Rebecca E. Grinter. Let’s get physical! in, out and around the gaming circle of physical gaming at home. Comput. Supported Coop. Work, 16:199–229, April 2007. [93] Jaime S´anchez, Nelson Baloian, Tiago Hassler, and Ulrich Hoppe. Audiobattle- ship: blind learners collaboration through sound. In CHI ’03: CHI ’03 extended abstracts on Human factors in computing systems, pages 798–799, New York, NY, USA, 2003. ACM. [94] Katie Sell, Tia Lillie, and Julie Taylor. Energy expenditure during physically interactive video game playing in male college students with different playing experience. J Am Coll Health, 56(5):505–11, 2008. [95] D.R Shapiro, L Moffett, L. Lieberman, and G.M. Drummer. Perceived com- petence of children with visual impairments. Journal of Visual Impairment & Blindness, 99(1):15–25, 2005. [96] M.E. Stuart, L.J. Lieberman, and K.E. Hand. Beliefs about physical activity among children who are visually impaired and their parents. Journal of Visual Impairment and Blindness, 100(4):223–234, 2006. [97] Sylvie Vidal and Gr´egoireLefebvre. Gesture based interaction for visually- impaired people. In Proceedings of the 6th Nordic Conference on Human- Computer Interaction: Extending Boundaries, NordiCHI ’10, pages 809–812, New York, NY, USA, 2010. ACM. 138

[98] Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popovi´c. Practical motion capture in everyday surroundings. In SIGGRAPH ’07: ACM SIGGRAPH 2007 papers, page 35, New York, NY, USA, 2007. ACM. [99] Elizabeth Wack and Stacey Tantleff-Dunn. Relationships between electronic game play, obesity, and psychosocial functioning in young men. Cyberpsychol Behav, 12(2):241–4, Apr 2009. [100] Stuart J Warden, Robyn K Fuchs, Alesha B Castillo, Ian R Nelson, and Charles H Turner. Exercise when young provides lifelong benefits to bone struc- ture and strength. J Bone Miner Res, 22(2):251–9, Feb 2007. [101] Stephen Yang and John T. Foley. Comparison of mvpa while playing ddr and eyetoy. Research Quarterly for Exercise and Sport, 79(1):17, 2008. [102] Bei Yuan and Eelke Folmer. Blind hero: enabling guitar hero for the visually impaired. In Assets ’08: Proceedings of the 10th international ACM SIGAC- CESS conference on Computers and accessibility, pages 169–176, New York, NY, USA, 2008. ACM. [103] Bei Yuan, Eelke Folmer, and Frederick C. Harris. Game accessibility; a survey. Universal Access in the Information Society, 10(1), 2010.