Camera Phone Based Motion Sensing: Interaction Techniques, Applications and Performance Study
Total Page:16
File Type:pdf, Size:1020Kb
Camera Phone Based Motion Sensing: Interaction Techniques, Applications and Performance Study Jingtao Wang♦ Shumin Zhai§ John Canny♦ ♦Computer Science Division §IBM Almaden Research Center UC Berkeley, 387 Soda Hall, Berkeley, CA, U.S.A 650 Harry Road, San Jose, CA, U.S.A. {jingtaow, jfc}@cs.berkeley.edu [email protected] ABSTRACT ability, but also imposes limitations on the interaction This paper presents TinyMotion, a pure software approach methods that can be used. While the computing and display for detecting a mobile phone user’s hand movement in real capabilities of mobile phones have increased significantly time by analyzing image sequences captured by the built-in in recent years, the input methods on phones largely remain camera. We present the design and implementation of Ti- button based. For basic voice functions, a button press nyMotion and several interactive applications based on based keypad is quite adequate. But mobile phones are TinyMotion. Through both an informal evaluation and a rapidly moving beyond voice calls into domains such as formal 17-participant user study, we found that 1. TinyMo- gaming, web browsing, personal information management, tion can detect camera movement reliably under most back- location services, and image and video browsing. Many of ground and illumination conditions. 2. Target acquisition these functions can greatly benefit from a usable analog tasks based on TinyMotion follow Fitts’ law and Fitts law input device. Mobile phones may take on even more com- parameters can be used for TinyMotion based pointing per- puting functions in the future if higher performance user formance measurement. 3. The users can use Vision Tilt- interfaces can be developed. We show in this paper that the Text, a TinyMotion enabled input method, to enter sen- built-in camera in mobile phones can be utilized as an input tences faster than MultiTap with a few minutes of practic- sensor enabling many types of user interactions. ing. 4. Using camera phone as a handwriting capture de- Various technologies have been proposed and tested to vice and performing large vocabulary, multilingual real improve interaction on mobile devices by enhancing ex- time handwriting recognition on the cell phone are feasible. pressiveness [11, 12, 18, 23, 24], or sensing contextual 5. TinyMotion based gaming is enjoyable and immediately features of the surrounding environment [12]. Accelerome- available for the current generation camera phones. We ters [12, 18, 23, 24], touch sensors [11, 12] and proximity also report user experiences and problems with TinyMotion sensors [12] have been used. While some of these tech- based interaction as resources for future design and devel- nologies may eventually make their way inside the phone, opment of mobile interfaces. they are rarely seen in phones today. ACM Classification: H5.2 [Information interfaces and presentation]: User Interfaces. - Graphical user interfaces; Input devices and strategies, Theory and methods General terms: Design, Human Factors, Performance Keywords: Input Techniques and Devices, Mobile Devices, Computer Vision, Mobile Phones, Camera Phones, Motion Estimation, Fitts’ Law, Human Performance, Handwriting Recognition, Gesture Recognition INTRODUCTION Mobile phones have become an indispensable part of our daily life. Their compact form has the advantage of port- Figure 1: Using TinyMotion enabled applications Permission to make digital or hard copies of all or part of this work for out-doors (left) and in-doors (right) personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies On the other hand, camera phones are already popular and bear this notice and the full citation on the first page. To copy otherwise, or pervasive. The global cell phone shipment in 2005 was 795 republish, to post on servers or to redistribute to lists, requires prior spe- million units, 57% of which (about 455 million units) were cific permission and/or a fee. UIST’06, October 15–18, 2006, Montreux, Switzerland. camera phones. It is predicted that 85% of the mobile Copyright 2006 ACM 1-59593-313-1/06/0010...$5.00. phones will be camera phones by 2008 with a shipment of 800 million units [21]. 101 We have developed a technique called TinyMotion (figure implicit biometric information. Specifically with regard to 1) for camera phones. TinyMotion detects the movements navigation, Rekimoto [18] used tilt input for navigating of a cell phone in real time by analyzing image sequences menus, maps, and 3-D scenes, and Harrison et al. [11] and captured by its built-in camera. Typical movements that Hinckley et al. [12] have used tilt for scrolling through TinyMotion detects include horizontal and vertical move- documents and lists. Peephole Displays [25] explored the ments, rotational movements and tilt movements. In con- pen interactions on spatially aware displays on a PDA. Ear- trast to earlier work, TinyMotion does not require addi- lier before the current generation of phones and PDAs, tional sensors, special scenes or backgrounds. A key con- Fitzmaurice et al. [5] explored spatially aware ‘Palmtop tribution of the paper is an experimental validation of the VR’ on a miniature handheld TV monitor. approach on a wide variety of background scenes, and a Computer Vision in Interactive Systems quantitative performance study of TinyMotion on standard Considering the amount of information captured by human target acquisition tasks and in real world applications. eyes, using computer vision in interactive systems has long RELATED WORK been a popular topic [7]. Much previous research in this Related work fall into three categories: emerging camera category covers multimodal interaction [6], gesture recog- phone applications, new interaction techniques for mobile nition, face tracking, body tracking [17] etc. There are also devices and computer vision based interaction systems. numerous systems that map certain types of user’s move- ments, for example, body, gesture, finger, face, and mouth Emerging Camera Phone Applications Inspired by the success of CyberCode[19] from SONY, movements into computer inputs. Please refer to [6, 17], several researchers[20] and companies have designed cus- which include some extensive survey in the related direc- tomized 2D barcodes that can be recognized easily by cam- tions, and [7] for some commonly used basic algorithms. era phones. Most of these systems measure the size, posi- However, most of those applications are built on powerful tion and angle of the barcode relative to the camera’s opti- desktop computers in controlled lab environments. cal axis, which can be used to infer the camera’s 3D posi- THE TINYMOTION ALGORITHM tion relative to the barcode, and provide an alternative spa- Computer vision techniques such as edge detection [9], tial input channel. SemaCode (http://semacode.org) is posi- region detection [7] and optical flow [13] can be used for tioned as a general purpose tagging solution, Spot- motion sensing. Ballagas et al [1] had implemented an opti- Code(a.k.a. ShotCode) has a feature that maps 2D bar cal flow based interaction method - “sweep” on camera codes to online URLs. On top of their barcode system Vis- phones. However, optical flow [13] is a gradient based ual Codes [20] have also built UI widgets to achieve desk- approach and it uses local gradient information and the top-level interaction metaphors. Visual Codes is also used assumption that the brightness pattern varies smoothly to to manipulate objects on large public displays interactively detect a dense motion field with vectors at each pixel. Due [1]. to the additional assumptions on gradient distribution and Hansen and colleagues [4] proposed the idea of a “mixed the smoothness of illumination, they are usually less robust interaction space” to augment camera phone interaction. than direct methods based on correlation or image differ- Their method relies on camera imaging of a light uniform ence. The latter are used in optical mice, video codecs etc, background with a circular marker. This image needs to be and we follow suit. TinyMotion has used both image differ- laid out on a suitable flat surface. 2D bar codes [1], Or- encing and correlation of blocks [8, 14] for motion estima- thogonal axis tags [16], high gradient still objects [3] and tion. human faces can all be used as markers to facilitate the The TinyMotion algorithm consists of four major steps: 1. tracking task. In contrast, TinyMotion provides general Color space conversion. 2. Grid sampling. 3. Motion esti- motion sensing from whatever scene the camera is pointed mation. 4. Post processing. All of these steps are realized at, near or far. efficiently by integer only operations. If we make assumptions on the texture/gradient distribution To use TinyMotion, the camera is set in preview mode, of surrounding environments, other vision based approach capturing color images at a resolution of 176x112 pixels, such as Projection Shift Analysis [2] or gradient feature at a rate of 12 frames/sec1. matching [9] could also provide real time motion detection Color Space Conversion on mobile devices. However, these approaches will fail After a captured image arrives, we use a bit shifting when these strong assumptions do not hold. E.g., image method (equation 2, an arithmetic approximation of equa- projection based approach won’t work on background with tion 1) to convert the 24-bit RGB color (20 effective bits in repeating patterns [2]. Many everyday backgrounds with each pixel for our specific camera phone used) to an 8-bit less gradient, e.g. floors, carpets and the sky, will make gray scale image. gradient feature detection infeasible. New Interaction Techniques for Mobile Devices Previous work has proposed many compelling interaction 1 Without displaying the captured image and additional computa- techniques based on physical manipulation of a small tion, the camera phones in our experiments can capture images screen device, including contact, pressure, tilt, motion and at the maximal rate of 15.2 frames/sec.