Handsee: Enabling Full Hand Interaction on Smartphones with Front Camera-Based Stereo Vision
Total Page:16
File Type:pdf, Size:1020Kb
CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK HandSee: Enabling Full Hand Interaction on Smartphones with Front Camera-based Stereo Vision Chun Yu123,Xiaoying Wei12, Shubh Vachher1, Yue Qin1, Chen Liang1, Yueting Weng1, Yizheng Gu12, Yuanchun Shi123 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Key Laboratory of Pervasive Computing, Ministry of Education, China 3Global Innovation eXchange Institute, Tsinghua University, Beijing, China {chunyu, shiyc}@tsinghua.edu.cn,{wei-xy17,vachhers10, y-qin15,c-liang15,guyz17}@mails.tsinghua.edu.cn Figure 1: (a) The right angle prism mirror placed on the front camera. (b) The space above the touchscreen that can be covered. (c) A sample image captured by the front camera. (d) The derived depth image of the touching hand and the gripping hand. ABSTRACT of novel interaction techniques and expands the design space We present HandSee, a novel sensing technique that can for full hand interaction on smartphones. capture the state and movement of the user’s hands touch- CCS CONCEPTS ing or gripping a smartphone. We place a right angle prism mirror on the front camera to achieve a stereo vision of the • Human-centered computing → Smartphones; Touch scene above the touchscreen surface. We develop a pipeline screens; Gestural input; to extract the depth image of hands from a monocular RGB image, which consists of three components: a stereo match- KEYWORDS ing algorithm to estimate the pixel-wise depth of the scene, a Smartphone interaction; full hand sensing; front camera; CNN-based online calibration algorithm to detect hand skin, stereo vision; touching hand; gripping fngers and a merging algorithm that outputs the depth image of the ACM Reference Format: hands. Building on the output, a substantial set of valuable in- Chun Yu,Xiaoying Wei, Shubh Vachher, Yue Qin, Chen Liang, Yuet- teraction information, such as fngers’ 3D location, gripping ing Weng, Yizheng Gu, Yuanchun Shi. 2019. HandSee: Enabling posture, and fnger identity can be recognized concurrently. Full Hand Interaction on Smartphones with Front Camera-based Due to this unique sensing ability, HandSee enables a variety Stereo Vision. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/ Permission to make digital or hard copies of all or part of this work for 3290605.3300935 personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components 1 INTRODUCTION of this work owned by others than ACM must be honored. Abstracting with Today, smartphone interaction is largely confned to the ca- credit is permitted. To copy otherwise, or republish, to post on servers or to pacitive surface of the touchscreen. Numerous works have redistribute to lists, requires prior specifc permission and/or a fee. Request explored methods of overcoming this limitation ranging from permissions from [email protected]. CHI 2019, May 4–9, 2019, Glasgow, Scotland UK enhancing the expressivity of touch (e.g. fnger identifcation © 2019 Association for Computing Machinery. [19] and posture [21, 55]), to leveraging grip posture as an ACM ISBN 978-1-4503-5970-2/19/05. $15.00 interaction context (e.g. interface shifting [8, 33]), as well as https://doi.org/10.1145/3290605.3300935 expanding the input space beyond the touchscreen surface Paper 705 Page 1 CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK (e.g. on the back [10, 54] or side [6, 8, 9] or around the device In the remainder of this paper, we frst review prior lit- [7, 22, 24]). However, most of these works focus on inter- erature on hand/fnger interaction and sensing. We then action design. They either use dedicated sensing systems describe the hardware design of HandSee, followed by our al- (e.g. Optitrack) or equip smartphones or users’ hands with gorithm pipeline. We move on to outline the design space and additional hardware sensors to provide specifc and limited describe novel interaction techniques with feedback from functionality. a preliminary user study. We conclude this research with a In this paper, we present HandSee, a compact sensing discussion on the practicality, limitations and directions for technique that can capture rich information about hands future work. and fngers interacting with a smartphone. We re-purpose the front camera by mounting a hypotenuse-coated right 2 RELATED WORK angle prism mirror on it, and direct it to look down along In this section, we frst review literature about enhancing the screen’s surface. As shown in Fig. 1, the feld of view hand/fnger interaction on smartphones. Meanwhile, we dis- (FOV) covers the gripping fngers and the entire touching cuss the sensing solutions in those works. We then give a hand. The prism mirror provides two optical paths through brief introduction of general hand/fnger sensing techniques, which the front camera can look outward. This creates two with a focus on camera-based ones. virtual cameras that form a stereo vision system, as shown in Fig. 1.a. The stereo vision adds depth information that can Enhancing Hand/Finger Interaction on Smartphones further augment the sense ability. Expanding Expressivity of Finger Touch on Screen. A straight- To capture depth images of hands, we develop a pipeline forward way to increase expressivity of touch is to leverage of computer vision algorithms, which consists of four compo- the state of touching fnger. TapSense [21] recognizes the dif- nents: an efcient skin segmentation with online threshold ferent parts of a human fnger (e.g. tip, pad, nail and knuckle) calibration, stereo matching that reconstructs the depth im- tapping on the screen by analyzing sounds resulting from the age of scene over the touchscreen, and a merging algorithm tapping impact. Xiao et al. [55] describe a method that esti- that derives the depth image of hands. Based on the output, mates the pitch and yaw of fngers relative to a touchscreen’s a set of valuable interaction information such as fngers’ 3D surface based on the raw capacitive sensor data. DualKey location, gripping posture, fnger identity can be derived, [19] instruments the index fnger with a motion sensor. It en- which enables a wide range of hand/fnger interaction tech- ables selection of letters on a miniature ambiguous software niques on smartphones. keyboard (e.g. a smartwatch) with diferent fngers. HandSee expands the space of full hand interaction on In addition, a few works explored leveraging the above- smartphones, which carries forward the idea that interprets screen space to improve interaction. Air+Touch [7] describes users’ intent beyond signals from the 2D capacitive screen the concept of interweaving on-screen touch and in-air ges- [24]. We re-outline the interaction space into three sub- tures to increase the expressivity of touch. The authors built spaces: Touching Hand Only, Gripping Hand Only and Hand- a prototype system with a depth camera. Thumbs-Up [22] to-Hand interaction. We propose a number of novel interac- presents a similar idea that is specifc to thumb input for tion techniques that fll in this space, and demonstrate the one-handed interaction. Pre-touch [24] researches the po- power of HandSee. Our user study shows that these tech- tential of leveraging the status of the approaching fnger, niques are well received by the users. They are easy to learn, by increasing the sensing range of capacitive touchscreen. convenient and fun to use. SegTouch [52] instruments the index fnger with a touchpad, and allows users to perform thumb slides on it to defne Specifcally, our contributions are threefold: various touch purposes. (1) A novel sensing scheme that captures both the touch- Interaction Beyond the Touchscreen. Researchers have ex- ing hands and gripping fngers on a smartphone. We plored extending smartphone interaction beyond the touch- achieve stereo vision by placing a prism mirror on top screen surface. Some of these works enable input on the of the front camera. side or the back of the device. [35] detects fnger taps on (2) A real-time pipeline to validate our setup’s computa- the sides of a smartphone using the built-in motion sensors. tional feasibility and compute the depth map of the BackXPress [10] places a pressure sensitive layer on the back user’s hands, based on which, valuable interaction in- of the device, which allows pressure input on the back to formation can be derived. augment the interaction with the remaining fngers on the (3) An expanded design space for full hand interaction on front. Back-Mirror uses a mirror to refect the back surface smartphones, as well as a number of novel interaction to the rear-facing camera of the phone, and recognizes hand techniques. gestures based on the visual pattern on the back surface. Paper 705 Page 2 CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK Others explored the 3D space around the device. Hover- Hand/Finger Tracking for General Purpose Flow [28] uses infrared proximity sensors to track hands in Accurate hand/fnger tracking is of great signifcance for the device’s proximity. It can sense coarse movement-based human computer interaction. Various tracking techniques gestures, as well as static position-based gestures. SideSight have been researched, such as using capacitive sensors [31], [4] embeds infra-red (IR) proximity sensors along the side of infrared signals [20], ultrasound [40], millimeter wave radar small device and supports single and multi-touch gestures in (i.e. Soli [32]), and monocular RGB camera [38] or depth cam- the space around the device.