Extended Multitouch: Recovering Touch Posture, Handedness, and User Identity Using a Depth Camera

Extended Multitouch: Recovering Touch Posture, Handedness, and User Identity using a Depth Camera Sundar Murugappana, Vinayaka, Niklas Elmqvistb, Karthik Ramania aSchool of Mechanical Engineering bSchool of Electric Engineering Purdue University, West Lafayette, IN 47907 [email protected] Figure 1. Extracting finger and hand posture from a Kinect depth camera (left) and integrating with a pen+touch sketch interface (right). ABSTRACT Author Keywords Multitouch surfaces are becoming prevalent, but most ex- Multitouch, tabletop displays, depth cameras, pen + touch, isting technologies are only capable of detecting the user’s User studies. actual points of contact on the surface and not the identity, posture, and handedness of the user. In this paper, we define the concept of extended multitouch interaction as a richer in- INTRODUCTION put modality that includes all of this information. We further Multitouch surfaces are quickly becoming ubiquitous: from present a practical solution to achieve this on tabletop dis- wristwatch-sized music players and pocket-sized smart- plays based on mounting a single commodity depth camera phones to tablets, digital tabletops, and wall-sized displays, above a horizontal surface. This will enable us to not only virtually every surface in our everyday surroundings may detect when the surface is being touched, but also recover the soon come to life with digital imagery and natural touch in- user’s exact finger and hand posture, as well as distinguish put. However, achieving this vision of ubiquitous [28] sur- between different users and their handedness. We validate face computing requires overcoming several technological our approach using two user studies, and deploy the technol- hurdles, key among them being how to augment any phys- ogy in a scratchpad application as well as in a sketching tool ical surface with touch sensing. Conventional multitouch that integrates pen and touch for creating beautified sketches. technology relies on either capacitive sensors embedded in the surface itself, or rear-mounted cameras capable of detecting the touch points of the user’s hand on the surface [9]. ACM Classification Keywords Both approaches depend on heavily instrumenting the sur- H.5.2 Information Interfaces and Presentation: User face, which is not feasible for widespread deployment. Interfaces—Interaction styles; I.3.6 Computer Graphics: In this context, Wilson’s idea [31] of utilizing a front- Methodology and Techniques—Interaction techniques mounted depth camera to sense touch input on any physical surface is particularly promising for the following reasons: General Terms • Minimal instrumentation: Because they are front- Design, Algorithms, Human Factors mounted, depth cameras integrate well with both pro- jected imagery as well as conventional screens; • Surface characteristics: Any surface, digital or inani- mate, flat or non-flat, horizontal or vertical, can be instrumented to support touch interaction; • On and above: Interaction both on as well as above the Submitted to ACM UIST 2012. Please do not redistribute. surface can be detected and utilized for input; 1 • Low cost: Depth cameras have recently become com- Going beyond merely sensing multiple touch points and de- modity products with the rise of the Microsoft KinectTM, riving properties of the touches requires more advanced ap- originally developed for full-body interaction with the proaches. We broadly classify existing approaches for this Xbox game console, but usable with any computer. into into vision-based techniques, instrumented glove-based tracking, and specialized hardware. In this paper, we define a richer touch interaction modality that we call extended multitouch (EMT) which includes not only sensing multiple points of touch on and above a surface Vision-based Techniques using a depth camera, but also (1) recovering finger, wrist Malik et al. [19] used two cameras mounted above the sur- and hand posture of the user, as well as (2) detecting user face to distinguish which finger of which hand touched a identity and (3) user handedness. We have implemented a surface, but their system requires a black background for re- MS Windows utility that interfaces with a Kinect camera, liable recognition. Wang et al. [26] detect the directed fin- extracts the above data from the depth image in real-time, ger orientation vector from contact information in real time and publishes the information as a global hook to other ap- by considering the dynamics of the finger landing process. plications, or using the TUIO [15] protocol. To validate our Dang et al. [4] mapped fingers to their associated hand by approach, we have also conducted two empirical user stud- making use of constraints applied to the touch position with ies: one for measuring the tracking accuracy of the touch finger orientation. Freeman et al. [6] captured the user’s sensing functionality, and the other for measuring the finger hand image on a Microsoft Surface to distinguish hand pos- and hand posture accuracy of our utility. tures through vision techniques. Finally, Holz et al. [13] pro- posed a generalized perceived input point model in which The ability to not only turn any physical surface into a touch- fingerprints were extracted and recognized to provide accu- sensitive one, but also to recover full finger and hand pos- rate touch information and user identification. ture of any interaction performed on the surface, opens up an entirely new design space of interaction modalities. We have implemented two separate touch applications for the Glove-based Techniques purpose of exploring this design space. The first is a proto- Gloves are widely popular in virtual and augmented reality type scratchpad where multiple users can draw simple vector environments [24]. Wang et al. [27] used a single camera to graphics using their fingers, each finger on each user being track a hand wearing an ordinary cloth glove imprinted with separately identified and drawing a unique color and stroke. a custom pattern. Marquardt et al. [20] used a fiduciary- The second application is called ‘uSketch’, and is an ad- tagged gloves on interactive surfaces to gather input about vanced sketching application for creating beautified mechan- many parts of a hand and to discriminate between one per- ical engineering sketches that can be used for further finite son’s or multiple peoples’ hands. Although simple and ro- element analysis. ‘uSketch’ integrates pen-based input that bust, these approaches require users to wear additional in- we use for content creation with our touch sensing function- strumented gear which is not always desirable compared to ality that we use for navigation and mode changes. using bare hands. RELATED WORK Specialized Hardware Our work is concerned with touch interaction, natural user Specialized hardware is capable of deriving additional touch interfaces, and the integration of touch with other input properties beyond the mere touch points. Benko et al. [1] modalities. Below we review relevant work in these areas. sensed muscle activity through forearm electromyography to provide additional information about hand and finger move- ments away from the surface. Lopes et al. [17] integrated Multitouch Technologies touch with sound to expand the expressiveness of touch in- Several different technologies exist for sensing multiple terfaces by supporting recognition of different body parts, touches on a surface, including capacitance, RFID, and such as fingers and knuckles. TapSense [11] is a similar vision-based approaches. Vision-based approaches are par- acoustic sensing system that allows conventional surfaces to ticularly interested for our purposes since they are suitable identify the type of object being used for input such as fin- for surfaces with large form factor, such as tabletops and gertip, pad and nail. The above systems require additional walls [16, 21, 30, 19, 9, 22]. These systems rely on the pres- instrumentation of either the user or the surface and take a ence of a dedicated surface for interaction. significant amount of time for setup. The presence of back- Recently, depth-sensing cameras are beginning to be used ground noise is also a deterrent to use acoustic sensing tech- in various interactive applications where any surface can be niques. made interactive. Wilson [31] explored the use of depth sensing cameras to sense touch on any surface. Omni- Pen and Touch-based Interaction Touch [10] is a wearable depth-sensing and projection sys- A few research efforts have explored the combination of pen tem that enables interactive multitouch applications on ev- and touch [34, 7, 2, 35]. Hinckley et al [12] described tech- eryday surfaces. Lightspace [32] is a small room installation niques for direct pen + touch input for a prototype, cen- that explores interactions between on, above and between tered on note-taking and scrapbooking of materials. Lopes et surfaces combining multiple depth cameras and projectors. al [18] assessed the suitability of combining pen and multi touch interactions for 3D sketching. Through experimen- Sensing Touch Properties tal evidence, they demonstrated that using bimanual inter- 2 actions simplifies work flows and lowers task times, when compared to the pen-only interface. OVERVIEW: EXTENDED MULTITOUCH INTERACTION Kinect Traditional multitouch interaction [5] is generally defined as the capability to detect multiple points of touch input on a surface. Here, we enrich this concept into what we call extended multitouch interaction (EMT), defined

Extended Multitouch: Recovering Touch Posture, Handedness, and User Identity Using a Depth Camera

Supporting Everyday Activities Through Always-Available Mobile Computing

Smart Workstation with a Kinect-Projector Assistance System for Manual Assembly

Using a Depth Camera As a Touch Sensor

(MRT): a Low-Cost Teleconferencing Framework for Mixed-Reality Applications

A Surface Technology with an Electronically Switchable

Extended Multitouch: Recovering Touch Posture and Differentiating Users Using a Depth Camera

Beyond Flat Surface Computing

Bringing the Physical to the Digital: a New Model for Tabletop Interaction