Web GIS in Practice X: a Microsoft Kinect Natural User Interface for Google Earth Navigation
Total Page:16
File Type:pdf, Size:1020Kb
Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 INTERNATIONAL JOURNAL http://www.ij-healthgeographics.com/content/10/1/45 OF HEALTH GEOGRAPHICS EDITORIAL Open Access Web GIS in practice X: a Microsoft Kinect natural user interface for Google Earth navigation Maged N Kamel Boulos1*, Bryan J Blanchard2, Cory Walker2, Julio Montero2, Aalap Tripathy2 and Ricardo Gutierrez-Osuna2 Abstract This paper covers the use of depth sensors such as Microsoft Kinect and ASUS Xtion to provide a natural user interface (NUI) for controlling 3-D (three-dimensional) virtual globes such as Google Earth (including its Street View mode), Bing Maps 3D, and NASA World Wind. The paper introduces the Microsoft Kinect device, briefly describing how it works (the underlying technology by PrimeSense), as well as its market uptake and application potential beyond its original intended purpose as a home entertainment and video game controller. The different software drivers available for connecting the Kinect device to a PC (Personal Computer) are also covered, and their comparative pros and cons briefly discussed. We survey a number of approaches and application examples for controlling 3-D virtual globes using the Kinect sensor, then describe Kinoogle, a Kinect interface for natural interaction with Google Earth, developed by students at Texas A&M University. Readers interested in trying out the application on their own hardware can download a Zip archive (included with the manuscript as additional files 1, 2, &3) that contains a ‘Kinnogle installation package for Windows PCs’. Finally, we discuss some usability aspects of Kinoogle and similar NUIs for controlling 3-D virtual globes (including possible future improvements), and propose a number of unique, practical ‘use scenarios’ where such NUIs could prove useful in navigating a 3-D virtual globe, compared to conventional mouse/3-D mouse and keyboard-based interfaces. Background anything [4,5] (cf.Sony’s PlayStation Move and Nitendo What is Kinect? Wii Remote controllers). PrimeSense also teamed up Launched in November 2010, Kinect is a motion sensing with ASUS to develop a PC-compatible device similar to USB (Universal Serial Bus) input device by Microsoft Kinect, which they called ASUS Xtion and launched in that enables users to control and naturally interact with the second quarter of 2011 [2,6]. games and other programs without the need to physi- Kinect is a horizontal bar connected to a small base cally touch a game controller or object of any kind. with a motorised pivot (to follow the user around, as Kinect achieves this through a natural user interface by needed), and is designed to be positioned lengthwise tracking the user’s body movement and by using ges- above or below the computer or TV screen (Figure 1). tures and spoken commands [1,2]. Kinect holds the The device features an RGB (Red Green Blue) colour Guinness World Record as the fastest selling consumer camera, a depth sensor (using an infrared–IR projector electronics device, with sales surpassing 10 million units and an IR camera), and a noise-cancelling, multi-array as of 9 March 2011 [3]. microphone (made of four microphones, which can also Kinect uses technology by Israeli company PrimeSense contribute to detecting a person’slocationin3-D that generates real-time depth, colour and audio data of (three-dimensional) space) [5,7]. Kinect also incorpo- the living room scene. Kinect works in all room lighting rates an accelerometer (probably used for inclination conditions, whether in complete darkness or in a fully and tilt sensing, and possibly image stabilisation [7]). lit room, and does not require the user to wear or hold Running proprietary firmware (internal device software), these components together can provide full-body 3-D * Correspondence: [email protected] motion capture, gesture recognition, facial recognition, 1 Faculty of Health, University of Plymouth, Drake Circus, Plymouth, Devon and voice recognition capabilities [2,4]. (Functions, accu- PL4 8AA, UK Full list of author information is available at the end of the article racy and usability will also greatly depend on device © 2011 Kamel Boulos et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 2 of 14 http://www.ij-healthgeographics.com/content/10/1/45 Figure 1 Anatomy of Microsoft Kinect. drivers and associated software running on the host PC driver support machine (which can be a Windows, Mac or Linux PC, In December 2010, OpenNI and PrimeSense released or an Xbox 360 game console) and used to access the their own Kinect open source drivers and motion track- Kinect hardware–see discussion about drivers below.) ing middleware (called NITE) for PCs running Windows Kinect is capable of simultaneously tracking two active (7, Vista and XP), Ubuntu and MacOSX [9,10]. FAAST users [2,8]. For full-body, head to feet tracking, the (Flexible Action and Articulated Skeleton Toolkit) is a recommended user distance from the sensor is approxi- middleware developed at the University of Southern mately 1.8 m for a single user; when there are two peo- California (USC) Institute for Creative Technologies that ple to track at the same time, they should stand aims at facilitating the integration of full-body control approximately 2.5 m away from the device. Kinect with virtual reality applications and video games when requires a minimum user height of 1 m (standing dis- using OpenNI-compliant depth sensors and drivers tance and user height figures are according to informa- [11,12]. tion printed on the Microsoft Kinect retail box). In June 2011, Microsoft released a non-commercial Because Kinect’s motorised tilt mechanism requires Kinect Software Development Kit (SDK) for Windows more power than what can be supplied via USB ports, that includes Windows 7-compatible PC drivers for the the device makes use of a proprietary connector and Kinect device (Microsoft’s SDK does not support older ships with a special power supply cable that splits the Windows versions or other operating systems) [13]. connection into separate USB and power connections, Microsoft’s SDK allows developers to build Kinect- with power being supplied from the mains by way of an enabled applications in Microsoft Visual Studio 2010 AC/DC adapter (Figure 1). using C++, C# or Visual Basic (Figure 2). Microsoft is Kinect can be bought new in the UK for less than planning to release a commercial version of the Kinect £100 per unit (http://Amazon.co.uk consumer price in for Windows SDK with support for more advanced GBP, including VAT, as of July 2011). device functionalities [2]. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 3 of 14 http://www.ij-healthgeographics.com/content/10/1/45 Figure 2 Microsoft Kinect for Windows SDK beta (non-commercial, June 2011 release). There is also a third set of Kinect drivers for Win- teleconferencing (e.g., the work done by Oliver Kreylos dows, Mac and Linux PCs by the OpenKinect (libFree- at the University of California Davis combining two Nect) open source project [14]. Code Laboratories’ CL Kinect devices [23,24], and the ‘Kinected Conference’ NUI Platform offers a signed driver and SDK for multi- research by Lining Yao, Anthony DeVincenzi and collea- ple Kinect devices on Windows XP, Vista and 7 [15]. gues at MIT Media Lab to augment video imaging with The three sets of drivers vary greatly in their affor- calibrated depth and audio [25,26]); using Kinect to dances. For example, the official Microsoft Kinect for assist clinicians in diagnosing a range of mental disor- Windows SDK does not need a calibration pose; the ders in children (research by Nikolaos Papanikolopoulos ‘Shape Game’ demo supplied with the SDK works and colleagues at the University of Minnesota [27]); and impressively well (for one or two active users) immedi- a more practical use of the device to control medical ately after installing the SDK, without the need to cali- imaging displays during surgery without having to physi- brate user pose [8]. This contributes to an excellent cally touch anything, thus reducing the chance of hand Kinect digital Out-Of-Box Experience (OOBE) for PC contamination in operating theatres (e.g., the work con- users that is comparable to the OOBE offered to Kinect ducted within the Virtopsy Project at the Institute of Xbox 360 game console users [16]. However, Microsoft Forensic Medicine, University of Bern, Switzerland [28], Kinect for Windows SDK beta (June 2011 release) does and the system currently in use by Calvin Law and his not offer finger/hand gesture recognition or hands-only team at Toronto’s Sunnybrook Health Sciences Centre tracking and is limited to skeletal tracking. OpenNI dri- [29], as well as the demonstration of a similar concept vers are more flexible in this respect, but on the nega- by InfoStrat, a US-based IT services company [30]). tive side, they require a calibration pose and lack the PC-based Kinect applications have also been devel- advanced audio processing (speech recognition) that is oped for controlling non-gaming 3-D virtual environ- provided by Microsoft’s official SDK [17-20]. ments (3-D models, virtual worlds and virtual globes), e. g., [31,32]. These environments share many common Applications beyond playing games features with 3-D video games, the original target Many developers and research groups around the world domain of the Kinect sensor. Evoluce, a German com- are exploring possible applications of Kinect (and similar pany specialising in natural user interfaces, offers a com- devices such as ASUS Xtion [6]) that go beyond the ori- mercial solution for multi-gesture interaction using ginal intended purpose of these sensors as home enter- Kinect’s 3-D depth sensing technology under Windows tainment and video game controllers [2,21,22].