Quick viewing(Text Mode)

Web GIS in Practice X: a Microsoft Kinect Natural User Interface for Google Earth Navigation

Web GIS in Practice X: a Microsoft Kinect Natural User Interface for Google Earth Navigation

Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 INTERNATIONAL JOURNAL http://www.ij-healthgeographics.com/content/10/1/45 OF HEALTH GEOGRAPHICS

EDITORIAL Open Access Web GIS in practice : a natural for navigation Maged N Kamel Boulos1*, Bryan J Blanchard2, Cory Walker2, Julio Montero2, Aalap Tripathy2 and Ricardo Gutierrez-Osuna2

Abstract This paper covers the use of depth sensors such as Microsoft Kinect and ASUS Xtion to provide a (NUI) for controlling 3-D (three-dimensional) virtual globes such as Google Earth (including its Street View mode), 3D, and NASA World Wind. The paper introduces the Microsoft Kinect device, briefly describing how it works (the underlying technology by PrimeSense), as well as its market uptake and application potential beyond its original intended purpose as a home entertainment and video . The different software drivers available for connecting the Kinect device to a PC () are also covered, and their comparative pros and cons briefly discussed. We survey a number of approaches and application examples for controlling 3-D virtual globes using the Kinect sensor, then describe Kinoogle, a Kinect interface for natural interaction with Google Earth, developed by students at Texas A&M University. Readers interested in trying out the application on their own hardware can download a Zip archive (included with the manuscript as additional files 1, 2, &3) that contains a ‘Kinnogle installation package for Windows PCs’. Finally, we discuss some usability aspects of Kinoogle and similar NUIs for controlling 3-D virtual globes (including possible future improvements), and propose a number of unique, practical ‘use scenarios’ where such NUIs could prove useful in navigating a 3-D virtual globe, compared to conventional mouse/3-D mouse and keyboard-based interfaces.

Background anything [4,5] (cf.Sony’s PlayStation Move and Nitendo What is Kinect? Remote controllers). PrimeSense also teamed up Launched in November 2010, Kinect is a motion sensing with ASUS to develop a PC-compatible device similar to USB (Universal Serial Bus) by Microsoft Kinect, which they called ASUS Xtion and launched in that enables users to control and naturally interact with the second quarter of 2011 [2,6]. games and other programs without the need to physi- Kinect is a horizontal bar connected to a small base cally touch a game controller or object of any kind. with a motorised pivot (to follow the user around, as Kinect achieves this through a natural user interface by needed), and is designed to be positioned lengthwise tracking the user’s body movement and by using ges- above or below the computer or TV screen (Figure 1). tures and spoken commands [1,2]. Kinect holds the The device features an RGB (Red Green Blue) colour Guinness World Record as the fastest selling consumer camera, a depth sensor (using an –IR projector electronics device, with sales surpassing 10 million units and an IR camera), and a noise-cancelling, multi-array as of 9 March 2011 [3]. microphone (made of four microphones, which can also Kinect uses technology by Israeli company PrimeSense contribute to detecting a person’slocationin3-D that generates real-time depth, colour and audio data of (three-dimensional) space) [5,7]. Kinect also incorpo- the living room scene. Kinect works in all room lighting rates an accelerometer (probably used for inclination conditions, whether in complete darkness or in a fully and sensing, and possibly image stabilisation [7]). lit room, and does not require the user to wear or hold Running proprietary firmware (internal device software), these components together can provide full-body 3-D * Correspondence: [email protected] , , facial recognition, 1 Faculty of Health, University of Plymouth, Drake Circus, Plymouth, Devon and voice recognition capabilities [2,4]. (Functions, accu- PL4 8AA, UK Full list of author information is available at the end of the article racy and usability will also greatly depend on device

© 2011 Kamel Boulos et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 2 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 1 Anatomy of Microsoft Kinect. drivers and associated software running on the host PC driver support machine (which can be a Windows, Mac or PC, In December 2010, OpenNI and PrimeSense released or an 360 game console) and used to access the their own Kinect open source drivers and motion track- Kinect hardware–see discussion about drivers below.) ing middleware (called NITE) for PCs running Windows Kinect is capable of simultaneously tracking two active (7, Vista and XP), Ubuntu and MacOSX [9,10]. FAAST users [2,8]. For full-body, head to feet tracking, the (Flexible Action and Articulated Skeleton Toolkit) is a recommended user distance from the sensor is approxi- middleware developed at the University of Southern mately 1.8 m for a single user; when there are two peo- California (USC) Institute for Creative Technologies that ple to track at the same time, they should stand aims at facilitating the integration of full-body control approximately 2.5 m away from the device. Kinect with applications and video games when requires a minimum user height of 1 m (standing dis- using OpenNI-compliant depth sensors and drivers tance and user height figures are according to informa- [11,12]. tion printed on the Microsoft Kinect retail box). In June 2011, Microsoft released a non-commercial Because Kinect’s motorised tilt mechanism requires Kinect Software Development Kit (SDK) for Windows more power than what can be supplied via USB ports, that includes -compatible PC drivers for the the device makes use of a proprietary connector and Kinect device (Microsoft’s SDK does not support older ships with a special power supply cable that splits the Windows versions or other operating systems) [13]. connection into separate USB and power connections, Microsoft’s SDK allows developers to build Kinect- with power being supplied from the mains by way of an enabled applications in 2010 AC/DC adapter (Figure 1). using C++, C# or (Figure 2). Microsoft is Kinect can be bought new in the UK for less than planning to release a commercial version of the Kinect £100 per unit (http://Amazon.co.uk consumer price in for Windows SDK with support for more advanced GBP, including VAT, as of July 2011). device functionalities [2]. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 3 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 2 Microsoft Kinect for Windows SDK beta (non-commercial, June 2011 release).

There is also a third set of Kinect drivers for Win- teleconferencing (e.g., the work done by Oliver Kreylos dows, Mac and Linux PCs by the OpenKinect (libFree- at the University of California Davis combining two Nect) open source project [14]. Code Laboratories’ CL Kinect devices [23,24], and the ‘Kinected Conference’ NUI Platform offers a signed driver and SDK for multi- research by Lining Yao, Anthony DeVincenzi and collea- ple Kinect devices on Windows XP, Vista and 7 [15]. gues at MIT Media Lab to augment video imaging with The three sets of drivers vary greatly in their affor- calibrated depth and audio [25,26]); using Kinect to dances. For example, the official Microsoft Kinect for assist clinicians in diagnosing a range of mental disor- Windows SDK does not need a calibration pose; the ders in children (research by Nikolaos Papanikolopoulos ‘Shape Game’ demo supplied with the SDK works and colleagues at the [27]); and impressively well (for one or two active users) immedi- a more practical use of the device to control medical ately after installing the SDK, without the need to cali- imaging displays during surgery without having to physi- brate user pose [8]. This contributes to an excellent cally touch anything, thus reducing the chance of hand Kinect digital Out-Of-Box Experience (OOBE) for PC contamination in operating theatres (e.g., the work con- users that is comparable to the OOBE offered to Kinect ducted within the Virtopsy Project at the Institute of game console users [16]. However, Microsoft Forensic Medicine, University of Bern, Switzerland [28], Kinect for Windows SDK beta (June 2011 release) does and the system currently in use by Calvin Law and his not offer finger/hand gesture recognition or hands-only team at Toronto’s Sunnybrook Health Sciences Centre tracking and is limited to skeletal tracking. OpenNI dri- [29], as well as the demonstration of a similar concept vers are more flexible in this respect, but on the nega- by InfoStrat, a US-based IT services company [30]). tive side, they require a calibration pose and lack the PC-based Kinect applications have also been devel- advanced audio processing () that is oped for controlling non-gaming 3-D virtual environ- provided by Microsoft’s official SDK [17-20]. ments (3-D models, virtual worlds and virtual globes), e. g., [31,32]. These environments share many common Applications beyond playing games features with 3-D video games, the original target Many developers and research groups around the world domain of the Kinect sensor. Evoluce, a German com- are exploring possible applications of Kinect (and similar pany specialising in natural user interfaces, offers a com- devices such as ASUS Xtion [6]) that go beyond the ori- mercial solution for multi-gesture interaction using ginal intended purpose of these sensors as home enter- Kinect’s 3-D depth sensing technology under Windows tainment and video game controllers [2,21,22]. These 7 [33]. Thai Phan at the University of Southern Califor- novel applications include 3-D and enhanced video nia Institute for Creative Technologies wrote software Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 4 of 14 http://www.ij-healthgeographics.com/content/10/1/45

based on the OpenNI toolkit to control a user’savatar modules underlying Kinoogle, then offer detailed user ® in the 3-D virtual world of Second Life and transfer instructions for readers interested in trying it out. the user’s social gestures to the in a natural way [34]. Hardware and third-party software FAAST [11] has been used with suitable ‘key bindings’ ThemainhardwareusedinKinoogleisMicrosoft (to map user’s movement and gestures to appropriate Kinect. Kinect generates a in real time, keyboard and mouse actions) to navigate Google Earth where each corresponds to an estimate of the dis- and Street View (Figure 3) [35,36]. InfoStrat recently tance between the Kinect sensor and the closest object demonstrated the use of their Motion Framework [37] in the scene at that pixel’s location. Based on this map, to control Bing Maps with a Kinect sensor [38]. They the Kinect system software allows applications such as also used custom Bing Maps-powered GIS (Geographic Kinoogle to accurately track different parts of a human Information System) and data visualisation applications body in three dimensions. Kinoogle has been designed to showcase their Motion Framework’s ability to work around a specific setup for operation that includes the with multi-modal input from motion sensing (such as user standing about one metre in front of Kinect, with Microsoft Kinect), multi-touch, speech recognition, sty- shoulders parallel to the device, and the sensor placed at lus, and mouse devices [39,40]. Building on InfoStrat’s the height of the user’s elbows. work (which brought multi-touch gestures such as Kinoogle makes extensive use of third party software, pinch and zoom to Kinect), Response Ltd, a small Hun- which includes OpenNI drivers and NITE Middleware garian company, developed and demonstrated an alter- [9,10], OpenGL [45], and Google Earth [46]. OpenNI native solution to navigate Bing Maps using the Kinect and NITE Middleware are both used for receiving and sensor,which,theyclaim,offersamore‘Kinect-like manipulating data obtained from the Kinect sensor, experience’ by allowing the user to use his/her whole whereas OpenGL and Google Earth provide low-level body to control the map [41,42]. graphics functionality and the front-end mapping appli- cation, respectively. OpenNI is a framework for utilising Kinoogle: a Kinect interface for natural Natural User Interface (NUI) devices, and has abstrac- interaction with Google Earth tions which allow for the use of middleware to process In this paper, we introduce Kinoogle, a natural interac- images and depth information. This allows for user tion interface for Google Earth using Microsoft Kinect. tracking, skeleton tracking, and hand tracking in Kinoo- Kinoogle allows the user to control Google Earth gle. OpenNI is designed such that applications can be through a series of hand and full-body gestures [43,44]. used independent of the specific middleware and there- We begin by describing the software design and fore allows much of Kinoogle’s code to interface directly

Figure 3 Using FAAST with suitable ‘key bindings’ to navigate Google Earth and Street View. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 5 of 14 http://www.ij-healthgeographics.com/content/10/1/45

with OpenNI while using the functionality from NITE as a “plumber”, placing certain objects into place to col- Middleware. The main purpose of NITE Middleware is lect and forward point or skeleton data to Kinoogle. It image processing, which allows for both hand-point also controls data flow from the Kinect hardware, such tracking and skeleton tracking. NITE is responsible for as halting incoming point/skeleton data. sending the location (i.e., 3-D coordinates) of the hand The main loop in Kinoogle continuously calls Kinect- points in every frame. Kinoogle also utilises OpenGL to Control.kinectRun() function, which in turn makes three create a (GUI) in the form of a essential calls: menu bar at the top of the screen to provide visual feed- • context.WaitAndUpdateAll(), which forces OpenNI back to the user (Figure 4). As a front-end, Kinoogle to get a new frame from Kinect and send this frame to uses Google Earth, which provides visual imagery of NITE for processing; Earth locations on a 3-D globe that the user is able to • skeletonUpdater.run(), which looks at all the skele- manipulate by panning, zooming, rotating, or tilting ton data contained in the UserGenerator node. If a ske- with an ordinary mouse, 3-D mouse (such as SpaceNavi- leton is active, it sends all the joint position data to gator [47]) or keyboard. Kinoogle; and • sessionManager.Update(), which forces sessionMana- Kinoogle system software ger to process the hand point data contained in the cur- At the top level, the system consists of four major rent context and update all the point-based controls objects, which serve as the connection to each of the with callbacks. third party software components; these four objects are: The additional modules GestureListener and Hand- KinectControl, Kinoogle, EarthControl, and Kinoogle sUpdater register callbacks with PointControls and Ses- GUI (Figure 5). The system is based on hardware input, sionManager. Thus, when Update() is called the so it is designed to be event-driven (see ‘code and docu- callbacks are activated and the thread travels through mentation’ folder in Additional files 1, 2). these two modules on to Kinoogle and eventually to EarthControl. KinectControl Included in KinectControl is also one of the most The first object, KinectControl, interfaces with NITE important components, Fist Detector. It looks at the and OpenNI to collect relevant data from Kinect. It acts shape of the hand around the hand points, and

Figure 4 Screenshot of Google Earth with Kinoogle’s menu bar at the top of the screen. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 6 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 5 Block diagram demonstrating the flow of data between the objects and third party software. determines whether the hand is closed or open. This • Zooming requires a two-hand gesture (Figure 6(b)), information is then used by Kinoogle to engage and and is performed by moving the hands either closer interact with Google Earth. FistDetector is based on cal- together or farther apart. Detection is based on calculat- culating the area of the convex hull of the hand, and ing the distance between the hands to determine if they comparing it to the area of the hand. (The convex hull are moving together or apart. of a geometric shape is the smallest convex shape that • Rotation (in plane) also requires a two-hand gesture contains all of the points of that shape.) In this case, (Figure 6(c)), and detection is based on determining Fist Detector allows us to quantify the area between whether the hands move in opposite directions along open fingers, as if the hand were webbed. If the ratio of the ‘y’ axis (i.e., vertical axis). the hand to the hull is near one-to-one, the hand is • Tilt (out-of-plane rotations) is also a two-hand ges- most likely to be closed; if it is not one-to-one, the hand ture (Figure 6(d)) and is based on detecting whether the is most likely open. hands move in opposite directions along the ‘z’ axis (i.e., depth axis). Kinoogle Finally, Kinoogle is also responsible for interpreting Kinoogle’s primary function is to interpret point data skeleton data, which allows users to interact with Goo- from Kinect Control in order to determine whether the gle Street View by means of intuitive “walking” and user has produced specific gestures and poses for inter- “turning” gestures. For walking detection, we focus on acting with Earth Control and Kinoogle GUI. Kinoogle the speed by which the user swings his/her arms (Figure is responsible for the communication among the other 7(a)); namely, we compute the average speed of the three objects that control the third party software; this right and left arms based on the position of the elbows is achieved through using KinoogleMessage, which is and compare it to a fixed threshold. Likewise, to control contained in the Message object. the camera angle in Street View, we detect whether the Kinoogle also handles the detection of stationary user is twisting his/her shoulders (Figure 7(b)); specifi- poses, with which the user switches between various cally, we determine which shoulder is closer to the map modes (e.g., panning, zooming). Whenever the user Kinect sensor and then turn the camera view based on activates his/her hands, an average location of the hands the difference between the two shoulders. is calculated over the last 100 frames. An imaginary box (300 × 200 ) is then set around this average loca- EarthControl tion. There are then three quadrants where the user can EarthControl uses simulated mouse and keyboard place his/her hands for pose detection: above the box, actions to control Google Earth. The object receives to the left, or to the right of the box. The various poses EarthMessages from Kinoogle, e.g., earth. interpretMes- are activated when both hands are placed in correspond- sage(&EarthMessage(Pan, panX, panY)), which it then ing quadrants around the average location. For example, interprets to determine the correct sequence of mouse placing one hand above the average location and the or keyboard actions. By moving the pointer in certain other hand to the right of the average location will be axes and holding down certain mouse buttons, Earth- detected as the pose for tilt. Four map modes have been Control effects the appropriate changes onto the map implemented (panning, zooming, rotation, and tilt), as view. Other functions are executed by simulating the follows: pressing of various keyboard buttons. The actual mouse • Panning is based on detecting the (x, y) position of and keyboard functions are performed through a series either single hand when engaged (Figure 6(a)). A velo- of functions that are grouped together in MouseLib. city variable is used to eliminate any slight movements Specifically, MouseLib uses the Windows API (Applica- that would result in flickering. tion Programming Interface) call SendInput, which takes Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 7 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 6 Gesture movement used for (a) pan, (b) zoom, (c) rotate, and (d) tilt.

Figure 7 Demonstration of (a) arms swinging and (b) shoulder twisting for Street View. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 8 of 14 http://www.ij-healthgeographics.com/content/10/1/45

INPUT structures and inserts them into the operating be inserted into an electrical outlet before using the system’s (OS) input stream. These structures indicate device. The USB connector must be inserted into the either a mouse movement or key press. MouseLib is computer where Kinoogle will run (Kinect should be separate from EarthControl in order to ensure future plugged in to the PC’sUSBportonlyafter OpenNI/ OS portability of the Kinoogle system. As an example, NITE driver installation is complete–see below). Kinoo- theentirecodethatisspecifictotheWindowsAPIis gle also requires third party drivers and software to be kept in MouseLib, so that an X-window or MacOSX installed on the host computer. The download links for version could be easily substituted in. the Windows OS version of these software packages are: EarthControl also has two other noteworthy character- Google Earth [48], OpenNI [49], and NITE [50] (the istics, which were created in response to specific issues x86(32-bit)versionsofOpenNIandNITEworkwell with Google Earth: a mouse brake system, and a method under both 32-bit and 64-bit (x64) versions of Microsoft for preventing a quick succession of single clicks. The Windows). Additional files 1, 2 are a compressed archive mouse brake system prevents unwanted drift of the map containing Kinoogle’s installation package for Windows. view while panning. Initially, the left click button was It has been tested on several machines to make sure it released immediately when the hand becomes disen- works robustly. gaged causing the map to continue to scroll, which Operating instructions made it almost impossible to leave the map in a station- The user must ensure that Google Earth is running ary position. The solution to this problem was to mea- before starting Kinoogle. Once Kinoogle is started, the sure the speed of the mouse pointer when the hand is status indicator shown in Figure 8(a) will become visible disengaged. If the mouse pointer moves under a certain at the top of the screen. Two coloured diamonds on the speed, the “brake” engages, leaving the mouse button right-hand-side indicate the status of the program’s clicked down and the pointer stationary for a certain hand tracking: (i) red indicates that the corresponding amount of time. The second method prevents accidental hand has not been detected, (ii) yellow indicates that the double-clicks of the left mouse button by starting a hand has been detected but is not engaged, and (iii) timer after the first left mouse click and not allowing green indicates that the hand is detected and also another to happen until after the timer is completed. engaged; see Figure 8(b). Hands are engaged when clenched into a fist. Kinoogle GUI To initialise hand tracking, the user waves one hand Kinoogle GUI provides the user interface to the system, until the status indicator for HAND1 becomes yellow, and is implemented as a menu bar on top of the appli- followed by waving the other hand near the location of cation (Figure 4). Kinoogle GUI runs as a separate the first hand until the indicator for HAND2 also thread from the main Kinoogle thread, from which it becomes yellow. The user may use either hand to initia- receives commands through GUIMessages, which is part lise tracking. Once hand tracking has been initialised, of the Message object. The menu interface (window and the program will automatically enter Map Mode, and buttons) is implemented with an OpenGL render object. the menu bar will change to reflect this; see Figure 8(c). There are three sections of information drawn on the Map Mode has four sub-modes, each one representing menu bar for the user (Figure 8(a)). The first section a different form of map manipulation: shows the different features that are available in the cur- 1. Pan/Zoom: This is the default sub-mode when rent mode, as well as arrows corresponding to the pose firstenteringMapMode.Thissub-modeallowsthe to activate a given feature. The second section is a single user to scroll the map in any direction, as well as button used to show the user the current mode. The zoom in and out. To scroll the map, the user engages third section contains two diamonds representing user’s one hand, and moves that hand as if he/she were drag- hands. Additional details about the menu interface are ging the map on a surface; see Figure 9(a). To zoom provided below under ‘User manual: Operating in, the user un-engages both hands and brings them instructions’. together, then engages both hands and pulls them apart; see Figure 9(b, c). Zooming out is performed by User manual un-engaging both hands and moving them apart, then Hardware and software installation engaging both hands and bringing them together; see Kinoogle requires a Microsoft Kinect, which is currently Figure 9(d, e). available worldwide for purchase from multiple retailers. 2. Rotate: This sub-mode allows the user to adjust the The Kinect package contains the Kinect sensor, in addi- rotation angle of the map on the x, y plane. To rotate tion to a power adapter with an electrical plug and a the map, the user engages both hands, and moves them USB connector (Figure 1). Kinect requires more power in opposite vertical directions, i.e., using a motion simi- than a USB port can provide, so the electrical plug must lar to turning a steering wheel; see Figure 9(f). Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 9 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 8 Kinoogle GUI bar. (a) The Kinoogle status window, as seen immediately at program start, before hand tracking is initialised. (b) The colour changes between red, yellow, and green based on hand detection. (c) The Kinoogle status window, as seen in Map Mode without any sub-mode selected. (d) Progress bar charging while a pose is detected. (e) The Kinoogle status window, as it appears while in Street View.

3. Tilt: This sub-mode allows the user to rotate the also change, as illustrated in Figure 8(e). While in Street map out-of-plane. To tilt the map, the user engages View, all the controls for Map Mode are disabled. Street both hands and moves them in opposite z directions, i. View mode has three main sub-modes that allow you to e., using a motion similar to rotating a crankset (chain- interact with the map: set); see Figure 9(g-h). • Walking: To move forward, the user swings his/her 4. Reset Map/Historical Maps: The map can be reset arms while standing in place. The user does not have to by un-engaging both hands and bringing them up to the move his/her legs, although he/she may walk in place if same level as user’s head; see Figure 10(d). This will also desired; see Figure 11(a, b). set up the Historical Maps sub-mode by moving the • Turning: The user is able to change camera views by cursor to the time period selector. To change the time twisting his/her shoulders towards or away from the period, the user engages one hand and moves the slider camera. The user must ensure that he/she is only twist- to the desired time period. Note that the available time ing his/her shoulders, and not turning his/her entire periods will change depending on the area being viewed. body; see Figure 11(c-e). When entering the Historical Maps sub-mode, the • Exit: To exit the Street View mode, the user extends map’s tilt and rotation angles will be reset. both arms straight out horizontally; see Figure 11(f). To switch between sub-modes, the user must make After exiting Street View, the map will zoom out. The and hold a specific pose for a short time (Figure 10). user must then reinitialise the hand detection in order The arrows to the right of each box in Figure 8(c) indi- to re-enter Map Mode and continue manipulating the cate how the user must hold his/her hands in order to map. enter that mode. A progress bar for the target mode will fill as the user holds the pose; see Figure 8(d). Discussion and conclusions Kinoogle also allows the user to interact with Google Nowadays, online 3-D virtual globes are commonly used Earth in the Street View mode. This mode is entered by to visualise and explore spatial epidemiology and public using the Pan/Zoom mode to zoom in as much as pos- health data [51-55]. Users normally employ one or more sible into a street that is enabled for Street View. Google conventional input devices (mouse, 3-D mouse and/or Earth will automatically zoom down to the street. Once keyboard) to navigate online virtual globes. Hands-free the map is zoomed onto the street, the user should gesture and speech recognition, e.g., as offered by the make a touchdown pose to calibrate Kinoogle for Street Kinect sensor, are expected to change our human-com- View; see Figure 10(e). The status indicator labels will puter interaction with online and desktop interfaces Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 10 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 9 Procedures for panning, zooming in and out, rotating and tilting the map. (a) Procedure for panning. The map can be panned both directions vertically. (b, c) Procedure for zooming in: (b) the user moves hands together and engages them, then (c) moves them apart. (d, e) Procedure for zooming out: (d) the user moves hands apart and engages them, then (e) brings them together. (f) Procedure for rotating the map. (g, h) Procedure for tilting the map: (g) the user engages the hands and then (h) moves one forward, and one to the back. See Figure 6(d) for alternative view. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 11 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 10 Poses required to switch among sub-modes. (a) To enter Rotate Mode, the user holds his/her right hand up and moves his/her left hand to his/her left. (b) To enter Tilt Mode, the user holds his/her left hand up and moves his/her right hand to his/her right. (c) To enter Pan/Zoom Mode, the user holds both hands to the side. (d) To reset the map, the user holds both hands up at the same level as his/her head. (e) The touchdown pose is used to calibrate Kinoogle for Street View. over the coming years [56,57]. Improvements in the way physical input device (the user’s body becomes the con- we interface with virtual globes are always welcome, troller). Kinoogle can be greatly improved if it can also provided that compelling ‘use scenarios’ can be con- make use of the advanced speech recognition function- ceived to justify recommending and investing in them ality afforded by the Kinect device as an alternative over or alongside existing PC input methods. method for navigating Google Earth using voice com- Kinoogle offers a good example of a natural interface mands besides hand and body gestures (e.g., where a with Google Earth and Street View using only body person lacks the necessary physical agility and dexterity movements and gestures, without having to touch any to perform certain “awkward” gestures or where more Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 12 of 14 http://www.ij-healthgeographics.com/content/10/1/45

Figure 11 Poses in Street View. (a, b) Walking forward. (c-e) Turning right, facing straight and turning left, respectively. (f) Exiting Street View. fine or quicker control of the interface is required than practical and exclusive ‘use scenarios’ where mouse or what can be achieved with body gestures alone). multi-touch screen inputs are difficult, such as when Other possible Kinoogle improvements include adding delivering presentations involving 3-D virtual globes on a Kinect-controlled virtual on-screen keyboard for text a large screen on stage to a large audience, e.g., this pre- input (e.g., to type in the ‘Fly To’ box of Google Earth), sentation of Microsoft’s WorldWide Telescope using the similar to the one provided by KinClick [58] for the Kinect sensor: [60]. (Kinect can also support two simul- KinEmote package that runs under Windows [59], and taneous active users/presenters.) perhaps developing special versions of 3-D virtual globes Collaborative GIS in virtual situation rooms involving with larger interface elements, big icons, and Kinect- distributed teams of users [61] is another ‘use scenario’ friendly dialog boxes or tabs to make it easier for Kinect that can benefit from Kinect’s unique features such as users to navigate and select/de-select on-screen items its headset-free 3-D and enhanced video teleconferen- and options (e.g., virtual globe ‘Layers’ in Google Earth). cing functionalities [23-26], in addition to its use as 3-D The hands-free convenience of gesture and speech motion sensing/gestures and speech recognition NUI for recognition can prove extremely useful in a number of controlling a shared 3-D virtual globe during networked Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 13 of 14 http://www.ij-healthgeographics.com/content/10/1/45

spatial data presentations [39,40]. Using the Kinect sen- same hard drive location, then run ‘Additional_file.part1.exe’ from that sor to create 3-D maps of real world locations and location. objects [62,63] is still not very refined for serious use, but might soon get added to the GIS professional’s toolkit as the technology evolves and matures. Author details Kinect can be an excellent, very usable and entertain- 1Faculty of Health, University of Plymouth, Drake Circus, Plymouth, Devon ing way for navigating Google Street View and exploring PL4 8AA, UK. 2Department of Computer Science and Engineering, Texas new places and cities (virtual tourism). The visual feed- A&M University, College Station, TX 77843-3112, USA. back in the form of the changing street scenery as the Authors’ contributions user walks (virtually, as on a treadmill [64]) in Street MNKB conceived and drafted the manuscript with contributions (description of Kinoogle) from BJB, CW, JM, AT, and RG-O. MNKB also conducted the View can be used to encourage people to do physical literature review for this article, proposed the ‘use scenarios’ for a Kinect- exercise and burn calories with the Kinect device for based NUI for 3-D virtual globe navigation, and wrote all of the ‘Background’ prolonged periods of time, without getting quickly and ‘Discussion and conclusions’ sections. BJB, CW, and JM conceived and coded Kinoogle as their CSCE 483 course project at Texas A&M University, bored (e.g., as part of obesity management and preven- with input from AT (CSCE 483 course Teaching Assistant) and under the tion programmes). This use of Kinect for physical fitness supervision of RG-O (CSCE 483 course Instructor). All authors read and purposes is already implemented as a game for the Xbox approved the final manuscript. Commercial products and company/brand names mentioned in this paper are trademarks and/or registered trademarks 360 game console [65]. of their respective owners. Hands-free gesture and speech recognition NUIs are not a full replacement for more conventional interaction Competing interests methods; for example, not all users have the necessary The authors declare that they have no competing interests. muscular agility and dexterity, physical room space, or Received: 12 July 2011 Accepted: 26 July 2011 Published: 26 July 2011 large enough screen size (for comfortable viewing from the user’s minimum distance away from the sensor) to References 1. Kinect Official Web Site. [http://www.xbox.com/kinect/]. use a Kinect sensor, and more conventional navigation 2. Kinect (Wikipedia, 21 June 2011 22:26). [http://en.wikipedia.org/w/index. of virtual globes with a 3-D mouse [47] or a multi- php?title=Kinect&oldid = 435540952]. touch screen can sometimes offer more precise and 3. Kinect Sales Surpass Ten Million (9 March 2011). [http://www.xbox.com/ en-US/Press/Archive/2011/0308-Ten-Million-Kinects]. smooth control of the 3-D map. However, depth sensors 4. Product Technology: The PrimeSensor™ Technology. [http://www. such as Microsoft Kinect, ASUS Xtion PRO/PRO Live .com/?p=487]. [66], and Lenovo iSec [67] remain an interesting alterna- 5. PrimeSensor™ Reference Design. [http://www.primesense.com/?p=514]. 6. Asus WAVI Xtion. [http://event.asus.com/wavi/Product/WAVI_Pro.aspx]. tive or complementary method of interaction with 3-D 7. Microsoft Kinect Teardown (iFixit, 4 November 2010). [http://www.ifixit. virtual globes, with some exclusive applications where com/Teardown/Microsoft-Kinect-Teardown/4066/1]. these devices cannot be easily matched (e.g., hands-free 8. Demo of Shape Game Sample Included with Kinect for Windows SDK (video, 15 June 2011). [http://research.microsoft.com/apps/video/default. virtual globe presentations to large audiences) and other aspx?id=150286]. applications where the NUI affordances of these sensors 9. OpenNI downloads. [http://www.openni.org/downloadfiles]. can greatly improve a user’s experience (e.g., when inter- 10. NITE Middleware. [http://www.primesense.com/?p=515]. 11. Flexible Action and Articulated Skeleton Toolkit (FAAST). [http://projects. acting with large and stereoscopic screens at a distance ict.usc.edu/mxr/faast/]. [36,68]). 12. Suma EA, Lange B, Rizzo A, Krum DM, Bolas M: FAAST: The Flexible Action and Articulated Skeleton Toolkit. Proceedings of Virtual Reality Conference (VR): 19-23 March 2011; Singapore IEEE; 2011, 247-248[http://dx.doi.org/ Additional material 10.1109/VR.2011.5759491]. 13. Kinect for Windows SDK from . [http://research. Additional file 1: Installation package for Kinoogle (part 1 of 3). microsoft.com/en-us/um/redmond/projects/kinectsdk/]. Compressed (zipped) archive containing Kinoogle’s installation package 14. OpenKinect (libFreeNect). [http://openkinect.org/]. for operating systems. Download and unzip the 15. Code Laboratories CL NUI Platform - Kinect Driver/SDK. [http:// contents of Additional file 1, Additional file 2, and Additional file 3 to the codelaboratories.com/nui/]. same hard drive location, then run ‘Additional_file.part1.exe’ from that 16. Solaro J: The Kinect Digital Out-of-Box Experience. Computer 2011, location. 44(6):97-99[http://dx.doi.org/10.1109/MC.2011.190]. 17. Driver Quick Facts (Which Driver is Right for Me?) (Kinect@RIT, 10 May Additional file 2: Installation package for Kinoogle (part 2 of 3). 2011). [http://www.rit.edu/innovationcenter/kinectatrit/driver-quick-facts- Compressed (zipped) archive containing Kinoogle’s installation package which-driver-right-me]. for Microsoft Windows operating systems. Download and unzip the 18. Microsoft Kinect SDK vs. PrimeSense OpenNI (Brekel.com, 18 June 2011). contents of Additional file 1, Additional file 2, and Additional file 3 to the ‘ ’ [http://www.brekel.com/?page_id=671]. same hard drive location, then run Additional_file.part1.exe from that 19. Hinchman W: Kinect for Windows SDK beta vs. OpenNI (Vectorform Labs, location. 20 June 2011).[http://labs.vectorform.com/2011/06/windows-kinect-sdk-vs- Additional file 3: Installation package for Kinoogle (part 3 of 3). openni-2/]. Compressed (zipped) archive containing Kinoogle’s installation package 20. Tscherrig J: Activity Recognition with Kinect–State of art (30 March for Microsoft Windows operating systems. Download and unzip the 2011).[http://www.tscherrig.com/documents/ActivityRecognition_% contents of Additional file 1, Additional file 2, and Additional file 3 to the 20StateOfArt.pdf]. Kamel Boulos et al. International Journal of Health Geographics 2011, 10:45 Page 14 of 14 http://www.ij-healthgeographics.com/content/10/1/45

21. Anonymous (Editorial): The Kinect revolution. The New Scientist 2010, 52. Kamadjeu : Tracking the polio virus down the Congo River: a case 208(2789):5[http://dx.doi.org/10.1016/S0262-4079(10)62966-1]. study on the use of Google Earth in public health planning and 22. Giles J: Inside the race to hack the Kinect. The New Scientist 2010, mapping. Int J Health Geogr 2009, 8:4. 208(2789):22-23[http://dx.doi.org/10.1016/S0262-4079(10)62989-2]. 53. Chang AY, Parrales ME, Jimenez J, Sobieszczyk ME, Hammer SM, 23. Kreylos O: 3D Video Capture with Kinect (video, 14 November 2010). Copenhaver DJ, Kulkarni RP: Combining Google Earth and GIS mapping [http://www.youtube.com/watch?v=7QrnwoO1-8A]. technologies in a dengue system for developing countries. 24. Kreylos O: 2 Kinects 1 Box (video, 28 November 2010).[http://www. Int J Health Geogr 2009, 8:49. youtube.com/watch?v=5-w7UXCAUJE]. 54. Kamel Boulos MN, Robinson LR: Web GIS in practice VII: stereoscopic 3-D 25. Kinected Conference - MIT Media Lab. [http://kinectedconference.media. solutions for online maps and virtual globes. Int J Health Geogr 2009, 8:59. mit.edu/]. 55. Cinnamon J, Schuurman N: Injury surveillance in low-resource settings 26. DeVincenzi A, Yao L, Ishii H, Raskar R: Kinected conference: augmenting using Geospatial and Social Web technologies. Int J Health Geogr 2010, video imaging with calibrated depth and audio. CSCW 2011 Proceedings 9:25. of the ACM 2011 Conference on Computer Supported Cooperative Work: 19-23 56. Clarke P: 10 technologies to watch in 2011: Gesture recognition for March 2011; Hangzhou, China New York, NY: ACM; 2011, 621-624[http://dx. hands-free convenience. EE Times 2010, 22[http://www.eetimes.com/ doi.org/10.1145/1958824.1958929]. electronics-news/4211508/10-technologies-to-watch-in-2011? 27. Bi F: Minnesota Prof. Uses Xbox Kinect For Research (The Minnesota pageNumber=1]. Daily, 14 March 2011).[http://minnesota.cbslocal.com/2011/03/14/ 57. Ackerman E, Guizzo E: 5 technologies that will shape the Web: 5. Voice minnesota-prof-uses-xbox-kinect-for-research/]. and Gestures Will Change Human-Computer Interaction. IEEE Spectrum 28. Virtopsy - Potential use of gesture control in medicine using the 2011, 48(6):45[http://dx.doi.org/10.1109/MSPEC.2011.5779788]. Microsoft Kinect camera (video, 19 December 2010). [http://www. 58. KinClick - A Virtual Keyboard Interface for KinEmote (31 January 2011). youtube.com/watch?&v=b6CT-YDChmE]. [http://kinect.dashhacks.com/kinect-news/2011/01/31/kinclick-virtual- 29. Moretti S: Doctors use Xbox Kinect in cancer surgery (22 March 2011). keyboard-interface-kinemote]. [http://tinyurl.com/4ym6866]. 59. KinEmote - Free XBox Kinect Software for Windows. [http://www. 30. Kinect and Medical Imaging (video, 4 May 2011). [http://www.youtube. kinemote.net/]. com/watch?v=U67ESHV8f_4]. 60. Hollister S: Microsoft’s Kinect navigates the universe thanks to Windows 31. Kinect Google Earth Interface by Ryerson University’s Interactive SDK ( video, 13 April 2011).[http://www.engadget.com/2011/04/ Computer Applications and Design (ICAD) students (video, 24 May 13/microsofts-kinect-navigates-the-universe-thanks-to-windows-sdk/]. 2011). [http://www.youtube.com/watch?v=2IdqY3HtJJI]. 61. Kamel Boulos MN: Novel Emergency/Public Health Situation Rooms and 32. Multi-Gesture Software for Kinect by Evoluce (video, 1 June 2011). More Using 4-D GIS. Proceedings of ISPRS WG IV/4 International Workshop [http://www.youtube.com/watch?v=GtOaxykFUb8]. on Virtual Changing Globe for Visualisation & Analysis (VCGVA2009): 27-28 33. Evoluce AG. [http://www.evoluce.com/en/]. October 2009; Wuhan University, Wuhan, Hubei, China (ISPRS Archives, Volume 34. Phan T: Using Kinect and OpenNI to Embody an Avatar in Second Life: XXXVIII, Part 4/W10, ISSN No: 1682-1777) [http://www.isprs.org/proceedings/ Gesture & Emotion Transference.[http://ict.usc.edu/projects/gesture_ XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdf]. emotion_transference_using_microsoft_kinect_and_second_life_avatars/]. 62. Szarski M: Real World Mapping with the Kinect (23 January 2011).[https:// 35. Kissko J: Teacher’s Guide to Kinect and Gesture-Based Learning .com/mszarski/KinectMapper]. (kinectEDucation.com, 19 May 2011).[http://www.kinecteducation.com/ 63. Brown M: Kinect hack builds 3D maps of the real world (Wired.co.uk, 24 blog/2011/05/19/teachers-guide-to-innovating-schools-with-gesture-based- January 2011).[http://www.wired.co.uk/news/archive/2011-01/24/3d-kinect- learning/]. map]. 36. Kinect + FAAST 0.06 + Google Earth 3D (TriDef) + 3D TV LG Circularly 64. Treadmill jogging in Second Life® (video, 27 June 2007). [http://www. Polarised (video, 25 February 2011). [http://www.youtube.com/watch? youtube.com/watch?v=c1wtAlAYiUE]. v=aQgASc5i_3U]. 65. : Fitness Evolved - for Kinect. [http://www.xbox.com/ 37. InfoStrat Motion Framework. [http://motionfx.codeplex.com/]. YourShapeFE]. 38. Bing Maps controlled with Kinect 3D-sensing technology (InfoStrat.com, 66. Sakr S: Asus updates Xtion Pro motion sensor, makes it even more like 12 January 2011). [http://www.infostrat.com/home/company_information/ Kinect (engadget, 18 July 2011).[http://www.engadget.com/2011/07/18/ news_room/Bingkinect.htm]. asus-updates-xtion-pro-motion-sensor-makes-it-even-more-like-ki/]. 39. Kinect and GIS (InfoStrat.com video, 4 May 2011). [http://www.youtube. 67. Schiesser T: Lenovo’s Kinect-like “iSec” console launching in China this com/watch?v=0pceMwJMumo]. year (Neowin.net, 7 May 2011).[http://www.neowin.net/news/lenovos- 40. Kinect and Data (InfoStrat.com video, 4 May 2011). [http:// kinect-like-isec-console-launching-in-china-this-year]. www.youtube.com/watch?v=COKnegfZWho]. 68. Kinect controls Windows 7 - WIN&I (video, 31 March 2011). [http://www. 41. Introducing Response’s Kinect + Bing Maps application (VBandi’s blog, 9 youtube.com/watch?v=o4U1pzVf9hY]. May 2011). [http://dotneteers.net/blogs/vbandi/archive/2011/05/09/ introducing-kinect-bing-maps.aspx]. doi:10.1186/1476-072X-10-45 42. Kinect + Bing Maps-the gestures (VBandi’s blog, 10 May 2011). [http:// Cite this article as: Kamel Boulos et al.: Web GIS in practice X: a dotneteers.net/blogs/vbandi/archive/2011/05/10/kinect-bing-maps-the- Microsoft Kinect natural user interface for Google Earth navigation. gestures.aspx]. International Journal of Health Geographics 2011 10:45. 43. Kinoogle project Web site. [https://sites.google.com/site/kinoogle483/]. 44. Kinoogle Demo Finale (video, 3 May 2011). [http://www.youtube.com/ watch?v=kk5jH-7V3hc]. 45. OpenGL. [http://www.opengl.org/]. Submit your next manuscript to BioMed Central 46. Google Earth. [http://www.google.com/earth/index.html]. and take full advantage of: 47. 3Dconnexion SpaceNavigator. [http://www.3dconnexion.com/products/ spacenavigator.html]. 48. Google Earth Download. [http://www.google.com/earth/download/ge/]. • Convenient online submission 49. OpenNI Modules - Stable. [http://openni.org/downloadfiles/ • Thorough peer review opennimodules/openni-binaries/21-stable]. • No space constraints or color figure charges 50. NITE - OpenNI Compliant Middleware Binaries (note NITE installation key on download page). [http://openni.org/downloadfiles/opennimodules/12- • Immediate publication on acceptance openni-compliant-middleware-binaries]. • Inclusion in PubMed, CAS, Scopus and Google Scholar 51. Kamel Boulos MN, Scotch M, Cheung KH, Burden D: Web GIS in practice • Research which is freely available for redistribution VI: a demo playlist of geo-mashups for public health neogeographers. Int J Health Geogr 2008, 7:38. Submit your manuscript at www.biomedcentral.com/submit