Rogue : Robot Gesture Engine

RoGuE : Robot Gesture Engine Rachel M. Holladay and Siddhartha S. Srinivasa Robotics Institute, Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Abstract We present the Robot Gesture Library (RoGuE), a motion-planning approach to generating gestures. Ges- tures improve robot communication skills, strengthen- ing robots as partners in a collaborative setting. Previous work maps from environment scenario to gesture selection. This work maps from gesture selection to gesture execution. We create a flexible and common language by parameterizing gestures as task-space constraints on robot trajectories and goals. This allows us to leverage powerful motion planners and to generalize across environments and robot morphologies. We demonstrate RoGuE on four robots: HREB, ADA, CURI and the PR2. Figure 1: Robots HERB, ADA, CURI and PR2 using RoGuE 1 Introduction (Robot Gesture Engine) to point at the fuze bottle. To create robots that seamlessly collaborate with people, we propose a motion-planning based method for generating ges- qualitative manner, as a method for expressing intent and tures. Gesture augment robots’ communication skills, thus emotion. Previous robot gestures systems, further detailed improving the robot’s ability to work jointly with humans in in Sec. 2.2, often concentrate on transforming ideas, such as their environment (Breazeal et al. 2005; McNeill 1992). This inputted text, into speech and gestures. work maps existing gesture terminology, which is generally These gesture systems, some of which use canned ges- qualitative, to precise mathematical definitions as task space tures, are difficult to use in cluttered environment with constraints. object-centric motions. It is critical to consider the environ- By augmenting our robots with nonverbal communica- ment when executing a gesture. Even given a gesture-object tion methods, specifically gestures, we can improve their pair, the gesture execution might vary due to occlusions and communication skills. In human-human collaborations, ges- obstructions in the environment. Additionally, many previ- tures are frequently used for explanations, teaching and ous gesture systems are tuned to a specific platform, limiting problem solving (Lozano and Tversky 2006; Tang 1991; their generalization across morphologies. Reynolds and Reeve 2001; Garber and Goldin-Meadow Our key insight is that many gestures can be translated 2002). Robots have used gestures to improve their persua- into task-space constraints on the trajectory and the goal, siveness and understandability while also improving the ef- which serves as a common language for gesture expression. ficiency and perceived workload of their human collaborator This is intuitive to express, and consumable by our powerful (Chidambaram, Chiang, and Mutlu 2012; Lohse et al. 2014). motion planning algorithms that plan in clutter. This also While gestures improve a robot’s skills as a partner, their enables generalization across robot morphologies, since the use also positively impacts people’s perception of the robot. robot kinematics are handled by the robot’s planner, not the Robots that use gestures are viewed as more active, likable gesture system. and competent (Salem et al. 2013; 2011). Gesturing robots Our key contribution is a planning engine, RoGuE (Robot are more engaging in game play, storytelling and over long- Gesture Engine), that allows a user to easily specify task- term interactions (Carter et al. 2014; Huang and Mutlu 2014; space constraints for gestures. Specifically we formalize Kim et al. 2013). gestures as instances of Task Space Regions (TSR), a gen- As detailed in Sec. 2.1 there is no standardized classifi- eral constraint representation framework (Berenson, Srini- cation for gestures. Each taxonomy describes gestures in a vasa, and Kuffner 2011). This engine has already been de- Copyright c 2016, Association for the Advancement of Artificial ployed to several robot morphologies, specifically HERB, Intelligence (www.aaai.org). All rights reserved. ADA, CURI and the PR2, seen in Fig.1. The source code for RoGuE is publicly available and detailed in Sec. 6. its clear that gestures are an important part of an en- RoGuE is presented as a set of gesture primitives that gaging robot’s skill set (Severinson-Eklundh, Green, and parametrize the mapping from gestures to motion planning. Huttenrauch¨ 2003; Sidner, Lee, and Lesh 2003). We do not explicitly assume a higher-level system that au- tonomously call these gesture primitives. 3 Library of Collaborative Gestures We begin by presenting an extensive view of related work As mentioned in Sec. 2.1, in creating our list we were in- followed by the motivation and implementation for each spired by Sauppe’s work (Sauppe´ and Mutlu 2014). Zhang gesture. We conclude with a description of each robot plat- studies a teaching scenario and found that only five ges- form using RoGuE and a brief discussion. tures made up 80% of the gestures used and of this, pointing dominated use (Zhang et al. 2010). Motivated by both these 2 Related Work works we focus on four key features: pointing, presenting, In creating a taxonomy of collaborative gestures we begin exhibiting and sweeping. We further define and motivate the by examining existing classifications and previous gesture choice of each of these gestures. systems. 3.1 Pointing and Presenting 2.1 Gestures Classification Across language and culture we use pointing to refer to ob- Despite extensive work in gestures, there is no existing com- jects on a daily basis (Kita 2003). Simple deictic gestures mon taxonomy or even standardized set of definitions (Wex- ground spatial references more simply than complex refer- elblat 1998). Kendon presents a historical view of termi- ential description (Kirk, Rodden, and Fraser 2007). Robots nology, demonstrating the lack of agreement, in addition to can, as effectively as human agents, use pointing as a refer- adding his own classification (Kendon 2004). ential cue to direct human attention (Li et al. 2015). Karam examines forty years of gesture usage in human- Previous pointing systems focus on understandability, ei- computer interaction research and established five major cat- ther by simulating cognition regions or optimizing for a leg- egories of gestures (Karam and others 2005). Other method- ibility (Hato et al. 2010; Holladay, Dragan, and Srinivasa ological gestures classifications name a similar set of five 2014). Pointing in collaborative virtual environments con- categories (Nehaniv et al. 2005). centrates on improving pointing quality by verifying that the Our primary focus is gestures that assist in collaboration information was understood correctly (Wong and Gutwin and there has been prior work with a similar focus. Clark cat- 2010). Pointing is also viewed as a social gesture that should egorizes coordination gestures into two groups: directing-to balance social appropriateness and understandability (Liu et and placing-for (Clark 2005). Sauppe presents a series of al. 2013). gestures focused on deictic gestures that a robot would use The natural human end effector shape when pointing is to communicate information to a human partner (Sauppe´ and to close all but the index finger (Gokturk and Sibert 1999; Mutlu 2014). Sauppe’s taxonomy includes: pointing, pre- Cipolla and Hollinghurst 1996) which serves as the pointer. senting, touching, exhibiting, sweeping and grouping. We Kendon formally describes this position as ’Index Finger implemented four of these, omitting touching and grouping. Extended Prone’. We adhere to this style as closely as the robot’s morphology will allow. Presenting achieves a sim- 2.2 Gesture Systems ilar goal, but is done with, as Kendon describes, an ’Open Hand Neutral’ and ’Open Hand Supine’ hand position. Several gesture systems, unlike our own, integrate body and facial features with a verbal system, generating the 3.2 Exhibiting voice and gesture automatically based on a textual input While pointing and presenting can refer to an object or spa- (Tojo et al. 2000; Kim et al. 2007; Okuno et al. 2009; tial region, exhibiting is a gesture used to show off an object. Salem et al. 2009). BEAT, Behavior Expression Animation Exhibiting involves holding an object and bringing emphasis Toolkit, is an early system that allows animators to input text to it by lifting it into view. Specifically, exhibiting deliberat- and outputs synchronized nonverbal behaviors and synthe- ing displays the object to an audience (Clark 2005). Exhibit- sized speech (Cassell, Vilhjalmsson,´ and Bickmore 2004). ing and pointing are often used in conjunction in teaching Some approach gesture generation as a learning problem, scenarios (Lozano and Tversky 2006). either learning via direct imitation or through Gaussian Mix- ture Models (Hattori et al. 2005; Calinon and Billard 2007). 3.3 Sweeping These method focus on being data-driven or knowledge- based as they learn from large quantities of human examples A sweeping gesture involves making a long, continuous (Kipp et al. 2007; Kopp and Wachsmuth 2000). curve over the objects or spatial area being referred to. Alternatively, gestures are framed as constrained in- Sweeping is a common technique used by teachers, where verse kinematics problem, either concentrating on smooth they sweep across various areas to communicate abstract joint acceleration or on synchronizing head and eye move- ideas or regions (Alibali, Flevares, and Goldin-Meadow ment (Bremner et al. 2009; Marjanovic, Scassellati, and 1997). Williamson 1996). Other gesture system

Load more