A Gesture Based Interface for Human-Robot Interaction

A Gesture Based Interface for Human-Rob ot Interaction Stefan Waldherr Computer Science Department Carnegie Mel lon University Pittsburgh, PA, USA Roseli Romero Instituto de Ci^encias Matem aticas e de Computac~ao UniversidadedeS~ao Paulo S~ao Carlos, SP, Brazil Sebastian Thrun Computer Science Department Carnegie Mel lon University Pittsburgh, PA, USA Abstract. Service rob otics is currently a pivotal research area in rob otics, with enormous so cietal p otential. Since service rob ots directly interact with p eople, nding \natural" and easy-to-use user interfaces is of fundamental imp ortance. While past work has predominately fo cussed on issues suchasnavigation and manipulatio n, relatively few rob otic systems are equipp ed with exible user interfaces that p ermit controlling the rob ot by \natural" means. This pap er describ es a gesture interface for the control of a mobile rob ot equipp ed with a manipulator. The interface uses a camera to track a p erson and recognize gestures involvin g arm motion. A fast, adaptive tracking algorithm enables the rob ot to track and follow a p erson reliably through oce environments with changing lighting conditions. Two alternative metho ds for gesture recognition are compared: a template based approach and a neural network approach. Both are combined with the Viterbi algorithm for the recognition of gestures de ned through arm motion in addition to static arm p oses. Results are rep orted in the context of an interactive clean-up task, where a p erson guides the rob ot to sp eci c lo cations that need to b e cleaned and instructs the rob ot to pick up trash. 1. Intro duction The eld of rob otics is currently undergoing a change. While in the past, rob ots were predominately used in factories for purp oses suchas manufacturing and transp ortation, a new generation of \service rob ots" has recently b egun to emerge Schraft and Schmierer, 1998. Service rob ots co op erate with p eople, and assist them in their everyday tasks. Sp eci c examples of commercial service rob ots include the Helpmate rob ot, which has already b een deployed at numerous hospitals world- wide King and Weiman, 1990, an autonomous cleaning rob ot that has c 2000 Kluwer Academic Publishers. Printed in the Netherlands. 2 successfuly b een deployed in a sup ermarket during op ening hours En- dres et al., 1998, and the Rob o-Caddy www.icady.com/homefr.htm, a rob ot designed to make life easier by carrying around golf clubs. Few of these rob ots can interact with p eople other than byavoiding them. In the near future, similar rob ots are exp ected to app ear in various branches of entertainment, recreation, health-care, nursing, etc., and it is exp ected that they interact closely with p eople. This up coming generation of service rob ots op ens up new research opp ortunities. While the issue of robot navigation has b een researched quite extensively Cox and Wilfong, 1990; Kortenkamp et al., 1998; Borenstein et al., 1996, considerably little attention has b een paid to issues of human-robot interaction see Section 5 for a discussion on related literature. However, many service rob ots will b e op erated by non-exp ert users, who might not even b e capable of op erating a computer keyb oard. It is therefore essential that these rob ots b e equipp ed with \natural" interfaces that facilitate the interaction b etween rob ots and p eople. Nevertheless, the need for more e ectivehuman-rob ot interfaces has well b een recognized by the research community.For example, Torrance develop ed in his M.S. thesis a natural language interface for teaching mobile rob ots names of places in an indo or environmentTorrance, 1994. Due to the lack of a sp eech recognition system, his interface still required the user to op erate a keyb oard; nevertheless, the natural language comp onent made instructing the rob ot signi cantly easier. More recently Asoh and colleagues develop ed an interface that inte- grates a sp eech recognition system into a phrase-based natural language interface Asoh et al., 1997. The authors successfully instructed their \oce-conversant" rob ot to navigate to oce do ors and other signif- icant places in their environment through verbal commands. Among p eople, communication often involves more than sp oken language. For example, it is far easier to p oint to an ob ject than to verbally describ e its exact lo cation. Gestures are an easy way to give geometrical informa- tion to the rob ot. Hence, other researchers have prop osed vision-based interfaces that allow p eople to instruct mobile rob ots via arm gestures see Section 5. For example, b oth Kortenkamp and colleagues Ko- rtenkamp et al., 1996 and Kahn Kahn, 1996 have develop ed mobile rob ot systems instructed through arm p oses. Most previous mobile rob ot approaches only recognize static arm p oses as gestures, and they cannot recognize gestures that are de ned through sp eci c temp oral patterns, suchaswaving. Motion gestures, which are commonly used for communication among p eople, provide additional freedom in the design of gestures. In addition, they reduce the chances of accidentally classifying arm p oses as gestures that were not intended as such. Thus, 3 they app ear b etter suited for human rob ot interaction than static p ose gestures alone. Mobile rob ot applications of gesture recognition imp ose several requirements on the system. First of all, the gesture recognition system needs to b e small enough to \ t" on the rob ot, as pro cessing p ower is generally limited. Since b oth the human and the rob ot maybemoving while a gesture is shown, the system may not assume a static background or a xed lo cation of the camera or the human who p erforms a gesture. In fact, some tasks mayinvolve following a p erson around, in which case the rob ot must b e able to recognize gestures while it tracks a p erson and adapt to p ossibly drastic changes in lighting conditions. Additionally, the system must work at an acceptable sp eed. Naturally, one would want that a `Stop' gesture would immediately halt the rob ot|and not ve seconds later. These requirements must b e taken into consideration when designing the system. This pap er presents a vision-based interface that has b een designed to instruct a mobile rob ot through b oth p ose and motion gestures Waldherr et al., 1998. At the lowest level, an adaptive dual-color tracking algorithm enables the rob ot to track and, follow a p erson around at sp eeds of up to 30 cm p er second while avoiding collisions with obstacles. This tracking algorithm quickly adapts to di erent lighting conditions, while segmenting the image to nd the p erson's p osition relative to the center of the image, and using a pan/tilt unit to keep the p erson centered. Gestures are recognized in two phases: In the rst, individual camera images are mapp ed into a vector that sp eci es likeliho o d for individual p oses. We compare two di erent approaches, one based on neural networks, and one that uses a graphical correlation-based template matcher. In the second phase, the Viterbi algorithm is employed to dynamically match the stream of image data with pre-de ned temp oral gesture templates. The work rep orted here go es b eyond the design of the gesture interface. One of the goals of this researchistoinvestigate the usability of gesture interface in the context of a realistic service rob ot appli- cation. The interface was therefore integrated into our existing rob ot navigation and control software. The task of the rob ot is motivated by the \clean-up-the-oce" task at the 1994 mobile rob ot comp etition Simmons, 1995. There, a rob ot had to autonomously search an oce for ob jects scattered at the o or, and to dep osit them in nearby trash bins. Our task di ers in that wewanta human to guide the rob ot to the trash, instructed with gestures. The rob ot should than pick up the trash and carry it and dump it into a trash bin. The remainder of the pap er is organized as follows. Section 2 describ es our approach to visual servoing, followed by the main gesture 4 recognition algorithm describ ed in Section 3. Exp erimental results are discussed in Section 4. Finally, the pap er is concluded by a discussion of related work Section 5 and a general discussion Section 6. 2. Finding and Tracking People Finding and tracking p eople is the core of any vision-based gesture recognition system. After all, the rob ot must know where in the image the p erson is. Visual tracking of p eople has b een studied extensively over the past few years Darrel et al., 1996, Crowley, 1997, and Wren et al., 1997. However, the vast ma jority of existing approaches as- sumes that the camera is mounted at a xed lo cation. Such approaches typically rely on a static background, so that human motion can b e detected, e.g., through image di erencing. In the case of rob ot-based gesture recognition, one cannot assume that the background is static. While the rob ot tracks and follows p eople, background and lighting conditions often change considerably.

Load more