Enterface12-Proceedings.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Forewords The eNTERFACE'12 workshop was organized by the Metz' Campus of Sup´elecand co-sponsored by the ILHAIRE and Allegro European projects. The previous workshops in Mons (Belgium), Dubrovnik (Croatia), Istanbul (Turkey), Paris (France), Genoa (Italy), Amsterdam (The Netherlands) and Plzen (Czech Republic) had an im- pressive success record and had proven the viability and usefulness of this original workshop. eN- TERFACE'12 hosted by Sup´elecin Metz (France) took this line of fruitful collaboration one step further. Previous editions of eNTERFACE have already inspired competitive projects in the area of multimodal interfaces, has secured the contributions of leading professionals and has encouraged participation of a large number of graduate and undergraduate students. We received high quality project proposals among which the 8 following projects were selected. 1. Speech, gaze and gesturing - multimodal conversational interaction with Nao robot 2. Laugh Machine 3. Human motion recognition based on videos 4. M2M -Socially Aware Many-to-Machine Communication 5. Is this guitar talking or what!? 6. CITYGATE, The multimodal cooperative intercity Window 7. Active Speech Modifications 8. ArmBand : Inverse Reinforcement Learning for a BCI driven robotic arm control All the projects resulted in promising results and demonstrations which are reported in the rest of this document. The workshop gathered more than 70 attendees coming from 16 countries all around Europe and even further. We received 4 invited speakers (Laurent Bougrain, Thierry Dutoit, Kristiina Jokinen and Anton Batliner) whose talks were greatly appreciated. The work- shop was held in a brand new 800 m2 building in which robotics materials as well as many sensors were available to the attendees. This is why we proposed a special focus of this edition on topics related to human-robot and human-environment interaction. This event was a unique opportunity for students and experts to meet and work together, and to foster the development of tomorrow's multimodal research community. All this has been made possible thanks to the the good will of many of my colleagues who volunteered before and during the workshop. Especially, I want to address many thanks to J´er´emy 2 who did a tremendous job for making this event as enjoyable and fruitful as possible. Thanks a lot to Matthieu, Th´er`ese,Dani`ele,Jean-Baptiste, Senthil, Lucie, Edouard, Bilal, Claudine, Patrick, Michel, Doroth´ee,Serge, Calogero, Yves, Eric, V´eronique,Christian, Nathalie and Elisabeth. Or- ganizing this workshop was a real pleasure for all of us and we hope we could make it a memorable moment of work and fun. Olivier Pietquin Chairman of eNTERFACE'12 3 The eNTERFACE'12 Sponsors We want to express our gratitude to all the organizations which made this event possible. The eNTERFACE'12 Scientific Committee Niels Ole Bernsen, University of Southern Denmark - Odense, Denmark Thierry Dutoit, Facult´ePolytechnique de Mons, Belgium Christine Guillemot, IRISA, Rennes, France Richard Kitney, University College of London, United Kingdom Beno^ıtMacq, Universit´eCatholique de Louvain, Louvain-la-Neuve, Belgium Cornelius Malerczyk, Zentrum f¨urGraphische Datenverarbeitung e.V, Germany Ferran Marques, Univertat Polit´ecnicade Catalunya PC, Spain Laurence Nigay, Universit´eJoseph Fourier, Grenoble, France Olivier Pietquin, Sup´elec,Metz, France Dimitrios Tzovaras, Informatics and Telematics Intsitute, Greece Jean-Philippe Thiran, Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland Jean Vanderdonckt, Universit´eCatholique de Louvain, Louvain-la-Neuve, Belgium The eNTERFACE'12 Local Organization Committee General chair Olivier Pietquin Co-chair Jeremy Fix Web management Claudine Mercier Technical support Jean-Baptiste Tavernier Social activities Matthieu Geist Administration Danielle Cebe Th´er`esePirrone 4 eNTERFACE 2012 - Project reports Project Title Pages P1 Speech, gaze and gesturing - multimodal conversational in- 7-12 teraction with Nao robot P2 Laugh Machine 13-34 P3 Human motion recognition based on videos 35-38 P5 M2M - Socially Aware Many-to-Machine Communication 39-46 P6 Is this guitar talking or what!? 47-56 P7 CITYGATE, The multimodal cooperative intercity Window 57-60 P8 Active Speech Modifications 61-82 P10 ArmBand : Inverse Reinforcement Learning for a BCI driven 83-88. robotic arm control 5 6 ENTERFACE’12 SUMMER WORKSHOP - FINAL REPORT; PROJECT P1 : MULTIMODAL CONVERSATIONAL INTERACTION WITH NAO ROBOT 7 Speech, gaze and gesturing: multimodal conversational interaction with Nao robot Adam Csapo, Emer Gilmartin, Jonathan Grizou, JingGuang Han, Raveesh Meena, Dimitra Anastasiou, Kristiina Jokinen, and Graham Wilcock Abstract—The report presents a multimodal conversational The report is structured as follows. Section II explains the interaction system for the Nao humanoid robot, developed by multimodal capabilities that we developed for Nao, focussing project P1 at eNTERFACE 2012. We implemented WikiTalk, an on communicative gesturing and its integration with speech existing spoken dialogue system for open-domain conversations, on Nao. This greatly extended the robot’s interaction capabilities interaction. Section III describes the system architecture and by enabling Nao to talk about an unlimited range of topics. In Section IV presents an evaluation of the system based on ques- addition to speech interaction, we developed a wide range of tionnaires and video recordings of human-robot interactions. multimodal interactive behaviours by the robot, including face- Finally, Section V introduces the use of Kinect with Nao to tracking, nodding, communicative gesturing, proximity detection further extend interaction functionality. and tactile interrupts. We made video recordings of user interac- tions and used questionnaires to evaluate the system. We further extended the robot’s capabilities by linking Nao with Kinect. II. MULTIMODAL CAPABILITIES Index Terms—human-robot interaction, spoken dialogue sys- Human face-to-face interaction is multimodal, involving tems, communicative gesturing. several input and output streams used concurrently to transmit and receive information of various types [4]. While proposi- I. INTRODUCTION tional content is transmitted verbally, much additional infor- mation can be communicated via non-verbal and paralinguistic The report presents a multimodal conversational interaction audio (’um’s and ’ah’s in filled pauses, prosodic features), system for the Aldebaran Nao humanoid robot, developed and visual channels (eye-gaze, gesture, posture). These non- by project P1 at eNTERFACE 2012. Our project’s starting verbal signals and cues play a major part in management of point was a speech-based open-domain knowledge access turn-taking, communicating speaker and listener affect, and system. By implementing this system on the robot, we greatly signaling understanding or breakdown in communication. extended Nao’s interaction capabilities by enabling the robot During interaction speakers and listeners produce bodily to talk about an unlimited range of topics. In addition to movements which, alone or in tandem with other audio and speech interaction, we developed a wide range of multimodal visual information, constitute cues or signals which aid under- interactive behaviours by the robot, including face-tracking, standing of linguistic information, signal comprehension, or nodding, communicative gesturing, proximity detection and display participants’ affective state. Movements include shifts tactile interrupts, to enhance naturalness, expressivity, user- in posture, head movements, and hand or arm movements. We friendliness, and add liveliness to the interaction. take ’gesture’ to include head and hand or arm movements. As the basis for speech interaction, we implemented on Nao the WikiTalk system [1], [2], that supports open-domain conversations using Wikipedia as a knowledge source. Earlier A. Gestures work with WikiTalk had used a robotics simulator. This report Nao Wikitalk was designed to incorporate head, arm and describes the multimodal interactive behaviours made possible body movements to approximate gestures used in human by implementing “Nao WikiTalk” on a real robot. conversation. This section describes the motivation for adding Based on the above, the Nao robot with Nao WikiTalk can gestures to Nao, and their design and synthesis. A more be regarded as a cognitive robot, since it can reason about how comprehensive description of enhancing Nao with gestures and to behave in response to the user’s actions. However, in the posture shifts can be found in [5]. broader sense, the combination of Nao and WikiTalk is also Gestures take several forms and perform different functions. viewed as a cognitive infocommunication system, as it allows Following [6], we can distinguish commands and commu- users to interact via the robot with Wikipedia content that is nicative gestures, and the latter can be categorized further remote and maintained by a wider community. as speech-independent (emblems -’ok’ sign) or speech de- This report was published at CogInfoCom 2012 [3]. pendent (gestures accompanying speech). Speech dependent gestures may be iconic or metaphoric - ”the fish was this A. Csapo is with Budapest University of Technology and Economics. E. Gilmartin and J. Han are with Trinity College Dublin. big” with hands apart to show dimension, a palm-upward J. Grizou is with INRIA, Bordeaux. ’giving’ gesture at start of narration. They may also be deictic R. Meena is with KTH, Stockholm.