A Cloud-Based Robust Semaphore Mirroring System for Social Robots
Total Page:16
File Type:pdf, Size:1020Kb
CONFIDENTIAL. Limited circulation. For review only. A Cloud-Based Robust Semaphore Mirroring System for Social Robots Nan Tian*,1,2 Benjamin Kuo*,2,3, Xinhe Ren,1, Michael Yu 2,4, Robert Zhang,2, Bill Huang, 2 Ken Goldberg1 and Somayeh Sojoudi1,4,5 Abstract— We present a cloud-based human-robot interac- tion system that automatically controls a humanoid robot to mirror a human demonstrator performing flag semaphores. We use a cloud-based framework called Human Augmented Robotic Intelligence (HARI) to perform gesture recognition of the human demonstrator and gesture control of a local humanoid robot, named Pepper. To ensure that the system is real-time, we design a system to maximize cloud computation contribution to the deep-neural-network-based gesture recog- nition system, OpenPose [1], and to minimize communication costs between the cloud and the robot. A hybrid control system is used to hide latency caused by either routing or physical distances. We conducted real-time semaphore mirroring exper- iments in which both the robots and the demonstrator were located in Tokyo, Japan, whereas the cloud server was deployed in the United States. The total latency was 400ms for the video streaming to the cloud and 108ms for the robot commanding from the cloud. Further, we measured the reliability of our gesture-based semaphore recognition system with two human subjects, and were able to achieve 90% and 76.7% recognition Fig. 1. Block diagram: (Top) Architecture diagram of the cloud-based accuracy, respectively, for the two subjects with open-loop semaphore mirroring system built on HARI, which supports AI assisted when the subjects were not allowed to see the recognition teleoperation. We deploy semaphore recognition system in the cloud, and results. We could achieve 100% recognition accuracy when both used a discrete-continuous hybrid control system to control Pepper, a subjects were allowed to adapt to the recognition system under humanoid robot, to imitate semaphore gestures. (Bottom) An example of a closed-loop setting. Lastly, we showed that we can support our system when the demonstrator performs semaphore ”R”, and the robot two humanoid robots with a single server at the same time. imitate him With this real-time cloud-based HRI system, we illustrate that we can deploy gesture-based human-robot globally and at scale. system conveying information at a distance by visual signals I. INTRODUCTION with hand-held flags. Alphabetical letters are encoded by the Humanoid social robots are available commercially, in- positions of the flags with respect to the body. It is still used cluding Softbanks Pepper [2] and Nao [3], AvatarMinds iPal, by lifeguards around the world. and iCub from RobotCub [4]. These robots have similar To perform semaphore recognition, we use 2D video appearances as humans, and have potentials in professional stream that is common in low-cost humanoid robots. How- services, such as retail, hospitality, and educations. Pepper, ever, it is impossible to deploy a real-time gesture-based a humanoid social robot designed by Soft-bank [2], is well recognition system locally on humanoid robots due to limited liked because it has a human voice, Astro boy liked body computation power. Deep learning based gesture recognition design, and generous full body movements. It is also well systems such as OpenPose [1] can only achieve real-time equipped with cameras, LIDAR, ultrasound sensors, and a performance on state-of-the-art deep learning GPU servers. chest input pad, integrated either through Pepper’s android At the same time, compared to Microsoft Kinect based ges- SDK or ROS based system. ture recognitions [5], OpenPose is preferable as it can work In this work, we focus on using Pepper to mirror a person outdoors, can track more people, and is camera agnostic. To performing semaphore, a hand-held flag language. It is a use OpenPose for robot gesture mirroring in real-time, one option is to stream videos from local robots to the cloud The AUTOLab at Berkeley (automation.berkeley.edu) 1Department of Electrical Engineering and Computer Science, Uni- to perform OpenPose gesture recognition and send control versity of California, Berkeley, USA; neubotech, jim.x.ren, signals back to local robots. { goldberg, sojoudi @berkeley.edu To illustrate this concept, we use CloudMinds Inc. HARI, 2Cloudminds Technology} Inc.; robert, bill @cloudminds.com { Human Augmented Robot Intelligence, to support large-scale 3Carnegie} Mellon University, The Robotics Institute; cloud service robot deployment. HARI is a hybrid system 4Department of Mechanical Engineering, University of California, Berke- that combines the power of artificial intelligence (AI) and ley, USA; 5Tsinghua-Berkeley Shenzhen Institute; human intelligence to control robots from the cloud. *Both authors contributed equally to this work Mitigating network latency is essential to make cloud- Preprint submitted to 14th IEEE International Conference on Automation Science and Engineering. Received March 18, 2018. CONFIDENTIAL. Limited circulation. For review only. based motion control robust, because network latency is un- avoidable at times due to routing and physical distance even with a reliable network. We pre-generate semaphore motion trajectories and store them in the local robot command unit (RCU). We execute these trajectories based on commands sent from the cloud. This way, the cloud AI only needs to use alphabetical letters to control robots to perform semaphore, which hides the network latency while the robot moves from one semaphore to another. To test network speeds, we conducted cross-Pacific experi- ments to control a Pepper robot in Japan from a HARI server located in the United States while the human subject was standing in front of the robot in Japan. We further showed that one HARI server could control two robots located in two different locations simultaneously, which demonstrates scalability. II. RELATED WORK An anthropomorphic robot, or humanoid robot, refers to Fig. 2. Programming Single Side Semaphore: A. Semaphore positions a robot whose shape resembles the human body. Many (Top) Seven different positions–cross up, cross down, top, up, flat, down, famous robots are humanoids, including Atlas from Boston home–demonstrated with cartons. (Bottom) Selected examples of single Dynamics [6], Honda research’s Asimo [7], and the first side semaphore positions implemented on Pepper. B. Semaphore Motion Motion trajectories in between different semaphore positons are generated space humanoid robot–NASA’s Robonaut 2 [8], just to name first using MOVEIT! and OMPL in ROS, and then stored on local RCU. a few. Socially interactive robots [9], though not necessarily anthropomorphic, often take humanoid forms, since many tems that rely on remote data or code for effective operation researchers believe humans must be able to interact with [23]. A wide variety of models for connecting robots to the robots naturally, or as human like as possible [10], [11]. cloud have been developed [24]. Some studies have used Others find them easier to control by emulating natural an SaaS model that moves a robot motion controller entirely human robot interactions [9]. Therefore, many works have into the cloud and used it for manipulation planning [25], been dedicated to imitating human gestures [12]–[16], human while others, such as RoboEarth’s Rapyuta system [26], demonstrations [14], [17] and human social interactions follow a Platform as a Service (PaaS) such that others would [18], [19] using humanoid robots. easily program on the platform. A third kind of cloud robotic Imitating full body human gesture requires a 3D position- system, aiming for a more energy efficient robot system, ing camera like Microsoft Kinect and a gesture recognition distributes the computation of robot motion plannings to system such as Kinect SDK [5] or deep neural network based both robots embedded computer and a cloud-based compute OpenPose [1]. The imitating controller maps position and service [27]. HARI is a combination of the three and aims at orientation estimation of human joints into a series of robot providing both AI and human intelligence control for large joint angles–trajectory. These generated gesture trojectories deployment of service type robots. are then used to control humanoids directly [5], indirectly after editing [20], or with a model learned from machine III. SYSTEM DESIGN learning [12], [14]. In Goodrich’s HRI survey [21], he separates HRI into We describe three basic components of our cloud-based two types: remote interaction and proximate interaction. In semaphore mirroring system: humanoid robot, robot com- either case, perception feedback–visual, audio, verbal, or mand unit, and cloud intelligent HARI; and explain how even tactile–is important to control and program robot for cloud-based semaphore system was implemented, including interaction. In this work, we focus on large-scale cloud- semaphore motion generation in ROS, trajectory execution at based social human robot interaction using a humanoid robot, RCU, command line interface on HARI, semaphore recog- named Pepper [2]. We enabled both remote interaction– nition, and semaphore mirroring with hybrid control. teleoperation–and proximate interaction by deploying both visual perception–gesture recognition, and high-level robot A. Humanoid Robot Pepper command–semaphore–in the cloud. A similar cloud robotic We use Pepper, a humanoid social robot designed by Soft- system was built by Kehoe, et. al. before [22] when an object bank, for all implementation. Pepper has two arms. Each arm recognition system was deployed in the cloud to support the has five degrees of freedom (DoF), with one more degree of PR2 robot grasping system. freedom for hand closing and opening. Further, Pepper has James Kuffner coined the term “Cloud Robotics” to de- two DoFs for head movements, two DoFs for hip movements, scribe the increasing number of robotics or automation sys- and one additional DoF for knee movements. Preprint submitted to 14th IEEE International Conference on Automation Science and Engineering. Received March 18, 2018. CONFIDENTIAL.