Ararabots@Home 2019 Team Description Paper

Ararabots@home 2019 Team Description Paper Edson T. Matsubara, Davi F. Santo, Kenzo M. Sakiyama, Leonardo O. Esperança, Mauro B. de Paula, Marina P. Barboza, Marllon L. R. Rosa, Walter E. S. S. Filho Laboratorio´ de Inteligenciaˆ Artificial & Faculdade de Computaçao˜ Universidade Federal de Mato Grosso do Sul Campo Grande, Brasil [email protected] Abstract—In this paper we provide a full description of the project in progress of AraraBox, an autonomous system for domestic use developed by a new team of the Artificial Intelligence Lab of Federal University of Mato Grosso do Sul. This paper does an introduction to our team and our work over the year to develop an autonomous robot qualified for the tasks in this category. We discuss several functions implemented or to be implemented such as object and facial recognition, voice recognition and emulation, object manipulation and navigation. I. INTRODUCTION Our research group is starting a new project that is to build a low-cost domestic robot with all the functions proposed by RoboCup for RoboCup@Home [1] category, although this is a new category for our team and we have less than six months, a participation in RoboCup Competition phase in Brazil hosted in Latin American Robotics Competition (LARC/CBR) is Fig. 1. Throphies acquired in 2012 LARC/CBR needed to help develop Robotics, Artificial Intelligence and related areas by sharing knowledge and learning from other We continued to participate in VSSS in the following years, competitors. in 2018 along with VSSS we returned with SEK category and Ararabox is our first try joining the category created in 2006 we achieved 9th place and 15th place in each category. This where we can apply not just previous robotics knowledge, results shows that we try to get better every competiton. For but also machine learning algorithms, computer vision, natural this year (2019) we will also compete in Robocup@home, language processing, etc. the last ones being areas of research trying to use and develop the newest technologies inside our of our laboratory. lab. In this paper we discuss and show the partial results III. ROBOT FEATURES and expectations of projects under development and to be developed. First we will give a brief history of our team and This section describes separately each module of our workplace, then we explain the features and softwares included project, and its features and peculiarities. Initially the project in this project. was divided into six parts: vision system, navigation, hardware design, voice recognition, voice emulation and object manipulation. Each one of them are more specified below. II. ARARABOTS TEAM BRIEF BACKGROUND A. AraraBox System Created in 2010, the Ararabots team emerged inside our AI We decided to use Python as the base programming lan- research lab named LIA(Laboratorio´ de Inteligenciaˆ Artificial) guage since we already have a previous knowledge and still [2], hosted at UFMS, in a way to put in practice some use in another category along with the ROS system [3] [4]. techniques used inside the lab and expand our work into That said, we plan for ROS to manage resources for every robotics. The main goal was to develop new technologies in vital part of our project without deadlocks and transmission programming and hardware development. Since 2011 our team problems using the tools available through the system. participated in almost every LARC/CBR competition hosted It receives camera images for object and face recognition, around our country, mostly in categories like IEEE VSSS or point cloud from Xbox Kinect in order to map the ambient, IEEE SEK achieving third place, on both in 2012. voice commands and output values to move the motors and the gripper accordingly and voice emulation data in response to commands. It is a challenge to continue our goal of developing a low- cost robot, we are looking for the best ways to perform the tasks so they are cost-effective and work optimally so we can help make these technologies more accessible. B. Vision System The vision system uses a high definition 720p webcam to acquire frames that are used in a python-developed subsystem, that makes a threaded camera read of frames, trying to diminish I/O blocking operations freezing the system, and does series of processing steps in order to do tasks like object detection and face recognition. Fig. 3. Face detection sample from a Jurassic Park frame 1) Object Detection: The object detection task was implemented using a OpenCV [5] module called DNN, that can load Separating the areas of object detection and facial recogni- and use lots of pre-trained, deployed deep neural networks tion can end up being costly if some care is not taken. With models to do recognition-like tasks. that in mind, we will implement a way to reduce this cost, so that these two areas can communicate and have mutual cooperation. C. Navigation The navigation of a robot in an unknown environment is a problem already known and solved in the literature through a technique known as SLAM (Simultaneous Localization and Mapping) [8]. As the nervous system of the robot was used ROS, so it is kind of immediate that the same would be used as a SLAM tool. ROS offers some tools already implemented that perform this task very efficiently, one of them gmapping, which was the first approach used. However the chosen ROS version (Melodic Morena) at the time still did not support gmapping, so we tried another tool, called RTAB-Map [9]. Fig. 2. Object detection sample from RetinaNet implementation [6] We selected some classes in addition the ones present at @home rule book and constructed a labeled dataset to train a neural network model to accomplish the task and then use it in our system as an inference model. The model used was an implementation of RetinaNet object detector created by Facebook Research [7]. The neural network is divided into two parts, a feature extractor backbone and two sub networks to do object classi- fication and bounding box suppression. Fig. 4. RTAB-Map mapping tool example 2) Face Recognition: The facial recognition task will be implemented using OpenCV and deep learning. The main goal The RTAB-Map (Real-Time Appearance-Based Mapping), will be to modify a fast and efficient face detection algorithm a RGB-D SLAM approach based on a global loop closure so that it can be used as a facial recognition algorithm. detector with real-time constraints. The loop closure detector We train some faces labeled as we want, using lots of uses a bag-of-words approach to determinate how likely a new examples from each face, and examples for a unknown generic image comes from a previous location or a new location. When face. We train the module to recognize each member our team a loop closure hypothesis is accepted, a new constraint is added and any other person as unknown. to the map’s graph, then a graph optimizer minimizes the errors in the map. It’s proved to be efficient given the mapping About the motors, they have four controller pins that will and localization functions already implemented, and totally be used to control the movement of the robot: start/stop, compatible with the ROS version. To obtain the point cloud brake, rotation and speed setting. Their functions are described used by the RTAB-Map we used Kinect, more specifically the below: XBOX-360 version. • start/stop: this signal is responsible of starting the motor controller. • brake: when active(ON), the motor completely stops it’s rotation acting like a brake. • rotation: this signal is used to indicate the rotation direc- tion of the motor. • speed setting: this signal is used with PWM(Pulse Width Modulation) to control the motor’s current speed. The signals start/stop, brake and rotation are active in low Fig. 5. the Xbox 360 Kinect RGBD sensor voltage (voltage level of 0 to 0.5 V indicates the signal is ON and 4 to 5 V indicates it is OFF). D. Hardware Design and Implementation An Arduino Uno board will be used to send signals to the controllers pins. It will receive packets with commands codi- This section explains aspects related to the robot hardware, fied in bytes from the main computer, decode and transform below hardware components of the robot are detailed in a it’s message into electric signals to control the motors. All succinct way: communication board-PC will be done using serial communi- AraraBox has an Arduino Uno equiped with ATMEGA328P cation. [10] processor, which is the microcontroller responsible for Since we are not using a pre-assembled base and structure, the two BLHM015K-30 [11] (1/50 HP Brushless DC 30:1) we made a design cheap, simple and functional with romm for motors used to move the robot, the motors have encoder and improvements. The base assembly is completed and working, a BLHD15K controller board. The circuit power is supplied and we are building the remaining structure. by external power, by 3 Makita Lithium-Ion batteries of 18V and 3.0Ah each; A Thinkpad Lenovo notebook, with intel core i5 and 8GB RAM memory; an X-BOX 360 Kinect as RBGD sensor; a Samsung Galaxy Tab4 Tablet to emulate a face and give a better human–computer interaction; an external Microphone to get voice commands; a Speaker to reproduce voice; a Microsoft Lifecam Cinema 720p HD. Fig. 7. AraraBox WIP base E. Voice Recognition For the speech/voice recognition submodule, we intend to use in the development of our robot the CMU PocketSphinx Recognition Engine [12].

Ararabots@Home 2019 Team Description Paper

Ntouch PC User Guide

Escuela Superior Politécnica De Chimborazo

Webcam Imaging Webcam Imaging Choosing Your Webcam Other Equipment

Webcams & Headphones

Microsoft Lifecam VX-3000

Wirecast 7 Example Setups There Are Many Ways to Set up Wirecast

Ograniczona Gwarancja WAŻNE! W POSTANOWIENIACH NINIEJSZEJ OGRANICZONEJ GWARANCJI PRZEDSTAWIONO PRAWA I ZOBOWIĄZANIA LICENCJOBIORCY

Lifecam Studio Studio

Ware Warranty & Agreement

Kinect PPE Gowning Conformation

Limited Warranty Garantía Limitada Contents Limited Warranty

Version Information Product Version Microsoft® Lifecam™ VX-3000