<<

Enhancing geometric maps through environmental interactions

SERGIO S. CACCAMO

Doctoral Thesis Stockholm, Sweden 2018 , Perception and Learning School of Electrical Engineering and Computer Science TRITA-EECS-AVL-2018:26 KTH Royal Institute of Technology ISBN 978-91-7729-720-8 SE-100 44 Stockholm, Sweden

Copyright © 2018 by Sergio S. Caccamo except where otherwise stated.

Tryck: Universitetsservice US-AB 2018 iii

Abstract

The deployment of rescue in real operations is becoming increasingly com- mon thanks to recent advances in AI technologies and high performance hardware. Rescue robots can now operate for extended period of time, cover wider areas and process larger amounts of sensory information making them considerably more useful during real life threatening situations, including both natural or man-made disasters. In this thesis we present results of our research which focuses on investigating ways of enhancing visual perception for Unmanned Ground Vehicles (UGVs) through environmental interactions using different sensory systems, such as tactile sensors and wireless receivers. We argue that a geometric representation of the surroundings built upon vi- sion data only, may not suffice in overcoming challenging scenarios, and show that robot interactions with the environment can provide a rich layer of new information that needs to be suitably represented and merged into the cognitive world model. Visual perception for mobile ground vehicles is one of the fundamental problems in rescue robotics. Phenomena such as rain, fog, darkness, dust, smoke and fire heav- ily influence the performance of visual sensors, and often result in highly noisy data, leading to unreliable or incomplete maps. We address this problem through a collection of studies and structure the thesis as fol- low: Firstly, we give an overview of the Search & Rescue (SAR) robotics field, and discuss scenarios, hardware and related scientific questions. Secondly, we focus on the problems of control and communication. Mobile robots require stable communication with the base station to exchange valuable information. Communication loss often presents a significant mission risk and disconnected robots are either abandoned, or autonomously try to back-trace their way to the base station. We show how non-visual environmental properties (e.g. the WiFi signal distribution) can be efficiently modeled using probabilistic active perception frameworks based on Gaussian Processes, and merged into geometric maps so to facilitate the SAR mission. We then show how to use tactile perception to enhance mapping. Implicit environmen- tal properties such as the terrain deformability, are analyzed through strategic glances and touches and then mapped into probabilistic models. Lastly, we address the problem of reconstructing objects in the environment. We present a technique for simultaneous 3D reconstruction of static regions and rigidly moving objects in a scene that enables on-the-fly model generation. Although this thesis focuses mostly on rescue UGVs, the concepts presented can be applied to other mobile platforms that operates under similar circumstances. To make sure that the suggested methods work, we have put efforts into design of user interfaces and the evaluation of those in user studies. iv

Sammanfattning

Användandet av räddningsrobotar vid olyckor och katastrofer blir allt vanligare, tack vare framsteg inom AI och annan informationsteknologi. Dessa framsteg gör att räddningsrobotarna nu kan arbeta under längre tidsperioder, täcka större områden och bearbeta större mängder sensorinformation än tidigare, vilket är användbart i många olika scenarier. I denna avhandling presenterar vi resultat som handlar om att komplettera de vi- suella sensorer som obemannade markfarkoster (, UGV:er) ofta är beroende av. Exempel på kompletterande sensorer är taktila sensorer, som tryck- sensorer längst ut på en robotarm, och mätningar av signalstyrkan i en trådlös förbin- delse. Vi visar att en geometrisk representation av robotens omgivningar baserad på en- dast visuell information kan vara otillräckligt i svåra uppdrag, och att interaktion med omgivningen kan ge ny avgörande information som behöver representeras och integre- ras i robotens kognitiva omvärldsmodell. Användandet av visuell information för skapandet av en omvärldsmodell har länge varit ett av de centrala problemen inom räddningsrobotik. Förutsättningarna för ett räddningsuppdrag kan dock ändras snabbt, genom t.ex. regn, dimma, mörker, damm, rök och eld, vilket drastiskt försämrar kvalitén på kamerabilder, och de kartor som roboten skapar utifrån dem. Vi analyserar dessa problem i en uppsättning studier, och strukturerar avhandlingen enligt följande: Först presenterar vi en översikt över räddningsrobotområdet, och diskuterar sce- narier, hårdvara, och tillhörande forskningfrågor. Därefter fokuserar vi på problemen rörande styrning och kommunikation. I de flesta tillämpningar är kommunikationslänken mellan operatör och räddningsrobot av- görande för uppdragets utförande. En förlorad förbindelse innebär antingen att roboten överges, eller att den på egen hand försöker återfrå kontakten genom att ta sig till- baka den väg den kom. Vi visar hur icke-visuell omvärldsinformation, så som WiFi- signalstyrka, kan modelleras med probabilistiska ramverk som Gaussiska Processer, och införlivas med en geometrisk karta för att möjliggöra slutförandet av uppdraget. Sedan visar vi hur man kan använda taktil information för att förbättra omvärldsmo- dellen. Egenskaper så som deformabilitet undersöks genom att röra omgivningen på noggrant utvalda ställen, och informationen aggregeras i en probabilistisk modell. Till sist visar vi hur man kan rekonstruera objekt i omgivningen, med hjälp av en metod som fungerar på både statiska objekt och rörliga icke-deformerbara kroppar. Trots att denna avhandling är uppbyggd kring scenarier och behov för räddnings- robotar kan de flesta metoder även tillämpas på de många andra typer av robotar som har liknande problem i form av ostrukturerade och föränderliga miljöer. För att säker- ställa att de föreslagna metoderna verkligen fungerar har vi vidare lagt stor vikt vid val av interface och utvärderat dessa i användarstudier. v

Acknowledgments

Thanks to my supervisors, Dani and Petter. Dani, thank you for your precious advice which have helped me in countless situations. You have taught me the art of being constructively self-critical, a not-so-obvious skill to take myself through the academic life and become a researcher. Petter, your knowledge and your enthusiasm have been an inspiration in both my profes- sional and my personal life. I have enjoyed our conversations, you have the power of making any subject fun, easing it up even in the toughest of times. Thanks to all of the RPL professors for their useful insights and for making the lab such a pleasant, fair and stimulating environment. Thank you John for your feedback and assistance in improving this thesis. My sincere gratitude goes to all of the great people at RPL and CVAP, with whom I have shared amazing trips, fruitful conversations and countless laughs. Fredrik, your generosity and genuineness helped me a lot during these years, I am happy to have shared the office with you. Thank you to all my colleagues and friends in ETH, CTU, Fraunhofer, DFKI, TNO, TU Delft, University La Sapienza, Ascendic Technologies, and fire departments of Italy (VVF), Dortmund (FDDO) and Rozenburg (GB), for making TRADR such a fun, wonder- ful and successful experience. I am grateful to MERL and to all of the wonderful and generous people I have met in Boston, in special account to Esra and Yuichi for recruiting and welcoming me into their lab in Cambridge. Thank you dear innebandy fellows for all of those awesome and very much needed games. My friends, thank you all for being there when I needed you most. My family, Giovanni, Maria and Daniele, and my grandmas. Grazie dal profondo del cuore. Con il vostro amore incondizionato siete la mia forza e il mio rifugio in tempi diffi- cili. Finally, I want to express my deepest gratitude to Carlotta. Non sarei mai arrivato dove sono senza il tuo constante supporto, i tuoi consigli, il tuo affetto, la tua forza e quel dono meraviglioso che hai di riuscire sempre a sorridere e vedere il buono nelle cose in qualunque circostanza. Ti ammiro e ti sarò per sempre riconoscente. I gratefully acknowledge funding under the European Union’s seventh framework program (FP7), under grant agreements FP7-ICT-609763 TRADR.

This thesis is dedicated to my grandpas, Salvo and Vincenzo. I miss you dearly.

Sergio Caccamo Stockholm, March 2018 vi

List of Papers

The thesis is based on the following papers:

[A] Fredrik Båberg, Sergio Caccamo, Nanjia Smets, Mark Neerincx and Petter Ögren. Free Look UGV Teleoperation Control Tested in Game Environ- ment: Enhanced Performance and Reduced Workload. In Proceedings of the 2016 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR’16), Lausanne, Switzerland, October 2016. [B] Sergio Caccamo, Ramviyas Parasuraman, Fredrik Båberg and Petter Ögren. Extending a ugv teleoperation flc interface with wireless network connectivity information. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15), Hamburg, Germany, September 2015. [C] Ramviyas Parasuraman, Sergio Caccamo, Fredrik Båberg, Petter Ögren and Mark Neerincx. A New UGV Teleoperation Interface for Improved Aware- ness of Network Connectivity and Physical Surroundings. In Journal of Human-Robot Interaction (JHRI), December 2017. [D] Sergio Caccamo, Ramviyas Parasuraman, Luigi Freda, Mario Gianni and Pet- ter Ögren. RCAMP: Resilient Communication-Aware Motion Planner and Autonomous Repair of Wireless Connectivity in Mobile Robots. In Proceed- ings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17), Vancouver, Canada, September 2017. [E] Sergio Caccamo, Yasemine Bekiroglu, Carl Henrik Ek and Danica Kragic. Active Exploration Using Gaussian Random Fields and Gaussian Process Im- plicit Surfaces. In Proceedings of the 2016 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS’16), Daejeon, Korea, October 2016. [F] Sergio Caccamo, Püren Guler, Hedvig Kjellström and Danica Kragic. Active Perception and Modeling of Deformable Surfaces using Gaussian Processes and Position-based Dynamics. In Proceedings of the 2016 IEEE/RAS Interna- tional Conference on Humanoid Robots (HUMANOIDS’16), Cancun, Mexico, November 2016. [G] Sergio Caccamo, Esra Ataer-Cansizoglu, Yuichi Taguchi. Joint 3D Recon- struction of a Static Scene and Moving Objects In Proceedings of the 2017 International Conference on 3D Vision (3DV’17), Qingdao, China, October 2017. vii

Other relevant publications not included in the thesis:

[A] Wim Abbeloos, Esra Ataer-Cansizoglu, Sergio Caccamo, Yuichi Taguchi, Yukiyasu Domae 3D Object Discovery and Modeling Using Single RGB- D Images Containing Multiple Object Instances. In Proceedings of the 2017 International Conference on 3D Vision (3DV’17), Qingdao, China, October 2017. [B] Wim Abbeloos* and Sergio Caccamo* 1, E. Ataer-Cansizoglu, Y. Taguchi, C. Feng, Teng-Yok Lee. Detecting and Grouping Identical Objects for Region Proposal and Classification. In Proceedings of the 2017 IEEE The Conference on Computer Vision and Pattern Recognition (CVPR’17), Workshop - Deep Learning for Robotic Vision , Honolulu, Hawaii, July 2017. [C] Ramviyas Parasuraman, Sergio Caccamo, Petter Ögren, Byung-Cheol Min. An Approach to Retrieve from Communication Loss in Field Robots. In IEEE Robotics: Science and Systems (RSS’17), Workshop - Robot Communication in the Wild: Meeting the Challenges of Real-World Systems, Cambridge, Mas- sachusetts, USA, July 2017. [D] D. Almeida, R. Ambrus, S. Caccamo, X. Chen, S. Cruciani, J. F. Pinto B. De Carvalho, J. Haustein, A. Marzinotto, F. E. Viña B., Y. Karayiannidis, P. Ögren, P. Jensfelt and D. Kragic. Team KTH's Picking Solution for the Ama- zon Picking Challenge 2016. In The 2017 IEEE International Conference on Robotics and Automation (ICRA’17), Workshop - Warehouse Picking Au- tomation Workshop 2017: Solutions, Experience, Learnings and Outlook of the Amazon Picking Challenge, Singapore, China, May 2017. [E] Fredrik Båberg, Yaquan Wang, Sergio Caccamo, Petter Ögren. Adaptive ob- ject centered teleoperation control of a mobile manipulator. In The 2016 IEEE International Conference on Robotics and Automation (ICRA’16), Stockholm, Sweden, May 2016.

1* indicates equal contribution to this work. viii

List of Acronyms

AP Wireless Access Point; Active Perception AUV Autonomous Underwater Vehicle CAMP Communication Aware Motion Planner CNN Convolutional Neural Network DARPA Defense Advance Research Project Agency DNN Deep Neural Network DoA Direction of Arrival of a Radio Signal FLC Free Look Control FEM Finite Element Method GP Gaussian Process GPR Gaussian Process for Regression GPIS Gaussian Process Implicit Surface GRF Gaussian Random Field HMI Human Machine Interface IP Interactive Perception MSM Meshless Shape Matching OCU Operator Control Unit PBD Position-based Dynamics RCAMP Resilient Communication Aware Motion Planner ROV Remote Operated Vehicle RSME Rating Scale Mental Effort RSS Radio Signal Strength S&R Search And Rescue SA Situation Awareness SAR Search And Rescue SCE Situated Cognitive Engineering SLAM Simultaneous Localization And Mapping TRADR The EU project Long Term Human Robot Teaming for Disaster Response TC Tank Control UAV UGV Unmanned Ground Vehicle USV USAR Urban Search And Rescue UUV Unmanned Underwater or Undersea Vehicle WMG Wireless Map Generator Contents

Contents ix

I Introduction 1

1 Introduction 3 1.1 Robotic systems that save lives ...... 4 1.2 Vision in SAR robotics ...... 5 1.3 Contributions and thesis outline ...... 6

2 Disaster Robotics 11 2.1 Overview ...... 11 2.2 Historical Perspective ...... 13 2.3 Types of Rescue Robots ...... 15 2.4 The User’s Perspective ...... 17 2.5 Notable SAR Research Projects, Groups and Challenges ...... 18 2.6 SAR Problems Addressed in This Thesis ...... 19 2.6.1 Control ...... 20 2.6.2 Communication ...... 21 2.6.3 Human-robot interaction ...... 21 2.6.4 Mapping ...... 22

3 Enhancing perception capabilities 23 3.1 Visual Perception System ...... 23 3.1.1 RGBD cameras and LiDARs ...... 23 3.1.2 Limitation of vision sensors ...... 24 3.2 Benefits of Active and Interactive Perception ...... 25 3.3 Geometric Representations ...... 26 3.4 A Probabilistic Approach to Interactive Mapping ...... 27 3.4.1 Gaussian Random Fields ...... 28 3.4.2 Gaussian Processes Implicit Surfaces ...... 28 3.5 Strategies for training and interacting ...... 29 3.6 Mapping the environment and moving objects ...... 31

ix x CONTENTS

4 Conclusions 35 4.1 Future Work ...... 36

5 Summary of Papers 39 A Free Look UGV Teleoperation Control Tested in Game Environment: En- hanced Performance and Reduced Workload ...... 40 B Extending a ugv teleoperation flc interface with wireless network connec- tivity information ...... 41 C A New UGV Teleoperation Interface for Improved Awareness of Network Connectivity and Physical Surroundings ...... 42 D RCAMP: Resilient Communication-Aware Motion Planner and Autonomous Repair of Wireless Connectivity in Mobile Robots ...... 43 E Active Exploration Using Gaussian Random Fields and Gaussian Process Implicit Surfaces ...... 44 F Active Perception and Modeling of Deformable Surfaces using Gaussian Processes and Position-based Dynamics ...... 45 G Joint 3D Reconstruction of a Static Scene and Moving Objects ...... 46

Bibliography 47 Part I

Introduction

Chapter 1

Introduction

For most living beings, the ability to sense and understanding the surrounding is of imper- ative importance for both surviving and evolution. Perception of the environment is a core concept in the field of robotics too. A robot must be able to sense, process and organize sensory information in order to acquire new knowledge, comprehend the surrounding and represent it. A good perception system al- lows a robot to refine its own cognitive model and make more accurate behavioral predic- tions for a specific task or tool. Computer Vision is the science that studies visual perception for machines. For the most part, computer vision scientists have been focusing on visual recognition, image en- hancement or mapping problems that exploit passive observations of the environment (i.e. pictures or videos). In contrast to a computer, though, a robot is an embodied agent, able to perform actions and interact with the environment. In this thesis we argue that mobile robotic systems, especially the ones involved in dis- aster response, should create and use a world representation that is not built upon merely vision data but on a collection of rich sensory signals result from the interaction of the robot with the environment. We present a series of studies that revolve around the idea that enhancing a geometric map using non visual sensor data leads to a more exhaustive world representation that greatly helps overcoming difficult scenarios. To achieve this, we will initially give a small introduction to the field of Search and Rescue Robotics, highlighting the open problems and difficulties that arise when using complex robotic systems in disaster scenarios and what solutions we propose to mitigate them. We then discuss more in detail the science and technology at the base of our work, and con- clude with a presentation of our scientific contributions. In this thesis we use the terms disaster robots, rescue robots and SAR (Search and Rescue) robots interchangeably to refer to all kinds of robots designed to assist humans in disaster response efforts, although someone may argue that rescue robots should refer solely to those robotics systems used during rescue operations.

3 4 CHAPTER 1. INTRODUCTION

1.1 Robotic systems that save lives

The size, complexity and dangerousness of many man-made and natural disasters have motivated the development of more intelligent and robust robotics systems able to increase the chances of finding and saving victims and facilitate the post-disaster recovery of the affected areas. The specific nature of the disaster motivates the use of a particular rescue robot over another. Large natural disasters require Unmanned Aerial Vehicles (UAVs)to fly over the affected area and generate high resolution top-view maps, whereas bomb disposal operations require arm equipped remotely operated vehicles (ROVs) to manipulate the explosive from a safe distance. Collapsed buildings are usually hard to traverse for wheeled robots and tracked or legged platforms are largely preferred; floods require Unmanned Surface Vehicles (USVs) to navigate and inspect the disaster zone. A rescue robot can assist a human rescue team by exploring areas difficult to reach, manipulate dangerous substances, provide medical support and remove rubble in what is often a race against time to find survivors.

Figure 1: A squad of unmanned ground and aerial vehicles used in the European Project TRADR.

EM-DAT (the international database of natural disaster) reports that between 1994 and 2013, 6,873 natural disasters were registered worldwide, which claimed 1.34 million lives. From a broader perspective, approximately 218 million people per annum were affected by natural disasters during this period [72]. In 2016, the number of people affected by natural disasters was higher than average reaching 564.4 million [29]. Rescue robots can greatly mitigate the effects of such catastrophic events by supporting rescue teams in the first hours of the crisis (usually victims are most likely to be found alive in the the first 72 hours) and in the aftermath (days or weeks after the event). Disasters have an enormous impact not only in terms of loss of human lives but also on the long term economy of the affected area which can take up to 30 years to recover, resulting in billions of dollars in economic losses [84]. The Centre for Research on the Epidemiology of Disasters (CRED) reports that in 2016 alone, worldwide disasters made US $ 154 billions in economic damages [29]. Slightly reducing the initial disaster response 1.2. VISION IN SAR ROBOTICS 5 time can effectively take down the overall recovery time of the affected region, saving billions of dollars. This further motivates governments to invest generous resources in the development of technologies to assist in disaster response. In Section 2.5 we describe some of the many international research projects and robotics challenges that have been funded in the past two decades.

1.2 Vision in SAR robotics

Many open issues are still present in the field of SAR robotics, spanning from ensuring an easy and effective control mode, to improve tools for human-robot interactions (HRI), to reduce communication losses and power consumption. UGVs must satisfy a large number of strict requirements to be able to operate in harsh environments. They must be sturdy, water resistant and have a communication interface (either tethered or wireless) to ex- change vital information with the base station. Very importantly, they must be equipped with a robust, possibly redundant, visual perception system. This represents a particularly challenging problem in SAR scenarios.

Figure 2: Two examples of problematic environmental conditions that challenge the vision system of mobile robots. In (A) the robot must traverse an obstacle that is occluded by dense smoke. In (B) darkness makes object detection and navigation difficult.

The vast majority of UGVs, autonomous or teleoperated, build their world represen- tation by relying on sensors using visible or infra red light (which we refer to as vision sensors in this thesis). LiDAR (Light Detection And Ranging), RGB cameras, Stereo cam- eras, RGB-D cameras, Omnidirectional cameras and Infrared cameras are excellent sensors capable of feeding the robotic system with a rich flow of information useful for detecting objects, building maps and localizing the robot (SLAM, Simultaneous Localization and Mapping). To build its 3D representation of the surroundings, the SAR robot uses its vision sys- tem to collect multiple observations and combine them into a consistent world map. Such passive observations are obtained over the duration of one or multiple missions [33]. Some- 6 CHAPTER 1. INTRODUCTION times the generated maps are the result of a joint effort of multiple robotics system explor- ing different areas of the same disaster scenario [42]. Following the recent trend of the robotics field, SAR robots are becoming more intelli- gent and reaching higher levels of autonomy. Teleoperation remains by far the most com- monly adopted operational modality, but in order to reduce the cognitive workload of the operator and increase his/her Situational Awareness [35], the rescue robot must show a cer- tain level of autonomy and feature high level functionalities. As an example, an unmanned ground vehicle can autonomously patrol and map a target area allowing the operator to fo- cus on other aspects of the mission rather than manually controlling the movements of the robot’s chassis. Autonomous actions are planned and executed based on the inner world representation that the robot creates based on the analysis and understanding of the envi- ronment and therefore highly dependent on the perception strategy. Basing the decisions making process of the robot on the analysis of an uncontrolled stream of pictures or video (i.e. passive observations) is a common situation in SAR robotics systems. With passive observations we refer to the process of perceiving the environment without purposely change the intrinsic or extrinsic camera parameters. In this thesis we highlight two major limitations of this approach. Firstly, environmental phenomena such as rain, fog, smoke, darkness, dust etc. that largely influence the performance of many vision sensors may generate highly noisy data and potentially lead the robot to take wrong assumptions on the nature of the surroundings. These phenomena are frequently observed in all kinds of rescue scenarios and represent a serious problem to many modern vision algorithm. For example, an algorithm that relies on thermovision for victims detection fails in presence of fire giving false positives or false negatives. Grey dust, which generally covers large part of the interiors of the rubble, hides objects shapes, textures and colors making object detection and mapping difficult. In Figure 2 we show two of the limitations of visual perception in presence of smoke and darkness encountered during a field exercise done in the context of the EU project TRADR [62]. Secondly, implicit geometric properties such as the deformability of the terrain, cannot be deduced from vision sensor data. Neither can non-visual environmental properties such as the wireless distribution of a radio signal, which requires a different sensor systems (i.e. wireless receivers) to be perceived. In our works we use the term geometric representation of the environment to refer to the point cloud representation of the robot surroundings constructed using a mapping algo- rithm, such as the ICP (Iterative Closest Point) [9] on multiple scans [22]. In Section 3.3 we discuss other kinds of world representation.

1.3 Contributions and thesis outline

This thesis focuses primarily on mobile ground vehicles and the enhancement of their visual perception system. We claim, and will demonstrate, that an enhanced multi sensory perception system greatly helps overcoming major open problems of the rescue robotics field. Such assertion is the result of a slow and gradual analysis of the field that moves from 1.3. CONTRIBUTIONS AND THESIS OUTLINE 7 the study of control methodologies to the development of interactive perception strategies for better communication and navigation. The contributions listed at the end of the thesis target four main open issues related to SAR robotics: Control, HRI, Communication and 3D mapping of objects and environment. The diagram in Figure 3 summarizes the listed contributions with respect to the problem they address.

Figure 3: Conceptual organization of our scientific contributions into four categories.

Paper A. In this work we compare through a user study two control modes, the clas- sic TANK control and a computer gaming inspired control mode called Free Look Control (FLC). The control of a multi degree of freedom system is a tedious task that heavily affects the mental workload of an operator [106]. The gaming community has put considerable ef- forts in designing control modes that enhance performance of players in platforms (screen and gamepad) that share many similarities with classic Operator Control Units (OCUs) of rescue robots. We want to investigate if the use of FLC, a control mode used in First Person Shooter games (FPS) that decouples camera translations and rotations, can increase the mission performance and reduce the operator stress. In order to do so, we create sev- eral virtually simulated urban search and rescue (USAR) scenarios and design use cases to systematically compare the two control modes in typical USAR tasks such as exploration and navigation. We then collect quantitative and qualitative data from the simulator and evaluation questionnaires given to volunteers. The results show that FLC has a number of advantages compared to TANK control. Paper B. Wireless communication represents one of the most problematic aspects of any USAR operation. In spite of its obvious advantages compared to tethered communi- cation, wireless connection suffers from continuous drops in signal quality that can result in the robot losing contact with the base station. The signal loss interrupts the information exchange of the robot with its operator and can seriously compromise the outcome of the mission. Regardless of its importance, a very small portion of a typical OCU interface is dedicated to representing the quality of the signal, which is often reported merely in terms of radio signal strength (RSS) perceived at the robot location as shown in Figure 4. In this work we address the problem above by extending an FLC operator interface with a system capable of estimating and representing the direction of arrival of a radio 8 CHAPTER 1. INTRODUCTION signal (DoA). In our approach, the DoA signal, which represents the RSS gradient at the robot location, is robustly extracted using commercial-off-the-shelf directional antennas. This information allows the operator to drive the robot away from areas with low wireless connectivity reducing the risk of signal drops. The graphical interface that shows the DoA is carefully designed to exploit the peripheral vision of the operator which can in this way focus solely on the video stream therefore improving the mission performance.

Figure 4: Classic teleoperation interface showing limited information regarding the robot connec- tivity.

Paper C. The effectiveness of the aforementioned approach, in comparison to the clas- sic representation of the radio signal strength (i.e. intensity bar), is evaluated in this work through a user study. Participants were asked to operate a real on a simulated USAR scenario and perform tasks such as look for symbols and explore the map while avoiding connection loss. The evaluation showed that users were able to significantly in- crease their performance using the new DoA interface and hardware. Paper D. Despite the operator’s skill in driving the UGV in connection-safe areas, un- predictable events such as hardware failures and stochastic elements in radio signal propa- gation remain a concrete risk for communication loss. USAR scenarios are characterized by concrete walls and metallic segments that influence the wireless signal propagation leading to delays and sudden drops in the quality of the signal. It is not uncommon for disconnected USAR robots to be abandoned when this hap- pens, see e.g. the UGV SOLEM, in Figure 5, deployed after the world trade center (WTC) terrorist attack in September 2001 [19]. In less dramatic circumstances, different rescue strategies can be adopted to retrieve the disconnected robot. For instance, the platform may attempt to back track its way to the base station or it can be pulled using cables. Both these solutions are unfruitful if the environment changes, making it impossible for the robot to travel the same path backwards or causing the cable to break (e.g. a fire that propagates into a room, a wall that collapses or a water leakage). Furthermore, the wireless access point can be relocated many times during a mission thwarting the attempt to reestablish connection [61]. In this work we introduce an active perception framework capable of estimating and 1.3. CONTRIBUTIONS AND THESIS OUTLINE 9

Figure 5: (A) The Foster Miller robot SOLEM that was teleoperated (wirelessly) into the WTC on the 9/11 terrorist attack. (B) View from the robot. (Courtesy of the Center for Robot-Assisted Search and Rescue (CRASAR)).

registering the radio signal distribution onto the geometric map and propose a Resilient Communication-Aware Motion Planner (RCAMP) that integrates the wireless distribution with a motion planner. RCAMP enables a self-repair strategy by considering the environ- ment, the physical constraints of the robot, the connectivity and the goal position when driving to a connection-safe position in the event of a communication loss. We demonstrate the effectiveness of the planner in a set of simulations in single or multi- channel communication scenarios. Paper E. The use of different sensory signals not only permits obtaining a more feature- rich world representation but also improving the geometric map built from visual analysis. In this work we propose an online active perception framework for surface exploration that allows building a compact 3D representation of the environment surrounding a robot. The use of tactile signals enables the robot to refine the geometric map (point cloud) of the environment, which may contain occlusions or incomplete areas. A mobile robot au- tonomously identifies regions of interest in its proximity and uses its , equipped with force sensors, to touch the surface and obtain tactile sensory data. The new informa- tion is used to train probabilistic models and reconstruct the missing information in the geometric map. A challenging aspect of any interactive perception framework is the diffi- culty of defining a proper perceptive strategy, that in our case translates to how, where and, if ever, when to perform the tactile exploration. In SAR operations, reducing the number of physical interactions with the environment may determine the successful outcome of the mission. Too many interactions may take too long time and consume too much energy. The novelty of the method is its ability to quickly detect incomplete regions in the point cloud and limit the number of environmental interactions needed, while still ensuring a satisfactory reconstruction of the target area. An example of usage of the proposed sys- tem on a real scenario is shown in Figure 2(A) where the robot may activate its mobile arm to investigate the obstacle occluded by the thick smoke and add its estimated shape to the world map. We then proceed to demonstrate how to apply this online probabilistic framework to object detection and terrain classification problems. 10 CHAPTER 1. INTRODUCTION

Paper F. The ability to model heterogeneous elastic surfaces is of great importance for both navigation and manipulation. A robot that is able to perceive the deformability distribution of a surface can decide whether to traverse a terrain or not and whether to manipulate objects on a surface or not. Exploring and modeling heterogeneous elastic sur- faces requires multiple interactions with the environment and a complicated selection of physical material parameters, which is commonly done through a set of passive observa- tions using computationally expensive force-based simulators. In this work we present an online probabilistic framework for estimation of heterogeneous deformability distribution maps that allows a mobile robot to autonomously model the elas- tic behavior of a portion of the environment from few interactions. This work demonstrates how to extract useful implicit environmental properties and merge them into a geometric map. We show experimental results using a robotic arm equipped with tactile sensors investigating different deformable surfaces. Paper G. Strategies of interactive perception are not only useful for map enhancement but also for object reconstruction. Mapping and object reconstruction methods commonly coexist into two separated pipelines or processes. The robot explores the area and col- lects observations that are merged together to create the geometric map. Object discovery and reconstruction is usually done offline after the map has been created. Following this approach, objects of interest may appear as part of the static world map and an offline pruning operations has to be done in order to separate them and clean the map. Even more problematic is the ghost effect created by undetected moving objects. In this work we propose a technique for simultaneous 3D reconstruction of static regions and rigidly mov- ing objects in a scene. The system is based on a multi-group RANSAC-based registration approach [39] to separate input features into two maps (object map and static map) that we grow separately. The system is designed to overcome situations where large scenes are dominated by static regions, making object tracking more difficult and where moving objects have larger pose variation between frames compared to the static regions. The approach allows simultaneous mapping and on-the-fly object model generation, en- abling an arm equipped UGV to create models of objects of interests (e.g. samples, valves etc) through physical interaction while mapping the environment. Chapter 2

Disaster Robotics

In this chapter we provide a brief overview of the field of Search & Rescue Robotics. Several surveys and books have been written that cover the topic extensively [74, 84, 85, 107]. It is outside the scope of this thesis to provide a similar level of detail, instead we aim to give the reader a basic understanding of the scenarios, hardware and challenges involved in this broad and fascinating scientific field that strongly motivated our research efforts. We will put particular emphasis on the analysis of ground vehicles although some of the technologies introduced in this thesis can be applied to other robotics systems too.

Figure 1: (A) A Thermite RS1-T3 Robotic Fire Fighter used during USAR operations (Courtesy of Howe & Howe Technologies). (B) Italian Rescuers (Vigili del Fuoco - VVF). (C) An Italian canine rescue unit.

2.1 Overview

Search & Rescue robotics is still a fairly new area of research which has made significant progresses since its first steps in the late ’90s. A common misconception, likely enforced by modern sci-fi pop culture, pictures the rescue robot as a fully autonomous multi-functional system, capable of substituting human

11 12 CHAPTER 2. DISASTER ROBOTICS and canine rescue teams in any circumstance. The reality is quite different. A disaster robot is not meant to replace human or canine rescuers but to complement their set of skills and help them to deal with complex and perilous scenarios. Rescue robots do not tire and operate uninterruptedly for a long time, bearing high heat, radiation and toxic atmospheres, operating in areas difficult to reach for humans or dogs. Most importantly, they are expendable. The range of applications where rescue robots can be used is very broad. Typically, disaster scenarios can be subsumed under two broad categories: Natural disasters such as earthquakes, tsunamis, hurricanes or floods are major and devastating consequences of natural planetary processes. The impact of climate change and irresponsible urbanization is alarmingly amplifying the magnitude of these catastrophic events [55], making the intervention of human rescue teams more difficult and sometimes prohibitive. In 2017 alone, three major hurricanes (Harvey, Irma and Maria in South- ern US and Caribbean), over 1500 earthquakes above magnitude 5.0 (Italy, Iraq, Mexico, Costa Rica, etc.), multiple floodings and landslides (Philippines, Sierra Leone, Colombia, Bangladesh, etc.) have caused billions of dollars in damages, millions of evacuees and tens of thousands of deaths. Man-made disasters, contrarily to natural disasters, usually occurs in smaller scales. Typical examples are terrorist attacks, mining accidents, bomb disarming operations, leak- ages of toxic or radioactive substances, explosions or any other kind of industrial acci- dent. Man-made disasters are more common on urban areas leading to the creation of the acronym USAR (Urban Search & Rescue). Collapsed structures increase the complexity of the environment making human or canine intervention arduous ( Figure 1(B-C)). In USAR scenarios, the presence of close and potentially live power lines facilitates the set-up of a base camp, which is essential for prompt logistics. Figure 2 illustrates an example of a German USAR simulation area where firefighters recreate a small scale man-made disaster (industrial accident) to be used for rescue robotics research.

Figure 2: Example of urban industrial scenario used for training (old furnace in Dortmund, Ger- many). The complexity of the structure makes mapping difficult and the metallic components of the old furnace attenuate the wireless signal propagation. 2.2. HISTORICAL PERSPECTIVE 13

2.2 Historical Perspective

In the ’80s the scientific robotics community moved its focus towards the study of more complex systems, capable of performing intelligent tasks in seemingly unconstrained en- vironments. This new paradigm extends the classic definition of , designed to help humans on the so-called dirty, dull and dangerous jobs. The possibility of having robotic systems capable of working outside the ad-hoc designed assembly line of a factory broaden the spectrum of applications the robot could be used in. In the same years, re- searchers started considering the use of mobile platforms for emergency situations includ- ing remote surveillance and firefighting. One of the first recorded cases of robots used for disaster response dates back to the 1979 where three robots built by researchers at Carnegie Mellon University (CMU) led by Professor William L. Whittaker, helped to clean up and survey a meltdown at the Three Mile Island (TMI) nuclear power plant in Pennsylvania. The robots built at CMU required multiple operators; they were big, heavy, and outfitted with lights, cameras and tools for handling radioactive materials. Later models kept this distinctive bulky design which made them unable to explore small voids and navigate the rubbles without the risk of causing more collapse [84]. It was only in 1995, when research into search and rescue robotics started at Kobe Univer- sity in Japan. Experience from successive real deployments [70] made it clear that robots needed to be redesigned to avoid them aggravating the situation, causing progressive col- lapse of unstable structures or triggering explosions on gas saturated spaces. The ideal rescue robot needed to be smaller, easier to operate, more agile and have a certain degree of autonomy [81]. The first time a SAR robot was used in a real operation was 2001, in response to the terrorist attack at the World Trade Center in NY, US. Two civil planes were crashed into the North and South towers, causing severe structural damages to the buildings. Within two hours, the floors of the towers begun to pancake down on one another creating extremely difficult operational conditions for the rescuers and killing thousands. A team of researcher from the Center for Robot-Assisted Search and Rescue (CRASAR) led by Professor Robin Murphy, deployed ground robots to assist rescuers in detecting survivors. Although no survivors were found, the mission was considered a success as the robot extended the senses of the responders far beyond the reach of the classic search cameras and managed to find 10 sets of human remains [19].

“The effectiveness of robots for man-made disaster stems from their potential to extend the senses of the responders into the interior of the rubble.” [84].

The evolution of SAR robots not only touches the mechanical design of the robot and its sensors, but also its operational architecture. Initial deployment of rescue robots included one or multiple operators piloting a single UGV in order to gather information from a safe distance. A team leader would assign to team members one or multiple micro tasks (e.g. collect a radioactive sample). As of 2018, disaster robotics is following the modern trend of robotics in moving from a task-specific perspective to a human-centric perspective, where one or multiple robots 14 CHAPTER 2. DISASTER ROBOTICS collaborate with human teams to gradually develop their understanding of the disaster sce- nario [60].

Figure 3: UGVs and UAVs inspect an old church in Amatrice (Italy), severely damaged after a 6.3 magnitude earthquake in 2016.

Example of real use of this operational architecture is the recent deployment of rescue UGVs and UAVs in August 2016, when a strong earthquake of Magnitued 6.2 struck the Italian city of Amatrice causing severe damage to the old town and the death of 299 peo- ple. Two ground robots and an aerial vehicle were sent to assist the Italian Vigili Del Fuoco (VVFF), generating 3D models of the Basilica of Saint Francis of Assisi and the Church of Sant’Agostino [61]. The 3D models were then used to asses the structural stability of the old buildings, and several cracks were found. During the mission the robots suffered from communication issues. The access point, used to exchange information between the base station and the robots, had to be relocated multiple times around the church to ensure good connectivity due to the thick stone walls of the structure. This more intricate conceptualization is possible only thanks to recent advances in commu- nication technologies, AI, and to the increased computational power which allows robot to perform more complex tasks in a unsupervised or semi unsupervised fashion. The advance in AI raises numerous moral questions and concerns [31], naturally con- textualized in SAR robotics. The ethical aspect is in fact becoming an important aspect of this field. In this regard, Winfield et al. in [119] investigated the consequences of introduc- ing ethical action selection mechanisms in robots. In their experiments, the authors used hockey-puck-sized mobile robots, two of them representing humans in danger (the robots were dangerously wandering on a surface filled with holes) and one being the rescue robot (the ethical machine) in charge of saving them. The rescuer could perceive the environment (e.g. position of the holes) and predict the actions of other dynamic actors; The robot was also forced to follow a minimalist ethic role analogous to Asimov’s first law. After many runs the experiments showed that the rescue robot often successfully managed to save one or both human agents. This oversimplified cognitive design, though, made the robot wan- der in confusion 50% of the times when both robots were simultaneously in danger, leading to the questions: “Who should the robot prioritize? And in what circumstances?” 2.3. TYPES OF RESCUE ROBOTS 15

2.3 Types of Rescue Robots

Several taxonomies have been proposed to categorize disaster robots. It is possible to classify rescue robots according to their size (e.g. packable robots), operational mode (e.g. aerial robots) or specifics of the task (e.g. explorers). A widely accepted nomen- clature distinguishes between: Unmanned Ground Vehicles (UGVs), Unmanned Aerial Vehicles (UAVs), Unmanned Undersea or Underwater Vehicles (UUVs) and Unmanned Surface Vehicles (USVs). Sometimes UUVs are also called Autonomous Underwater Ve- hicles (AUVs). Sub classes exist within each category, for instance underwater gliders are particular kinds of UUVs that change their buoyancy to navigate the water and Remote Operate Vehicles (ROVs) are the teleoperated-only subclass of UGVs (not to be confused with the Remotely Operated Underwater Vehicle ROUV, sometimes also called ROV). Inspired by nature, rescue robots can have peculiar shapes which facilitate specific tasks, such as the snake robot [120] in Figure 4 capable of infiltrating rubble and providing an immediate video feed to the operator.

Figure 4: CMU Snake robot sent to assist rescuers days after a 7.1-magnitude earthquake struck Mexico City in 2017. (Courtesy of CMU Robotics, photo by Evan Ackerman)

Unmanned Ground Vehicles (UGVs) are commonly used to navigate the environment in search of victims or hazardous goods. They can have different sizes, be packable or very large and are used both in the first intervention phase (e.g. extinguish a fire, search for survivors, sample collection), and on the post-disaster phase (e.g. structural inspections or a victim’s body recovery). A vision sensor, such as a thermo camera, a laser scanner or an RGB camera, is always present on-board to allow the robot to navigate the environment and provide visual feedback to the operator. UGVs can be autonomous or teleoperated, work alone or in squads [8], and their operational time can extend over single or multiple missions [62]. Connection with the users, be it tethered or wireless, is often a problematic aspect of any UGV rescue operation as we discuss in the papers B, C, D. A large number of UGVs, as the one in Figure 5 are equipped with a robotic manipulator that allows them to perform physical tasks such as collecting samples, close valves or remove debris. 16 CHAPTER 2. DISASTER ROBOTICS

Figure 5: A PackBot UGV inspects a barrel containing dangerous chemicals. (Courtesy of Endeavor Robotics, formerly known as the Defense & Security unit at iRobot)

Unmanned Aerial Vehicles (UAVs) are revolutionizing SAR operational strategies by providing the rescuer units the possibility to consult updated top-view maps of the hot zone. The operator can manually set way points or specify a geographical region that the unmanned aerial vehicle, as the one in Figure 6, will visit and map. UAVs can also be operated close to unstable structures so to obtain a close look to areas difficult to access otherwise. Larger categories of UAVs are used to study severe meteorological phenomena such as hurricanes. Their sophisticated sensory system collects atmospheric data which permits generating reliable mathematical models that provide precious hints about the future trajectory and severity of the storm. This technology enables authorities to prepare for large scale evacuations or find the best option of disaster response.

Figure 6: AscTec Falcon UAV used in the EU project TRADR.

Unmanned Underwater Vehicles (UUVs) are able to substantially speed up under- water search operations. Their operational time and reach are considerably higher than the one of a scuba diver [121]. Visibility in turbid water and underwater darkness remain one of the major challenges for this kind of rescue robots. Remotely operated UUVs have been 2.4. THE USER’S PERSPECTIVE 17 enormously useful in containing environmental disasters, such as the 2010 oil spill in the Gulf of Mexico [7]. They were used to turn valves and connect pipes in extremely deep waters where no scuba diver could ever venture.

Figure 7: A Double Eagle SAROV AUV. (Courtesy of SAAB)

Unmanned Surface Vehicles (USVs) have an important role in assisting rescue teams on locations situated in proximity to water and during floods. They are particularly useful in the inspection of damaged bridges or piers as they allow structural engineers to quickly obtain a detailed look of the affected structure without endangering human rescuers [82]. Their use can however be difficult in the presence of swift waters.

2.4 The User’s Perspective

The cognitive workload of the operator defines the mental effort needed to operate and supervise the UGVs during the lifetime of the mission [106]. Studies have shown that a good portion of the rescue mission is used by the operator to navigate the enviroment [14]. The design of proper USAR HRI strategies [102], including interfaces and control systems, allows an operator to focus more on the resolution of the specific situation rather than the . Reduction of the operator’s cognitive workload can be achieved in different ways and reach different level of complexity. Low level control of the robot (e.g. direct control of the manipulator or tracks) can be improved by introducing new control paradigms. Controlling a multi degree of freedom system is in fact a complex task which requires dexterity and concentration [53], [102]. In response, authors in [128] propose to use a reinforcement learning framework to train a system to autonomously control parts of a UGV, reducing its teleoperation to mere di- rectional control. A relatively new trend in SAR robotics is to use computer game control modes adapted to SAR scenarios [75] [26]. This approach brings several benefits such as allowing an operator to familiarize faster with the robot movements. Gaming control modes are in fact designed for the general public and are less cumbersome to use compared to classic control modes as we discuss in paper A. 18 CHAPTER 2. DISASTER ROBOTICS

Design of proper user interfaces (UI) facilitates control of the UGV and simplifies the way information is presented to the user. In paper B and C we put great emphasis in the user evaluation and in the design of an interface that conveys readable information to the robot operator.

Figure 8: Teleoperation in line of sight (LOS) with a Thermite RS1-T3 Robotic Fire Fighter during a USAR operations. (Courtesy of Howe & Howe Technologies).

High level functionalities such as automatic victim detection or autonomous patrolling and mapping of target areas lower the number of simultaneous tasks the user needs to perform and therefore his/her cognitive workload. Full autonomy is the ultimate condition for a system, that enables an operator to entirely focus on the mission rather than the robot. The increasing level of autonomy brings to the table new challenges and ethical questions. For instance, the design of a proper AI that instead of using a first-come, first-served role makes decision based on the specifics of the scenario, to prioritize victims to be saved.

2.5 Notable SAR Research Projects, Groups and Challenges

A large variety of research projects have been funded in the past years focusing on differ- ent aspects of rescue robotics. A notable example is the EU project TRADR (Long-Term Human-Robot Teaming for Disaster Response) [62]. Started in 2014 as a continuation of the EU project NIFTi, TRADR aims to develop science and technologies for teams of hu- mans and robots that collaborate over multiple sorties in urban rescue scenarios. Missions in TRADR have a time span of days or weeks and reach a high level of complexity. The novelty of this project is its human oriented perspective, where teams of robots (UAVs and UGVs) jointly build interactive maps that can be easily consulted by human rescue teams. Similarly, the EU project ICARUS [50] focuses on building tools for detection and localization of humans in disaster scenarios aiming to fill the gap between technologies de- veloped in research labs and usage of such technologies during emergency situations and crisis. Project SHERPA [71], contrarily to the previous ones, focuses mostly on UAVs in 2.6. SAR PROBLEMS ADDRESSED IN THIS THESIS 19 alpine rescuing scenarios. In SHERPA the human and the robot work in the same team ex- ploiting each others individual features and capabilities towards a common goal. Different in its concept is the project CENTAURO [20] which aims to develop a Centaur-like robot tele-operated by a human operator. This human-robot symbiotic system should be able to perform dexterous manipulation (e.g. as closing valves) in austere conditions. An historical and important contributor to the the development of SAR technologies in the past decades is the Center for Robot-Assisted Search and Rescue (CRASAR) [21] in Texas, US. CRASAR has been actively supporting international rescue teams on a mul- titude of real operations which include mining accidents, terrorist attacks, hurricanes and floods. CRASAR was also part of the Rescue Robot for Research and Response Program (R4) sponsored by the National Science Foundation’s Computer and Information Science and Engineering (CISE) that aimed to facilitate research in SAR robotics by making avail- able to researchers expensive equipment and realistic datasets [83]. Other notable projects linked to SAR robotics are GUARDIANS [45], SmokeBot [105] and COCORO [23]. In 2002 The Ministry of Education, Culture, Sport, Science and Tech- nology in Japan promoted the Special Project for Earthquake Disaster Mitigation in Urban Areas that aimed to significantly mitigate the impact that a strong earthquake could have on big cities, such as the Tokyo metropolitan area. The notorious DARPA Robotic Challange (DRC) [27] was an international Robotics Challenge aimed to accelerate R&D of robots that could help humans in disaster response missions. It was funded after the accident at the Fukushima Daiichi nuclear disaster in 2011 as a wake up call for the robotics community. Robots scored points by performing an intrinsic sequence of tasks in a simulated industrial disaster scenario. Another important SAR robotics challenge is the ERL (European Robotics League) Emer- gency Robots, previously known as euRathlon [36]. This challenge was also inspired by the Fukushima accident in 2011 and focuses mostly on cooperative teams of UAVs, UGVs and UUVs to survey the simulation area, collect data and identify critical hazards. Finally, the RoboCup Rescue League [99] is one of the oldest rescue robots challenges that aims to provide researchers with standardized environments and benchmarks for urban rescue robots. It was created in the aftermath of the disastrous Kobe earthquake that hit Japan in 2001 and consists of two main subcategories: the Rescue Robot League for mobile robots that search and map a physically simulated rescue environment, and the Rescue Robot Simulation League that tackles the problem of multi-robot collaboration on large virtually simulated disaster scenarios. There is still a need for proper benchmark protocols in this field, but this issue will not be addressed in this thesis.

2.6 SAR Problems Addressed in This Thesis

The lack of proper evaluation methodologies and the limited number of real deployments have been making it difficult to identify the foremost causes of failure in SAR missions. Authors in [19, 61, 107], highlighted some of the open issues encountered during real deployment of robots in rescue operations. 20 CHAPTER 2. DISASTER ROBOTICS

Rescue robots in general are very complex systems that require careful design [81]. The spectrum of problems that follows the creation and adoption of SAR robots into real applications is very broad and touching many interconnected fields. For instance, from the mechanical point of view, the robot must ensure great reliability and be sturdy enough to bear adverse environmental conditions and abrasive substances. At the same time the robot must be sufficiently small and agile to improve its mobility in constrained areas and small voids in the rubble. Depending on the mission requirements, the robot should also be water resistant and shielded from radiation or very high temperatures. Its sensory system must be durable and able to face extreme weather conditions, extreme temperatures, water, dust and dirt.

Figure 9: (A) USAR Training facility in Rotterdam, NL. (B) OCUs for multi-robot USAR missions. (C) Robot collaborating on a simulation scenario.

In this thesis we tackle mainly the problems of Control, Communication, HRI and Mapping. Compelling discussions regarding other fundamental problems are carefully investigated in [81], [85].

2.6.1 Control

The teleoperation strategy of a rescue robot has deep implications on the outcome on the SAR mission. The design of a proper control mode, which is frequently dictated by the mission requirements and the robot mechanical complexity (e.g. tracked versus wheeled platforms or the presence of a manipulator), must ensure reactive maneuverability of the UGV with minimum operator’s cognitive effort. This means that when the mission de- mands rapid intervention and maneuverability in contained areas, the operator must intu- itively perform the correct sequence of commands that leads the robot to overcome the situation. The effectiveness of the control layer relies on a closed loop architecture that in- cludes the robot sensory system and actuators, the communication means and the operator control unit (OCU). Teleoperation is also heavily affected by other environmental phenomena such as the wireless connectivity that causes strong delays on the video feed, hampering the manuvra- bility of the robot and stressing the operator [122]. 2.6. SAR PROBLEMS ADDRESSED IN THIS THESIS 21

2.6.2 Communication

The robot Solem, depicted in Figure 5, is a good example that shows how important it is to ensure good connectivity during a SAR mission. The UGV lost wireless communication while exploring the dense rubble of the WTC terrorist attack. The communication loss interrupted the video feed, preventing the possibility of remote control. Rescuers unsuc- cessfully tried to retrieved the robot from the rubble through a safety rope that broke. A more recent example of the importance of communication happened during the Fukushima Daichii nuclear disaster inspection in 2011. In order to protect the nuclear power plant personnel from radioactive rays, the building was shielded with thick walls which made it difficult to guarantee a robust propagation of radio waves for transmission. The problem of communication was then extensively examined by a team of engineers from TEPCO (Tokyo Electric Power Company Holdings) which evaluated the feasibility of the use of wireless connectivity for their Quince rescue robot [87]. In the experiments, 2.4 GHz transmitters and receivers where placed at different location in order to assess the signal quality. Results of the studies suggested the adoption of a hybrid thetered/wireless communication architecture. Nevertheless, the communication cable broke during one of the missions, and the robot Quince was abandoned on the third floor of the Fukushima reac- tor building [123]. In many cases, the use of wireless mesh networks [100], for multi-robot teleoperation is considered to be a promising direction to ensure reliable remote control in GPS and wireless denied environments. In these architectures the robots behave as mobile beacons and extend the signal propagation in poorly reached areas. Continuous relocation of the access point represents another important issue related to communication, as experienced in the USAR robots deployment in Amatrice (IT), de- scribed in Section 2.2.

2.6.3 Human-robot interaction

Human-robot interaction (HRI) in SAR robotics aims to reduce the workload and stress the robot operator is subjected to during a mission [102]. The strict time constraints, the difficulty in operating the robot, the struggle of the life-saving urgency and the chaotic inherent characteristic of the hot zone, heavily impact the performance of the operator, who is likely to make poor choices and errors. A proper user interface (UI) must coher- ently and intuitively show world information to the operator. Some of the robots brought by CRASAR at the WTC were not used because their interfaces were too complex [19]. The UI should be easy to learn so to speed up the operator training time and facilitate an immidiate deployment of the UGV. A TEPCO spokesman reported that it took several weeks for crews to learn how to operate the complex devices and this delayed the mission start considerably [109]. The design of user interfaces for single or multi robot operation, as the one showed in Figure 10(C), and the study of user-robot interactions are therefore important problems that we consider and discuss in our works. 22 CHAPTER 2. DISASTER ROBOTICS

Figure 10: (A) TRADR UGVs patroling and mapping an industrial disaster zone. (B) TRADR UAV providing indoor 3rd person view to the UGV. (C) Collaborative Mapping and Patrolling interface for human supervision.

2.6.4 Mapping The complexity of a typical SAR scenario raises numerous questions regarding the effec- tiveness of common mapping algorithms. Highly complex structures as the one showed in Figure 9(A) or in Figure 2 require careful navigation and high resolution mapping which is achieved through continuous scanning using high performance sensory systems such as LiDAR. Mapping unstructured environments can be even harder and proper mapping strategies must be adopted. Several solutions have been proposed to speed up the map- ping process during SAR operations such as collaborative mapping [77] (shown in Fig- ure 10(A)) or aerial mapping [78], [40]. The problem of enriching geometrical maps integrating other sensor data, which we cover in this thesis, tackles the mapping problem from a transversal perspective as we will discuss in the next chapter. Chapter 3

Enhancing perception capabilities

3.1 Visual Perception System

Most of the robots designed to interact with the environment are equipped with a vision system capable of generating a descriptive representation of the surrounding. The need of advanced visual perception systems became imperative when robots started moving away from the controlled environment of an industrial production line. As of 2018, vision remains by far the most studied and effective method of perception for a robot that wants to map and investigate the world. More recently, researches have started investigating ways of enhancing vision through different sensory signals. For instance, sound has been demostrated to enhance visual perception [115] in humans as well as in robots [114]. Authors in [104] proposed a method to enhance object recognition by letting a robot observe the changes in its proprioceptive and auditory sensory streams while performing five exploratory behaviors (namely lift, shake, drop, crush, and push objects) on several different objects.

3.1.1 RGBD cameras and LiDARs A sensor capable of estimating the true 3D geometry of the environment opens the doors to a variety or robot applications such as navigation, mapping and object reconstruction. Laser scanners have been the de-facto standard solution for 3D sensing in mobile robots in the last decades. A LiDAR sensor is generally composed of three main parts: a laser, a photodetector and optics [57]. The sensor emits a pulsing laser beam directed towards a moving mir- ror that steers the beam in the direction of the scene. Measurements on the laser return time and wavelength permits the sensor to calculate the distance of the hit points [116] and consequentially obtain the object’s point cloud representation. Modern LiDARs can reach sub-millimeter precision at long range, making them widely used for building high- resolution maps. They are however rather expensive and do not capture the colors of the reconstructed surfaces. Their high scanning frequency (30-100 Hz) and the density of the

23 24 CHAPTER 3. ENHANCING PERCEPTION CAPABILITIES output data (30,000-180,000 points/scan) make them ideal for modern autonomous driv- ing applications [126], [65], [13]. The increasing availability of cheap time of flight (ToF) and structured lights cam- eras [125] in the last years made possible the use of 3D sensors in virtually any robotic application. Initially developed as a gaming interface, the Kinect [59] is one of the most successful examples. A structured light camera uses an infrared (IR) projector to produce a pattern of light (usually parallel stripes) that are shot onto a 3D surface which distorts the pattern depending on the geometry of the scene and the perceptive perspective. A monochrome CMOS sensor, conveniently placed on the side of the projector, analyzes the distortions of the light pattern and uses geometric feature matching to compute the point cloud representation of the object’s surface or landscape. This method is very versatile and fast, and permits capturing the color of the 3D surfaces, if an RGB camera, pointing at the same scene, is properly integrated. The accuracy of Kinect-like cameras for indoor mapping applications are evaluated in [56].

3.1.2 Limitation of vision sensors Sensors that rely on light propagation and reflection to compute the underlying 3D surface of a scene are affected by several phenomena that hamper their capability to extrapolate distances. Figure 1 illustrates some of the most common illumination problems that affect the performance of structured-lights based 3D sensors. Volumetric scattering is generally caused by phenomenons such as thin fog or smoke. The scattered rays produce shafts of light and volumetric shadows cast from geometric objects in the scene. These effects are observed by the sensor and generate artifacts in the 3D reconstruction. The sub-surface scattering, result of the characteristics of the material, is another factor that yields sys- tematic distortions in the depth map. Translucent objects severely degrade the vision sen- sor performance as observed by authors in [66] who proposed a novel 3D reconstruction method based on the high dynamic range imaging techniques [97], [30]. The same problem was studied by the authors in [103] who developed a distance representation that captures the depth distortion induced as a result of translucency for time of flight cameras. To deal with global illumination caused errors, researchers in [46] proposed novel re- silient structured light patterns that use simple logical operations and tools from combina- torial mathematics. Reflective surfaces, such as mirror-type objects, are particularly difficult to detect and reconstruct as they do not possess their own appearance and the reflections from the envi- ronment are view-dependent. Authors in [112] proposed a technique for recovering small to medium scale mirror-type objects. The technique adopts a novel ray coding scheme that uses a two-layer liquid crystal display (LCD) setup to encode the illumination directions. Differently, authors in [67] propose a way to reconstruct specular or transparent objects by using a frequency-based 3D reconstruction approach, which incorporates the frequency- based matting method [127]. The same problem was investigated in [47] and [95]. Other phenomena, such as the vignetting (radial falloff) and exposure (gain) variations caused by the RGB camera, contributes to limiting the quality of the 3D reconstruction and their removal was investigated in [43] and [2]. 3.2. BENEFITS OF ACTIVE AND INTERACTIVE PERCEPTION 25

Figure 1: Unexpected illuminations are source of error for structured lights camera.

In this thesis we take a transversal approach to deal with the limitation of these vi- sion systems and introduce active and interactive perception frameworks to investigate the regions of the scene affected by distortion and occlusion.

3.2 Benefits of Active and Interactive Perception

Computer vision applied to disaster robotics mostly focuses on processing and interpreting static images or video streams to feed the cognitive model of the robot. A cognitive model, as defined in behavioral science and neuroscience, describes how a person’s thoughts and perceptions influence his or her decision making process and consequently his or her life [92]. Humans and Animals, however, build their understanding of the world through non- static observations of the surrounding. Researchers in [37] show compelling evidence that the exploratory process at the very base of the perception mechanisms of all living organisms, including the simplest ones, generates a rich informative set of sensory signals fundamental to the development of a functional and responsive cognitive model. Robots can mimic these mechanisms and use their body to physically interact with the environment in order to generate a more insightful collection of visual sensory signals. Interactive Perception (IP) is the field of robotics that studies ways of purposely interacting with (and possibly modifying) the environment in order to have a better understanding of the robot’s surrounding. The difference between Active Perception (AP) [5] and IP is that AP can be seen as a special case of IP that aims to change only the sensor intrinsics and extrinsics without any physical interaction with the environment. An intuitive example of Interactive Perception is when a subject uses his/her hands to move an object from occluding a second one, hence revealing new unobserved features; Active Perception, instead, is when the subject moves his/her head to change viewpoint and observe the occluded element without altering the state of the scene. When interacting with an object the agent can obtain information not easily or even feasibly achieved through passive observation, such as the object’s weight, material, shape, 26 CHAPTER 3. ENHANCING PERCEPTION CAPABILITIES rigidity and degrees of freedom [88]. In order to be an Interactive or Active Perceiver an agent must know what to perceive and how, when and where to do the perception actions [6]. Authors in [11] present a thorough investigation of IP and its applications. In the works presented in this thesis we show several applications of AP and IP to help robots in SAR tasks. We use an arm equipped mobile robot to haptically explore the terrain surrounding a mobile platform. A probabilistic model, based on Gaussian Process Regression (GPR), is used to detect regions in the map that require a physical investigation. Tactile signals, which are able to enrich the robot’s capabilities in many applications such as manipulation [10], are used to refine the geometric representation of the world built using vision sensors alone (Paper E). A similar approach is used to estimate implicit environmental properties such as the deformability of the terrain (Paper F). In our application AP is not restricted to vision; We use an active perception frame- work to generate wireless distribution maps and guiding a robot inside a disaster scenario (Paper D). The robot generates a new layer of information that is added to the geometric world representation. In Section 3.3 we discuss how to remove ambiguity in the geometric world representation (the map) inferring an expressive probabilistic model continuously trained on active observations.

3.3 Geometric Representations

Object representation in all its flavors (Figure. 2) seeks to provide an informative visual description of the object that preserves its underlying structures and properties. A reason- able reaction is to wonder what is the best representation for a specific problem that makes it more tractable. Ideally, a 3D representation suitable for an interactive perception framework should be: − Well suited for learning; − Flexible, so that it can model a wide variety of shapes; − Geometrically manipulable, therefore deformable, interpolable and convenient to impose structural constraints on;

Different representations, such as the ones illustrated in Figure 2, bring different ad- vantages and disadvantages. For example, volumetric occupancy is optimal to model a large variety of shapes but is unsuitable for learning (expensive to compute) and unfit to flexible geometrical manipulation. In certain research fields, such as , the problem of finding the proper representation becomes less tricky for the 2D case and stan- dards have been released that include a limited number of metric representation for planar environments representation [1]. Recent years have witnessed breathtaking advances in the field of Deep Learning (DL) for analysis of 2D images, audio streams, videos and also 3D geometric shapes [38]. A 3D representation for DL applications must be easily formulated as the output of the neural network ensuring fast forward/backward propagation [94]. Similarly to the work 3.4. A PROBABILISTIC APPROACH TO INTERACTIVE MAPPING 27 in [94], in our frameworks we use point clouds (PC) as geometric representation. A point cloud, although wasteful in terms of memory, is geometrically flexible, can be obtained directly from a variety of sensory systems (e.g. RGB-D camera, LiDARs, Stereo Cameras) and it is compatible with the vast majority of 3D mapping algorithms. Most importantly, PC representation is well suited for learning as it adapts well to the probabilistic models used in our methods. A point in a PC is a feature vector whose elements are the coordinates of the point on a certain reference system. Our frameworks expand the feature vector with properties of the environment under analysis as we discuss in the next section.

Figure 2: Different representations for objects grouped into regular and irregulars grids.(Courtesy of Hao Su, Stanford University)

3.4 A Probabilistic Approach to Interactive Mapping

Particularly suitable for our problem are Gaussian Processes for Regression (GPR) [96] which have been wideley used on surface reconstruction problems [58], [113]. The problem of reconstructing a surface, which we consider as a "compact, connected, orientable twodimensional manifold, possibly with boundary, embedded in R3" (cf. O’Neill [91]) has been thoroughly studied in [49]; GPRs enable us to extend the object’s or terrain’s surface definition to include other environmental properties. In our work we merge multiple sensory signals into probabilistic active and interac- tive perception frameworks using two and three dimensional GPR, known respectively 28 CHAPTER 3. ENHANCING PERCEPTION CAPABILITIES as Gaussian Random Fields (GRF) and Gaussian Process Implicit Surfaces (GPIS); We demonstrate how to exploit GPRs for modeling a surface appearance, its deformability or other environmental properties such as the wireless distribution. We also demonstrate how to obtain compact representations of the environment from sparse point clouds and tactile information.

3.4.1 Gaussian Random Fields

A 2.5D surface can be described with a function f : R2 → R where each vector of xy- coordinates generates a single height. A Gaussian Process Regression shaped over a bi- dimensional Euclidean set is commonly referred as Gaussian Random Field (GRF). GRF are powerful statistical tools able to smoothly fit values ensuring the best linear unbiased prediction (BLPU) as opposed to a common piecewise-polynomial spline. This interpola- tion process is widely used in spatial analysis and geostatistics where it is more frequently known as Kriging, [90]. In its simplest form, Kriging’s prediction in a location s0 can be formulated as a weighted sum of N measurements Z (si) at spatial locations si, where weights φi are computed depending on a fitting model of the measurements.

N ˆ Z (s0) = ∑ φiZ (si) (3.1) i=1 In our work we make use of GRF for terrain reconstruction (Paper E), terrain deforma- bility (Paper F) and wifi distribution (Paper D) estimation. The methodology used to ex- ploit GRF varies sensibly between different papers. We leave details about the mathemati- cal formulation of GRF to the attached papers and describe here only the data sets used to define our perception strategies. 3 Given a set of observation points (i.e. point cloud) PS = {p1,p2 ...pN},pi ∈ R , we can N 2 define a training set DRF = {xi,yi}i=1 where xi ∈ X ⊂ R are the xy-coordinates of the 1 points in PS . Depending on the application, yi contains the environmental property of interest measured or estimated at the location xi, namely: In paper E, yi contains the z-coordinate (height); In paper F, yi contains the deformability parameter (β); In paper D, yi contains the radio signal strength (RSS); Note that the formulation of f does not allow us to model complex terrains shapes which require multiple heights for a single xy-coordinate such as in the example shown in Figure 3. GRF allow predicting the behavior of a 2.5D surface on unobserved regions of the space.

3.4.2 Gaussian Processes Implicit Surfaces Many applications in computer vision and computer graphics require the definition of curves and surfaces [32] . Implicit surfaces are a popular choice for this because they

1 Axis are described considering the frame represented in Figure 3 3.5. STRATEGIES FOR TRAINING AND INTERACTING 29

Figure 3: (A) Planar cut of a 2.5D surface that has a single height for a coordinate point xi. (B) Planar cut of a 2.5D surface that has multiple heights for a coordinate point xi. are smooth, can be appropriately constrained by known geometry, and require no special treatment for topology changes. In this section we briefly describe Gaussian Process Implicit Surfaces (GPIS) [118] and their ability to model highly complex shapes. GPIS allow to model a function f : R3 → R which supporting points allows defining an implicit surface. Whereas their mathematical formulation mantains the same form as for GRF, the training and test sets change sensibly: N DIS = {xi,yi}i=1 where xi ∈ X ⊆ PS and yi ∈ R are defined as in [118]  = −1, if x is below surface;  i yi = 0, if xi is on the surface; (3.2)  = 1, if xi is above the surface; For clarity, Figure 4 shows example training points, laying on a curve traversing an implicit surface, colored according to Equation 3.2. GPIS do not allow us to directly obtain the shape the surface of interest. Whereas the output of the GRF maps the explicit behavior the 2.5D surface z = f (x,y) , GPIS maps the target set of the implicit function F (x,y.z) not solvable for x,y or z. In order to obtain the surface, we needed to define a large set of 3D test points, e.g. a dense cubic set of points centered on a region of interest, and then find the isosurface of value 0 on the output mean function. This operation is computationally expensive depending on the size of the test set X∗. GPIS however brings several benefits compared to GRF: 1. It allows us to model very complex surfaces (e.g. objects); 2. Training points do not need to lay on the iso surface; Point (2), in particular, plays a significant role in defining the training strategy of our active perception frameworks as described in Paper E.

3.5 Strategies for training and interacting

The problem we addressed in this thesis required the implementation of online interactive perception frameworks with basic mechanisms outlined in this section. The challenges that arise when using GPR in a close-loop online perception framework (Figure 5) are (1) to define the correct training set, which must have a small size; (2) to make it possible to feed and retrain the model with data coming from new environmental 30 CHAPTER 3. ENHANCING PERCEPTION CAPABILITIES

Figure 4: Illustrative representation of above-surface (blue), on-surface (green) and below-surface (red) training points along a trajectory. interactions; and (3) identify unexplored regions on the map that need to be investigated. The problems of power consumption, strict time constraints and risk of causing further collapses, discussed in Chapter 2, force us to limit environmental interactions as much as possible. Defining the most appropriate training data set X is crucial for a correct understand- ing of the environmental properties. Using the entire sensory input as training set is often prohibitive in terms of time and computational power needed and subsets are usually pre- ferred. As for GP specifically, a large set X leads to a large covariance matrix which needs to be inverted [96]. Standard methods for matrix inversion of a n×n positive-definite sym- metric matrix require time On3 making GP unsuitable for online tasks. The application itself helps defining the strategies of training and data collection. For example, modeling an object’s surface with high level of details, requires a dense set of training points and millimetric precision, whereas modeling the radio signal distribution over a large area is a more forgiving application in terms of quantity and quality of training data needed. The probabilistic model used plays a big role too. GRF usually require smaller amounts of training data as opposed to GPIS which need a large number of on and off-surface points. In our work we define different perception strategies that follow the general system outline illustrated in Figure 5. The robot uses vision clues to built the initial geometry of the environment which works as foundation for other sensor data. Looking at the variance of the GPR model, the robot is able to isolate areas on the map that require further investigation. Geometric analysis, done using Delaunay triangulation (the dual of the Voronoi diagram) [41], can also give precious hints regarding the sparsity of the data as shown in our work and in [86]. Measurements are then sequentially collected on the isolated regions by either touching the surface or collecting RSS data 2. After each interaction the training set is shaped so to accommodate the new measurements while preserving the old ones. Alternatively to the closed-loop perception diagram of Figure 5 described above, it is possible to define an open-loop perception strategy where multiple environmental inter-

2A RSS dataset, collected during a real UGV deployment, is available in [93] 3.6. MAPPING THE ENVIRONMENT AND MOVING OBJECTS 31

Figure 5: Generic Active-Interactive perception framework. actions are triggered all at once after all the region of interest are detected and actions selected. Observations made during the interactions are stored into a separated training set which is ultimately fed to the model. This approach allows one to save processing time as the model does not need repeated updating steps after each interaction, but it is less reac- tive to environmental changes and discoveries that could reduce the number of exploration steps required. For instance, a USAR UGV that tries to traverse a rough terrain may decide to poke the terrain surface to estimate its stability. Given the geometric shape of the terrain the robot could decide to touch the environment multiple times. After the first touch a large chunk of terrain moves, giving evident hints regarding the precarious stability condition of the area to be traversed. The robot will continue touching the surface if its cognitive model is not updated after the first interaction. Figure 6 shows results of an active exploration process obtained using one of our frame- works. Here we let the robot autonomously explore the environment. In this case the action selection and region detection steps are triggered and executed by a Communication Aware Motion Planner [79] that would drive the robot in the map according to the predicted qual- ity of the wireless signal (represented as the height and color of the bell overlapping the geometric map) and its uncertainty. This perceptive paradigm shares similarities with a mapping techinque called active SLAM [64] that we will briefly mention in the next sec- tion.

3.6 Mapping the environment and moving objects

Robotic mapping [111] is one of the core concepts of this thesis. More specifically we will briefly discuss Simultaneous Localization and Mapping (SLAM), defined as the problem of combining sensor observations for jointly estimating the map and the robot pose. The term SLAM which was originally affiliated with a family of mapping algorithms based on Kalman filters [76], now describes a large variety of different mapping techniques and it 32 CHAPTER 3. ENHANCING PERCEPTION CAPABILITIES

Figure 6: A virtually simulated UGV explores and maps a disaster area (grey points indicate traversable areas, red points are buildings or obstacles). Simultaneously, the robot estimates the distribution of the wireless signal on the environment and overlaps this information onto the geo- metric map (blue-green points in the wireless bell indicate good signal, orange-red points are poorly connected areas). is often used to refer to robotic mapping in general. Depending on the application and the kind of sensor available, the mapping process can take advantage of different ways to model geometry. Dense Representations attempt to explicitly model the appearance of the surface with high quality and large amount of data. Approaches such as Kinect Fusion [89] or Elastic Fusion [117] have pioneered the field of low-level raw dense mapping using Kinect-like sensors. This techniques generally use point clouds, polygons or others low level geo- metric primitives such as surfels (i.e. small disks) to encode the scene’s geometry. Other dense approaches tend to use function descriptors to generate compact representation of the environment. The aforementioned implicit surface is a popular choice of representa- tions among these techniques. Notable works use distinctive implicit surface descriptive functions such as the signed-distance function [25], the radial-basis functions [16], and the most recently introduced truncated signed-distance function (TSDF) [124]. A more tractable approach to mapping represents the environment as a collection of sparse 3D features in the environment in what is commonly referred to as Landmark-Based Sparse Representations. Often, the large number of landmarks in an environment lowers the performance of the mapping process due to the high computational cost of Kalman 3.6. MAPPING THE ENVIRONMENT AND MOVING OBJECTS 33

filter-based algorithms. To address this problem, authors in [80] propose a Bayesian ap- proach named FastSLAM that scales logarithmically with the number of landmarks in the map. Faster and more robust sparse mapping approaches for 3D cameras were proposed in [108] and [4] where, in contrast to classic approaches that mainly use 3D points for registra- tion, authors also used planes and 2D projected points (i.e. SIFT [69] features computed in the RGB camera frame and projected onto the scene using the camera’s intrinsics and extrinsics) as minimal set of primitives in a RANSAC [39] framework to robustly compute correspondences and estimate the sensor pose. One interesting approach is the object oriented method named SLAM++, proposed by researchers in [101]. Their framework takes advantage of the prior knowledge that observations in an environment present repeated, domain-specific objects and structures that authors exploit to create robust camera-object constraints. The constraints are stored in a graph of objects constantly refined by an efficient pose-graph optimization. More closely related to our work is the set of robotic mapping techniques denoted as active SLAM [64]. Active SLAM investigates how to leverage a robot’s motion to improve the mapping result [110]. The ideas at the base of this mapping problem shares common ancestors with our active and interactive perception frameworks [6], whose decision mak- ing process is based on the predicted map uncertainty [17], [18]. Other notable approaches to active SLAM include the use of Bayesian Optimization [73] and Partially Observable Markov Decision Processes [54]. Researchers have also been focusing on studying the mapping problem from a temporal perspective. For instance, works done in the context of the EU project STRANDS [48] such as [12] and [3], focused on observing the temporal behavior of an environment over time, trying to isolate the static parts of an environment and studying objects’ dynamics and contextualization (i.e. the meaning behind the presence of an object in a specific loca- tion at a specific time). More compelling discussions on robotic SLAM are found in [15] and [34]. At the time of this writing, the robotics mapping literature shows large gaps in aspects related to SLAM for mobile vehicles and novel research questions are arising [15]. Among many, are the problems of SLAM in Resource-Constrained Platforms, Robust Distributed Mapping and Map Representation that closely touch the disaster robotics field. In this regard, part of this thesis focused in developing a features-based mapping method for simultaneous reconstruction of static and moving parts of a scene. The main motivation behind this work is to enable a mobile platform (e.g. a UGV with limited pro- cessing power on-board) to map an area while tracking and modeling rigidly moving ob- jects in an online and unsupervised fashion. Differently from other works [3], [98], [52] we avoid using observations of the envi- ronment taken at different times (i.e. long-term operations) as these may be unavailable in many USAR operations, and grow two different maps for the static scene and the rigidly moving object. We designed our SLAM framework to be well suited for interactive per- ception, thus enabling a mobile manipulator to model objects in the environment (e.g. samples) by interacting with them (i.e. poking and rotating the object) while limiting the risk of contamination of the static scene.

Chapter 4

Conclusions

We have investigated major problems in the search and rescue robotics field and proposed ways of enhancing vision perception through the use of non visual sensory systems and environmental interactions. The ideas at the base of the papers were gradually developed as a result of experience gained during the deployment of robots in real scenarios and as a consequence of several fruitful conversations with the brilliant colleagues and researchers whom I have shared this journey with. The explored fundamental questions are:

• How can a rescue robot cope with the lack of visual sensor information when map- ping an environment? • What are the benefits of the interaction with the environment for a rescue robot? • How do we define the proper interactive perception strategy and how do we represent and exploit the new information?

We started by comparing two different control modes, Free Look Control (FLC) and Tank control and showed that FLC could substantially improve the mission performance reducing the operator’s cognitive load. The outcome of this study led us to implement FLC on the UGV showed in Figure 10 and to use it during real robot deployments. We then proposed a method to estimate the direction of arrival (DoA) of a radio signal and combine it with an FLC interface to reduce the risk of signal loss during remote control of a mobile robot. The effectiveness of this new interface compared to the classic signal strength bar- indicator was later demonstrated through a comparative user study. This line of study gave rise to the introduction of a Resilient Communication Aware Motion Planner that takes into account the environment geometry and a novel signal mapping method to increase the awareness of the robot to the radio signal propagation, allowing it to autonomously navigate in connection-safe position. The decision to focus on the development of frameworks for enhancing geometric maps by means of interactions with the environment was initially inspired by observing the oper- ational mechanisms human responders use to explore the disaster area when poor visibility, caused by smoke or darkness, hinders mobility. In limited vision conditions, rescuers use

35 36 CHAPTER 4. CONCLUSIONS their hands to touch and feel the geometry of the environment, and speak to communi- cate their location. Similarly, a rescue robot could use its manipulator to touch and better perceive the surrounding when environmental conditions weakens its vision system. For this reason we introduced a tactile probabilistic perception framework based on Gaussian Process Regression that enables a rescue robot to cope with the lack of visual information when mapping an environment. Using a fast Position-based Dynamics simulator we redesigned and extended the frame- work to estimate implicit environmental properties such as the terrain deformability. The last contribution of this thesis is a SLAM algorithm that allows a mobile robot to simultaneously map the environment and rigidly moving objects. The algorithms and methods presented define perception systems that:

• enable a mobile robot to map and exploit non-visual information in order to improve the mission outcome. • continuously trade off different sensor modalities, limiting the number of environ- mental interactions as much as possible in favor of fast exploration processes. • efficiently identify regions of interest in the robot’s surrounding that needs to be investigated. • track and model rigidly moving objects while generating a static map of the environ- ment.

With this thesis we aim to raise awareness on the many gaps found in the disaster robotics literature and hope to inspire the reader in proposing novel solutions to address the open problems that still affect this field.

4.1 Future Work

The diversity of the presented contributions gives rise to several roads of development one can pursue. We hereby highlight possible directions of research: Need of benchmarking and datasets - Although some of the robotics challenges and projects discussed in Section 2.5 try to define benchmark protocols for specific rescue scenarios (e.g. a complex sequence of operations that define a rescue mission) the lack of data sets for active perception frameworks in rescue robots makes a fair comparisons with other methods difficult. Deep Learning - The impact of Deep Learning (DL) is revolutionizing the approaches taken towards many Robotics and Computer vision problems and increasingly raising the bar in many low and high level perceptual supervised learning problems, such as visual recognition tasks or speech and video analysis [63], [44]. The use of DL has been extended to reinforcement learning (RL) problems that involve visual perception and to more tradi- tional robotics problem such as SLAM. In this regard, [24] shows how to improve visual odometry by using deep networks to directly estimate the frame-to-frame motion sensed by a moving robot thus bypassing traditional geometric approaches for camera pose estima- tion. Also interesting is the work in [68] that shows how to use deep networks to estimate the depth map from single monocular images. More recently, Deep Learning is making 4.1. FUTURE WORK 37 significant inroads into active perception applications [28]. For instance, [51] shows how it is possible to perform end-to-end learning of motion policies using a Recurrent Neural Network (RNN) that is suited for active recognition tasks. On the other hand, most of these advances require large datasets to be effective and tend to be restricted to the supervised learning setting, which provides clear error signals. These factors effectively limit their usefulness in active perception frameworks for USAR appli- cations. Nevertheless, we strongly believe that this is a promising direction of development for the framework presented in this thesis. DL could be adopted to learn to forecast the effects of the agent’s actions on its internal representation of the environment conditional on all past motions and observations so to define the most effective exploration strategy. Use of different sensors - Other sensor modalities can be adopted to either complement vision or extract further useful environmental properties to enhance the geometric world representation. Multi-robot collaboration - Team of robots can be used to minimize the overall per- ceptual exploration time or to perform more complex interactions, which are able to unveil specific environmental properties. The key problem of collaborative multi-robot perception is the robot task allocation, or in simpler terms, to choose which robot that has to perform a certain action in a designed location. For instance, multiple UGVs can be used to quickly generate the wireless signal distribution on a large map. In case of signal loss, the robots can elaborate sophisticated self re-connection strategies, which see certain robots driv- ing back to designed connection safe locations and working as repeaters for the operating agents. Use of more complex high level actions for tactile exploration - To facilitate the ex- ploration process, we used simple actions such as "poking" and "pushing" the surface. A richer set of actions such as "slide on surface", "rotate the object", or any in-hand ma- nipulation, may generate more informative tactile signals that could potentially reduce the number of interactions even further. A larger action set, however, complicates the action selection process raising the questions: "Which sequence of actions should be selected? When and Where?"

Chapter 5

Summary of Papers

39 40 CHAPTER 5. SUMMARY OF PAPERS

A Free Look UGV Teleoperation Control Tested in Game Environment: Enhanced Performance and Reduced Workload

Abstract. Concurrent telecontrol of the chassis and camera of an Unmanned Ground Vehi- cle (UGV) is a demanding task for Urban Search and Rescue (USAR) teams. The standard way of controlling UGVs is called Tank Control (TC), but there is reason to believe that Free Look Control (FLC), a control mode used in games, could reduce this load substan- tially by decoupling, and providing separate controls for, camera translation and rotation. The general hypothesis is that FLC (1) reduces robot operators’ workload and (2) en- hances their performance for dynamic and time-critical USAR scenarios. A game-based environment was set-up to systematically compare FLC with TC in two typical search and rescue tasks: navigation and exploration. The results show that FLC improves mission performance in both exploration (search) and path following (navigation) scenarios. In the former, more objects were found, and in the latter shorter navigation times were achieved. FLC also caused lower workload and stress levels in both scenarios, without inducing a significant difference in the number of collisions. Finally, FLC was preferred by 75% of the subjects for exploration, and 56% for path following.

Figure 1: The virtually simulated UGV is controlled using two control modes (FLC and Tank control) on exploration and navigation USAR scenarios. B Extending a ugv teleoperation flc interface with wireless network connectivity information

Abstract. Teleoperated Unmanned Ground Vehicles (UGVs) are expected to play an im- portant role in future search and rescue operations. In such tasks, two factors are crucial for a successful mission completion: operator situational awareness and robust network connectivity between operator and UGV. In this paper, we address both these factors by extending a new Free Look Control (FLC) operator interface with a graphical representa- tion of the Radio Signal Strength (RSS) gradient at the UGV location. We also provide a new way of estimating this gradient using multiple receivers with directional antennas. The proposed approach allows the operator to stay focused on the video stream providing the crucial situational awareness, while controlling the UGV to complete the mission without moving into areas with dangerously low wireless connectivity. The approach is implemented on a KUKA youBot using commercial-off-the-shelf components. We provide experimental results showing how the proposed RSS gradient estimation method performs better than a difference approximation using omnidirectional antennas and verify that it is indeed useful for predicting the RSS development along a UGV trajectory. We also evaluate the proposed combined approach in terms of accuracy, precision, sensitivity and specificity.

Figure 2: The mobile robot can perceive the Direction of Arrival (DoA) or a radio signal. The user interface allows to show the DoA in the video feed consistently with the orientation of the camera w.r.t. the world frame. 42 CHAPTER 5. SUMMARY OF PAPERS

C A New UGV Teleoperation Interface for Improved Awareness of Network Connectivity and Physical Surroundings

Abstract. A reliable wireless connection between the operator and the teleoperated un- manned ground vehicle (UGV) is critical in many urban search and rescue (USAR) mis- sions. Unfortunately, as was seen in, for example, the Fukushima nuclear disaster, the networks available in areas where USAR missions take place are often severely limited in range and coverage. Therefore, during mission execution, the operator needs to keep track of not only the physical parts of the mission, such as navigating through an area or search- ing for victims, but also the variations in network connectivity across the environment. In this paper, we propose and evaluate a new teleoperation user interface (UI) that in- cludes a way of estimating the direction of arrival (DoA) of the radio signal strength (RSS) and integrating the DoA information in the interface. The evaluation shows that using the interface results in more objects found, and less aborted missions due to connectivity problems, as compared to a standard interface. The proposed interface is an extension to an existing interface centered on the video stream captured by the UGV. But instead of just showing the network signal strength in terms of percent and a set of bars, the additional information of DoA is added in terms of a color bar surrounding the video feed. With this information, the operator knows what movement directions are safe, even when moving in regions close to the connectivity threshold.

Figure 3: The YouBot is teleoperated into a small map partially covered by a wireless radio signal. Participants of a user study must found symbols and explore the environment without losing signal using different user interfaces. D RCAMP: Resilient Communication-Aware Motion Planner and Autonomous Repair of Wireless Connectivity in Mobile Robots

Abstract. Mobile robots, be it autonomous or teleoperated, require stable communication with the base station to exchange valuable information. Given the stochastic elements in ra- dio signal propagation, such as shadowing and fading, and the possibilities of unpredictable events or hardware failures, communication loss often presents a significant mission risk, both in terms of probability and impact, especially in Urban Search and Rescue (USAR) operations. Depending on the circumstances, disconnected robots are either abandoned, or attempt to autonomously back-trace their way to the base station. Although recent results in Communication-Aware Motion Planning can be used to effectively manage connectiv- ity with robots, there are no results focusing on autonomously re-establishing the wireless connectivity of a mobile robot without back-tracing or using detailed a priori information of the network. In this paper, we present a robust and online radio signal mapping method using Gaussian Random Fields, and propose a Resilient Communication-Aware Motion Plan- ner (RCAMP) that integrates the above signal mapping framework with a motion plan- ner. RCAMP considers both the environment and the physical constraints of the robot, based on the available sensory information. We also propose a self-repair strategy us- ing RCMAP, that takes both connectivity and the goal position into account when driving to a connection-safe position in the event of a communication loss. We demonstrate the proposed planner in a set of realistic simulations of an exploration task in single or multi- channel communication scenarios.

Figure 4: A virtually simulated UGV explores a disaster area and builds its geometric representation. Simultaneously, the system estimates the distribution of the wireless signal on the environment and overlaps this information onto the geometric map. 44 CHAPTER 5. SUMMARY OF PAPERS

E Active Exploration Using Gaussian Random Fields and Gaussian Process Implicit Surfaces

Abstract. In this work we study the problem of exploring surfaces and building compact 3D representations of the environment surrounding a robot through active perception. We propose an online probabilistic framework that merges visual and tactile measurements using Gaussian Random Field and Gaussian Process Implicit Surfaces. The system inves- tigates incomplete point clouds in order to find a small set of regions of interest which are then physically explored with a robotic arm equipped with tactile sensors. We show experimental results obtained using a PrimeSense camera, a Kinova Jaco2 robotic arm and Optoforce sensors on different scenarios. We then demonstrate how to use the online framework for object detection and terrain classification.

Figure 5: The geometric representation of the scenario based on vision alone can lead to ambiguity in case of occlusions (A) or photo metric effects. Physical interactions with the environment allow to refine the representation and remove ambiguity (B). F Active Perception and Modeling of Deformable Surfaces using Gaussian Processes and Position-based Dynamics

Abstract. Exploring and modeling heterogeneous elastic surfaces requires multiple inter- actions with the environment and a complex selection of physical material parameters. The most common approaches model deformable properties from sets of offline observations using computationally expensive force-based simulators. In this work we present an online probabilistic framework for autonomous estimation of a deformability distribution map of heterogeneous elastic surfaces from few physical interactions. The method takes advantage of Gaussian Processes for constructing a model of the environment geometry surrounding a robot. A fast Position-based Dynamics simulator uses focused environmental observations in order to model the elastic behavior of portions of the environment. Gaussian Process Regression maps the local deformability on the whole environment in order to generate a deformability distribution map. We show experimental results using a PrimeSense camera, a Kinova Jaco2 robotic arm and an Optoforce sensor on different deformable surfaces.

Figure 6: Conceptual representation of the deformability analysis of a surface through physical interactions. 46 CHAPTER 5. SUMMARY OF PAPERS

G Joint 3D Reconstruction of a Static Scene and Moving Objects

Abstract. We present a technique for simultaneous 3D reconstruction of static regions and rigidly moving objects in a scene. An RGB-D frame is represented as a collection of features, which are points and planes. We classify the features into static and dynamic regions and grow separate maps, static and object maps, for each of them. To robustly classify the features in each frame, we fuse multiple RANSAC-based registration results obtained by registering different groups of the features to different maps, including (1) all the features to the static map, (2) all the features to each object map, and (3) subsets of the features, each forming a segment, to each object map. This multi-group registration approach is designed to overcome the following challenges: scenes can be dominated by static regions, making object tracking more difficult; and moving object might have larger pose variation between frames compared to the static regions. We show qualitative results from indoor scenes with objects in various shapes. The technique enables on-the-fly object model generation to be used for robotic manipulation.

Figure 7: Example of on-line object reconstruction (C) from environmental observations. The objects is tracked (A) and a feature map is created (B). Simultaneously, the algorithm allows to map the environment and localize both the robot and the object. 47

References

[1] Ieee standard for robot map data representation for navigation. 1873-2015 IEEE Standard for Robot Map Data Representation for Navigation, pages 1–54, Oct 2015. [2] S. V. Alexandrov, J. Prankl, M. Zillich, and M. Vincze. Calibration and correction of vignetting effects with an application to 3d mapping. In 2016 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 4217–4223, Oct 2016. [3] R. Ambrus, J. Ekekrantz, J. Folkesson, and P. Jensfelt. Unsupervised learning of spatial-temporal models of objects in a long-term autonomy scenario. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5678–5685, Sept 2015. [4] E. Ataer-Cansizoglu, Y. Taguchi, and S. Ramalingam. Pinpoint slam: A hybrid of 2d and 3d simultaneous localization and mapping for rgb-d sensors. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1300–1307, May 2016. [5] R. Bajcsy. Active perception. Proceedings of the IEEE, 76(8):966–1005, Aug 1988. [6] Ruzena Bajcsy, Yiannis Aloimonos, and John K. Tsotsos. Revisiting active per- ception. Autonomous Robots, Feb 2017. URL https://doi.org/10.1007/ s10514-017-9615-3. [7] Robot vessels used to cap gulf of mexico oil leak, 2018. URL http://news.bbc. co.uk/2/hi/americas/8643782.stm. "[Online; accessed 23-January-2018]". [8] Zoltan Beck. Collaborative search and rescue by autonomous robots. PhD thesis, University of Southampton, December 2016. URL https://eprints.soton.ac. uk/411031/. [9] Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, volume 1611, pages 586–607. International Society for Optics and Photonics, 1992. [10] A. Bicchi, J. K. Salisbury, and P. Dario. Augmentation of grasp robustness using intrinsic tactile sensing. In Proceedings, 1989 International Conference on Robotics and Automation, pages 302–307 vol.1, May 1989. [11] J. Bohg, K. Hausman, B. Sankaran, O. Brock, D. Kragic, S. Schaal, and G. S. Sukhatme. Interactive perception: Leveraging action in perception and perception in action. IEEE Transactions on Robotics, 33(6):1273–1291, Dec 2017. [12] Nils Bore, Johan Ekekrantz, Patric Jensfelt, and John Folkesson. Detection and tracking of general movable objects in large 3d maps. arXiv preprint arXiv:1712.08409, 2017. 48 BIBLIOGRAPHY

[13] Michael Bosse and Robert Zlot. Continuous 3d scan-matching with a spinning 2d laser. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, pages 4312–4319. IEEE, 2009.

[14] Jennifer L. Burke, Robin R. Murphy, Michael D. Coovert, and Dawn L. Rid- dle. Moonlight in miami: A field study of human-robot interaction in the con- text of an urban search and rescue disaster response training exercise. Hum.- Comput. Interact., 19(1):85–116, June 2004. URL http://dx.doi.org/10. 1207/s15327051hci1901&2_5.

[15] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6):1309– 1332, Dec 2016.

[16] Jonathan C Carr, Richard K Beatson, Jon B Cherrie, Tim J Mitchell, W Richard Fright, Bruce C McCallum, and Tim R Evans. Reconstruction and representation of 3d objects with radial basis functions. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 67–76. ACM, 2001.

[17] H. Carrillo, Y. Latif, J. Neira, and J. A. Castellanos. Fast minimum uncertainty search on a graph map representation. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2504–2511, Oct 2012.

[18] H. Carrillo, I. Reid, and J. A. Castellanos. On the comparison of uncertainty criteria for active slam. In 2012 IEEE International Conference on Robotics and Automa- tion, pages 2080–2087, May 2012.

[19] J. Casper and R. R. Murphy. Human-robot interactions during the robot-assisted urban search and rescue response at the world trade center. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 33(3):367–385, June 2003.

[20] Centauro: Robust mobility and dexterous manipulation in disaster response by fullbody telepresence in a centaur-like robot, 2018. URL http://www. centauro-project.eu/. "[Online; accessed 18-January-2018]".

[21] Center for robot-assisted search and rescue (crasar), 2018. URL http://crasar. org/. "[Online; accessed 18-January-2018]".

[22] Yang Chen and Gérard Medioni. Object modelling by registration of multiple range images. Image and Vision Computing, 10(3):145 – 155, 1992. URL http://www. sciencedirect.com/science/article/pii/026288569290066C. Range Im- age Understanding.

[23] Cocoro: robot swarms use collective cognition to perform tasks, 2018. URL http: //cordis.europa.eu/project/rcn/97473_en.html. "[Online; accessed 18- January-2018]". 49

[24] G. Costante, M. Mancini, P. Valigi, and T. A. Ciarfuglia. Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robotics and Automation Letters, 1(1):18–25, Jan 2016.

[25] Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312. ACM, 1996.

[26] J. d. Leon, M. Garzon, D. Garzon, E. Narvaez, J. d. Cerro, and A. Barrientos. From video games multiple cameras to multi-robot teleoperation in disaster scenarios. In 2016 International Conference on Systems and Competitions (ICARSC), pages 323–328, May 2016.

[27] Darpa robotics challenge (drc), 2018. URL https://www.darpa.mil/program/ darpa-robotics-challenge. "[Online; accessed 18-January-2018]".

[28] Emmanuel Daucé. Toward predictive machine learning for active vision. CoRR, abs/1710.10460, 2017. URL http://arxiv.org/abs/1710.10460.

[29] Pascaline Wallemacq Debarati Guha-Sapir, Philippe Hoyois and Regina Below. Annual disaster statistical review 2016 - the numbers and trends. Technical re- port, Université catholique de Louvain, Brussels, Belgium, 01 2017. URL "http: //emdat.be/sites/default/files/adsr_2016.pdf".

[30] Paul E. Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’97, pages 369–378, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co. ISBN 0-89791-896-7. URL http://dx.doi.org/10.1145/258734.258884.

[31] B. Deng. Machine ethics: The robot’s dilemma. Nature, 523:24–26, jul 2015. URL http://adsabs.harvard.edu/abs/2015Natur.523...24D.

[32] Manfredo P Do Carmo. Differential Geometry of Curves and Surfaces: Revised and Updated Second Edition. Courier Dover Publications, 2016.

[33] R. Dubé, A. Gawel, C. Cadena, R. Siegwart, L. Freda, and M. Gianni. 3d localiza- tion, mapping and path planning for search and rescue operations. In 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 272–273, Oct 2016.

[34] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping: part i. IEEE Robotics Automation Magazine, 13(2):99–110, June 2006.

[35] Mica R. Endsley. Measurement of situation awareness in dynamic sys- tems. Human Factors, 37(1):65–84, 1995. URL https://doi.org/10.1518/ 001872095779049499. 50 BIBLIOGRAPHY

[36] European robotics league: Emergency robots, 2018. URL https://www. eu-robotics.net/robotics_league/. "[Online; accessed 18-January-2018]".

[37] Marc O Ernst and Heinrich H Bülthoff. Merging the senses into a robust percept. Trends in cognitive sciences, 8(4):162–169, 2004.

[38] Haoqiang Fan, Hao Su, and Leonidas Guibas. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017.

[39] Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In Readings in computer vision, pages 726–740. Elsevier, 1987.

[40] C. Forster, M. Pizzoli, and D. Scaramuzza. Air-ground localization and map aug- mentation using monocular dense reconstruction. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3971–3978, Nov 2013.

[41] Steven Fortune. Handbook of discrete and computational geometry. chapter Voronoi Diagrams and Delaunay Triangulations, pages 377–388. CRC Press, Inc., Boca Raton, FL, USA, 1997. ISBN 0-8493-8524-5. URL http://dl.acm.org/ citation.cfm?id=285869.285891.

[42] D. Fox, J. Ko, K. Konolige, B. Limketkai, D. Schulz, and B. Stewart. Distributed multirobot exploration and mapping. Proceedings of the IEEE, 94(7):1325–1339, July 2006.

[43] D. B. Goldman and Jiun-Hung Chen. Vignette and exposure calibration and com- pensation. In Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, volume 1, pages 899–906 Vol. 1, Oct 2005.

[44] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[45] Guardians: Group of unmanned assistant robots deployed in aggregative navigation supported by scent detection, 2018. URL http://cordis.europa.eu/project/ rcn/80210_en.html. "[Online; accessed 18-January-2018]".

[46] M. Gupta, A. Agrawal, A. Veeraraghavan, and S. G. Narasimhan. Structured light 3d scanning in the presence of global illumination. In CVPR 2011, pages 713–720, June 2011.

[47] K. Han, K. Y. K. Wong, and M. Liu. A fixed viewpoint approach for dense recon- struction of transparent objects. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4001–4008, June 2015. 51

[48] N. Hawes, C. Burbridge, F. Jovan, L. Kunze, B. Lacerda, L. Mudrova, J. Young, J. Wyatt, D. Hebesberger, T. Kortner, R. Ambrus, N. Bore, J. Folkesson, P. Jensfelt, L. Beyer, A. Hermans, B. Leibe, A. Aldoma, T. Faulhammer, M. Zillich, M. Vincze, E. Chinellato, M. Al-Omari, P. Duckworth, Y. Gatsoulis, D. C. Hogg, A. G. Cohn, C. Dondrup, J. Pulido Fentanes, T. Krajnik, J. M. Santos, T. Duckett, and M. Han- heide. The strands project: Long-term autonomy in everyday environments. IEEE Robotics Automation Magazine, 24(3):146–156, Sept 2017. [49] Hugues Hoppe, Tony DeRose, Tom Duchamp, John McDonald, and Werner Stuet- zle. Surface reconstruction from unorganized points. In Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’92, pages 71–78, New York, NY, USA, 1992. ACM. ISBN 0-89791-479-1. URL http://doi.acm.org/10.1145/133994.134011. [50] Icarus - unmanned search and rescue, 2018. URL http://www.fp7-icarus.eu/ project-overview. "[Online; accessed 20-January-2018]". [51] Dinesh Jayaraman and Kristen Grauman. Look-ahead before you leap: End-to-end active recognition by forecasting the effect of motion. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 489– 505, Cham, 2016. Springer International Publishing. ISBN 978-3-319-46454-1. [52] C. Jiang, D. P. Paudel, Y. Fougerolle, D. Fofi, and C. Demonceaux. Static-map and dynamic object reconstruction in outdoor scenes using 3-d motion segmentation. IEEE Robotics and Automation Letters, 1(1):324–331, Jan 2016. [53] M. W. Kadous, R. K. M. Sheh, and C. Sammut. Controlling heterogeneous semi- autonomous rescue robot teams. In 2006 IEEE International Conference on Sys- tems, Man and Cybernetics, volume 4, pages 3204–3209, Oct 2006. [54] Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134, 1998. [55] Eugenia Kalnay and Ming Cai. Impact of urbanization and land-use change on climate. Nature, 423:528 EP –, May 2003. URL http://dx.doi.org/10.1038/ nature01675. [56] Kourosh Khoshelham and Sander Oude Elberink. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors, 12(2):1437–1454, 2012. URL http://www.mdpi.com/1424-8220/12/2/1437. [57] Dennis K Killinger and Aram Mooradian. Optical and laser remote sensing, vol- ume 39. Springer, 2013. [58] S. Kim and J. Kim. Occupancy mapping and surface reconstruction using local gaussian processes with kinect sensors. IEEE Transactions on Cybernetics, 43(5): 1335–1346, Oct 2013. 52 BIBLIOGRAPHY

[59] Kinect for microsoft windows, 2018. URL https://developer.microsoft. com/en-us/windows/kinect. "[Online; accessed 18-January-2018]". [60] G. J. M. Kruijff, M. Janícek,ˇ S. Keshavdas, B. Larochelle, H. Zender, N. J. J. M. Smets, T. Mioch, M. A. Neerincx, J. V. Diggelen, F. Colas, M. Liu, F. Pomerleau, R. Siegwart, V. Hlavác,ˇ T. Svoboda, T. Petríˇ cek,ˇ M. Reinstein, K. Zimmermann, F. Pirri, M. Gianni, P. Papadakis, A. Sinha, P. Balmer, N. Tomatis, R. Worst, T. Lin- der, H. Surmann, V. Tretyakov, S. Corrao, S. Pratzler-Wanczura, and M. Sulk. Ex- perience in System Design for Human-Robot Teaming in Urban Search and Rescue, pages 111–125. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. ISBN 978- 3-642-40686-7. URL https://doi.org/10.1007/978-3-642-40686-7_8. [61] I. Kruijff-Korbayová, L. Freda, M. Gianni, V. Ntouskos, V. Hlavác,ˇ V. Kubelka, E. Zimmermann, H. Surmann, K. Dulic, W. Rottner, and E. Gissi. Deployment of ground and aerial robots in earthquake-struck amatrice in italy (brief report). In 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 278–279, Oct 2016.

[62] Ivana Kruijff-Korbayová, Francis Colas, Mario Gianni, Fiora Pirri, Joachim de Gre- eff, Koen Hindriks, Mark Neerincx, Petter Ögren, Tomáš Svoboda, and Rainer Worst. Tradr project: Long-term human-robot teaming for robot assisted dis- aster response. KI - Künstliche Intelligenz, 29(2):193–201, Jun 2015. URL https://doi.org/10.1007/s13218-015-0352-5. [63] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521 (7553):436, 2015. [64] C. Leung, S. Huang, and G. Dissanayake. Active slam using model predictive con- trol and attractor based exploration. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5031, Oct 2006.

[65] Jesse Levinson, Jake Askeland, Jan Becker, Jennifer Dolson, David Held, Soeren Kammel, J Zico Kolter, Dirk Langer, Oliver Pink, Vaughan Pratt, et al. Towards fully autonomous driving: Systems and algorithms. In Intelligent Vehicles Sympo- sium (IV), 2011 IEEE, pages 163–168. IEEE, 2011. [66] H. Lin and Z. Song. 3d reconstruction of specular surface via a novel structured light approach. In 2015 IEEE International Conference on Information and Automation, pages 530–534, Aug 2015. [67] D. Liu, X. Chen, and Y. H. Yang. Frequency-based 3d reconstruction of transparent and specular objects. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 660–667, June 2014.

[68] Fayao Liu, Chunhua Shen, and Guosheng Lin. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5162–5170, 2015. 53

[69] David G. Lowe. Distinctive image features from scale-invariant keypoints. In- ternational Journal of Computer Vision, 60(2):91–110, Nov 2004. URL https: //doi.org/10.1023/B:VISI.0000029664.99615.94. [70] Catherine Manzi, Michael J Powers, and Kristina Zetterlund. Critical information flows in the Alfred P. Murrah Building bombing: a case study. Chemical and Bio- logical Arms Control Institute, 2002. [71] L. Marconi, C. Melchiorri, M. Beetz, D. Pangercic, R. Siegwart, S. Leutenegger, R. Carloni, S. Stramigioli, H. Bruyninckx, P. Doherty, A. Kleiner, V. Lippiello, A. Finzi, B. Siciliano, A. Sala, and N. Tomatis. The sherpa project: Smart collab- oration between humans and ground-aerial robots for improving rescuing activities in alpine environments. In 2012 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 1–4, Nov 2018. [72] Debarati Guha-Sapir Margareta Wahlstrom. The human cost of natural disasters 2015: a global perspective. Technical report, Université catholique de Louvain, Brussels, Belgium, 01 2015. URL "https://reliefweb.int/report/world/ human-cost-natural-disasters-2015-global-perspective". [73] Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, José Castellanos, and Ar- naud Doucet. A bayesian exploration-exploitation approach for optimal online sens- ing and planning with a visually guided mobile robot. Autonomous Robots, 27(2): 93–103, 2009. [74] F. Matsuno and S. Tadokoro. Rescue robots and systems in japan. In 2004 IEEE International Conference on Robotics and Biomimetics, pages 12–20, Aug 2004. [75] Bruce A. Maxwell, Nicolas Ward, and Frederick Heckel. Game-based design of human-robot interfaces for urban search and rescue. In In Computer-Human Inter- face Fringe, 2004. [76] Peter S Maybeck. Stochastic models, estimation, and control, volume 3. Academic press, 1982. [77] Nathan Michael, Shaojie Shen, Kartik Mohta, Yash Mulgaonkar, Vijay Kumar, Keiji Nagatani, Yoshito Okada, Seiga Kiribayashi, Kazuki Otake, Kazuya Yoshida, Kazunori Ohno, Eijiro Takeuchi, and Satoshi Tadokoro. Collaborative mapping of an earthquake-damaged building via ground and aerial robots. Journal of Field Robotics, 29(5):832–841, 2012. URL http://dx.doi.org/10.1002/rob. 21436. [78] Nathan Michael, Shaojie Shen, Kartik Mohta, Yash Mulgaonkar, Vijay Kumar, Keiji Nagatani, Yoshito Okada, Seiga Kiribayashi, Kazuki Otake, Kazuya Yoshida, Kazunori Ohno, Eijiro Takeuchi, and Satoshi Tadokoro. Collaborative mapping of an earthquake-damaged building via ground and aerial robots. Journal of Field Robotics, 29(5):832–841, 2012. URL http://dx.doi.org/10.1002/rob. 21436. 54 BIBLIOGRAPHY

[79] Magnus Minnema Lindhé. Communication-aware motion planning for mobile robots. PhD thesis, KTH Royal Institute of Technology, 2012.

[80] Michael Montemerlo, Sebastian Thrun, Daphne Koller, Ben Wegbreit, et al. Fast- slam: A factored solution to the simultaneous localization and mapping problem. Aaai/iaai, 593598, 2002.

[81] S. A. A. Moosavian, H. Semsarilar, and A. Kalantari. Design and manufacturing of a mobile rescue robot. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3982–3987, Oct 2006.

[82] R. R. Murphy, E. Steimle, M. Hall, M. Lindemuth, D. Trejo, S. Hurlebaus, Z. Medina-Cetina, and D. Slocum. Robot-assisted bridge inspection after hurri- cane ike. In 2009 IEEE International Workshop on Safety, Security Rescue Robotics (SSRR 2009), pages 1–5, Nov 2009.

[83] Robin R Murphy. National science foundation summer field institute for rescue robots for research and response (r4). AI Magazine, 25(2):133, 2004.

[84] Robin R. Murphy. Disaster Robotics. The MIT Press, 2014. ISBN 0262027356, 9780262027359.

[85] Robin R. Murphy, Satoshi Tadokoro, and Alexander Kleiner. Disaster Robotics, pages 1577–1604. Springer International Publishing, Cham, 2016. ISBN 978-3- 319-32552-1. URL https://doi.org/10.1007/978-3-319-32552-1_60.

[86] K. Nagatani and H. Choset. Toward robust sensor based exploration by constructing reduced generalized voronoi graph. In Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289), vol- ume 3, pages 1687–1692 vol.3, 1999.

[87] Keiji Nagatani, Seiga Kiribayashi, Yoshito Okada, Kazuki Otake, Kazuya Yoshida, Satoshi Tadokoro, Takeshi Nishimura, Tomoaki Yoshida, Eiji Koyanagi, Mineo Fukushima, et al. Emergency response to the nuclear accident at the fukushima daiichi nuclear power plants using mobile rescue robots. Journal of Field Robotics, 30(1):44–63, 2013.

[88] Lorenzo Natale, Giorgio Metta, and Giulio Sandini. Learning haptic representation of objects. In International Conference on Intelligent Manipulation and Grasping, July 2004.

[89] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pages 127–136, Oct 2011. 55

[90] M. A. OLIVER and R. WEBSTER. Kriging: a method of interpolation for geographical information systems. International Journal of Geographical In- formation Systems, 4(3):313–332, 1990. URL https://doi.org/10.1080/ 02693799008941549. [91] Barrett O’neill. Elementary differential geometry. Academic press, 2006. [92] Thomas J. Palmeri, Bradley C. Love, and Brandon M. Turner. Model- based cognitive neuroscience. Journal of Mathematical Psychology, 76:59 – 64, 2017. URL http://www.sciencedirect.com/science/article/pii/ S002224961630116X. Model-based Cognitive Neuroscience. [93] Ramviyas Parasuraman, Sergio Caccamo, Fredrik Baberg, and Petter Ogren. Craw- dad dataset kth/rss (v. 2016-01-05), 2016. [94] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017. [95] Y. Qian, M. Gong, and Y. H. Yang. 3d reconstruction of transparent objects with position-normal consistency. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4369–4377, June 2016. [96] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006. ISBN 13978-0-262-18253-9. URL http: //www.gaussianprocess.org/gpml/chapters/. [97] Erik Reinhard, Greg Ward, Sumanta Pattanaik, and Paul Debevec. High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting (The Morgan Kaufmann Series in Computer Graphics). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. ISBN 0125852630. [98] C. Y. Ren, V. Prisacariu, D. Murray, and I. Reid. Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data. In 2013 IEEE International Confer- ence on Computer Vision, pages 1561–1568, Dec 2013. [99] The robocup rescue league, 2018. URL http://www.robocup2016.org/en/ leagues/robocup-rescue/. "[Online; accessed 18-January-2018]". [100] David Rodenas-Herraiz, Antonio-Javier Garcia-Sanchez, Felipe Garcia-Sanchez, and Joan Garcia-Haro. Current trends in wireless mesh sensor networks: A re- view of competing approaches. Sensors, 13(12):5958–5995, May 2013. URL http://dx.doi.org/10.3390/s130505958. [101] Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly, and Andrew J. Davison. Slam++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13, pages 1352–1359, Washington, DC, USA, 56 BIBLIOGRAPHY

2013. IEEE Computer Society. ISBN 978-0-7695-4989-7. URL http://dx.doi. org/10.1109/CVPR.2013.178.

[102] J. Scholtz, J. Young, J. L. Drury, and H. A. Yanco. Evaluation of human-robot interaction awareness in search and rescue. In Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International Conference on, volume 3, pages 2327–2332 Vol.3, April 2004.

[103] H. Shim and S. Lee. Recovering translucent objects using a single time-of-flight depth camera. IEEE Transactions on Circuits and Systems for Video Technology, 26 (5):841–854, May 2016.

[104] Jivko Sinapov, Taylor Bergquist, Connor Schenck, Ugonna Ohiri, Shane Griffith, and Alexander Stoytchev. Interactive object recognition using proprioceptive and auditory feedback. The International Journal of Robotics Research, 30(10):1250– 1262, 2011.

[105] Smokebot: Mobile robots with novel environmental sensors for inspection of disas- ter sites with low visibility, 2018. URL http://cordis.europa.eu/project/ rcn/194282_en.html. "[Online; accessed 18-January-2018]".

[106] Aaron Steinfeld, Terrence Fong, David Kaber, Michael Lewis, Jean Scholtz, Alan Schultz, and Michael Goodrich. Common metrics for human-robot interaction. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-robot Interac- tion, HRI ’06, pages 33–40, New York, NY, USA, 2006. ACM. ISBN 1-59593-294- 1. URL http://doi.acm.org/10.1145/1121241.1121249.

[107] Satoshi Tadokoro. Summary of DDT Project, Unsolved Problems, and Future Roadmap, pages 175–189. Springer London, London, 2009. ISBN 978-1-84882- 474-4. URL https://doi.org/10.1007/978-1-84882-474-4_10.

[108] Yuichi Taguchi, Yong-Dian Jian, Srikumar Ramalingam, and Chen Feng. Point- plane slam for hand-held 3d sensors. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, pages 5182–5189. IEEE, 2013.

[109] Ieee spectrum: Robots enter fukushima reactors, detect high radiation, 2012. URL https://spectrum.ieee.org/automaton/robotics/industrial-robots/ robots-enter-fukushima-reactors-detect-high-radiation. "[Online; accessed 18-January-2018]".

[110] Sebastian Thrun. Probabilistic robotics. Communications of the ACM, 45(3):52–57, 2002.

[111] Sebastian Thrun et al. Robotic mapping: A survey. Exploring artificial intelligence in the new millennium, 1:1–35. 57

[112] S. K. Tin, J. Ye, M. Nezamabadi, and C. Chen. 3d reconstruction of mirror-type objects using efficient ray coding. In 2016 IEEE International Conference on Com- putational Photography (ICCP), pages 1–11, May 2016.

[113] Shrihari Vasudevan, Fabio Ramos, Eric Nettleton, and Hugh Durrant-Whyte. Gaus- sian process modeling of large-scale terrain. Journal of Field Robotics, 26(10): 812–840, 2009. URL http://dx.doi.org/10.1002/rob.20309.

[114] Raquel Viciana-Abad, Rebeca Marfil, Jose M. Perez-Lorenzo, Juan P. Bandera, Adrian Romero-Garces, and Pedro Reche-Lopez. Audio-visual perception sys- tem for a humanoid robotic head. Sensors, 14(6):9522–9545, 2014. URL http: //www.mdpi.com/1424-8220/14/6/9522.

[115] Jean Vroomen and Beatrice de Gelder. Sound enhances visual perception: cross- modal effects of auditory organization on vision. Journal of experimental psychol- ogy: Human perception and performance, 26(5):1583, 2000.

[116] Claus Weitkamp. Lidar: range-resolved optical remote sensing of the atmosphere, volume 102. Springer Science & Business, 2006.

[117] Thomas Whelan, Renato F Salas-Moreno, Ben Glocker, Andrew J Davison, and Ste- fan Leutenegger. Elasticfusion: Real-time dense slam and light source estimation. The International Journal of Robotics Research, 35(14):1697–1716, 2016. URL https://doi.org/10.1177/0278364916669237.

[118] O. Williams and A. Fitzgibbon. Gaussian process implicit surfaces. Gaussian Proc. in Practice, 2007.

[119] Alan F. T. Winfield, Christian Blum, and Wenguo Liu. Towards an ethical robot: Internal models, consequences and ethical action selection. In Michael Mistry, Aleš Leonardis, Mark Witkowski, and Chris Melhuish, editors, Advances in Autonomous Robotics Systems, pages 85–96, Cham, 2014. Springer International Publishing. ISBN 978-3-319-10401-0.

[120] C. Wright, A. Johnson, A. Peck, Z. McCord, A. Naaktgeboren, P. Gianfortoni, M. Gonzalez-Rivero, R. Hatton, and H. Choset. Design of a modular snake robot. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2609–2614, Oct 2007.

[121] Russell B. Wynn, Veerle A.I. Huvenne, Timothy P. Le Bas, Bramley J. Murton, Douglas P. Connelly, Brian J. Bett, Henry A. Ruhl, Kirsty J. Morris, Jeffrey Peakall, Daniel R. Parsons, Esther J. Sumner, Stephen E. Darby, Robert M. Dorrell, and James E. Hunt. Autonomous underwater vehicles (auvs): Their past, present and fu- ture contributions to the advancement of marine geoscience. Marine Geology, 352: 451 – 468, 2014. URL http://www.sciencedirect.com/science/article/ pii/S0025322714000747. 50th Anniversary Special Issue. 58 BIBLIOGRAPHY

[122] Euijung Yang and Michael C. Dorneich. The emotional, cognitive, physiologi- cal, and performance effects of variable time delay in robotic teleoperation. In- ternational Journal of Social Robotics, 9(4):491–508, Sep 2017. URL https: //doi.org/10.1007/s12369-017-0407-x. [123] Tomoaki Yoshida, Keiji Nagatani, Satoshi Tadokoro, Takeshi Nishimura, and Eiji Koyanagi. Improvements to the Rescue Robot Quince Toward Future Indoor Surveil- lance Missions in the Fukushima Daiichi Nuclear Power Plant, pages 19–32. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. ISBN 978-3-642-40686-7. URL https://doi.org/10.1007/978-3-642-40686-7_2. [124] Christopher Zach, Thomas Pock, and Horst Bischof. A globally optimal algorithm for robust tv-l 1 range image integration. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007. [125] Pietro Zanuttigh, Giulio Marin, Carlo Dal Mutto, Fabio Dominio, Ludovico Minto, and Guido Maria Cortelazzo. Time-of-flight and structured light depth cameras. Springer, 2016. URL https://doi.org/10.1007/978-3-319-30973-6. [126] Wende Zhang. Lidar-based road and road-edge detection. In Intelligent Vehicles Symposium (IV), 2010 IEEE, pages 845–848. IEEE, 2010. [127] Jiayuan Zhu and Yee-Hong Yang. Frequency-based environment matting. In 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Pro- ceedings., pages 402–410, Oct 2004. [128] K. Zimmermann, P. Zuzanek, M. Reinstein, and V. Hlavac. Adaptive traversability of unknown complex terrain with obstacles for mobile robots. In 2014 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 5177–5182, May 2014.