Safe and Data-Efficient Learning for Robotics: a Control Theoretic Approach
Total Page:16
File Type:pdf, Size:1020Kb
Safe and Data-Efficient Learning for Robotics: A Control Theoretic Approach Somil Bansal Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2020-186 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-186.html November 24, 2020 Copyright © 2020, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Safe and Data-Efficient Learning for Robotics: A Control Theoretic Approach by Somil Bansal A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Sciences in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Claire Tomlin, Chair Professor Sanjit Seshia Professor Koushil Sreenath Fall 2020 The dissertation of Somil Bansal, titled Safe and Data-Efficient Learning for Robotics: A Control Theoretic Approach, is approved: Chair Date Date Date University of California, Berkeley Safe and Data-Efficient Learning for Robotics: A Control Theoretic Approach Copyright 2020 by Somil Bansal 1 Abstract Safe and Data-Efficient Learning for Robotics: A Control Theoretic Approach by Somil Bansal Doctor of Philosophy in Electrical Engineering and Computer Sciences University of California, Berkeley Professor Claire Tomlin, Chair For successful integration of autonomous systems such as drones and self-driving cars in our day-to-day life, they must be able to quickly adapt to ever changing environments, and actively reason about their safety and that of other users and autonomous systems around them. Even though control theoretic approaches have been used for decades now for the con- trol and safety analysis of autonomous systems, these approaches typically operate under the assumption of a known system dynamics model and the environment in which the system is operating. To overcome these challenges, machine learning approaches have been explored to operate autonomous systems intelligently and reliably in unpredictable environments based on prior data. However, learning techniques widely used today are extremely data inefficient, making it challenging to apply them to real-world physical systems. Moreover, unlike con- trol theoretic approaches, these techniques lack the necessary mathematical framework to provide guarantees on correctness, causing safety concerns as data-driven physical systems are integrated in our society. This dissertation aims to combine the control theoretic perspective with the modern learning approaches to enable autonomous systems to safely adapt to unknown environments. It first introduces a suite of core tools from dynamical system theory and robust control that permits the safety analysis of single and multi-agent autonomous systems under the assumption of known system model and environment. In the remainder of the dissertation, we discuss how we can go past these assumptions with the help of machine learning, while minimizing the data requirement for learning and maintaining the safety guarantees for the autonomous system. To that end, we first discuss how we can learn inaccuracies in the dynamics model of the system, such as unknown ground effects for quadrotors, and use the learned model along with optimal control tools to improve the control performance. Our particular focus is on learning task-specific models that allow for a quick adaptation for the task at hand; these techniques are showcased on physical quadrotors and robotic arms. We next present mod- ular architectures that use a learning-based perception module for the environment level reasoning and a dynamics model-based module for system level reasoning to solve challeng- 2 ing perception and control problems in a priori unknown environments in a data-efficient fashion. Moreover, due to their modularity, these architectures are amenable to simulation- to-real transfer, and can be used for different robotic systems without any retraining. These approaches are demonstrated on a variety of ground robots navigating in unknown buildings around humans based only on onboard visual sensors. Next, we discuss how we can use dynamics models not only for data-efficient learning, but also to monitor and recognize the learning system's failures, and to provide online corrective safe actions when necessary. This allows us to provide safety assurances for learning-enabled systems in unknown and human-centric environments, which has remained a challenge to date. Together these techniques enable autonomous systems to learn to operate in unknown environments, but do so in a data-efficient and safe fashion. The dissertation ends with a discussion of future challenges and opportunities at the intersection of learning and control, including the safety analysis in online learning settings and the need to close the loop between the design of learning systems and their safety analysis for developing resilient and continually improving autonomous systems. i To my parents, Sh. Raman and Punam Bansal, and to my grandma Sh. Kasturi Mahipal. ii Contents Contents ii List of Figures v List of Tables xv 1 Introduction 1 2 Background and Preliminaries 5 2.1 System Dynamics and Feedback Control . 5 2.2 Optimal Control and Dynamic Games . 9 2.3 Hamilton-Jacobi Reachability . 17 2.4 Reinforcement Learning . 26 2.5 Neural Networks . 27 2.6 Gaussian Processes and Bayesian Optimization . 30 I Safety Analysis for Robotic Systems 33 3 Scaling Safety Analysis: Algorithmic and Computational Fronts 34 3.1 Related Work . 35 3.2 Reachability Decomposition . 37 3.3 Run-Time Reachability in Dynamic Environments: Warm Start and Local Updates . 62 3.4 The Berkeley Efficient API in C++ for Level Set Methods (BEACLS) . 74 3.5 Chapter Summary . 77 4 Provably Safe and Scalable Multi-Vehicle Trajectory Planning 80 4.1 Related Work . 81 4.2 Safe Multi-Vehicle Trajectory Planning . 83 4.3 Sequential Trajectory Planning . 84 4.4 Sequential Trajectory Planning With An Adversarial Intruder . 103 4.5 Chapter Summary . 114 iii II Going Beyond Known Dynamics Models and Environments: Learning-Based Control for Unknown Models and Environ- ments 116 5 Learning for Unknown Dynamics Models: Indirect Learning-Based Con- trol 117 5.1 Related Work . 118 5.2 Problem Formulation . 121 5.3 Identification of Unmodeled Dynamics Using Deep Neural Networks . 122 5.4 Experiments: Learning Quadrotor Dynamics Using DNN . 123 5.5 Discussion . 134 5.6 Chapter Summary . 134 6 Learning for Unknown Dynamics Models: Direct Learning-Based Control136 6.1 Related Work . 137 6.2 Problem Formulation . 139 6.3 Goal-Driven Dynamics Learning (aDOBO) . 141 6.4 Going Beyond Linear Models: Learning Task-Driven Gaussian Process (GP) Models . 155 6.5 Combining Indirect and Direct Learning-Based Control . 163 6.6 Chapter Summary . 173 7 Learning for Unknown Environments: Perception-Based Control 174 7.1 Related Work . 175 7.2 Problem Formulation . 179 7.3 Learning-Based Perception with Model-Based Control for Visual Navigation 179 7.4 Visual Navigation in Dynamic, Human-Centric Environments . 195 7.5 Chapter Summary . 208 III Safety for Learning-Enabled Control Systems 209 8 Safety Analysis Using Learning-Based Dynamics Models 210 8.1 Related Work . 211 8.2 Problem Formulation . 213 8.3 SPEC: Specification Centric Simulation Metric . 217 8.4 Numerical Simulations . 226 8.5 Chapter Summary . 234 9 Safe Learning-Enabled Perception Components 235 9.1 Related Work . 236 9.2 Problem Formulation . 238 9.3 Reachability-Based Safety Monitor in Unknown Environments . 239 iv 9.4 Numerical Simulations . 244 9.5 Hardware Experiments . 248 9.6 Chapter Summary . 250 10 Towards a Synergistic Learning and Control Future 251 Bibliography 254 v List of Figures 2.1 The neural network architecture for a 3 layer feed-forward ReLU network. The NN consists of three layers: an input layer, a hidden ReLU layer, and an output layer. The parameters to be learned during the training process are θ = (W; w; B; b). 30 3.1 This figure shows the back-projection of sets in the x1-xc plane S1 and the x2- xc plane (S2) to the 3D space to form the intersection shown as the black cube (S). The figure also shows projection of a point x onto the lower-dimensional subspaces in the x1-xc and x2-xc planes. 41 3.2 Comparison of the Dubins Car BRS A(t = −0:5) computed using the full for- mulation and via decomposition. Left top: The BRSs in the lower-dimensional subspaces and how they are combined to form the full-dimensional BRS. Top right: The BRS computed via decomposition. Bottom left: The BRSs computed using both methods, superimposed, showing that they are indistinguishable. Bot- tom right: The BRS computed using the full formulation. 46 3.3 The Dubins Car BRS A(t = −0:5) computed using the full formulation and via decomposition, other view angles. 47 3.4 Computation times of the two methods in log scale for the Dubins Car. The time of the direct computation in 3D increases rapidly with the number of grid points per dimension. In contrast, computation times in 2D with decomposition are negligible in comparison. 48 3.5 Comparison of the R(t) computed using our decomposition method and the full formulation. The computation results are indistinguishable. Note that the surface shows the boundary of the set; the set itself is on the \near" side of the left subplot, and the left side of the right subplot. 48 3.6 3D slices of the 10D BRSs over time (colored surfaces) and the BRT (black surface) for the Near-Hover Quadrotor. The slices are taken at the 7D points at (vx; vy; vz) = (−1:5; −1:8; 1:2); θx = θy = !x = !y = 0 (Left) and (px; py; pz) = (−1:5; 0; 1); (vx; vy) = (1:2; −0:6);!x = !y = −0:5 (Right).