Game-Theoretic Safety Assurance for Human-Centered Robotic Systems by Jaime Fernández Fisac a Dissertation Submitted in Partial
Total Page:16
File Type:pdf, Size:1020Kb
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems by Jaime Fern´andezFisac A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences in the Graduate Division of the University of California, Berkeley Committee in charge: Professor S. Shankar Sastry, Co-chair Professor Claire J. Tomlin, Co-chair Professor Anca D. Dragan Professor Thomas L. Griffiths Professor Ruzena Bajcsy Fall 2019 Game-Theoretic Safety Assurance for Human-Centered Robotic Systems Copyright 2019 by Jaime Fern´andezFisac 1 Abstract Game-Theoretic Safety Assurance for Human-Centered Robotic Systems by Jaime Fern´andezFisac Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences University of California, Berkeley Professor S. Shankar Sastry, Co-chair Professor Claire J. Tomlin, Co-chair In order for autonomous systems like robots, drones, and self-driving cars to be reliably intro- duced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled envi- ronments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high- confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible. This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic naviga- tion in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians|and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to \close the loop" around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence. 2 [This page intentionally left blank] i To my parents, Concha and Curro, and to my sister Carmeluky. <Porque juntos somos geniales! ii Contents Contents ii List of Figures iv List of Tables vi 1 Introduction1 1.1 Central Challenges in Robotic Safety Assurance................2 1.2 Thesis Overview and Contributions.......................5 2 Background and Preliminaries8 2.1 System Dynamics and Model Uncertainty....................8 2.2 Optimal Control and Dynamic Games..................... 18 2.3 Safety Analysis.................................. 30 2.4 Learning-Based Control............................. 42 2.5 Cognitive Human Models............................. 46 I Safety Analysis for Robotic Systems 50 3 Time-Varying Reach-Avoid Games 51 3.1 Time-Varying Reach-Avoid Games....................... 53 3.2 The Double-Obstacle Isaacs Equation...................... 57 3.3 Numerical Implementation............................ 66 3.4 Numerical Examples............................... 68 3.5 Chapter Summary................................ 77 4 Safe Multi-Robot Trajectory Planning 78 4.1 Safe Multiagent Trajectory Planning...................... 82 4.2 Sequential Trajectory Planning Without Disturbances............. 85 4.3 Robust Tracking of Committed Trajectories.................. 95 4.4 Least-Restrictive STP: Alternative Performance Objectives.......... 103 4.5 Chapter Summary................................ 106 iii 5 Safe Real-Time Robotic Navigation 108 5.1 Fast Planning, Safe Tracking........................... 111 5.2 Recursive Safety and Liveness in Uncertain Environments........... 122 5.3 Chapter Summary................................ 135 II Safety Across the Reality Gap 137 6 Safe Learning under Uncertainty 138 6.1 Problem Formulation.............................. 142 6.2 Safety Analysis with Imperfect Model Error Bounds.............. 146 6.3 Bayesian Safety Assurance............................ 149 6.4 Experimental Results............................... 157 6.5 Chapter Summary................................ 164 7 Confidence-Aware Planning with Human Models 169 7.1 Safe Robot Trajectories under Uncertain Human Motion........... 172 7.2 Confidence-Aware Human Motion Prediction.................. 176 7.3 Safe Probabilistic Planning and Tracking.................... 181 7.4 Demonstration with Real Human Trajectories................. 187 7.5 Safe Multi-Human Multi-Robot Navigation................... 189 7.6 Implications on Human Preference Inference and Value Alignment...... 194 7.7 Chapter Summary................................ 196 8 Game-Theoretic Autonomous Driving 199 8.1 Driving as a Nonzero-Sum Dynamic Game................... 203 8.2 Hierarchical Game-Theoretic Planning..................... 204 8.3 Simulation Results................................ 208 8.4 Chapter Summary................................ 214 III Safe Steps Forward 216 9 Safety Analysis through Reinforcement Learning 217 9.1 The Undiscounted Safety Problem........................ 220 9.2 The Discounted Safety Bellman Equation.................... 221 9.3 Results....................................... 223 9.4 Chapter Summary................................ 227 10 Towards a Safe Robotic Future 230 Bibliography 234 iv List of Figures 1.1 Thesis overview....................................6 2.1 Quadrotor in safe, unsafe, and failure state..................... 38 3.1 Backward-time evolution of the reach-avoid set for a simple control problem.. 69 3.2 Analytic and numerical reach-avoid set....................... 70 3.3 Convergence of numerical Hamilton-Jacobi scheme with grid resolution..... 72 3.4 Backward-time evolution of the reach-avoid set for a reach-avoid game..... 74 3.5 Reach avoid set via state augmentation and time-varying method........ 76 4.1 Initial configuration of the four-vehicle example.................. 90 4.2 Vehicle reach-avoid sets at departure time..................... 92 4.3 Backward-time evolution of the reach-avoid set for a single vehicle........ 93 4.4 Planned trajectories of all vehicles.......................... 94 4.5 Initial configuration of the four-vehicle example in the presence of disturbances. 101 4.6 Backward-time evolution of the robust reach-avoid set for a single vehicle.... 102 4.7 Robust Sequential Trajectory Planning simulation (4 vehicles).......... 103 4.8 Robust Sequential Trajectory Planning simulation (50 vehicles)......... 104 4.9 Robust Sequential Trajectory Planning simulation (200 vehicles)......... 105 5.1 Illustration of heuristic-margin motion planning and FaSTrack scheme...... 110 5.2 Analytic and numerical tracking error bound.................... 116 5.3 Simulated autonomous flight in a cluttered environment............. 119 5.4 Robust tracking bound size vs. planner speed................... 120 5.5 Crazyflie quadrotor during FaSTrack demonstration................ 122 5.6 FaSTrack quadrotor trajectory............................ 123 5.7 FaSTrack quadrotor trajectory with meta-planning ................ 124 5.8 Illustration of unsafe motion plan due to lack of recursive feasibility....... 125 5.9 Outbound expansion and inbound consolidation of navigation graph....... 129 5.10 Schematic diagram of the heuristic exploration procedure............. 131 5.11 Relative states and tracking error between quadrotor and Dubins car...... 134 5.12 Recursively feasible exploration for a Dubins car model.............. 136 v 6.1 Quadrotor learning to fly under an unmodeled disturbance............ 139 6.2 Evolution of disturbance probability under Gaussian process updates...... 157 6.3 Quadrotor altitude over time learning to fly with poor initialization....... 160 6.4 Quadrotor altitude over time flying with unreliable learned model........ 162 6.5 Safe sets computed online from quadrotor flight data and Gaussian process... 163 6.6 Quadrotor altitude over time learning under unmodeled disturbance....... 164 7.1 Quadrotor flying near human with unexpected walking behavior......... 171 7.2 Human trajectory and probabilistic model predictions............... 179 7.3 Human motion predictions under modeled and unmodeled goals......... 181 7.4 Robot trajectories with different model confidence................. 184 7.5 Predicted human state distribution and forward-reachable set.........