Autonomous Vehicle Benchmarking Using Unbiased Metrics
Total Page:16
File Type:pdf, Size:1020Kb
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 25-29, 2020, Las Vegas, NV, USA (Virtual) Autonomous Vehicle Benchmarking using Unbiased Metrics David Paz1∗, Po-jung Lai1∗, Nathan Chan2∗, Yuqing Jiang2∗ Henrik I. Christensen2∗ Abstract— With the recent development of autonomous vehi- ment reports2 are not time and distance normalized: did the cle technology, there have been active efforts on the deployment vehicle experience five disengagements during the course of this technology at different scales that include urban and of 10 miles or 10,000 miles? Or did it experience five highway driving. While many of the prototypes showcased have been shown to operate under specific cases, little effort disengagements over the course of 10 minutes or 2,000 has been made to better understand their shortcomings and hours? generalizability to new areas. Distance, uptime and number of Given the lack of spatiotemporal information, in many manual disengagements performed during autonomous driving cases, these unnormalized reports make it impossible to provide a high-level idea on the performance of an autonomous quantify the performance and robustness of the autonomous system but without proper data normalization, testing location information, and the number of vehicles involved in testing, systems and most importantly quantify their overall safety the disengagement reports alone do not fully encompass system with respect to other autonomous systems or human drivers. performance and robustness. Thus, in this study a complete set This study aims to shed light on autonomous system of metrics are applied for benchmarking autonomous vehicle technology performance and safety by leveraging spatiotem- systems in a variety of scenarios that can be extended for poral information and metrics geared towards benchmarking comparison with human drivers and other autonomous vehicle 3 systems. These metrics have been used to benchmark UC San Level 3 to Level 5 autonomous vehicle systems. Our key Diego’s autonomous vehicle platforms during early deployments contributions consist of three different parts: for micro-transit and autonomous mail delivery applications. • We introduce the concept of intervention maps for disengagement visualization and analysis. Additionally, I. INTRODUCTION the metrics we introduced in [6] have been extended in Autonomous vehicle technology has been under active order to account for safety driver dependability. development for at least 30 years [1] [2] [3] [4]. Since the • With spatial information as a function of time, we time the technology was first conceived [5], a wide range separate the results into different road types to provide more realistic and objective comparisons across differ- of applications have been explored from micro-transit to 4 highway driving applications but more recently has started ent platforms without biasing the results. to become commercialized. With the variety of use cases • A four-month data collection phase is performed using in question, one important topic involves safety. This has UCSD’s Autonomous Vehicle Laboraty autonomous ve- received the attention of state officials, and in many cases, hicles; the data is analyzed using the metrics proposed. regulations and policies have been imposed. With the methods introduced in this study, our team plans In some states, the Department of Motor Vehicles requires on open sourcing an online tool for autonomous vehicle a summary of disengagement reports from each entity per- benchmarking to encourage autonomous vehicle entities to forming tests on public roads to provide a better understand- report their data in order to objectively quantify system safety ing on the number of annual interventions each self-driving and long term autonomy capabilities. car entity is generating. In the state of California alone, the II. RELATED WORK Department of Motor Vehicles (DMV) requires autonomous vehicle companies with a valid testing permit to submit The areas of autonomous vehicle benchmarking have annual reports with a summary of system disengagements. remained relatively unexplored. Prior related work in the area At the time this paper is being written, 66 tech entities hold a of benchmarking sheds light on performance measures for valid autonomous vehicle testing permit and only three hold intelligent systems in off-road and on-road unmanned mil- a driverless testing permit.1 itary applications[9]. While the performance measures pro- Even though many of these reports include certain infor- posed may serve for certain unmanned military applications, mation to estimate the number of disengagements performed autonomous vehicle applications in public road conditions in an entire year, most of the publicly available disengage- often require safety drivers to ensure the vehicles will not behave erratically and pose danger for road users if failure *This work was performed in collaboration with UC San Diego’s Oper- cases arise. ations, Mailing Center, and Police Station. 1∗Department of Electrical and Computer Engineering, University of 2https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/ California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093 disengagement_report_2019 2∗Department of Computer Science and Engineering, University of Cal- 3https://www.nhtsa.gov/technology-innovation/automated- ifornia, San Diego, 9500 Gilman Dr, La Jolla, CA 92093 vehicles-safety 1https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/ 4We define unbiased in the context of making objective comparisons permit across different vehicle platforms without biasing towards a specific system. 978-1-7281-6211-9/20/$31.00 ©2020 IEEE 6223 With road user safety and failure cases in mind, [10] focuses on estimating the number of miles a self-driving vehicle would have to be driven autonomously in order to demonstrate its reliability with respect to human drivers and proof of their safety. This study specifically shows that self-driving vehicles will take tens to hundreds of years to demonstrate considerable reliability over human drivers with respect to fatalities and injuries. In addition, this naturally Fig. 1. Enable/disable (disengagement) signal as a function of time. leads to the questions, how can the autonomous vehicle progress in between be measured objectively? In the figure, manual and autonomous driving segments are While certain self-driving car entities have identified the represented by orange and blue colors, respectively, where flaws with current disengagement data reported by the DMV the separation is given by an intervention or a system re- [11] [12] [13], to the best of our knowledge, our team is the enable signal. Given that a manual intervention could be first to make objective comparisons of autonomous systems performed for an arbitrary length of time, it is important to by studying their long term autonomy implications using real accurately measure the disengagement signals in real-time autonomous vehicle data collected from diverse and realistic by associating them with a system timestamp. While these urban scenarios. measurements can be performed by manual annotation, this introduces human error. Therefore, in the measurements per- III. METRICS formed in this study, each autonomous vehicle was retrofitted In this section, the metrics and tools used to benchmark with a logging device that records the enable and disable an autonomous vehicle during a four-month study at UC San signals over time by using Unix time. This device operates Diego are defined with the goal of fully characterizing the in an encapsulated environment and records serialized data performance of the systems over time. for vehicle pose, speed, enable signals, as well as their corresponding timestamp. Given this data, measuring the A. Direct System Robustness Characterization time elapsed between a disengagement and a re-enable signal For direct system robustness characterization, the metrics can be measured by the difference in timestamps. On the of choice are given by Mean Distance Between Interventions other hand, two methods can be employed for measuring (MDBI) and Mean Time Between Interventions (MTBI). the distance traveled in between any two given timestamps ti These metrics provide a normalized means of benchmarking and ti+k, where ti < ti+k as shown in Equation 3 and 4–where > system robustness over time by including temporal and the vehicle position at time t is given by Xt = [xt ;yt ;zt ] and spatial information. This makes them ideal for comparing speed is given by vt . For the measurements performed in this performance against other systems. In contrast, unnormalized study, Equation 3 was used for estimating distance given that data cannot be used to perform objective comparisons across vehicle pose estimates are provided with a high degree of different autonomous vehicle systems as one cannot estimate precision by the LiDAR based Normal-Distributions Trans- how often the disengagements are happening in terms of time form localization algorithm [7]. The devices used for these and distance. For this reason, we do not use intervention measurements are also introduced in our previous work on counts alone to quantify performance. By definition, the the lessons learned from deploying autonomous vehicles [6] MDBI and MTBI statistics can be computed as shown in and a high level description will be provided in the next Equation 1 and 2. section. Total Distance MDBI = (1) i+k Number of Interventions ∑ kXt − Xt−1k (3) t=i+1 Total Uptime MTBI = (2) Number of Interventions