Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios
Total Page:16
File Type:pdf, Size:1020Kb
2020 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM Autonomy, Artificial Intelligence & Robotics (AAIR) Technical Session August 11-13, 2020 { Novi, Michigan ENABLING ARTIFICIAL INTELLIGENCE STUDIES IN OFF-ROAD MOBILITY THROUGH PHYSICS-BASED SIMULATION OF MULTI-AGENT SCENARIOS D. Negrut, R. Serban, A. Elmquist, J. Taves, A. Young University of Wisconsin { Madison Madison, WI A. Tasora, S. Benatti Dipartimento di Ingegneria ed Architettura University of Parma, Italy ABSTRACT We describe a simulation environment that enables the design and testing of control policies for off- road mobility of autonomous agents. The environment is demonstrated in conjunction with the design and assessment of a reinforcement learning policy that uses sensor fusion and inter-agent communi- cation to enable the movement of mixed convoys of conventional and autonomous vehicles. Policies learned on rigid terrain are shown to transfer to hard (silt-like) and soft (snow-like) deformable ter- rains. The enabling simulation environment, which is Chrono-centric, is used as follows: the training occurs in the GymChrono learning environment using PyChrono, the Python interface to Chrono. The GymChrono-generated policy is subsequently deployed for testing in SynChrono, a scalable, cluster- deployable multi-agent testing infrastructure that uses MPI. The Chrono::Sensor module simulates sensing channels used in the learning and inference processes. The software stack described is open source. Relevant movies: [1]. 1 INTRODUCTION perform more thorough testing, and produce more performant and safer designs. Simulation is not Computer simulation has been extensively used in a silver bullet as it has its limitations, first of all the design and analysis of various automation as- the issue of simulation-to-reality transfer [3], which pects tied to on-road mobility, see, for instance pertains to the failure of control policies derived in [2]. A similar statement cannot be made for off- simulation to work well in the real world. Fur- road mobility owing to a smaller market and a thermore, models are difficult to set up and cal- set of stiff challenges brought along by the task ibrate, the validation process can be tedious and at hand. However, a predictive simulation plat- time consuming, and open source simulation tools form for off-road mobility analysis of autonomous that are both predictive and expeditious are not agents (AAs) is very desirable since it can accel- readily available. This contribution addresses the erate the engineering design cycle, reduce costs, third point. It describes a simulation environment DISTRIBUTION STATEMENT A. Approved for public release; distribution unlimited. OPSEC#864 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) whose stated purpose is to allow the practitioner but now including support for on-road traffic of to gain insights into the operation of AAs (robots AVs as well. Carla and AirSim rely on Unreal and autonomous wheeled or tracked vehicles) in Engine but several other engines are used for AA off-road conditions with an eye towards: improv- simulation, e.g. Unity [29] and TORCS [30]. For a ing mechanical designs of AAs; and, producing and survey of other solutions for on-road mobility the testing control policies that govern the operations reader is referred to [31, 32]. of the AAs. These are topical goals for the Army, The AA off-road mobility simulation platform see, for instance, [4, 5, 6, 7], or, for a historical discussed here is Chrono-centric [33, 34]. In its perspective on early contributions, [8]. purpose, it is similar to the ANVEL-VANE en- There are several ongoing efforts that seek to vironment as it seeks to simulate robots and address the AA simulation issue. In robotics, wheeled/tracked vehicles operating in off-road con- Gazebo [9, 10] is a widely used 3D multi-robot ditions. Compared to the ANVEL-VANE so- simulator with dynamics. It is not a simulation en- lution, the Chrono environment is different in gine per se, but a platform that exposes several en- several respects: it is open source and avail- gines: ODE [11], Bullet [12], DART [13], and Sim- able for unfettered use under a BSD3 license; body [14]. Unlike Gazebo, which is open source, it uses its own multi-physics engine; it is scal- CoppeliaSim (formerly V-REP) [15] is a commer- able and deployable on supercomputers, clusters, cial multi-robot simulation solution that also ex- or multi-core architectures owing to its reliance poses a set of simulation engines: MuJoCo [16], on the MPI message passing standard for paral- Vortex Dynamics [17], Bullet, and Newton Dy- lel computing [35]; and is under active develop- namics [18]. ROAMS [19] and ANVEL [20] are ment. Chrono is an ecosystem of modules and two other simulation engines for off-road AAs. The toolkits. It has support for rigid and flexible former is used for mission planning by NASA and body dynamics (Chrono::Engine), fluid-solid in- draws on an in-house dynamics engine [21]; the lat- teraction (Chrono::FSI), and granular dynamics ter relies on the ODE simulation engine and has (Chrono::Parallel and Chrono::Granular) applica- been used in the past for off-road military applica- tions. It has Python bindings, support for sensor tions [22] in combination with a sensor simulation simulation in Chrono::Sensor, an API for ROS [36] package [23]. MAVS is an off-road AA simulation bridging, as well as facilities for: rapid vehicle environment that is currently under active devel- modeling via parameterized templates [37]; con- opment [24]. It provides an in-house developed, trol policy design with GymChrono; and scalable sophisticated sensor simulation module [25, 26], control policy testing with SynChrono. Chrono re- has a ROS bridge, and uses Chrono as its dynam- lies on GPU computing for fluid-solid interaction ics engine. USARSim is an AA simulation plat- and select granular dynamics simulations, multi- form, not under active development, that draws core for most of the other modules, and MPI- on a game engine (Unreal Engine [27]), a choice enabled parallel computing for co-simulation when with pluses (scalability, ability to create complex handling large terramechanics applications or col- worlds) and minuses (the simulation engine is de- lections of AAs. Real-time simulation is not one of signed for plausibility rather than accuracy). For Chrono's priorities. Although for vehicle-on-rigid- autonomous vehicle (AV) simulation, Carla [2] and terrain simulation it provides faster than real-time AirSim [28] are two often used open-source simu- performance, there are numerous applications that lators, the former for on-road AV driving scenarios lead to long run times in Chrono, e.g., deformable simulation, the latter originally designed for drones terrain mobility, nonlinear flexible body dynamics, Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 2 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) fording scenarios, etc. ber of samples to be analyzed [46]. Algorithms This contribution highlights the Chrono com- such as Dijkstra's, A-Star (A*), or the Rapidly- ponents that support the design and testing of exploring Random Tree-Star (RRT*), sample the control policies through simulation: PyChrono, state space either deterministically or stochasti- GymChrono, Chrono::Sensor, and SynChrono. To cally [45]. Depending on the complexity of the show these components at work, a Reinforcement traffic scenario, these algorithms can prove compu- Learning (RL) approach is used herein to produce tationally costly and provide sub-optimal results. a control policy. There is nothing special about Model Predictive Control (MPC) is another the RL approach; other techniques to design con- common AV control approach [47]. Using a dy- trol policies could be used equally well, a point namic model of a vehicle, the MPC algorithm touched upon in more detail in Section x2. Sec- computes trajectories over the state space and tion x3 describes the Chrono infrastructure that determines an optimal trajectory using gradient- facilitates artificial intelligence studies in off-road, descending optimization techniques [46, 48]. A multi-agent mobility scenarios. Section x4 covers limited time horizon is employed to reduce un- simulation experiments that highlight two aspects: needed computation for times too far out into the the scalability of the SynChrono testing environ- future. In comparison to sampling algorithms, the ment, and the process of designing the RL control MPC approaches show improved performance that policy along with an evaluation of the policy's ro- is tied to the use of the gradient fields in the opti- bustness. Concluding remarks and directions of mization problem that comes into play [47]. future work round off the contribution. The accuracy of the simulation platform plays a critical role both for MPC as well as sampling- 2 DERIVING CONTROL POLICIES based controllers. To adequately validate and sub- THROUGH SIMULATION sequently verify a controller such as MPC, the simulation must be of high enough definition to Derived using an accurate simulation framework, carry over successfully to reality [49]. When us- control algorithms have been shown to bridge the ing the more pedestrian PID controller solutions, sim-to-reality gap successfully [38, 39]. The use for which gains must be carefully selected, an in- of vehicles with Level 1 and Level 2 autonomy has accurate simulation platform could yield a poor grown considerably [40, 41] and the automotive in- design that leads to undesired consequences when dustry is making major strides in the transition to deployed on a real vehicle. Levels 3 and 4 autonomy [42, 43]. The use of simu- The design of a robust controller that per- lation for on-road AVs is an area of intense research forms adequately in complex environments using and development as this technology is seen as an the aforementioned strategies has proven difficult important catalyst of the aforementioned transi- when aiming for a generalized policy [50].