2020 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM Autonomy, Artificial Intelligence & (AAIR) Technical Session August 11-13, 2020 – Novi, Michigan

ENABLING ARTIFICIAL INTELLIGENCE STUDIES IN OFF-ROAD MOBILITY THROUGH PHYSICS-BASED SIMULATION OF MULTI-AGENT SCENARIOS

D. Negrut, R. Serban, A. Elmquist, J. Taves, A. Young University of Wisconsin – Madison Madison, WI A. Tasora, S. Benatti Dipartimento di Ingegneria ed Architettura University of Parma, Italy

ABSTRACT

We describe a simulation environment that enables the design and testing of control policies for off- road mobility of autonomous agents. The environment is demonstrated in conjunction with the design and assessment of a reinforcement learning policy that uses sensor fusion and inter-agent communi- cation to enable the movement of mixed convoys of conventional and autonomous vehicles. Policies learned on rigid terrain are shown to transfer to hard (silt-like) and soft (snow-like) deformable ter- rains. The enabling simulation environment, which is Chrono-centric, is used as follows: the training occurs in the GymChrono learning environment using PyChrono, the Python interface to Chrono. The GymChrono-generated policy is subsequently deployed for testing in SynChrono, a scalable, cluster- deployable multi-agent testing infrastructure that uses MPI. The Chrono::Sensor module simulates sensing channels used in the learning and inference processes. The software stack described is open source. Relevant movies: [1].

1 INTRODUCTION perform more thorough testing, and produce more performant and safer designs. Simulation is not Computer simulation has been extensively used in a silver as it has its limitations, first of all the design and analysis of various automation as- the issue of simulation-to-reality transfer [3], which pects tied to on-road mobility, see, for instance pertains to the failure of control policies derived in [2]. A similar statement cannot be made for off- simulation to work well in the real world. Fur- road mobility owing to a smaller market and a thermore, models are difficult to set up and cal- set of stiff challenges brought along by the task ibrate, the validation process can be tedious and at hand. However, a predictive simulation plat- time consuming, and open source simulation tools form for off-road mobility analysis of autonomous that are both predictive and expeditious are not agents (AAs) is very desirable since it can accel- readily available. This contribution addresses the erate the engineering design cycle, reduce costs, third point. It describes a simulation environment

DISTRIBUTION STATEMENT A. Approved for public release; distribution unlimited. OPSEC#864 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) whose stated purpose is to allow the practitioner but now including support for on-road traffic of to gain insights into the operation of AAs (robots AVs as well. Carla and AirSim rely on Unreal and autonomous wheeled or tracked vehicles) in Engine but several other engines are used for AA off-road conditions with an eye towards: improv- simulation, e.g. Unity [29] and TORCS [30]. For a ing mechanical designs of AAs; and, producing and survey of other solutions for on-road mobility the testing control policies that govern the operations reader is referred to [31, 32]. of the AAs. These are topical goals for the Army, The AA off-road mobility simulation platform see, for instance, [4, 5, 6, 7], or, for a historical discussed here is Chrono-centric [33, 34]. In its perspective on early contributions, [8]. purpose, it is similar to the ANVEL-VANE en- There are several ongoing efforts that seek to vironment as it seeks to simulate robots and address the AA simulation issue. In robotics, wheeled/tracked vehicles operating in off-road con- Gazebo [9, 10] is a widely used 3D multi-robot ditions. Compared to the ANVEL-VANE so- simulator with dynamics. It is not a simulation en- lution, the Chrono environment is different in gine per se, but a platform that exposes several en- several respects: it is open source and avail- gines: ODE [11], Bullet [12], DART [13], and Sim- able for unfettered use under a BSD3 license; body [14]. Unlike Gazebo, which is open source, it uses its own multi-; it is scal- CoppeliaSim (formerly V-REP) [15] is a commer- able and deployable on supercomputers, clusters, cial multi-robot simulation solution that also ex- or multi-core architectures owing to its reliance poses a set of simulation engines: MuJoCo [16], on the MPI message passing standard for paral- Dynamics [17], Bullet, and Newton Dy- lel computing [35]; and is under active develop- namics [18]. ROAMS [19] and ANVEL [20] are ment. Chrono is an ecosystem of modules and two other simulation engines for off-road AAs. The toolkits. It has support for rigid and flexible former is used for mission planning by NASA and body dynamics (Chrono::Engine), fluid-solid in- draws on an in-house dynamics engine [21]; the lat- teraction (Chrono::FSI), and granular dynamics ter relies on the ODE simulation engine and has (Chrono::Parallel and Chrono::Granular) applica- been used in the past for off-road military applica- tions. It has Python bindings, support for sensor tions [22] in combination with a sensor simulation simulation in Chrono::Sensor, an API for ROS [36] package [23]. MAVS is an off-road AA simulation bridging, as well as facilities for: rapid vehicle environment that is currently under active devel- modeling via parameterized templates [37]; con- opment [24]. It provides an in-house developed, trol policy design with GymChrono; and scalable sophisticated sensor simulation module [25, 26], control policy testing with SynChrono. Chrono re- has a ROS bridge, and uses Chrono as its dynam- lies on GPU computing for fluid-solid interaction ics engine. USARSim is an AA simulation plat- and select granular dynamics simulations, multi- form, not under active development, that draws core for most of the other modules, and MPI- on a game engine (Unreal Engine [27]), a choice enabled parallel computing for co-simulation when with pluses (scalability, ability to create complex handling large terramechanics applications or col- worlds) and minuses (the simulation engine is de- lections of AAs. Real-time simulation is not one of signed for plausibility rather than accuracy). For Chrono’s priorities. Although for vehicle-on-rigid- autonomous vehicle (AV) simulation, Carla [2] and terrain simulation it provides faster than real-time AirSim [28] are two often used open-source simu- performance, there are numerous applications that lators, the former for on-road AV driving scenarios lead to long run times in Chrono, e.g., deformable simulation, the latter originally designed for drones terrain mobility, nonlinear flexible body dynamics,

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 2 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) fording scenarios, etc. ber of samples to be analyzed [46]. Algorithms This contribution highlights the Chrono com- such as Dijkstra’s, A-Star (A*), or the Rapidly- ponents that support the design and testing of exploring Random Tree-Star (RRT*), sample the control policies through simulation: PyChrono, state space either deterministically or stochasti- GymChrono, Chrono::Sensor, and SynChrono. To cally [45]. Depending on the complexity of the show these components at work, a Reinforcement traffic scenario, these algorithms can prove compu- Learning (RL) approach is used herein to produce tationally costly and provide sub-optimal results. a control policy. There is nothing special about Model Predictive Control (MPC) is another the RL approach; other techniques to design con- common AV control approach [47]. Using a dy- trol policies could be used equally well, a point namic model of a vehicle, the MPC algorithm touched upon in more detail in Section §2. Sec- computes trajectories over the state space and tion §3 describes the Chrono infrastructure that determines an optimal trajectory using gradient- facilitates artificial intelligence studies in off-road, descending optimization techniques [46, 48]. A multi-agent mobility scenarios. Section §4 covers limited time horizon is employed to reduce un- simulation experiments that highlight two aspects: needed computation for times too far out into the the scalability of the SynChrono testing environ- future. In comparison to sampling algorithms, the ment, and the process of designing the RL control MPC approaches show improved performance that policy along with an evaluation of the policy’s ro- is tied to the use of the gradient fields in the opti- bustness. Concluding remarks and directions of mization problem that comes into play [47]. future work round off the contribution. The accuracy of the simulation platform plays a critical role both for MPC as well as sampling- 2 DERIVING CONTROL POLICIES based controllers. To adequately validate and sub- THROUGH SIMULATION sequently verify a controller such as MPC, the simulation must be of high enough definition to Derived using an accurate simulation framework, carry over successfully to reality [49]. When us- control algorithms have been shown to bridge the ing the more pedestrian PID controller solutions, sim-to-reality gap successfully [38, 39]. The use for which gains must be carefully selected, an in- of vehicles with Level 1 and Level 2 autonomy has accurate simulation platform could yield a poor grown considerably [40, 41] and the automotive in- design that leads to undesired consequences when dustry is making major strides in the transition to deployed on a real vehicle. Levels 3 and 4 autonomy [42, 43]. The use of simu- The design of a robust controller that per- lation for on-road AVs is an area of intense research forms adequately in complex environments using and development as this technology is seen as an the aforementioned strategies has proven difficult important catalyst of the aforementioned transi- when aiming for a generalized policy [50]. An tion. One active area of research is focused on emerging approach that has gained momentum in sampling-based methods. These approaches gen- recent years is Machine Learning (ML) based [51]. erate many candidate trajectories a vehicle can ML has shown promise in producing efficient and follow, selecting and executing the controls asso- robust models that generalize well in a variety of ciated with the best candidate [44, 45]. Graph situations. The three pillars of ML include su- search methods are commonly associated with each pervised learning, unsupervised learning, and re- trajectory. The approach is real-time challenged, inforcement learning. In the AA problem, deep since achieving robust results requires a high num- reinforcement learning (DRL) has been very suc-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 3 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) cessful, as it displays the ability to learn and re- the data needed to design a control policy. Sec- spond in complex scenarios without the need for ond, it is used for testing purposes. To this end, preprocessed or labeled data [39]. it exposes the control policy produced in a model- Since its introduction [52], DRL has proven suc- based or model-free approach to a battery of tests cessful in robotics applications [53, 54]. At its core, to gauge its correctness and robustness. This DRL is an iterative learning process in which an section outlines the components of this Chrono- agent interacts with an environment; at each iter- centric simulation environment. The emphasis is ation the agent collects an observation (or state), placed on four components: Chrono::Sensor, Py- then performs an action based on the previous ob- Chrono, GymChrono and SynChrono, that anchor servation and gets a reward which is a measure of the AA design and testing process and have not its performance. The goal of RL is to find a policy been discussed elsewhere in detail. More estab- that maximizes the sum of the collected reward. lished Chrono components or functionality will be RL allows for complex control policies viable touched upon in passing; more details are provided in unstructured and stochastic environments; it is in [34, 37, 58]. model-free in that it does not require a model that predicts environment transients; and can learn Chrono. Under active development for over two from scratch. RL’s major flaw is its need for a decades, Chrono [34] is a multi-physics simulation massive amount of training data to infer a robust engine distributed as open-source under a permis- policy. The role of simulation is to produce this sive BSD license. Its core module, Chrono::Engine, collection of samples. Policy Gradient Algorithms provides support for rigid multibody dynamics, are a subset of RL algorithms whose goal is to di- nonlinear finite element analysis, and frictional rectly learn an optimal stochastic policy πθ(a|s), contact dynamics. Chrono is modular, with op- where s, a, and θ are the state, action, and a set of tional modules providing support for additional learnable parameters, respectively. If π is a Neural classes of physics simulation (e.g., fluid-solid inter- Network (NN), θ are the weights and biases of the action or large-scale granular dynamics), for mod- NN, the state is the NN input, and the action is its eling and simulation of specialized mechanical sys- output. Proximal Policy Optimization (PPO) [55] tems (e.g., ground vehicles), for interfaces to exter- is one of the most widely used algorithms for con- nal solvers (e.g., sparse direct linear solvers), or for tinuous state and action environments and will be dedicated parallel algorithms targeting different the algorithm of choice in this contribution. PPO computing architectures (multi-core, distributed, is a policy gradient algorithm whose goal is to op- and GPU) for large-scale simulations. timize a stochastic policy. It is also an actor-critic Written almost entirely in C++, Chrono is mid- method since another NN is trained and used to dleware and therefore supports customized solu- estimate the Value Function [56] employed to es- tions that involve user code and potentially third- timate the Advantage Function [57]. The Advan- party software. The software is portable and can tage Function is used in the objective function to be built on different platforms, under different be maximized. operating systems, and using various compilers. Chrono is actively developed, has a continuous 3 SIMULATION INFRASTRUCTURE integration process, an active user forum, and is managed through GitHub [59]. It’s latest release The purpose of the simulation environment de- is 5.0.1, available as of March 2020. Chrono is used scribed is twofold. First, it is used to produce by academic, industrial, and government research

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 4 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) and development groups and projects.

Chrono::Vehicle. Chrono::Vehicle [37] is a spe- cialized Chrono module that makes available a col- lection of templates (fully parameterized models) for various topologies of both wheeled and tracked vehicle subsystems. It provides facilities for mod- eling rigid, deformable, and granular terrain; sup- port for closed-loop and interactive driver models; Figure 1: Chrono::Vehicle HMMWV with flexible and run-time and off-line visualization of simula- tires navigating granular terrain demonstrating vehicle tion results. Chrono::Vehicle leverages and works dynamics, flexible body dynamics, and parallel com- in tandem with other Chrono modules for run-time puting support in Chrono [61]. visualization or finite element, granular dynamics, and parallel computing support. Chrono::Vehicle provides vehicle subsystem tem- finite-element based soil representation. For simple plates for tires, suspensions, steering mechanisms, terramechanics simulations, Chrono::Vehicle im- drivelines, sprockets, track shoes, etc.; templates plements a customized Soil Contact Model (SCM), for external systems such as powertrains, drivers, based on Bekker theory, with extensions to al- and terrain models; and additional utility classes low non-structured triangular grids, adaptive mesh and functions for vehicle visualization, monitoring, refinement, and incorporation of bulldozing ef- and collection of simulation results. As a middle- fects [58]. Second, Chrono provides an FEA con- ware library, Chrono::Vehicle requires the user to tinuum soil model based on multiplicative plas- provide C++ classes for a concrete instantiation ticity theory with Drucker-Prager failure criterion of a particular template. An optional Chrono li- and specialized brick elements. Finally, leveraging brary provides complete sets of template instanti- Chrono support for large-scale granular dynamics ations for several concrete ground vehicles, both and for multi-core, GPU, and distributed paral- wheeled and tracked, which can serve as exam- lel computing, off-road vehicle simulations can be ples for developing more customized vehicle mod- conducted using fully-resolved, granular dynamics- els. An alternative mechanism for defining con- based complex terramechanics, using a Discrete El- crete instantiation of vehicle system and subsys- ement Method approach, see Fig. 1 [61, 62]. tem templates is based on input specification files in the JSON format [60]. For additional flexibil- Chrono::Sensor. Chrono::Sensor provides sen- ity and to allow integration of third-party software, sor simulation support for software-in-the-loop Chrono::Vehicle is designed to permit either mono- testing. Cameras, lidar, GPS, and IMU sensors lithic simulations or co-simulation where the vehi- can be placed within a Chrono simulation to gener- cle, powertrain, tires, driver, and terrain/soil in- ate synthetic data based on user-defined sensor pa- teraction can be simulated independently. rameters and attributes of the virtual world host- Chrono::Vehicle provides several classes of ter- ing the AA simulation experiment. The goal of the rain and soil models, of different fidelity and com- module is to allow realistic data generation based putational complexity, ranging from rigid, to semi- on sensor characteristics such as noise, distortion, empirical Bekker-Wong type models, to complex and lag. Sensors can be attached to objects within physics-based models based on either a granular or the simulation and configured to match corre-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 5 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) sponding real sensors. For modeling convenience, generated Python wrappers for much of the sensors can be defined through a JSON file [60]. Chrono functionality. The purpose was twofold: to Additionally, custom sensors and post-processing provide a lower-entry point to Chrono simulations filtering can be implemented, leveraging the exist- for users less familiar with C++; and, provide a ing render framework or physics interface. bridge to various machine learning platforms, e.g. For interoceptive sensing (GPS and IMU), the TensorFlow [64], PyTorch [65], Theano [66], and module utilizes the internally computed physi- CAFFE [67]. cal quantities from the Chrono system and can The Python wrapping relies on using automated augment this ground truth with drift, Gaussian technology provided by SWIG [68], which gener- noise, lag, and filtering characteristics from fi- ates the interface between Python user-code and nite collection time. For exteroceptive sensors the underlying Chrono C++ libraries. Presently, that provide information about scene characteris- a large set of Chrono functionality is exposed to tics (camera and lidar), Chrono::Sensor leverages Python users, including the core multibody and hardware accelerated ray tracing through the Op- FEA module, the interface to CAD systems (like tiX library [63] and implements physically based SolidWorks), run-time visualization with Irrlicht, rendering techniques. The ray tracing approach etc. In particular, full support is available for both allows for the physical reconstruction of the light- modeling, simulation, and visualization of wheeled based data acquisition process and thus controls ground vehicles and use of the Chrono::Vehicle ex- the attributes of the synthetically generated sensor isting wheeled vehicle models, as well as using the data. For camera, lens models and post-processing sensor models provided by Chrono::Sensor. Py- noise augmentation are supported, with an inter- Chrono for Python 3 can be built from sources on face to extend or implement custom models. For any of the supported platforms (, Windows, lidar, the framework expands on work from [25] MacOS). Alternatively, pre-built conda PyChrono to provide a beam divergence model that supports packages are available on the project’s Anaconda multiple modes of lidar return and reduced inten- page [69] (note that Chrono::Sensor is not available sity during partial beam reflectance. The cam- yet via the conda PyChrono packages). era and lidar can also be parameterized by update rate, time over which to collect data, and lag. GymChrono. GymChrono is an extension of Chrono::Sensor is currently in development with OpenAI Gym [70]. It exposes a set of envi- planned expansion of sensor support and addi- ronments providing continuous control tasks for tional distortion model implementations. All sen- physics and sensor simulation run by the Chrono sors and capabilities are written in C++, but can backend. These environments inherit from Ope- also be accessed from Python through the Py- nAI Gym classes. As such, they can be used out Chrono interface. The entire module can be run of the box with any algorithm or DRL framework headless without the requirement of a render con- made for gym environments. They can also draw text, allowing for ease of deployment in machine on gym’s environment parallelization for learning learning applications on remote servers (in the acceleration. cloud). SynChrono. SynChrono is a software compo- PyChrono. While the main Chrono API is ex- nent that uses Chrono to implement a distributed- pressed in C++, in recent years a concerted ef- memory execution model when simulating sce- fort was dedicated to providing automatically- narios that include multiple AAs. By leverag-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 6 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) ing the Message Passing Interface (MPI) stan- dard [71], SynChrono can manage multiple in- stances of Chrono running together one mobility analysis on a supercomputer, cluster, or multi-core setup thus supporting the scalable and distributed simulation of multiple agents (robots, tracked ve- hicles, wheeled vehicles, etc.) The paradigm em- braced is that of running the dynamics of one AA as one MPI rank, with the ranks/AAs communi- cating through MPI messages to maintain space and time coherent state for all agents participat- ing in the study. As an example, if there are two agents, SynChrono makes it possible to syn- Figure 2: Schematic of the SynChrono framework. chronously run the two agents on two different Dynamics simulations are done in separate Chrono compute nodes in a supercomputer. By the same systems and the outcome of the dynamics simulation token, if there are 50 agents and 50 compute nodes is synchronized between ranks using MPI. in a cluster, SynChrono provides the infrastructure to keep the 50 agents operating in a coherent (time- vary from complex algorithms that fuse sensor wise and spacewise) virtual world. Although each feeds/data streams, to controls based on empiri- agent represents a Chrono simulation, SynChrono cal models, and on to pre-recorded driver inputs makes it possible for ruts generated by agent 27 from a human or human-driven control in scenar- to be picked up on a camera sensor on agent 31 ios that are simple enough to allow real-time sim- if the camera points to agent 27. The time co- ulation. SynChrono supports human-in-the-loop herence aspect prevents some agents from racing experiments as well. into the future while other agents lag behind in Each SynChrono process is responsible for the the past. The global synchronization mechanism dynamics of a single agent. At a slow frequency in SynChrono ensures that all 50 agents march for- (relative to the simulation time-step), all Syn- ward in simulation time in a coherent fashion so Chrono processes communicate via the SynChrono that mutual interaction (a vehicle crossing the ruts daemon to exchange state information. State in- of a different one, a vehicle sensing another vehicle, formation is intended to be brief, sufficient to en- etc.) happens as in real life. able a SynChrono process to reconstruct a “ghost” The structure of SynChrono’s MPI framework version of outside agents in its own world for vi- is shown in Fig. 2. SynChrono manages multi- sualization and sensing purposes. In an example ple AAs as multiple processes via as many MPI where each agent is a vehicle, the state informa- ranks. Each AA runs in its own SynChrono pro- tion consists of the vehicle location and orienta- cess (an MPI rank) and interfaces with its ded- tion along with pose information for each wheel. icated control stack for software-in-the-loop con- This information is packaged via the FlatBuffers trol. The control stack is fed synthetic data gen- serialization library [72]. erated by Chrono::Sensor and acts upon the en- One limitation of the implementation is that vironment through Chrono::Vehicle control inputs currently two agents running as two SynChrono (throttle, steering, braking). The control algo- processes cannot crash with each other or partic- rithm for each agent is also configurable and can ipate in an operation that couples their dynam-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 7 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) ics, e.g. lifting together a heavy object. Such a information is relevant when discussing real-time scenario should be run in Chrono, since the cou- performance and scalability aspects. pling is too tight to be handled by SynChrono. This will make the simulation longer to run since more agents will have to be handled within one 4.1 SynChrono scaling analysis Chrono process. However, if the agent coupling SynChrono uses N + 1 processes executing on a happens via sensing or the virtual world, e.g., one cluster/supercomputer to simulate the dynamics agent sensing another one, or one vehicle crossing of N agents handled as N independent Chrono over and being jolted by the ruts left by a differ- simulations. Each Chrono simulation is a pro- ent agent, then SynChrono can be relied on, thus cess; the extra process is involved in maintaining ensuring scalability. The goal of SynChrono is to the time and space coherence for the virtual world have virtually constant scaling up to a high thresh- shared by the N AAs. The SynChrono daemon old where the data passing overhead becomes the executes as the extra process and it coordinates simulation bottleneck. via MPI the dynamics of the N agents. The nu- merical experiments described here answer the fol- Interface to an external controller. For test- lowing questions: (i) How does the time to com- ing of control algorithms that are intended to be plete a simulation change as N increases? (ii) How easily transferred to real-life vehicles or robots, fast is SynChrono in mobility studies on rigid ter- the simulation platform provides an external con- rain? (iii) How fast is it when an SCM deformable trol interface that is exposed in SynChrono. An terrain model is considered in the simulation sce- agent in SynChrono can send messages (i.e. sensor nario? The Linux cluster used exposed 15 nodes. data packets) to the external autonomous controls Thus, one set of experiments was conducted with framework which can then send a message back up to N = 14 AAs, using rigid, SCM hard (silt- (i.e. control inputs). The control stack is indepen- like), and SCM soft (snow-like) terrain, with one dent of the SynChrono platform (e.g. a bridge has agent per compute node. However, the experiment been developed for ROS [36]), and can be tested also included scenarios with up to 56 AAs. In this with inputs replicating those from reality, such as case, four AAs ran on the same compute node. On sensor and/or V2X communication data. the upside, this allowed more AAs to participate in the benchmark; on the downside, the simula- 4 TECHNOLOGY DEMONSTRATION tion times went up since the hardware allocation per AA went down by a factor of four. All simulation scenarios considered in this sec- The handling of a virtual world that has SCM tion use a Chrono::Vehicle HMMWV model. terrain is challenging since each of the N vehicles Chrono::Vehicle was benchmarked as part of the changes the terrain at the same time and these Next Generation-NRMM (NG-NRMM) exercise changes must be space and time coherent. This is [73]; Chrono::Vehicle-specific benchmark findings facilitated and managed by the SynChrono dae- are detailed in [74, 75]. Note that: the NG-NRMM mon. The key component of the SCM terrain benchmarking effort involved Chrono::Vehicle but is the deformation of each vertex in the underly- no the particular model used in this study; and, all ing SCM mesh. All other terrain properties can results reported herein were obtained using a sim- be computed based on the height of each vertex ulation time step of ∆t = 2 × 10−3 seconds, both alone [58]. At each simulation step, a SynChrono for rigid and deformable terrain. This time step rank that is associated with an agent moving on

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 8 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

SCM terrain collects a list of the deformed ver- tices that the agent produced during the course of that time step. Mesh deformation data may not be sent at every simulation time-step, so this collec- tion of mesh changes is in general persistent across simulation time steps. Once an agent reaches a SynChrono synchronization point, the cumulative mesh deformations produced by one agent are sent via the daemon to every other agent’s world. Each agent then applies the deformations to their own copy of the SCM mesh and resets their collection Figure 3: Environment used for SynChrono scaling of new mesh deformations. This means that two analysis for agents operating on SCM terrain. Two agents should not come close enough that they de- lines of vehicles move across a rectangular patch, form the same vertices during the same synchro- crossing orthogonally and making ruts in the SCM nization period, but this is just as restrictive as soil. Simulations were run on both soft and hard SynChrono’s assumption that any two agents will SCM terrain, as well as on rigid terrain. not interact by crashing. While Chrono’s SCM implementation allows for meshes that auto-refine, this is currently not supported in SynChrono due In a first sub-experiment, we assign each Syn- to the significant increase in algorithmic complex- Chrono process (which runs one vehicle) to dis- ity for relatively little gain in performance. The tinct cluster nodes. The scaling analysis results main concern for scalability is message size, since are presented in Fig. 4. The plot shows on the for synchronizing SCM terrain there is about 15 y-axis the log10 of the RTF. Thus, any dot below times as much data to send for each agent relative the horizontal x-axis is associated with an exper- to the rigid terrain case. iment that ran faster than real time. Simulations The scenario discussed herein is that of many on the rigid terrain ran with an RTF of 0.75 and vehicles crossing perpendicularly on a rectangular simulations on either type of SCM terrain (hard or patch of SCM terrain, see Fig. 3 and online movies soft) ran with an RTF of approximately 110. The [1]. In this setup, one can easily scale up the num- RTF value of 110 is independent of the of SCM ber of vehicles and verify that the SCM terrain de- soil parameters (i.e. soft vs. hard), but is highly formation is properly synchronized across multiple dependent on the processor performance, MPI im- ranks. The scaling metric used was the Real Time plementation, compiler, and compilation optimiza- Fraction (RTF), representing the amount of wall- tion. These results confirm weak scaling behavior: time required to finish a simulation divided by the no significant performance penalty is observed for amount of time simulated. Running in real-time including additional agents in the simulation when corresponds to a factor of 1, while slower than real- each SynChrono process is assigned to a different time corresponds to factors larger than one. The compute node. tests were run on the Euler computing cluster at Using the same scenario as above, scaling anal- the University of Wisconsin-Madison. Each node yses were also conducted for a larger number of has an eight core, Broadwell-generation Intel pro- agents on rigid terrain to ensure that the same cessor; inter-node communication is facilitated via weak scaling continued to hold. To this end, two a Gigabit Ethernet interconnect. other sub-experiments were conducted: two Syn-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 9 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Figure 4: SynChrono scaling analysis for rigid, Figure 5: SynChrono scaling analysis on rigid ter- SCM hard, and SCM soft terrains. Logarithmic rain for a large number of agents. scale used.

voy vehicles use this policy while driving in a pla- Chrono processes were deployed per node; and, toon. Thus, the possible scenarios are: three lead four SynChrono processes were deployed per node. vehicles and one following vehicle (3L+1F), two Given the number of nodes made available on the lead and two followers (2L+2F), and one lead and cluster (15), for two-per-node, SynChrono handled three followers (1L+3F). The lead vehicles are pro- 28 AAs; for four-per-node, it managed a joint sim- grammed to follow a path defined by way-points; ulation of 56 AAs. Timing results are shown in for all purposes, these can be considered human Fig. 5, which shows that 56 AAs run in slightly driven. A follower vehicle is autonomous and uses more than 1.8× real time. Given a supercomputer the learned policy to follow the vehicle in front of or larger cluster, SynChrono opens the possibility it. In doing so, it should not crash in the vehi- of simulating swarms of AAs. cle ahead of it, and avoid hitting obstacles in the vicinity of the path. To this end, it relis on a cam- 4.2 Learning to drive in a convoy era sensor, location acquired through GPS sensing, and compass heading. Note that communication Chrono helps with two tasks: learning a control allows a vehicle to find out the GPS location and policy, and testing a policy that was designed in velocity of the vehicle in front. Given that four ve- Chrono or brought from outside. In this exam- hicles are involved in this platooning experiment, ple, PyChrono and GymChrono are used to design SynChrono is subsequently used to test the pol- a policy. SynChrono is subsequently used to test icy to reduce simulation times. Indeed, this vali- it. The RL-based learning is done on rigid ter- dation could be run in Chrono::Engine but would rain using a nondescript texture. The goal is to roughly require four times longer to complete it. enable a vehicle to move as part of a convoy. To The salient points of this experiment are as fol- test it, the policy is deployed on vehicles that are lows: although vehicles are run in different Syn- part of a four-vehicle convoy driving on rigid or Chrono processes, there is time and space coher- SCM deformable terrain. Up to three of the con- ence between them to the point where the vehicles

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 10 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) sense each other; the learning occurs using rigid terrain with nondescript texture but the policy is tested on deformable terrain that is white (snow- 40 like); this is an end-to-end policy that uses sensor data fusion to control both the steering and accel- 20 eration/braking of the vehicle.

0 Y [m] Designing a policy. The policy was obtained through training using a custom implementation 20 of the PPO reinforcement learning algorithm lever- aging PyTorch [65] as the Deep Learning frame- 40 work. The agent is a HMMWV vehicle modeled 75 50 25 0 25 50 75 in Chrono::Vehicle. The goal of the training pro- X [m] cess is to develop a control policy that enables an Figure 6: The double S and C paths used during agent (in this case the HMMWV) to drive in a training. Each one of these is randomly flipped and convoy. For training, to increase the randomness rotated, resulting in 16 different possible paths. of the path and thus the robustness of the con- The dashed lines represent the segment of the path trol, two different path types were used on a 90x90 in which the simulation episode can start. meters area. The first is S-shaped, starting from a corner and finishing in the opposite corner; the second is C-shaped, starting and finishing on the the NN update process. As such, large images that same side of the driving area. To further increase are contained in observations are typically avoided. the randomness, these paths can be flipped along As with any other RL environment, an observa- the y-axis (left-right) and rotated about the ver- tion and a reward is provided to the ML algorithm tical z-axis to obtain 16 different possible paths. at each timestep. Subsequently, the agent must The starting point is picked randomly within the perform an action prescribed by the ML algorithm first half of the path as shown in Fig. 6. Eight ob- in order to maximize the reward collected. The stacles placed near the path are randomly selected action is a two element array: the first is a steer- from various rock, tree, and bush assets. ing value; the second element is a combined throt- In order for the agent to accomplish its task, tling and braking value. The choice of collapsing the vehicle must be aware of its surroundings. To throttle and brake control into the same action was that end, the HMMWV used two sensors simulated taken to avoid simultaneous braking and throttling in Chrono::Sensor: a GPS sensor, and an RGB and because throttling and braking both directly camera placed on the front bumper. The camera, control the vehicle’s acceleration. which updates at 30 Hz, has a resolution of 80 × The agent is rewarded only when it successfully 45 pixels. This level of resolution suffices, since de- stays behind the leader, meaning a reward is pro- tailed features that can be extracted from higher vided when the angle between the heading of the resolution images are not needed by the control leader and the follower is in the range [−π/4, π/4]. policy. Furthermore, given that the dataset con- When this condition is met, the agent receives the tains one image per interaction, images of too high maximum reward if it keeps an optimal distance resolution can quickly deplete the available GPU (within a prescribed tolerance) from the lead vehi- memory due to the increased memory footprint of cle. In other words, the follower is rewarded when

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 11 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) it stays in a sector of an annulus centered at the soft (snow-like). This leads to a set of nine pla- leader. When the vehicle is outside the desired tooning scenarios. In the following discussion, the area, the reward decreases hyperbolically. leader and follower vehicles are numbered start- The learning draws on information from several ing from the head of the convoy; for example, the sensors, whose output is organized into two tuples. order of the vehicles in a 1L+3F configuration is: The first element is a 80 × 45 × 3 RGB image. The Leader, Follower 1, Follower 2, Follower 3. second is a vector of four values: the latitude and For each platooning scenario, data recorded longitude difference between the leader and the fol- from simulations included position, velocity, and lower; the heading according to the compass; and acceleration for each of the four vehicles. In addi- the speed at which the follower is approaching the tion, a high definition camera sensor was attached leader. This multi-sensor observation required the to the last vehicle in the convoy in order to vi- NN architecture to incorporate an input composed sualize the simulation. Still frames captured with of a 3D and a 1D tensor. The image is processed in this camera are shown in Fig. 9 and full-length a Convolutional Neural Network (CNN) as in [52]. videos of representative simulations are available Its output is then concatenated with the output of online [1]. Different ground textures and colors the one Fully Connected (FC) hidden layer Deep were used to further differentiate between the three Neural Network (DNN) which takes the 1D Tensor terrain types (rigid, SCM-Hard, and SCM-Soft). as input. Their concatenated output is then pro- Table 1 shows top-down views of the convoy tra- cessed by three FC hidden layers. The architecture jectories for each platooning scenario, with solid of the model is shown in Fig. 7. and dashed lines representing leader and follower RL-based training requires a very large number trajectories, respectively. As these images indi- of iterations. Three decisions helped speed up the cate, different simulation configurations lead to dif- learning process: (i) the lead vehicles were not sim- ferent reactions of the follower vehicles. The train- ulated, but only rendered at the correct location ing process described in §4.2 rewards a follower ve- and orientation (as this has no bearing for sens- hicle for being within a certain angle and distance ing purposes); (ii) a reduced-order model of the of the leader; as a result, this allows for deviations HMMWV vehicle was used in the first stage of between the paths of follower and leader vehicles as training, with the more computationally demand- well as deviations in the speed at which leader and ing full vehicle model substituted during the train- follower vehicles negotiate a certain path segment. ing process (see Fig. 8) to further refine the NN In an effort to quantify the performance and parameters; and (iii) the learning process was ac- robustness of the derived platooning policy, we celerated using the OpenAI [70] Baselines tool for present next results from a statistical analysis us- environment parallelization, thus allowing several ing ensemble convoy simulations. This study in- simulations running simultaneously to speed up cludes results from 128 independent simulations the collection of the dataset samples. for each one of the three terrain types mentioned above. In order to allow relative comparisons Testing of learned policy. The AA control among different terrain types as well as between policy derived in PyChrono and GymChrono was follower positions in the convoy, all simulations tested in SynChrono for various convoy setups used the 1L+3F convoy configuration and each one while operating on three terrain types. The pla- of the three sets of 128 simulation used the same toons were 3L+1F, 2L+2F, and 1L+3F. The ter- set of randomly-generated trajectories. The per- rains were rigid, SCM hard (silt-like), and SCM formance metrics used in this analysis measure the

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 12 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Figure 7: Sensor Fusion NN architecture.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 13 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Table 1: 2D Positions of each vehicle in each simulation configuration

3L+1F 2L+2F 1L+3F Rigid SCM-Hard SCM-Soft

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 14 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

constrained to produce trajectories that remained

14000 in the domain and did not intersect obstacles. Each of the 128 resulting paths shown in Fig. 11 12000 was then used as the prescribed trajectory for the 10000 path-follower PID-based lateral controller imple- mented on the leader vehicle, in conjunction with 8000 a PID-based longitudinal controller that prescribes 6000

Mean Sum of Reward a target speed linearly increasing in time (corre- 2 4000 sponding to a constant acceleration of 0.47 m/s ). To produce paths that are feasible for simu- 0 500 1000 1500 2000 lated HMMWV vehicles, the generation of the Updates corresponding environment setup in SynChrono Figure 8: Plot of the moving average of the sum involved scaling obstacle meshes (rocks, trees, of collected rewards with respect to the policy up- bushes) such that their bounding sphere allows 2 m dates (updated every 1500 interactions). The ver- of separation from the generated path centerline; tical dashed line represents the switch from the in other words, in each of the 128 environments, reduced to the full HMMWV model. the path prescribed for the leader vehicle ensures a minimum width of 4 m. An overhead view of the resulting SynChrono simulation environment path and speed deviation of a follower vehicle from is shown on the right in Figure 10. that of the convoy’s leader. In order to perform a statistical analysis of the The 128 distinct scenarios were created in a 100 performance of the platooning policy, we define a m × 100 m swath of flat terrain. A script gen- set of six performance metrics that measure the erated 30 randomly placed circular obstacles with deviations of a follower vehicle from that of the sizes uniformly distributed between 5 m and 7 m. convoy leader and encode both lateral path devia- The obstacle positions were drawn from a uniform tion and deviations in the vehicle speed at a given distribution with an additional restriction to al- location along the leader’s path. These metrics low overlap of no more than half their radius [76]. are defined in such a way as to allow comparisons Finally, to increase the complexity of the result- between the performance of followers at different ing paths and reduce the likelihood of a straight positions in the convoy, as well as across the three path, each configuration included four additional different terrain types considered here. fixed obstacles with radius of 8 m placed equidis- To eliminate differences due to the fact that ve- tant along the diagonal from start to end. hicles in a convoy are inherently staggered along A path was designed to get from start to fin- the path, the evaluation of the performance met- ish while avoiding the obstacles. The path start rics is based on a common path segment. Figure 12 and end positions were fixed to the south-west and illustrates this process for the same trajectory used north-east corners, respectively. A sample obsta- as an example before. The sample results used in cle field is shown on the left in Fig. 10. A Parti- this description are results of a simulation on rigid cle Swarm Optimization (PSO) algorithm [77] was terrain and consider the last vehicle in the convoy used to generate a shortest distance path connect- (Follower 3). The start clip point is defined as the ing the start and end locations, as shown in the point on the follower path closest to the leader’s middle image of Fig. 10. The path generation was initial location. Similarly, the end clip point is de-

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 15 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

(a) (b) (c)

Figure 9: Still frames from attached third person camera: (a) rigid terrain; (b) SCM-Hard terrain; (c) SCM-Soft terrain.

Figure 10: Sample obstacle field, PSO-based path planning, and the corresponding SynChrono environ- ment setup.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 16 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Figure 12: Path deviation metrics calculation Figure 11: Ensemble of the 128 paths used in the (rigid terrain, Follower 3). The segment of the statistical analysis. leader path used in calculations for this particular scenario (rigid terrain) was L = 158.2 m.

fined as the point on the leader’s path closest to the follower’s final location. The resulting segment on the path (and not at the same point in time); on the leader’s path is then sampled at intervals of this is also why, when calculating the subsequent equal arc-length and the closest point on the fol- speed deviation metrics, we discard the values cor- lower’s path to each such sampled point is identi- responding to the first 10 m of travel (to allow the fied. The distance between corresponding points vehicles to accelerate from rest to the desired con- on the leader and follower paths are then used to voy speed). With these we define two more pairs of avg max define a follower path deviation as function of dis- performance metrics: mas , mas , for the average tance traveled along the leader’s path (see Fig. 13). and maximum absolute speed deviation between avg max This allows us to define the first two performance leader and follower, and mrs , mrs , for the aver- avg metrics: mp , the average follower path deviation, age and maximum relative speed deviation of the max and mp , the maximum path deviation. follower. Next, we compare the vehicle speeds at corre- The six metrics defined above were evaluated sponding points on the follower and leader paths for each of the three follower vehicles in each sce- as shown in the left plot of Fig. 14 from which nario in the three sets of 128 environments on we derive the absolute and relative speed errors rigid, SCM-Hard, and SCM-Soft terrain, respec- (the latter being the speed difference scaled by tively. The resulting statistics are presented as the leader’s speed at that location). The under- box-and-whisker diagrams in Figs. 15, 16, and 17, lying assumption in these definitions is that a fol- providing measures of the variability in the three lower’s speed should match as closely as possible statistical populations without any assumption on the speed of the leader vehicle at the same location their underlying statistical distribution (which is

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 17 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

performance on SCM-Hard and SCM-Soft terrains, with the latter showing consistent lower metrics values (i.e., better performance in terms of main- taining position in the convoy). The explanation for this behavior is likely a com- bination of several factors. First, even though the target speed profile for the leader vehicle was set identically for all three terrain types, the increased terrain resistance in the SCM-Hard and SCM-Soft cases resulted in the leader vehicle being unable to continuously increase its speed to the specified value; on both deformable terrain types, the vehi- cles were unable to shift in the higher gears and Figure 13: Deviation in path between Leader and their speed limited to lower levels than on rigid Follower 3. The lateral deviation metrics for this terrain (an effect more pronounced on SCM-Soft avg particular scenario (rigid terrain) were mp = terrain than on SCM-Hard). The ensuing over- max 0.473 m, mp = 2.484 m. all lower convoy speed results in driving scenarios where the control policy can adapt better. Sec- ond, as seen in Fig. 14 and typical of all simu- unknown, due to the manner in which the sample lations conducted as part of this analysis, current trajectories were constructed). For each metric, on deficiencies in the RL-based control policy result in each terrain type and for each of the three follower relatively jerky motion of the follower vehicles and vehicles in the 1L+3F configuration, these stan- noisy speed profiles. These spurious accelerations dard box plots provide information on their mean, and decelerations are inhibited on the SCM-Soft second and third quartiles, as well as minimum and due to the increased motion resistance. Finally, the maximum values. largest deviations in a follower’s path (see Fig. 12) We assume that a perfect control policy for a always occur at tight turns where the leader vehicle follower vehicle would result in a convoy in which must go around an obstacle. In these situations, each follower vehicle runs exactly in the tracks the tendency of the control policy is to “cut cor- of the vehicle preceding it and achieves the exact ners” and thus direct the vehicle to increase steer- same speed at any given location. Given their def- ing input. However, these control steering inputs inition, this ideal case corresponds to zero values are more difficult to follow in a deformable terrain for each of the six performance metrics. soft enough to result in deep ruts, thus resulting The results of the statistical analysis presented in the follower vehicles more closely matching the herein confirm the intuition that, in a 1L+3F con- leader’s vehicle path around obstacles. figuration, a less than perfect control policy will While the path planning procedure and the lead to worse performance for the trailing follower path-following PID-based control policy imple- vehicles and better performance on harder surfaces mented for the leader vehicles ensures that a leader (especially taking into account that the training vehicle always avoids obstacles, this is not the case was performed exclusively on rigid terrain). How- for the RL-based control policy implemented for ever, as the results for both path and speed de- the follower vehicles, which occasionally are unable viation show, this is not the case when comparing to avoid an obstacle (in few situations, a follower

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 18 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Figure 14: Deviation in speed between Leader and Follower 3. The vehicle speeds are evaluated at the same location along the leader’s path. The speed deviation metrics for this particular scenario (rigid avg max avg max terrain) where mas = 0.505 m/s, mas = 2.154 m/s and mrs = 12.2%, mrs = 69.2%. vehicle, especially one in position 2 or 3, may end physics-based simulation engine; has templates for up going on the opposite side of an obstacle). This wheeled and tracked vehicles; enforces space and behavior has multiple compounding causes, includ- time coherence; allows for human-in-the-loop sce- ing the particular reward system used in the cur- narios; provides sensor simulation capabilities; has rent training as well as configurations where per- a bridge to ROS; can simulate mobility on fully re- ception of the leader vehicle is obstructed by an solved, continuum, or SCM representations of the obstacle or the leader vehicle is out of the camera terrain; is open source; and is cluster deployable sensor’s field of view while negotiating a tighter to support multi-AA mobility studies. This soft- turn. The paths of all vehicles on each terrain ware framework is used here to design an RL-based type are shown in Fig. 18. The path information control policy that allows AAs to follow in a con- therein is used to gauge the robustness of the con- voy formation. The learning took place on rigid trol policy in terms of obstacle avoidance. Fig- terrain but was demonstrated to work when de- ure 19 provides the cumulative statistics in terms ployed on AAs that operate on deformable SCM of number of obstacles hit, over all ensemble sim- soils. The virtual environments used in testing dif- ulations, for all three terrain types and for each of fered in textures and colors from the ones used in the three follower vehicles. These results show the the training, thus demonstrating robustness of the same relative performance trends observed before, inferred policy that relies on inputs from an RGB with the trailing follower on SCM-Hard terrain ex- camera sensor. Unsurprisingly, the fewer AAs in hibiting worst performance. the platoon, the tighter it managed to follow a pre- scribed path. Looking ahead, we plan to speed up the SCM implementation; augment the sensing 5 CONCLUSIONS. FUTURE WORK simulation support; improve scalability; and use this simulation infrastructure to derive new con- This contribution discussed a Chrono-centric sim- trol policies for off-road AA mobility. ulation platform designed to facilitate the design and testing of control policies for AAs operating in off-road conditions. The platform draws on a

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 19 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

(a) (b)

Figure 15: Statistics of average and maximum follower path deviations.

(a) (b)

Figure 16: Statistics of average and maximum follower absolute speed deviations.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 20 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

(a) (b)

Figure 17: Statistics of average and maximum follower relative speed deviations.

(a) (b) (c)

Figure 18: Paths of every vehicle on each terrain type: (a) Rigid; (b) SCM-Hard; (c) SCM-Soft.

(a) (b) (c)

Figure 19: Number of obstacles hit by each vehicle on the three terrain types: (a) Rigid; (b) SCM-Hard; (c) SCM-Soft.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 21 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Acknowledgments vehicle in an off-road scenario using integrated sensor, controller, and multi-body dynamics. The Euler supercomputer used in this work Technical report, Army Tank Automotive Re- contains hardware procured through the Army search Development and Engineering Ceter, Research Office DURIP instrumentation grant Warren-MI, 2011. W911NF1810476. Support for terramechanics re- search is provided through US Army Research Of- [5] Paramsothy Jayakumar, Abhinandan Jain, fice project W911NF1910431. Ongoing support James Poplawski, Marco Quadrelli, Jonathan for core Chrono development is provided by Na- Cameron, and Joseph Raymond. Ad- tional Science Foundation project CISE1835674. vanced mobility testbed for dynamic semi- Ongoing support for SynChrono development is autonomous unmanned ground vehicles. provided by National Science Foundation project Technical report, Army Tank Automotive CPS1739869. Support for the development of Research Development and Engineering Chrono::Vehicle was provided by U.S. Army GVSC Ceter, Warren-MI, 2015. grant W56HZV-08-C-0236. Support for the de- [6] David J Gorsich, Paramsothy Jayakumar, velopment of Chrono::Sensor and SynChrono has Michael P Cole, Cory M Crean, Abhinandan been provided by the SAFER-SIM program, which Jain, and Tulga Ersal. Evaluating mobility vs. is funded through a grant from the U.S. Depart- latency in unmanned ground vehicles. Journal ment of Transportation’s University Transporta- of Terramechanics, 80:11–19, 2018. tion Centers Program (69A3551747131). [7] Michael Cole, Cesar Lucas, Kumar B Kulka- rni, Daniel Carruth, Christopher Hudson, and REFERENCES Paramsothy Jayakumar. Are M&S tools ready [1] . GVSETS paper simu- for assessing off-road mobility of autonomous lations. https://uwmadison.box.com/s/ vehicles? In Ground Vehicle Systems En- glbpqxpomgyiomt2ydctpe35avrh44vd, 2020. gineering and Technology Symposium, pages 13–15, 2019. [2] Alexey Dosovitskiy, German Ros, Felipe [8] Philip Frederick, Mike Del Rose, David Codevilla, Antonio Lopez, and Vladlen Pirozzo, Robert Kania, Bernard Theisen, and Koltun. CARLA: An open urban driving sim- Stephanie Roth. Implementing robotic con- ulator. In Proceedings of the 1st Annual Con- trol algorithms in open source and govern- ference on Robot Learning, pages 1–16, 2017. ment virtual environments. In NDIA Ground [3] Nick Jakobi, Phil Husbands, and Inman Har- Vehicle Systems Engineering and Technology vey. Noise and the reality gap: The use of sim- Symposium, 2014. ulation in evolutionary robotics. In European [9] Open-Source-Robotics-Foundation. A 3D Conference on Artificial Life, pages 704–720. multi-robot Simulator with Dynamics. http: Springer, 1995. //gazebosim.org/. Accessed: 2015-03-09. [4] Paramsothy Jayakumar, William Smith, [10] Nathan P Koenig and Andrew Howard. De- Brant A Ross, Rohit Jategaonkar, and Krys- sign and use paradigms for Gazebo, an open- tian Konarzewski. Development of high fi- source multi-robot simulator. In IROS, vol- delity mobility simulation of an autonomous ume 4, pages 2149–2154. Citeseer, 2004.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 22 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

[11] Russell L. Smith. . aerospace conference proceedings (IEEE Cat. Available online at http://www.ode.org/, No. 04TH8720), volume 2, pages 861–876. 2015. IEEE, 2004.

[12] Erwin Coumans and Yunfei Bai. Pybullet, [20] Quantum Signals. Anvel:AE - ANVEL UGV a python module for physics simulation for simulator. https://quantumsignalai.com/. games, robotics and machine learning. http: Accessed: 2020-05-13. //pybullet.org, 2016–2019. [21] Abhi Jain. DARTS dynamics algorithms [13] Jeongseok Lee, Michael X Grey, Sehoon for real-time simulation. https://dartslab. Ha, Tobias Kunz, Sumit Jain, Yuting Ye, jpl.nasa.gov/DARTS/index.php. Siddhartha S Srinivasa, Mike Stilman, and C Karen Liu. DART: Dynamic animation and [22] Phillip Durst, Christopher Goodin, Chris robotics toolkit. The Journal of Open Source Cummins, Burhman Gates, Burney Mckin- Software, 3(22):500, 2018. ley, Taylor George, Mitchell Rohde, Matthew Toschlog, and Justin Crawford. A real-time, [14] Michael A Sherman, Ajay Seth, and Scott L interactive simulation environment for un- Delp. Simbody: multibody dynamics for manned ground vehicles: The autonomous biomedical research. Procedia IUTAM, 2:241– navigation virtual environment laboratory 261, 2011. (ANVEL). In 2012 Fifth International Con- [15] Eric Rohmer, Surya PN Singh, and Marc ference on Information and Computing Sci- Freese. V-REP: A versatile and scalable robot ence, pages 7–10. IEEE, 2012. simulation framework. In 2013 IEEE/RSJ In- [23] C. Goodin, T. George, C. Cummins, P. Durst, ternational Conference on Intelligent Robots B. Gates, and G. McKinley. The virtual au- and Systems, pages 1321–1326. IEEE, 2013. tonomous navigation environment: High fi- [16] Emanuel Todorov, Tom Erez, and Yuval delity simulations of sensor, environment, and Tassa. Mujoco: A physics engine for model- terramechanics for robotics. In Earth and based control. In 2012 IEEE/RSJ Interna- Space, pages 1441–1447, 2012. tional Conference on Intelligent Robots and [24] D. W. Carruth. Simulation for training and Systems, pages 5026–5033. IEEE, 2012. testing intelligent systems. In 2018 World [17] CM-Labs. Vortex Studio. https://www. Symposium on Digital Intelligence for Sys- cm-labs.com, 2020. tems and Machines (DISA), pages 101–106, 2018. [18] Newton Dynamics. Newton Dynamics: a cross-platform life-like physics simulation li- [25] Christopher Goodin, Matthew Doude, brary. http://newtondynamics.com/forum/ Christopher Hudson, and Daniel Carruth. newton.php, 2020. Enabling off-road autonomous navigation- simulation of lidar in dense vegetation. [19] A Jain, J Balaram, J Cameron, J Guineau, Electronics, 7(9):154, 2018. C Lim, M Pomerantz, and G Sohl. Re- cent developments in the ROAMS planetary [26] Christopher Goodin, Daniel Carruth, rover simulation environment. In 2004 IEEE Matthew Doude, and Christopher Hudson.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 23 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Predicting the influence of rain on lidar in – Lecture Notes in Computer Science, pages adas. Electronics, 8(1):89, 2019. 19–49. Springer, 2016.

[27] Epic Games. Unreal Engine. https://www. [35] Message Passage Interface Forum. MPI: A unrealengine.com, 2020. message-passing interface standard. Technical report, University of Tennessee, Knoxville, [28] Shital Shah, Debadeepta Dey, Chris Lovett, TN, USA, 1994. http://www.ncstrl.org: and Ashish Kapoor. Airsim: High-fidelity vi- 8900/ncstrl/servlet/search?formname= sual and physical simulation for autonomous detail&id=oai%3Ancstrlh%3Autk_cs% vehicles. In Field and service robotics, pages 3Ancstrl.utk_cs%2F%2FUT-CS-94-230. 621–635. Springer, 2018.

[29] Unity3D. Main Website. https://unity3d. [36] Morgan Quigley, Ken Conley, Brian Gerkey, com/, 2016. Accessed: 2016-06-09. Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. ROS: an open- [30] Eric Espi´e, Christophe Guionneau, Bern- source Robot . In ICRA hard Wymann, and Christos Dimitrakakis. workshop on open source software, volume 3, TORCS – The Open Racing Car Simula- page 5. Kobe, Japan, 2009. tor. https://sourceforge.net/projects/ torcs/, 2020. [37] R. Serban, M. Taylor, D. Negrut, and A. Tasora. Chrono::Vehicle Template-Based [31] Yue Kang, Hang Yin, and Christian Berger. Ground Vehicle Modeling and Simulation. Test your self-driving algorithm: An overview Intl. J. Veh. Performance, 5(1):18–39, 2019. of publicly available driving datasets and vir- tual testing environments. IEEE Transactions [38] A. Amini, I. Gilitschenski, J. Phillips, J. Mo- on Intelligent Vehicles, 4(2):171–185, 2019. seyko, R. Banerjee, S. Karaman, and D. Rus. Learning robust control policies for end-to- [32] Francisca Rosique, Pedro J Navarro, Carlos end autonomous driving from data-driven Fern´andez,and Antonio Padilla. A system- simulation. IEEE Robotics and Automation atic review of perception system and simula- Letters, 5(2):1143–1150, 2020. tors for autonomous vehicles research. Sen- sors, 19(3):648, 2019. [39] Xinlei Pan, Yurong You, Ziyan Wang, and [33] Project Chrono. Chrono: An Open Source Cewu Lu. Virtual to real reinforcement learn- Framework for the Physics-Based Simu- ing for autonomous driving. 01 2017. lation of Dynamic Systems. http:// projectchrono.org, 2020. Accessed: 2020- [40] SAE. Taxonomy and Definitions for Terms 03-03. Related to On-Road Motor Vehicle Automated Driving Systems, Jan 2014. [34] A. Tasora, R. Serban, H. Mazhar, A. Pa- zouki, D. Melanz, J. Fleischmann, M. Tay- [41] Adit Joshi. Hardware-in-the-loop (hil) imple- lor, H. Sugiyama, and D. Negrut. Chrono: mentation and validation of sae level 2 auto- An open source multi-physics dynamics en- mated vehicle with subsystem fault tolerant gine. In T. Kozubek, editor, High Perfor- fallback performance for takeover scenarios. mance Computing in Science and Engineering 1:13–32, 07 2018.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 24 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

[42] A. Folkers, M. Rick, and C. Bskens. Control- [49] W. Huang, Kunfeng Wang, Yisheng Lv, and ling an autonomous vehicle with deep rein- FengHua Zhu. Autonomous vehicles testing forcement learning. In 2019 IEEE Intelligent methods review. In 2016 IEEE 19th Interna- Vehicles Symposium (IV), pages 2025–2031, tional Conference on Intelligent Transporta- 2019. tion Systems (ITSC), pages 163–168, 2016. [43] Murat Dikmen and Catherine M. Burns. Au- [50] Sampo Kuutti, Richard Bowden, Y. Jin, Phil tonomous driving in the real world: Ex- Barber, and Saber Fallah. A survey of deep periences with tesla autopilot and summon. learning applications to autonomous vehicle In Proceedings of the 8th International Con- control. ArXiv, abs/1912.10773, 2019. ference on Automotive User Interfaces and Interactive Vehicular Applications, Automo- [51] Jens Kober, Andrew Bagnell, and Jan Pe- tiveUI 16, page 225228, New York, NY, USA, ters. Reinforcement learning in robotics: A 2016. Association for Computing Machinery. survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013. [44] X. Qian, I. Navarro, A. de La Fortelle, and F. Moutarde. Motion planning for urban [52] Volodymyr Mnih, Koray Kavukcuoglu, David autonomous driving using bzier curves and Silver, Alex Graves, Ioannis Antonoglou, mpc. In 2016 IEEE 19th International Con- Daan Wierstra, and Martin A. Riedmiller. ference on Intelligent Transportation Systems Playing atari with deep reinforcement learn- (ITSC), pages 826–833, 2016. ing. CoRR, abs/1312.5602, 2013. [45] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, [53] Sergey Levine, Peter Pastor, Alex Krizhevsky, E. Frazzoli, and J. P. How. Real-time motion and Deirdre Quillen. Learning hand-eye coor- planning with applications to autonomous ur- dination for robotic grasping with deep learn- ban driving. IEEE Transactions on Control ing and large-scale data collection. CoRR, Systems Technology, 17(5):1105–1118, 2009. abs/1603.02199, 2016. [46] X. Qian, F. Altch, P. Bender, C. Stiller, and [54] Marcin Andrychowicz, Bowen Baker, Ma- A. de La Fortelle. Optimal trajectory plan- ciek Chociej, Rafal Jozefowicz, Bob McGrew, ning for autonomous driving integrating logi- Jakub Pachocki, Arthur Petron, Matthias cal constraints: An miqp perspective. In 2016 Plappert, Glenn Powell, Alex Ray, Jonas IEEE 19th International Conference on Intel- Schneider, Szymon Sidor, Josh Tobin, Pe- ligent Transportation Systems (ITSC), pages ter Welinder, Lilian Weng, and Wojciech 205–210, 2016. Zaremba. Learning dexterous in-hand manip- ulation. The International Journal of Robotics [47] Francesco Borrelli, Paolo Falcone, Tam´asKe- Research, 39(1):3–20, 2020. viczky, Jahan Asgari, and Davor Hrovat. Mpc-based approach to active steering for au- [55] John Schulman, Filip Wolski, Prafulla Dhari- tonomous vehicle systems. 2005. wal, Alec Radford, and Oleg Klimov. Prox- imal policy optimization algorithms. CoRR, [48] Alexander Liniger, Alexander Domahidi, and abs/1707.06347, 2017. Manfred Morari. Optimization-based au- tonomous racing of 1: 43 scale rc cars. ArXiv, [56] Richard S. Sutton and Andrew G. Barto. In- abs/1711.07300, 2017. troduction to Reinforcement Learning. MIT

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 25 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Press, Cambridge, MA, USA, 1st edition, tracing engine. ACM Transactions on Graph- 1998. ics, August 2010.

[57] Sham Kakade and John Langford. Ap- [64] Martin Abadi, Paul Barham, Jianmin Chen, proximately optimal approximate reinforce- Zhifeng Chen, Andy Davis, Jeffrey Dean, ment learning. In IN PROC. 19TH INTER- Matthieu Devin, Sanjay Ghemawat, Geof- NATIONAL CONFERENCE ON MACHINE frey Irving, Michael Isard, Manjunath Kud- LEARNING, pages 267–274, 2002. lur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul [58] A. Tasora, D. Magnoni, D. Negrut, R. Ser- Tucker, Vijay Vasudevan, Pete Warden, Mar- ban, and P. Jayakumar. Deformable soil tin Wicke, Yuan Yu, and Xiaoqiang Zheng. with adaptive level of detail for tracked and Tensorflow: A system for large-scale machine wheeled vehicles. Intl. J. Veh. Performance, learning. In 12th USENIX Symposium on Op- 5(1):60–76, 2019. erating Systems Design and Implementation (OSDI 16), pages 265–283, 2016. [59] Project Chrono Development Team. Chrono: An Open Source Frame- [65] Adam Paszke, Sam Gross, Soumith Chintala, work for the Physics-Based Simula- Gregory Chanan, Edward Yang, Zachary De- tion of Dynamic Systems. https: Vito, Zeming Lin, Alban Desmaison, Luca //github.com/projectchrono/chrono. Antiga, and Adam Lerer. Automatic differ- Accessed: 2019-12-07. entiation in pytorch. 2017. [60] ECMA. The JSON data interchange format. Technical Report ECMA-404, ECMA Inter- [66] Theano Development Team. Theano: A national, 2013. Python framework for fast computation of mathematical expressions. arXiv e-prints, [61] A. M. Recuero, R. Serban, B. Peterson, abs/1605.02688, 2016. H. Sugiyama, P. Jayakumar, and D. Negrut. A high-fidelity approach for vehicle mobility [67] Yangqing Jia, Evan Shelhamer, Jeff Don- simulation: Nonlinear finite element tires op- ahue, Sergey Karayev, Jonathan Long, Ross erating on granular material. Journal of Ter- Girshick, Sergio Guadarrama, and Trevor ramechanics, 72:39 – 54, 2017. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint [62] R. Serban, D. Negrut, A. M. Recuero, and arXiv:1408.5093, 2014. P. Jayakumar. An integrated framework for high-performance, high-fidelity simulation of [68] David M Beazley. Automated scientific soft- ground vehicle-tyre-terrain interaction. Intl. ware scripting with SWIG. Future Generation J. Vehicle Performance, 5(3):233–259, 2019. Computer Systems, 19(5):599–609, 2003.

[63] Steven G. Parker, James Bigler, Andreas [69] Project Chrono Development Team. Py- Dietrich, Heiko Friedrich, Jared Hoberock, Chrono: A Python wrapper for the Chrono David Luebke, David McAllister, Morgan multi-physics library. https://anaconda. McGuire, Keith Morley, Austin Robison, and org/projectchrono/pychrono. Accessed: Martin Stich. OptiX: A general purpose ray 2020-04-29.

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 26 of 27 Proceedings of the 2020 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

[70] Greg Brockman, Vicki Cheung, Ludwig Pet- 70348-draw-randomly-centered-circles-of-various-sizes. tersson, Jonas Schneider, John Schulman, Jie Accessed: 2020-06-17. Tang, and Wojciech Zaremba. OpenAI Gym. CoRR, abs/1606.01540, 2016. [77] Yarpiz. Path planning using PSO in MATLAB. https://www.mathworks. [71] Message Passing Interface Forum. MPI: A com/matlabcentral/fileexchange/ Message-Passing Interface Standard Version 53146-path-planning-using-pso-in-. 3.0, 09 2012. Chapter author for Collective Accessed: 2020-06-17. Communication, Process Topologies, and One Sided Communications.

[72] Google. Flatbuffers white paper: https://google.github.io/flatbuffers/ flatbuffers_white_paper.html.

[73] O. Balling, M. McCullough, H. Hodges, R. Pulley, and P. Jayakumar. Tracked and wheeled vehicle benchmark – a demonstration of simulation maturity for next generation NATO Reference Mobility Model. In Ground Vehicle Systems Engineering and Technology Symposium, Novi, MI, 2018.

[74] R. Serban, R. Gerike, A. Elmquist, and D. Negrut. NG-NRMM Phase II Bench- marking: Chrono Wheeled-Vehicle Platform Simulation Results Summary,. Technical Re- port TR-2017-05: http://sbel.wisc.edu/ documents/TR-2017-05.pdf, Simulation- Based Engineering Laboratory, University of Wisconsin-Madison, 2017.

[75] R. Serban, M. Taylor, D. Melanz, and D. Negrut. NG-NRMM Phase I Bench- marking: Chrono Tracked Vehicle Simula- tion Results Summary. Technical Report TR-2016-08: http://sbel.wisc.edu/ documents/TR-2016-08.pdf, Simulation- Based Engineering Laboratory, University of Wisconsin-Madison, 2016.

[76] Adam Danz. Draw randomly centered circles of various sizes. https://www.mathworks. com/matlabcentral/fileexchange/

Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multi-Agent Scenarios, Negrut, et al. Page 27 of 27