AI Accelerator Latencies in Hybrid Vehicular Simulation

AI Accelerator Latencies in Hybrid Vehicular Simulation Jussi Hanhirova Matias Hyyppä Abstract Aalto University Aalto University We study the use of accelerators for vehicular AI (Artifi- Espoo, Finland Espoo, Finland cial Intelligence) applications. Managing the computation jussi.hanhirova@aalto.fi juho.hyyppa@aalto.fi is complex as vehicular AI applications call for high performance computations in a real-time distributed environment, in which low and predictable latencies are essential. We have used the CARLA simulator together with machine learning based on CNNs (Convolutional Neural Networks) in our research. In this paper, we present the latency be- Anton Debner Vesa Hirvisalo havior with GPU acceleration for CNN processing. Our ex- Aalto University Aalto University perimentation is motivated by using the simulator to find the Espoo, Finland Espoo, Finland corner cases that are demanding for the accelerated CNN anton.debner@aalto.fi vesa.hirvisalo@aalto.fi processing. Author Keywords Computation acceleration; GPU; deep learning ACM Classification Keywords D.4.8 [Performance]: Simulation; I.2.9 [Robotics]: Autonomous vehicles; I.3.7 [Three-Dimensional Graphics and Realism]: Virtual reality Introduction In this paper, we address the usage of accelerators in vehicular AI (Artificial Intelligence) systems and in the simulators that are needed in the development of such systems. The recent development of AI system is enabling many new Convolutional Neural Net- applications including autonomous driving of motor vehi- software [5] together with deep learning based inference on works (CNN) are a specific cles on public roads. Many of such systems process sen- TensorFlow [6]. Our measurements show the basic viability class of neural networks that sor data related to environment perception in real-time, be- of the hybrid simulation approach, but they also underline are often used in deep form cause they trigger actions which have latency requirements. the jitter problem that makes the design of vehicle AI sys- containing several “hidden” The typical sensor data for autonomous driving consists of tems (and also realistic hybrid simulation) challenging. layers in addition to the input videos and LiDAR streams, which is a form of continuous The structure of this paper is the following. We begin by and output ones. 3D imaging data. explaining our approach in making hybrid vehicular simula- Object recognition out- Our research problem is to understand the latency prop- tors. Central in our solution is embedding the TensorFlow puts labels corresponding to erties of deep learning AI accelerators and their usage platform into the simulator to simultaneously run both the classes of recognized objects in vehicular real-time systems. Autonomous and semi- traffic simulation and real vehicle AI algorithms. We con- along with some confidence autonomous vehicles typically need several AI systems to tinue our presentation by describing a concrete set-up and levels when given an input enable their operation. The computational requirements of a set of latency measurements with acceleration for both image. the AI systems can differ significantly but usually they have the simulation and the vehicular AI. We end the paper with to co-operate and share the same computational platforms our conclusions. Object detectors are able together with many other (non-AI) systems. to tell also where in the pic- System description ture the objects reside by In many cases, fully orchestrated synchronous operation is Instead of using real vehicles, we use hybrid simulation to outputting bounding box co- not practical with AI systems and asynchronous operation understand the computation needed in vehicular systems. ordinates in addition to class is preferred. Asynchronous operation is also an option for Managing the computations is complex as vehicular AI ap- labels. hybrid vehicular simulators. In our research on latency be- plications call for high performance computations in a real- haviour, we have been focused on how the asynchronous time distributed environment, in which low and predictable operations affect the real-time properties of the systems. latencies are essential. The basis for computer vision is object recognition and de- Our system is illustrated in Figure1 (left). The system has tection from images and videos. In the recent years, deep three main parts: corner case search, traffic simulation and learning, particularly in the form of deep Convolutional Neu- vehicle control. We use the CARLA software [5] as the ba- ral Networks (CNN), has become the dominant approach sis of our simulator. The simulator runs the simulation loop for implementing computer vision algorithms [12]. Novel and advances the state of the world model. Example of hardware architectures and the emergence of fog/edge CARLA simulation model view is presented in Figure4. computing offer multiple options for further improvements of machine learning inference systems.[3, 13, 14] The vehicle control is built on top of a subsystem that can connect to multiple different systems, such as inference Our contributions are two-fold. First, we present the accel- services. erator usage in a hybrid vehicular simulator. Second, we present our measurements on the latency behaviour of the Our implementation of the vehicular part of the system is accelerators. Our hybrid simulator [8] is based on CARLA hybrid. The vehicles, their physics, and their overall set-up Corner case search tries to find abnormal operating situations of a system. Usually, this means situations outside the typical combination of operating parameters. In Figure 1: The overall processing consisting of three major parts (corner case search, traffic simulation and vehicle control) is illustrated on the our system, the corner case left. The computational load is projected onto the hardware that be composed of several nodes with several accelerators, which is illustrated on search subsystem receives the right. state information from the simulator and the connected AI clients. is simulated. However, the AI subsystem of the vehicles is Modern AI systems often apply the machine learning meth- By comparing the ground real: it is implemented by using real software running on ods that are based on having extensive computational re- truth state to observed real hardware. The AI subsystem uses TensorFlow [6] as its sources [12]. This poses a challenge for embedded sys- state mismatches can be software basis. tems. Offloading parts of the AI computation into the fog found. or the edge is one option in getting the extensive computa- Traditionally, vehicle software has been embedded into the tional resources organized. Vehicle AI systems also need Criteria and tolerance for on-board software. The usual way to configure a vehicle services that are typically organized on the cloud-side. mismatches is simulation system is to have a middleware (e.g., AUTOSAR-based scenario specific. [2]) that utilizes and integrates the vehicle subsystems that With our hybrid simulation system, we study how the com- appear as ECUs (Electronic Control Unit) in the system putation is mapped onto a platform. This is illustrated in If an AI system estimate architecture. Modern cars have plenty of software on-board Figure1 (right). A typical computation platform consists of differs from the simulator (see Figure2 for typical challenges for modern software). nodes connected by some interconnect (which can be a ground truth then a mis- bus, a cluster network, or some wider network – including match is found. In our sys- Modern vehicular software extends beyond the embedded the Internet). The computing power for AI applications are tem, simulation model states on-board software. Not only there are open application plat- often implemented by using accelerators, which often are and vehicle assumed states forms in IVI systems (In-Vehicle Infotainment), but more computational subsystems coordinated by the node CPUs. leading to the mismatch are importantly for our purpose, there are cloud-based sys- saved to a database for later tems for understanding the environment of the vehicles. Our system uses TensorFlow to map the computational analysis. The emergence of fog/edge computing [1] makes cloud-like load onto the platform. TensorFlow is an open-source li- resources available for vehicular AI systems. brary for dataflow programming and its especially suited for machine learning applications that use neural networks. TensorFlow has tools for projecting the load on multiple nodes and accelerators and it includes protocols for inter- node communication. CARLA is an open-source simulator for autonomous driving research [5]. CARLA simulation is based on having a server keeping on the world state and a number of clients that cor- respond to the actors in the world. Simulation physics ad- Figure 2: In this image, a traffic vance independently without waiting for client input. situation is observed from a vehicle in motion. Some of the objects As we use hybrid simulation, we must understand the tim- detected by the AI system are ing of the simulation in addition to the timing behaviour of marked with bounding boxes in the vehicular software. Further, a typical vehicle AI system con- image. Only one of the other sist of several parts and their timing requirements may differ vehicles is actually moving and (e.g., collision avoidance versus road weather conditions). actively participating in the traffic As illustrated in Figure1 (right) these real-time computation situation. Considering the

AI Accelerator Latencies in Hybrid Vehicular Simulation

Accenture AI Inferencing in Action

GPU Developments 2018

Persistent Memory for Artificial Intelligence

Unified Inference and Training at the Edge

Low-Power Ultra-Small Edge AI Accelerators for Image Recog- Nition with Convolution Neural Networks: Analysis and Future Directions

Survey and Benchmarking of Machine Learning Accelerators

Efficient Management of Scratch-Pad Memories in Deep Learning

Infiltration of AI/ML in Particle Physics Aspects of Data Handling

Ai Accelerator Ecosystem: an Overview

MYTHIC MULTIPLIES in a FLASH Analog In-Memory Computing Eliminates DRAM Read/Write Cycles

Driving Intelligence at the Edge: Axiomtek's Edge AI Solutions

Low-Power Ultra-Small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions