Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor

Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor Benoît Dupont de Dinechin Kalray S.A. [email protected] Figure 1: Overview of the MPPA3 processor. ABSTRACT CCS CONCEPTS The requirement of high performance computing at low power can • Computer systems organization → Multicore architectures; be met by the parallel execution of an application on a possibly Heterogeneous (hybrid) systems; System on a chip; Real-time large number of programmable cores. However, the lack of accurate languages. timing properties may prevent parallel execution from being appli- cable to time-critical applications. This problem has been addressed KEYWORDS by suitably designing the architecture, implementation, and pro- manycore processor, cyber-physical system, dependable computing gramming models, of the Kalray MPPA (Multi-Purpose Processor ACM Reference Format: Array) family of single-chip many-core processors. We introduce Benoît Dupont de Dinechin. 2019. Consolidating High-Integrity, High- the third-generation MPPA processor, whose key features are mo- Performance, and Cyber-Security Functions on a Manycore Processor. In tivated by the high-performance and high-integrity functions of The 56th Annual Design Automation Conference 2019 (DAC ’19), June 2– automated vehicles. High-performance computing functions, rep- 6, 2019, Las Vegas, NV, USA. ACM, New York, NY, USA, 4 pages. https: resented by deep learning inference and by computer vision, need //doi.org/10.1145/3316781.3323473 to execute under soft real-time constraints. High-integrity functions are developed under model-based design, and must meet hard 1 INTRODUCTION real-time constraints. Finally, the third-generation MPPA processor Cyber-physical systems are characterized by software that interacts integrates a hardware root of trust, and its security architecture with the physical world, often with timing-sensitive safety-critical is able to support a security kernel for implementing the trusted physical sensing and actuation [10]. Emerging applications such as execution environment functions required by applications. aircraft pilot support or automated driving systems require more than what classic cyber-physical systems (CPSs) can provide. In particular, application functionality relies on pervasive application of machine learning techniques, while the cyber-security requirements have become significantly more stringent. We refer to the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed CPSs enhanced with advanced machine learning capabilities and for profit or commercial advantage and that copies bear this notice and the full citation strong cyber-security support as “intelligent systems”. on the first page. Copyrights for components of this work owned by others than ACM Given the state of CMOS computing technology [8], providing must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a the processing performances required by intelligent systems while fee. Request permissions from [email protected]. meeting the Size, Weight, and Power (SWaP) constraints of embed- DAC ’19, June 2–6, 2019, Las Vegas, NV, USA ded systems can only be achieved by parallel computing and by © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6725-7/19/06...$15.00 the specialization of processing elements. For instance, automated https://doi.org/10.1145/3316781.3323473 driving systems around year 2022 are estimated to require over 100 DAC ’19, June 2–6, 2019, Las Vegas, NV, USA B. D. de Dinechin Defense Avionics Automotive Hardware root of trust X X X Physical attack protection X X Encrypted boot firmware X X Application code decryption X X X Event data record encryption X X Table 1: Security requirements by application area. Conversely, vehicle control algorithms, as well as sensor & actua- tor management, must be delegated to micro-controller processors specifically designed to host ASIL-D functions. Figure 2: Autoware automated driving system functions. Similarly, a targeted unmanned aerial vehicle application is composed of two functional domains, one being safety-critical and TOPS of deep learning inference in the vehicle perception functions, the other non safety-critical. The two domains are segregated by while the motion planning functions would require more than 50 physical isolation mechanisms, which ensures no execution re- FP32 TFLOPS (Figure 2). sources can be shared between them. The safety-critical domain In order to address the challenges of high-performance embed- hosts the trajectory control partition (DO-178C DAL-A/B), the de- ded computing with time-predictability, Kalray has been refining tection and avoidance partition (DAL-C), and payload applications a manycore architecture called MPPA (Massively Parallel Proces- such as video streaming (DAL-C). The non-critical domain hosts a sor Array) across three generations. The first-generation MPPA data management partition (DAL-E) and a secured communication processor was primarily targeting accelerated computing [2], but partition (ED-202 SAL-3). Of interest is the fact that the secured implemented the first key architectural features for time-critical partition is located in the non safety-critical domain, as the avail- computing [4]. Kalray further improved the second-generation ability requirements of functional safety appear incompatible with MPPA processor for better time-predictability [12], providing an the integrity requirements of cyber-security. excellent target for model-based code generation [11] and enabling Finally, embedded applications in the areas of defense, avionics, accurate analysis of the network-on-chip (NoC) service guarantees and automotive have common requirements in the area of cyber- through a Deterministic Network Calculus formulation [3]. security (Table 1). The main one is the availability of a hardware In this report, we introduce the third-generation MPPA proces- root of trust (RoT), that is, a secured component that can be inher- sor manufactured in 16FFC CMOS technology, whose manycore ently trusted. Such RoT can be provided as an external hardware architecture is significantly improved over the previous ones in security module (HSM), or integrated as a central security module the areas of performance, programmability, functional safety, and (CSM). In both cases, this security module maintains the critical cyber-security. These features are motivated by application cases security parameters (CPS) such as public authentication keys, de- in defense, avionics and automotive where the high-performance, vice identity and master encryption keys in non-volatile secured high-integrity, and cyber-security functions must be consolidated area. The security module embeds a TRNG, hashing, symmetric onto a single or dual processor configuration. In Section 2, we in- and public-key cryptographic accelerators in order to extend the troduce the target applications and discuss how existing manycore chain of trust, starting with boot firmware signature verification. computing platforms appear unsatisfactory. In Section 3, we present the main features of the MPPA processor in relation with the target 2.2 Manycore Processors application requirements. A multicore processor refers to a computing device that contains 2 MOTIVATIONS multiple software programmable processing units (cores with caches). Multicore processors deployed in desktop computers or datacenters 2.1 Target Applications feature homogeneous cores, and a memory hierarchy composed The third-generation MPPA (MPPA3) processor is designed for of coherent caches. Conversely, a manycore processor can be char- building embedded intelligent systems in the areas of defense, avion- acterized by the architecturally visible grouping of cores inside ics, and automotive. Applications in these areas rely on some form compute units: cache coherence may not extend beyond the com- of dependable computing, that is, the ability to achieve prescribed pute unit, or the compute unit may provide scratch-pad memory. A levels of reliability, availability, functional safety, and cyber-security. multicore processor scales by replicating its cores, while a many- They also require that the different application components operate core processor scales by replicating its compute units. A manycore at different levels of functional safety and cyber-security. architecture may thus be scaled to 100s if not 1000s of cores. In case of automated driving applications (Figure 2), the percep- The GPGPU architecture introduced by NVidia with Fermi is a tion and the decision functions require high performances that can mainstream manycore architecture, whose compute units are called only be met with parallel computing techniques. These techniques Streaming Multiprocessors (SMs). Each SM comprises 32 Stream- entails significant execution resource sharing, which negatively ing Cores (SCs) that share a local memory, caches and a global impacts time predictability [13]. As a result, the functional safety of memory system. Threads are scheduled and executed atomically by these perception and decision functions targets ISO 26262 ASIL-B. ’warps’, where SCs execute the same instruction or are inactive at Consolidating Functions on a Manycore Processor DAC ’19, June 2–6, 2019, Las Vegas, NV, USA any

Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor

Increasing Memory Miss Tolerance for SIMD Cores

Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis

Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase

Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context

Parallel Processing with the MPPA Manycore Processor

A Scalable Manycore Simulator for the Epiphany Architecture

Mapping of Large Task Network on Manycore Architecture Karl-Eduard Berger

Power and Energy Characterization of an Open Source 25-Core Manycore Processor

MALT Is a Truly Manycore Processor  [email protected] ✆ +7 (495) 133 6248

Enhancing Programmability in Noc-Based Lightweight Manycore Processors with a Portable MPI Library

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI James A

Manycore Platforms