Heterogeneous Multicore Openamp

Heterogeneous Multicore OpenAMP Felix Baum Mentor Embedded 1 Trends Create Opportunities Competitive Consolidation SoC Technology Market (Reuse, Space, Advancement Weight, Power) Multiple OSes Mixed Core Types Global Competition Complex Increasing Pace of Innovation Configurations Complexity 2 Heterogeneous Computing for Power/Performance Systems that use more than one kind of processor Allows improved efficiency for various workloads Increases the programming challenge 3 Heterogeneous Computing for Power/Performance 4 SoC FPGA: Heterogeneous Processing Choices Hard processor system: ARM* Cortex*-A9, A53 • Dual- and quad-core variants • System management and control Soft CPU(s): Nios II • Configurable (performance, number of cores) • From C-programmable state machine to Linux* Hardware accelerators • Application specific hardware in FPGA • OpenCL*, MATLAB*/Simulink*, custom designed RTL • Managed by CPU or autonomous 5 Techniques to Parallelize the Processing: OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. • It consists of a set of compiler directives, library routines, and environment variables that influence runtime behavior • User identifies supported constructs, and manually inserts directives to assist compiler for a reconstruction of supported constructs into parallel • User does not need to create threads, and does not need to consider the work assigned to threads • OpenMP allows users who do not have a sufficient knowledge of parallel computing to explore parallel computing 6 Techniques to Parallelize the Processing: OpenCL* OpenCL* (Open Computing Language) is the industry’s open standard for writing data-parallel code in heterogeneous computers. • OpenCL is promoted as a primary approach towards programming Intel parallel computing hardware offerings. OpenCL requires a level of low-level understanding and competence to write efficient parallel software. • OpenCL is primarily targeted at leveraging data-parallelism of devices, and additional considerations must be made to use the multiple cores available on the CPUs in the system, for example offloading CPU intensive algorithms to GPUs. The multicore framework listed in this paper differs from OpenCL as it does not try to parallelize processing for the sake of performance but tries to isolate and separate execution blocks based on performance or power consumption requirements. 7 Techniques to Parallelize the Processing: OpenAMP The multicore framework listed in this paper differs from OpenCL* and OpenMP as it does not try to parallelize processing for the sake of performance, but tries to isolate and separate execution blocks based on performance or power consumption requirements. OpenAMP defines how operating systems interact: • Focused on IPC and booting frameworks • In particular between Linux* and RTOS/bare-metal • In particular in heterogeneous systems 8 Multicore Configurations SMP Linux* (SMP) Cortex* A Cortex A 9 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations SMP Linux* (SMP) Cortex* A Cortex A RTOS (SMP) Cortex A Cortex A 10 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP sAMP Linux* Linux Linux (SMP) (master) Cortex* A Cortex A Cortex A Cortex A RTOS RTOS RTOS (SMP) (master) Cortex A Cortex A Cortex A Cortex A 11 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP sAMP Linux* Linux Linux (SMP) (master) Cortex* A Cortex A Cortex A Cortex A RTOS RTOS RTOS (SMP) (master) Cortex A Cortex A Cortex A Cortex A 12 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP Heterogeneous sAMP sAMP Bare Linux* Linux RTOS RTOS Linux Linux Metal (SMP) (master) (master) (master) Env. Cortex* A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Bare RTOS RTOS RTOS Linux RTOS Linux Metal (SMP) (master) (master) (master) Env. Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A 13 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP Heterogeneous sAMP sAMP Bare Linux* Linux RTOS RTOS Linux Linux Metal (SMP) (master) (master) (master) Env. Cortex* A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Bare RTOS RTOS RTOS Linux RTOS Linux Metal (SMP) (master) (master) (master) Env. Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A 14 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP Heterogeneous sAMP sAMP Bare Linux* Linux RTOS RTOS Linux Linux Metal (SMP) (master) (master) (master) Env. Hypervisor Hypervisor Hypervisor Cortex* A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Bare RTOS RTOS RTOS Linux RTOS Linux Metal (SMP) (master) (master) (master) Env. Hypervisor Hypervisor Hypervisor Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A 15 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Multicore Configurations Homogeneous SMP Heterogeneous sAMP sAMP Bare Linux* Linux RTOS RTOS Linux Linux Metal (SMP) (master) (master) (master) Env. Hypervisor Hypervisor Hypervisor Cortex* A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Bare RTOS RTOS RTOS Linux RTOS Linux Metal (SMP) (master) (master) (master) Env. Hypervisor Hypervisor Hypervisor Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A Cortex A 16 SMP – Symmetric MultiProcessing; uAMP – unsupervised Asymmetric MultiProcessing; sAMP – supervised AMP aka virtualized Complexity Skyrockets Extreme complexity is introduced with general purpose development • System architecture • Configuration • Booting • Debugging • Separation • Device sharing • Inter-processor communication Development Complexity • Security Single Dual Multiple Heterogeneous Core Cores Cores Multiple Cores 17 The OpenAMP Standard • OpenAMP is an open source project managed by the Multicore Association • OpenAMP standardizes how operating systems interact in heterogeneous systems - Focused on inter-process communications (IPC) and booting frameworks - In particular between Linux* and RTOS/bare metal • Guiding principles: - Open source implementations for Linux and RTOSes - Prototype and prove in open source before standardizing - Business friendly APIs and implementations to allow proprietary solutions * 18 The OpenAMP Collaboration Mentor Embedded • Participates on MCA Board of Directors • Serves in a Technical Advisor role on the MCA OpenAMP Committee • Contributed most of the initial code to the OpenAMP git tree • Mentor Embedded Multicore Framework is a first commercial implementation of the standard 19 OpenAMP: Core Management • Used by master OS to boot remote OSes on remote CPUs • remoteproc user API for processor lifecycle management • Conformance to upstream Linux* remoteproc implementation • Stand alone OS agnostic clean-room implementation of remoteproc API Where A could be Cortex* A7/A9/A15/A53… 20 OpenAMP: Inter Process Communication • For inter-processor communications between OS/software contexts • rpmsg user API for inter-processor communication • Conformance to upstream Linux* rpmsg implementation • Stand alone OS agnostic clean-room implementation of virtio and rpmsg for bare-metal or other non-Linux OS 21 OpenAMP: Use Cases • Separation of applications. One example, user interface tasks from critical processing tasks. • Offload work for computationally intensive operations such as processing, encryption of secure, sensitive data * 22 OpenAMP: Medical Use Case Real-Time Display Vitals Data Acquisition Web View 23 Summary • Heterogeneous multicore is here to stay and has great potential to help with both consolidation and isolation • Making multiple and varied operating environments run side-by-side is hard work, Open AMP makes it possible while minimizing schedule risk • Open AMP unleashes the ‘silicony goodness’ by providing developers with the framework and tools to optimize the processing to extract most out of the underlying hardware 24 Next Steps • SoC FPGA Overview - portfolio, tools, ecosystem, road map https://www.altera.com/socfpga • Low-cost Development Platform - evaluation kit https://www.altera.com/atlassoc • Mentor Embedded Multicore Framework www.mentor.com/embedded-software/multicore • White Paper: Architecting Success with Heterogeneous Systems • Webinar: Challenges of Debugging Heterogeneous Multicore SoCs 25 Completing an Online Session Evaluation by 10am Tomorrow Automatically Enters You in a Drawing to Win! You will receive an email with a link to the online evaluation prior to the end of this session. ISDF Technical Session Prize Intel® Compute Stick (2) Winners will be notified by email Copies of the complete sweepstakes rules are available at the Info Desk 26 Q&A 27 Legal Notices and Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system

Heterogeneous Multicore Openamp

Microprocessor

Programmers' Tool Chain

Multiprocessing Contents

Multiprocessing and Scalability

Design of a Message Passing Interface for Multiprocessing with Atmel Microcontrollers

Piotr Warczak Quarter: Fall 2011 Student ID: 99XXXXX Credit: 2 Grading: Decimal

Message Passing Fundamentals

Composable Multi-Threading and Multi-Processing for Numeric Libraries

14. Parallel Computing 14.1 Introduction 14.2 Independent

Educational Goals for Embedded Systems in the Multicore Era

Efficient Parallel Approaches to Financial Derivatives and Rapid Stochastic Convergence

Performance Analysis and Tuning in Multicore Environments