Introduction to IBM Cell/B.E. SDK V3.1 Programming IBM Powerxcell 8I / QS22
Total Page:16
File Type:pdf, Size:1020Kb
IBM Systems & Technology Group Introduction to IBM Cell/B.E. SDK v3.1 Programming IBM PowerXcell 8i / QS22 PRACE Winter School 10-13 February 2009, Athens, Greece 1 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Objectives Introduce you to … – Cell Software Development Kit (SDK) for Multicore Acceleration Version 3.1 – Programming the Cell/B.E (libSPE2, MFC, SIMD, … ) – Programming Models: DaCS, ALF, OpenMP – Programming Tips & Tricks – Performance tools Trademarks – Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 2 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell/B.E. Programming Approaches are Fully Customizable! Increasing Programmer Control over Cell/B.E. resources Decreasing programmer attention to architectural details 1. “Native” Programming Æ HW 2. Assisted 3. Development Resources Programming Tools Intrinsics, Æ Libraries, ÆUser tool-driven DMA, etc. Frameworks ÆHardware Programming Effort Programming abstraction 1. “Native” Programming 2. Assisted Programming 3. Development Tools Advantages Best performance possible Greatly reduced development time over Minimum development time required “Native” Programming Best use of “Native” resources Some degree of platform Still allows some custom use of “Native” independence resources Limitations Requires the most coding work of Performance gains may not be as great as Performance gains may not be as the three options with “Native” Programming great as with “Native” Programming Requires highest level of “Native” Confined to limitations of frameworks and CASE tool determines debugging expertise libraries chosen capabilities and platform support choices Where it is most Embedded Hardware / Real-time Vast Majority of all applications Simultaneous Deployments across useful Applications multiple Hardware architectures Hardware resources / power / space Programmers pool / skill base is / cost are at a premium restricted to high level skills 3 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell Multi-core Programming Effort Roadmap Requires mostly the same effort to port to any multi-core architecture. Port app Begin Optimizing Port app to Power, Cell BE moving function Porting & to Linux, run on Tune SIMD function on SPE’s Optimizing if needed PPE to SPE’s Optimizing function •Exploit Parallelism at Task - Local Store Level Management •Exploit Parallelism at instruction / data level •Data and Instruction Locality Tuning WritingWriting forfor CellCell BEBE speedsspeeds upup codecode onon allall multi-coremulti-core architecturesarchitectures becausebecause itit usesuses thethe samesame parallelparallel bestbest prpracticesactices – – Cell Cell architecturearchitecture justjust gainsgains moremore fromfrom themthem becausebecause ofof itsits design.design. 4 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Objectives Introduce you to … – Cell Software Development Kit (SDK) for Multicore Acceleration Version 3.1 – Programming the Cell/B.E (libSPE2, MFC, SIMD, … ) – Programming Models: DaCS, ALF, OpenMP – Programming Tips & Tricks – Performance tools Trademarks – Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 5 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM SDK for Multicore Acceleration and related tools The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture Eclipse-based IDE Simulator IBM XL C/C++ compiler* Optimized compiler for use in creating Cell/B.E. optimized applications. Offers: * improved performance * automatic overlay support * SPE code generation XLC compiler is Performance a Tools complementary GNU tool chain product to SDK Libraries and frameworks Data Accelerated Communication Basic Linear Standardized Library and Algebra SIMD math Framework (ALF) Synchronization Subroutines (BLAS) libraries (DaCS) Denotes software components included in the SDK for Multicore Acceleration 6 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell BE SDK for Multicore Acceleration v3.1 Overview Runtime Environment Program Development Tools Programming Models Development Libraries Performance Tools Trademarks - Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 7 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell BE SDK for Multicore Acceleration v3.1 Overview Runtime Environment Program Development Tools Programming Models Development Libraries Performance Tools Trademarks - Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 8 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM Cell SW Environment IDE – Integrated Dev Env Development VPA – Visual Perf Analyzer Runtime Environment PTP – Parallel Tools Platform Prog Env Environment End-User ALF, DaCS Experience Performance Tools Security SDK spu_timing, asmVis, PDT/PDTR, FDPR-Pro, Examples, Demos, Cell Perf Counter, Benchmarks Oprofile, Code Analyzer Application Libraries SIMD math, MASS/MASSV, crypto, gdb – combined debugger Monte Carlo RNG, FFT, BLAS, LAPACK Compilers SPE Runtime Management Library (libspe2) gnu C/C++, Fortran, Ada SPU system library (C99/posix, __ea cache, spu_timers) XL – C/C++, Fortran, single source compiler Enhanced Linux – RHEL 5.2/5.3 Fedora 9 GNU binutils Hardware – QS21, QS22, Soma CAB IBM Full System Simulator STANDARDS – HW (CBEA) SW (ABI, Language, Assembly, SIMD math, libspe2) 9 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell BE SDK for Multicore Acceleration v3.1 Overview Runtime Environment – Linux Kernel – SPE Runtime Management Library – System Simulator Program Development Tools Programming Models Development Libraries Performance Tools Trademarks - Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 10 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Linux Kernel Fedora 9 – Patches made to Linux 2.6.25 kernel to provide services required to support the Cell BE hardware facilities – Patches and pre-built kernel binaries are distributed by the Barcelona Supercomputing Center (BSC- CNS) http://www.bsc.es/projects/deepcomputing/linuxoncell RHEL 5.2/5.3 – Patches included in the kernel distribution. For the QS21/QS22, – the kernel is installed into the /boot directory – yaboot.conf is modified – needs reboot to activate this kernel 11 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group SPE Runtime Management Library The SPE runtime management library (libspe2) contains an SPE thread programming model for Cell BE applications is used to control SPE program execution from the PPE program Handles SPEs as virtual objects called SPE contexts – SPE programs can be loaded and executed by operating SPE contexts Licensed under the GNU LPGL Fedora 9 – Packages available at the Barcelona Supercomputing Center (BSC-CNS) http://www.bsc.es/plantillaH.php?cat_id=581 RHEL 5.2/5.3 – Packages available at the RHEL extras iso image. 12 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM Full-System Simulator Emulates the behavior of a full system that contains a Cell BE processor. Can start Linux on the simulator and run applications on the simulated operating system. Supports the loading and running of statically-linked executable programs and standalone tests without an underlying operating system. Simulation models – Functional-only simulation: Models the program-visible effects of instructions without modeling the time it takes to run these instructions. • ÎFor code development and debugging. – Performance simulation: Models internal policies and mechanisms for system components, such as arbiters, queues, and pipelines. Operation latencies are modeled dynamically to account for both processing time and resource constraints. • ÎFor system and application performance analysis. 13 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Simulator Structure and Windows Command Window GUI Window Console Window systemsim% [user@bringup /]# Linux on Simulated Machine Simulated System Cell Simulated Machine IBM Full System Simulator Simulator Linux Operating System Base Simulator Base processor Hosting Environment 14 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell BE SDK for Multicore Acceleration v3.1 Overview Runtime Environment Program Development Tools – gcc and GNU Toolchain – XL C/C++ Compilers – Eclipse IDE Programming Models Development Libraries Performance Tools Trademarks - Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 15 PRACE Winter School 2/16/2009 © 2009 IBM Corporation IBM Systems & Technology Group GNU Toolchain Contains the GCC compiler for the PPU and the SPU. – ppu-gcc, ppu-g++, ppu32-gcc, ppu32-g++, spu-gcc, spu-g++ – For the PPU, GCC replaces the native GCC on PPC platforms and it is a cross-compiler on x86. The GCC for the PPU is preferred and the makefiles are configured to use it when building the libraries and samples. – For the