Front cover Draft Document for Review February 15, 2008 4:59 pm SG24-7575-00 Programming the Cell Broadband Engine Examples and Best Practices Practical code development and porting examples included Make the most of SDK 3.0 debug and performance tools Understand and apply different programming models and strategies Abraham Arevalo Ricardo M. Matinata Maharaja Pandian Eitan Peri Kurtis Ruby Francois Thomas Chris Almond ibm.com/redbooks Draft Document for Review February 15, 2008 4:59 pm 7575edno.fm International Technical Support Organization Programming the Cell Broadband Engine: Examples and Best Practices December 2007 SG24-7575-00 7575edno.fm Draft Document for Review February 15, 2008 4:59 pm Note: Before using this information and the product it supports, read the information in “Notices” on page xvii. First Edition (December 2007) This edition applies to Version 3.0 of the IBM Cell Broadband Engine SDK, and the IBM BladeCenter QS-21 platform. © Copyright International Business Machines Corporation 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Draft Document for Review February 15, 2008 4:59 pm 7575TOC.fm Contents Preface . xi The team that wrote this book . xi Acknowledgements . xiii Become a published author . xiv Comments welcome. xv Notices . xvii Trademarks . xviii Part 1. Introduction to the Cell Broadband Engine . 1 Chapter 1. Cell Broadband Engine Overview . 3 1.1 Motivation . 4 1.2 Scaling the three performance-limiting walls. 6 1.2.1 Scaling the power-limitation wall . 6 1.2.2 Scaling the memory-limitation wall . 7 1.2.3 Scaling the frequency-limitation wall . 7 1.2.4 How the Cell Broadband Engine overcomes performance limitations 8 1.3 Hardware Environment . 8 1.3.1 The Processor Elements. 8 1.3.2 The Element Interconnect Bus . 9 1.3.3 Memory Interface Controller . 10 1.3.4 Cell Broadband Engine Interface Unit. 10 1.4 Programming Environment . 12 1.4.1 Instruction Sets . 12 1.4.2 Storage Domains and Interfaces. 12 1.4.3 Bit Ordering and Numbering . 15 1.4.4 Runtime Environment . 15 Chapter 2. IBM SDK for Multicore Acceleration . 17 2.1 Compilers . 17 2.1.1 GNU Toolchain . 18 2.1.2 IBM XLC/C++ Compiler. 18 2.1.3 GNU ADA Compiler . 18 2.1.4 IBM XL Fortran for Multicore Acceleration for Linux . 18 2.2 IBM Full System Simulator . 19 2.2.1 System root image for Simulator. 20 © Copyright IBM Corp. 2007. All rights reserved. iii 7575TOC.fm Draft Document for Review February 15, 2008 4:59 pm 2.3 Linux Kernel . 20 2.4 Cell BE Libraries . 20 2.4.1 SPE Runtime Management Library. 20 2.4.2 SIMD Math Library . 20 2.4.3 Mathematical Acceleration Subsystem (MASS) libraries . 21 2.4.4 Basic Linear Algebra Subprograms (BLAS) . 21 2.4.5 ALF Library . 22 2.4.6 Data Communication and Synchronization library (DaCS) . 22 2.5 Code examples and example libraries . 23 2.6 Performance Tools . 23 2.6.1 SPU Timing Tool . 23 2.6.2 OProfile . 24 2.6.3 Cell-perf-counter tool. 24 2.6.4 Performance Debug Tool (PDT) . 24 2.6.5 Feedback Directed Program Restructuring (FDPR-Pro). 24 2.6.6 Visual Performance Analyzer (VPA) . 25 2.7 IBM Eclipse IDE for the SDK. 25 2.8 Hybrid-x86 programming model . 26 Part 2. Programming Environment . 27 Chapter 3. Enabling applications on the Cell BE . 29 3.1 Concepts and terminology. 31 3.1.1 The computation kernels. 32 3.1.2 Important Cell BE features . 35 3.1.3 The parallel programming models. 36 3.1.4 The Cell BE programming frameworks . 39 3.2 Does the Cell BE fit the application requirements? . 46 3.2.1 Higher performance/watt. 47 3.2.2 Opportunities for parallelism . 47 3.2.3 Algorithm match . 47 3.2.4 Ready to make the effort?. 49 3.3 Which parallel programming model ? . 51 3.3.1 Parallel programming models basics . 52 3.3.2 Chip or board level parallelism . 54 3.3.3 More on the host-accelerator model . 57 3.3.4 Summary. 58 3.4 Which Cell BE programming framework to use ? . 60 3.5 The application enablement process. 61 3.5.1 Performance tuning for Cell BE programs . 64 3.6 A few scenarios . 65 3.7 Design patterns for Cell BE programming. 69 3.7.1 Shared queue . 69 iv Programming the Cell Broadband Engine: Examples and Best Practices Draft Document for Review February 15, 2008 4:59 pm 7575TOC.fm 3.7.2 Indirect addressing . 70 3.7.3 Pipeline . 71 3.7.4 Multi-SPE software cache . 72 3.7.5 Plugin . 72 Chapter 4. Cell BE programming . 75 4.1 Task parallelism and PPE programming . 78 4.1.1 PPE architecture and PPU programming . 79 4.1.2 Task parallelism and managing SPE threads . 83 4.1.3 Creating SPEs affinity using gang. 93 4.2 Storage domains, channels and MMIO interfaces . 95 4.2.1 Storage domains . 96 4.2.2 MFC channels and MMIO interfaces and queues. 98 4.2.3 SPU programming methods to access MFC’s channel interface . 100 4.2.4 PPU programming methods to access MFC’s MMIO interface. 104 4.3 Data transfer . 109 4.3.1 DMA commands . 111 4.3.2 SPE initiated DMA transfer between LS and main storage. 119 4.3.3 PPU initiated DMA transfer between LS and main storage . 137.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages662 Page
-
File Size-