Software Development Kit for Multicore Acceleration Ver sion 3.0 Programming Tutorial SC33-8410-00 Software Development Kit for Multicore Acceleration Ver sion 3.0 Programming Tutorial SC33-8410-00 Note Before using this information and the product it supports, read the information in “Notices” on page 153. Edition notice This edition applies to the version 3, release 0 of the IBM Software Development Kit for Multicore Acceleration (Product number 5724-S84) and to all subsequent releases and modifications until otherwise indicated in new editions. 2005, 2007 © Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation Preface About this book This tutorial is written for programmers who are interested in developing ™ applications or libraries for the Cell Broadband Engine (Cell BE). It is not intended for programmers who want to develop device drivers, compilers, or operating systems for the Cell Broadband Engine. The descriptions and examples in this tutorial are from the Software Development Kit for Multicore Acceleration, Version 3.0. The examples are chosen to highlight the general principals required for Cell Broadband Engine programming, so that an experienced programmer can apply this knowledge to other environments. Who should read this book The document is intended for system and application programmers who wish to develop Cell Broadband Engine applications. Prerequisites It is assumed that you are an experienced C/C++ programmer and are familiar with the basic concepts of single-instruction, multiple-data (SIMD) vector ® ™ instruction sets, such as the PowerPC Architecture Vector/SIMD Multimedia ® ™ Extensions, Intel MMX , SSE, 3DNOW!, or x86-64 instruction sets. It is also assumed that you have the Software Development Kit (SDK) for Multicore Acceleration, which includes a Cell BE specific, 64-bit PowerPC Linux operating system, SDK code examples, and the IBM Full System Simulator for Cell BE. Related documentation The following is a list of reference and supporting materials for the Cell Broadband Engine. Additional documentation for specific SDK components is generally provided with that component. v C/C++ Language Extensions for Cell Broadband Engine Architecture v Cell Broadband Engine, Architecture v Cell Broadband Engine Linux Reference Implementation, Application Binary Interface Specification v Cell Broadband Engine, Programming Handbook v Cell Broadband Engine, Registers v Accelerated Library Framework, Programmer’s Guide and API Reference v Data Communication and Synchronization, Programmer’s Guide and API Reference v PowerPC Microprocessor Family: The Programming Environments Manual for 64-bit Microprocessors v PowerPC Microprocessor Family: Vector/SIMD Multimedia Extension Technology Programming Environments Manual, Version 2.06c v PowerPC Operating Environment Architecture, Book III, Version 2.02 v PowerPC User Instruction Set Architecture, Book I, Version 2.02 iii v PowerPC Virtual Environment Architecture, Book II, Version 2.02 v SIMD Math Library Specification for Cell Broadband Engine v Software Development Kit, Programmer’s Guide v SPE Runtime Management Library (Version 2) v SPU Application Binary Interface Specification v SPU Assembly Language Specification v Synergistic Processor Unit, Instruction Set Architecture iv Programming Tutorial Contents Preface . iii Promoting scalar data types to vector data types 71 Differences between PPE and SPE SIMD support 72 Compiler directives . .75 Figures . vii MFC commands . .76 DMA-command tag groups . .79 Tables . .ix Synchronizing DMA transfers . .80 MFC input and output macros . .80 Chapter 1. Overview of the Cell Coding methods and examples . .83 Broadband Engine . .1 DMA transfers . .83 Introduction . .1 DMA-list transfers . .84 Background and motivations . .1 Moving double-buffered data . .86 Scaling the three performance-limiting walls . .3 Vectorizing a loop . .88 Architecture overview . .4 Reducing the impact of branches . .89 The PowerPC Processor Element . .6 Porting SIMD code from the PPE to the SPEs . .92 Synergistic Processor Elements . .7 Code-mapping considerations . .93 Programming Overview . .9 Simple macro translation . .94 Byte ordering and bit numbering . .9 Example 1: Euler particle-system simulation . .96 SIMD vectorization . .10 Performance analysis . 106 SIMD C-language intrinsics . .11 Performance issues . 106 Threads and tasks . .12 Example 1: Tuning SPE performance with static The runtime environment . .13 and dynamic timing analysis . 106 Application partitioning . .13 General SPE programming tips . .115 The software development kit . .16 Chapter 4. Programming models . .117 Chapter 2. The PPE and the Function-Offload Model . .117 Remote procedure call . .118 programming process . .19 Device-Extension Model . .118 PPE registers . .19 Computation-Acceleration Model . .119 PPE instruction sets . .21 Streaming model . .119 PowerPC instructions . .22 Shared-Memory Multiprocessor Model . .119 Vector/SIMD Multimedia Extension instructions 24 Asymmetric-Thread Runtime Model . 120 C/C++ language extensions (intrinsics) . .25 User-mode thread model . 120 Programming with Vector/SIMD Multimedia Cell application frameworks . 120 Extension intrinsics . .33 SPE overlays . 121 The PPE and the SPEs . .35 Storage Domains . .35 Issuing DMA commands from the PPE . .37 Chapter 5. The simulator . 123 Creating threads for the SPEs . .38 Simulator basics . 124 Communication between the PPE and SPEs . .40 Operating-system modes . 124 Developing code for the Cell Broadband Engine . .41 Interacting with the simulator . 124 Producing a simple multi-threaded CBE program 42 Command-line interface . 125 Running the program in the simulator . .44 Graphical User Interface. 126 Debugging programs . .48 The simulation panel . 127 GUI buttons . 135 Performance monitoring . 140 Chapter 3. Programming the SPEs . .49 Displaying performance statistics . 141 SPE configuration . .49 SPE performance profile checkpoints . 144 Synergistic Processor Unit . .50 Example program: tpa1 . 146 Memory flow controller . .54 Emitters . 147 Channels . .55 SPU performance and semantics . 149 SPU instruction set . .60 Data layout in registers . .60 Instruction types . .62 Notices . 153 SPU C/C++ language extensions (intrinsics) . .64 Edition notices . 155 Assembly language versus intrinsics comparison: an example . .65 Trademarks . 157 Intrinsic classes . .66 v Index . 171 vi Programming Tutorial Figures 1. Overview of Cell Broadband Engine 23. SIMD floating-point Add instruction function 63 architecture . .5 24. Array-of-structures data organization for one 2. PowerPC Processor Element (PPE) block triangle . .63 diagram . .6 25. Structure-of-arrays data organization for four 3. Synergistic Processor Element (SPE) block triangles . .64 diagram . .8 26. DMA transfers using a double-buffering 4. Big-endian byte and bit ordering . .10 method . .87 5. Four concurrent Add operations . .11 27. Example of the Function-Offload (or RPC) 6. Byte-shuffle operation . .11 Model . .118 7. Application partitioning model . .14 28. Simulation stack . 123 8. PPE-centric multistage pipeline model and 29. Simulator structures and screens . 125 parallel stages model . .15 30. Main Graphical User Interface for the 9. PPE-centric services model . .15 simulator . 127 10. PPE user-register set . .20 31. Project and processor folders . 128 11. Concurrent execution of integer, floating-point, 32. PPE General-Purpose Registers window 129 and vector units . .24 33. PPE Floating-Point Registers window 129 12. Running the Vector/SIMD Multimedia 34. PPE Core window . 130 Extension sample program . .34 35. SPE MFC window . 131 13. Storage domains defined in the Cell 36. SPE MFC Address Translation window 132 Broadband Engine . .36 37. SPE Channels window . 133 14. Sample project directory structure and 38. SPE statistics . 134 makefiles . .42 39. Debug Controls window . 137 15. Windows visible after starting the simulator 40. SPE Visualization window . 138 GUI . .45 41. Track All PCs window . 139 16. Console window on completion of Linux boot 46 42. SPU Modes window . 140 17. Loading the program into the simulation 43. tpa1 statistics for SPE 0 . 143 environment . .47 44. tpa1 statistics for SPE 2 . 144 18. Running the sample program . .48 45. Profile checkpoint output for SPE 2 . 146 19. SPE architectural block diagram . .50 46. Emitters . 148 20. SPE user-register set . .51 47. Emitter architecture . 148 21. Big-endian ordering supported by the SPE 61 22. Register layout of data types and preferred (scalar) slot . .61 vii viii Programming Tutorial Tables 1. PPE and SPE intrinsic classes . .12 18. Generic SPU Intrinsics . .69 2. Definition of threads and tasks . .12 19. Composite SPU intrinsics . .71 3. PPE-specific scalar intrinsics . .26 20. Intrinsics for Changing Scalar and Vector Data 4. Vector/SIMD Multimedia Extension data types 29 Types . .72 5. Vector/SIMD Multimedia Extension specific 21. PPE and SPE Architectural Comparison 72 and generic intrinsics . .29 22. PPE versus SPU Vector Data Types . .73 6. Vector/SIMD Multimedia Extension predicate 23. Single-Token Vector Keyword Data Types 74 intrinsics . .32 24. MFC DMA Command . .76 7. MFC command-parameter registers for 25. MFC Command Suffixes . .78 PPE-initiated DMA transfers . .37 26. MFC Synchronization Commands . .79 8. Mailbox channels and MMIO registers . .40 27. MFC Atomic Commands . .79 9. Signal notification channels and MMIO 28. MFC Input and Output Macros . .80 registers . .41
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages190 Page
-
File Size-