Cell Broadband Engine Spencer Dennis Nicholas Barlow the Cell Processor
Total Page:16
File Type:pdf, Size:1020Kb
Cell Broadband Engine Spencer Dennis Nicholas Barlow The Cell Processor ◦ Objective: “[to bring] supercomputer power to everyday life” ◦ Bridge the gap between conventional CPU’s and high performance GPU’s History Original patent application in 2002 Generations ◦ 90 nm - 2005 ◦ 65 nm - 2007 (PowerXCell 8i) ◦ 45 nm - 2009 Cost $400 Million to develop Team of 400 engineers STI Design Center ◦ Sony ◦ Toshiba ◦ IBM Design PS3 Employed as CPU ◦ Clocked at 3.2 GHz ◦ theoretical maximum performance of 23.04 GFLOPS Utilized alongside NVIDIA RSX 'Reality Synthesizer' GPU ◦ Complimented graphical performance ◦ 8 Synergistic Processing Elements (SPE) ◦ Single Dual Issue Power Processing Element (PPE) ◦ Memory IO Controller (MIC) ◦ Element Interconnect Bus (EIB) ◦ Memory IO Controller (MIC) ◦ Bus Interface Controller (BIC) Architecture Overview SPU/SPE Synergistic Processing Unit/Element SXU - Synergistic Execution Unit LS - Local Store SMF - Synergistic Memory Frontend EIB - Element Interconnect Bus PPE - Power Processing Element MIC - Memory IO Controller BIC - Bus Interface Controller Synergistic Processing Element (SPE) 128-bit dual-issue SIMD dataflow ○ “Single Instruction Multiple Data” ○ Optimized for data-level parallelism ○ Designed for vectorized floating point calculations. ◦ Workhorses of the Processor ◦ Handle most of the computational workload ◦ Each contains its own Instruction + Data Memory ◦ “Local Store” ▫ Embedded SRAM SPE Continued Responsible for governing SPEs ◦ “Extensions” of the PPE Shares main memory with SPE ◦ can initiate accesses for SPE cores Power Architecture ◦ Implements Power Architecture Hypervisor ▫ can run multiple operating systems concurrently Memory (1st generation) ◦ 32KB split L1 instruction & Data cache ▫ unified 512KB L2 Cache Power Processor Element (PPE) Element Interconnect Bus High bandwidth internal bus 1st generation: 96 Bytes/cycle 4 16B rings ◦ can handle up to 3 simultaneous data transfers 12 on and off ramps ◦ Each SPE + PPE ◦ memory controller ◦ 2 Off-chip I/O interfaces Memory Flow Controller Asynchronous Memory Controller Retrieves data from main memory to SPE’s local storage & PPE’s Cache. Supports two Rambus XDR memory banks Bus Interface Controller Provides asynchronous interface between EIB and IO interfaces Two flexible IO interfaces to rest of system ◦ One Interface can be reconfigured to provide Symmetric Multiprocessing (SMP) interface Contains pervasive unit ◦ provides test, debug and monitoring functionality ▫ Chip level error checking ◦ provides clock generation & distribution control ◦ Power on Reset Unit (POR) ▫ Responsible for unit initialization ◦ Performance monitoring Power Management Unit (PMU) ◦ Allows software controlled power reduction Thermal Management Unit (TMU) Developing for Cell Octopiler ◦ Takes high level sequential code and parallelizes it to optimize it for a multiprocessor system ▫ High level languages ◦ Divides code nine ways ▫ 8 sets of instructions are written for the SPE’s ▫ The final set is written for the Power PC PPE GCC ◦ IBM sourced plugins for cell PPU/SPU development SPU ISA SPU ISA (cont’d) Applications (In Depth) Console Gaming ◦ PS3 ▫ PPE controls 6 SPE’s delegating tasks ▫ 1 SPE is OS reserved, 1SPE is redundant Supercomputing ◦ IBM BladeCenter QS Series ▫ Easy Scalability Password cracking ◦ High parallelism allows for high floating point brute force performance Conclusion Discontinued in 2009 ◦ Difficult development environment ▫ Programmer managed SPE memory ▫ Explicit parallelism ▫ Two separate ISAs Idea still lives on… ◦ General Purpose GPU ▫ Intel Larabee Architecture . Intel Many Integrated Core Architecture ▫ AMD FireStream ▫ Nvidia Tesla ◦ https://www- 01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F39872570600 06F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf ◦ http://en.wikipedia.org/wiki/SIMD ◦ http://en.wikipedia.org/wiki/Cell_(microprocessor) ◦ ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1564359 ◦ http://arstechnica.com/uncategorized/2006/02/6265-2/ ◦ http://www2.lbl.gov/Science- Articles/Archive/sabl/2006/Jul/CellProcessorPotential.pdf ◦ http://en.wikipedia.org/wiki/Symmetric_multiprocessing ◦ http://researcher.watson.ibm.com/researcher/view.php?person=us- mkg/papers/2006_ieeemicro.pdf References.