Cell Broadband Engine Processor TM
Total Page:16
File Type:pdf, Size:1020Kb
IBM STG Cell Broadband Engine Processor TM (A multi core design based on Power Architecture™ technology) Michael Paolini, [email protected] Solutions Architect (SWG Master Inventor) IBM Systems & Technology Group Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Performance Limiters and Challenges in Conventional Microprocessors (Single Thread Throughput) . Memory Wall – Latency induced bandwidth limitations . Power Wall – Must improve efficiency and performance equally . Frequency Wall – Diminishing returns from deeper pipelines (can be negative if power is taken into account) 2 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor What’s Causing The Problem? P=(1/2)CV2f Gate dielectric approaching a fundamental limit: Atomic defects matter ! G10aSte To Sxt=a1c1Ak 65 nm 1000 Air Cooling limit 100 ) Active 2 Power m c 10 / Frequency Increase vs Power Consumption W 3.5 ( Passive Power y 1 t 3 i s 2.5 n e 0.1 D e 2 v i t r a l a e e R 1.5 0.w 01 1 o P 1994 2004 0.5 0.001 0 1 0.1 0.01 0.9 1 1.1 1.2 1.3 Gate Length (microns) Pow er Gate Length (microns) Voltage Frequency 3 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor TThhee DDiissccoonnttiinnuuiittyy Then (~2003) Now . Scaling drove . Innovation drives performance performance . Scaling drove down cost . Scaling drives down cost . Performance constrained . Power constrained . Active power . Standby power dominates dominates . Focus on processor . Focus on system performance performance 4 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Collaborative Innovation Drivers . Economics - Investment unaffordable even for large entities . Technology - Extent of invention and innovation required . Perspective – Breadth of expertise and knowledge required . Creativity – Collaboration truly brings great ideas forward . Focus – Allows companies to best leverage core skills . Time – Rapid team assembly and execution 5 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Cell Broadband Engine History . IBM, SCEI/Sony, Toshiba Alliance formed in 2000 . Design Center opens March 2001 . ~$400M Investment, 5 years, 600 people . February 7, 2005: First technical disclosures . January 12, 20006: Alliance extended 5 more years YKT, EFK, BURLINGTON, ENDICOTT ROCHESTER BOEBLINGEN TOKYO SAN JOSE RTP AUSTIN INDIA ISRAEL 6 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Introducing Cell BE . Cell BE is an accelerator extension to Power – Built on a Power ecosystem – Used best know system practices for processor design . Sets a new performance standard – Exploits parallelism while achieving high frequency – Supercomputer attributes with extreme floating point capabilities – Sustains high memory bandwidth with smart DMA First Generation Cell BE controllers . 90 nm . Designed for natural human interaction . 241M transistors – Photo-realistic effects – Predictable real-time response . 235mm2 – Virtualized resources for concurrent activities . 9 cores, 10 threads . Designed for flexibility . >200 GFlops (SP) – Wide variety of application domains . >20 GFlops (DP) – Highly abstracted to highly exploitable programming models . Up to 25 GB/s memory B/W – Reconfigurable I/O interfaces . Up to 75 GB/s I/O B/W – Virtual trusted computing environment for security . >300 GB/s EIB . Cell BE is the chip powering the Sony PlayStation 3 . Top frequency >4GHz – Ships in volume the US in Nov ‘06 (observed in lab) 7 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor The Cell BE Concept . Compatibility with 64b Power Architecture™ – Builds on and leverages IBM investment and community . Increased efficiency and performance – Attacks on the “Power Wall” • Non Homogenous Coherent Multiprocessor • High design frequency @ a low operating voltage with advanced power management – Attacks on the “Memory Wall” • Streaming DMA architecture • 3-level Memory Model: Main Storage, Local Storage, Register Files – Attacks on the “Frequency Wall” • Highly optimized implementation • Large shared register files and software controlled branching to allow deeper pipelines . Interface between user and networked world – Image rich information, virtual reality, shared reality – Flexibility and security . Multi-OS support, including RTOS / non-RTOS – Combine real-time and non-real time worlds Add words around ECC etc. 8 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Heterogeneous Multi-core Architecture 9 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor 1 PPE core: - VMX unit - L1, L2 cache - 2 way SMT 10 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor 8 SPEs -128-bit SIMD instruction set - Register file – 128x128-bit - Local store – 256KB - Dedicated Asynchronous DMA engine - Isolation mode Add words around ECC etc. 11 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Element Interconnect Bus (EIB) - 96B / cycle bandwidth 12 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Debug Bus 13 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor SIMD Architecture . SIMD = “single-instruction multiple-data” . SIMD exploits data-level parallelism – a single instruction can apply the same operation to multiple data elements in parallel . SIMD units employ “vector registers” – each register holds multiple data elements . SIMD is pervasive in the BE – PPE includes VMX (SIMD extensions to PPC architecture) – SPE is a native SIMD architecture (VMX-like) . SIMD in VMX and SPE – 128bit-wide datapath – 128bit-wide registers – 4-wide fullwords, 8-wide halfwords, 16-wide bytes – SPE includes support for 2-wide doublewords 14 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Specialized Purpose Processor vs. Traditional General Purpose Processor (Roughly to scale 65nm) Roughly Half the size & Power @ the frequency, 9 Cores, ~230 SP GFlops 349mm2, 2 Cores, 3.4 GHz @ 150W, ~54.4 SP GFlops 15 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Ideal Cell BE Software Target Areas . Data Manipulation . Structured –Digital Media –Image processing – Easier for memory fetch & SIMD operations –Video processing – Data prefetch possible –Visualization of output – Non branchy instruction pipeline; –Compression/decompression –Encryption /decryption – Data more tolerant, but has the same caution –DSP –Audio processing, language translation? . Multiple Operations on Data . Graphics – Many operations on same data before –Transformation between domains (viewpoint reloading transformation; time vs space; 2D vs 3D) –Lighting . Easy Parallelize and SIMD –Ray Tracing / Ray casting – Little or nor collective communication . Floating Point Intensive Applications (SP) required –Single precision Physics – No Global or Shared memory or nested –Single precision HPC loops –Sonar . Pattern Matching . Compute Intense –Bioinformatics – Determined by ops per byte –String manipulation (search engine) –Parsing, transformation,translation (XSLT) . Fits Streaming Model –Audio processing, language translation? – Small computation kernel through which you –Filtering & Pruning stream a large body of data . Offload Engines – – Algorithms that fit Graphics Processing Units TCP/IP –Compiler for gaming applications – GPU’s are being used for more than just –XML graphics today thanks to PCI Express –Network Security, Virus Scan and Intrusion 16 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Cell BE Processor Isn't Just for Games. Innovative Chip is best high-performance embedded processor of 2005 We chose the Cell BE as the best high-performance embedded processor of 2005 because of its innovative design and future potential....Even if the Cell BE accumulates no more design wins, the PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of the console. That would make the Cell BE one of the most successful microprocessors in history. “…Cell BE could power “It was originally conceived hundreds of new apps, as the microprocessor to create a new video- power Sony's [PS3], but it is processing industry and expected to find a home in fuel a multibillion-dollar lots of other broadband- build out of tech hardware connected consumer items over ten years.” and in servers too.” -- Forbes -- IEEE Spectrum 17 IBM | [email protected] Nov 13, 2006 © 2006 IBM Corporation Cell Broadband Engine Processor Cell Broadband Engine Architecture™ (CBEA) Technology Competitive Roadmap Next Gen (2PPE’+32SPE’) 45nm SOI ~1 TFlop (est.) Performance Enhancements/ Enhanced Scaling Cell BE (1+8eDP SPE) 65nm SOI Cost Cell BE Reduction (1+8) 90nm SOI 2006 2007 2008 2009 2010 All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. 18 Cell B IEB MRo |a pdamoalipn [email protected] 5..1c o7m-Aug-2006 Nov 13, 2006 © 2006 IBM Corporation IBM Confidential Cell Broadband Engine Processor Cell Broadband Engine™ Blade – The first in a line of planned offerings using Cell Broadband Engine technology Performance Target Availability: 1H08 Enhanced Cell BE-based Blade Target Availability: