CellCell ProcessorProcessor andand CellCell ProcessorProcessor BasedBased DevicesDevices

May 31, 2005 SPXXL Barry Bolding, WW Deep Computing

Thanks to Ashwini Nanda IBM TJ Research Center New York Pathway to the Digital Media Revolution

Incremental Technology Innovations Provide Stepping Stones of Progress to the Future

Virtual Communities Immersive WEB Portals Virtual Travel & eCommerce 20xx "Matrix" HD Content Virtual Tokyo Creation & Delivery Real-time Creation and Modification of Content Immersive Environment Needs enormous

Real-Time Engineering computing Design Collaboration power 2004 Incremental Technology Innovations on the Horizon: On-Demand Computing & Communication Infrastructures Online Games Application Optimized Processor and System Architectures Leading Edge Communication Bandwidth/Storage Capacities Immersion HW and SW Technologies Rich Media Applications, Middleware and OS CellCell ProcessorProcessor OverviewOverview CellCell HistoryHistory

• IBM, SCEI/Sony, Toshiba Alliance formed in 2000 • Design Center opened in March 2001 • Based in Austin, Texas • February 7, 2005: First technical disclosures CellCell HighlightsHighlights

• Multi-core microprocessor (9 cores) • The possibilities… – Supercomputer on a chip – Digital home to distributed computing and supercomputing – >10x performance potential for many kernels/algorithms • Current prototypes running <3 GHz clock frequency • Demonstrated Beehive at Electronic Entertainment Expo Meeting – TRE (Terrain Rendering Engine application) IntroducingIntroducing CellCell

• Sets a new performance standard – Exploits parallelism while achieving high frequency – Supercomputer attributes with extreme floating point capabilities – Sustains high memory bandwidth • Designed for natural human interaction – Photo-realistic affects – Predictable real-time response – Virtualized resources for concurrent activities • Designed for flexibility – Wide variety of application domains – Highly abstracted to highly exploitable programming models – Reconfigurable I/O interfaces – Autonomic power management MajorMajor ChallengesChallenges inin MicroprocessorMicroprocessor ArchitectureArchitecture •Power – Power efficiency limits performance •Memory wall – Even with on-chip memory controllers main memory is 100s-1000 of cycles away • New streaming applications – Current cache designs non-optimal for some applications. Power Crisis forces Rethinking the Fundamentals Power Crisis forces Rethinking1000 the Fundamentals

• Power components: 100 )

2 Active – Active power Power 10 – Passive power Passive Power • Gate leakage 1 • Sub-threshold leakage (source- 0.1 drain leakage) 0.01 Power Density (W/cm 1994 2004 Gate10S Tox=11A Stack 0.001 1 0.1 0.01 NET: INCREASINGGate Leng thPERFORMANCE (microns) REQUIRES INCREASING EFFICIENCY Gate dielectric approaching a fundamental limit (a few atomic layers) SystemSystem TrendsTrends towardtoward IntegrationIntegration

Memory Northbridge Memory Accel Processor Processor IO Southbridge IO

• Increased integration is driving processors to take on many functions typically associated with systems – Integration forces processor developers to address off-load and acceleration in the design of the processor – Integration of bridge chip functionality • Virtualization technology is used to support non-homogeneous environments CellCell ProcessorProcessor OverviewOverview “Supercomputer-on-a-Chip”“

Processing Unit (PU): Synergetic Processing Units (SPU): •General Purpose, 64-bit RISC •8 per chip Processor ( Power PC 2.0) •128 bit wide SIMD Units •2-Way Hardware Multithreaded •Integer and Floating Point capable •L1 : 32kB I ; 32kB D SPU SPU •256KB Local Store •L2 : 512kB PU 1 N •Up to 32 GF/s per SPU --- •Coherent load/store 256GF/s total * •VMX L1 LS …. LS • TBD GHz MFC MFC L2 Internal Interconnect: External Interconnects: Interconnect •256 GB/s total internal •25.6 GB/sec BW memory interface interconnect bandwidth •2 Configurable I/O Interfaces •DMA control to/from SPUs •Coherent interface (graphics) supports >100 outstanding •Normal I/O interface memory requests •Total BW configurable between interfaces Memory •Up to 35 GB/s out I/O •Up to 25GB/s in Memory Management & Mapping •SPU Local Store aliased into PU system memory •MFC/MMU controls SPU DMA accesses •Compatible with PowerPC Virtual Memory architecture •S/W controllable from PU MMIO •QoS memory is pinned system memory with BW and latency guarantees •Access to I/O devices protected by PU MMIO •SPU DMA access protected by MFC/MMU * At targeted clock speed of 4GHz ““OutwardOutward Facing”Facing” AspectsAspects ofof CellCell • Cell is designed to be responsive • .. to human user –Real-time response –Supports rich visual interfaces • .. to network –Flexible, can support new standards –High-bandwidth –Content protection, privacy & security KeyKey AttributesAttributes

• Cell is Multi-Core – Contains 64-bit Power Architecture TM – Contains 8 Synergistic Processor Elements (SPE) • Cell is a Flexible Architecture – Multi-OS support (including Linux) with Virtualization technology – Path for OS, legacy apps, and software development • Cell is a Broadband Architecture – SPE is RISC architecture with SIMD organization and Local Store – 128+ concurrent transactions to memory per processor • Cell is a Real-Time Architecture – Resource allocation (for Bandwidth Measurement) – Locking Caches (via Replacement Management Tables) • Cell is a Security Enabled Architecture – Isolatable SPE for flexible security programming KeyKey FeaturesFeatures

Synergistic Processor Elements for High (Fl)ops / Watt • The first generation CELL SPE

processor consists of: SPU SPU SPU SPU SPU SPU SPU SPU – A Power Processor Element 16B/cycle

(PPE) LS LS LS LS LS LS LS LS

– 8 Synergistic Processor 16B/cycl Elements (SPE) e – A high bandwidth Element EIB (up to 96B/cycle) Interconnect Bus (EIB) 16B/cycle 16B/cycle PPE 16B/cycle (2x) – Two configurable non-

coherent IO interfaces (BIC) L2 MIC BIC – A Memory Interface Controller (MIC) 32B/cycle PPU – A Pervasive unit that L1 16B/cycle Dual RRAC I/O supports extensive test, XDRTM monitoring, and debug 64-bit Power Architecture w/VMX for functions Traditional Computation

ApplicationApplication andand ProgrammingProgramming ofof CellCell ProcessorProcessor BasedBased BladeBlade DevicesDevices Potential Programming Models: Support for Data partitioning, Synchronization and Communication

• Native Cell execution Models •SIMD • SPMD •MIMD • Pipelined • Long Vector • Multi-Cell Execution models • Message Passing • Shared Memory • Software Shared Memory Tools and Software Environments:

• Still a primitive environment • CDE work ongoing with SONY • Native Cell execution Libraries • gcc compiler prototype • xlc compiler prototype • MPI prototype • & C with OpenMP prototype CPBS Target Applications • Digital Media • Consumer: on-line games, rich media content creation, content distribution (video, audio, image), interactive broadcasting, on-line shopping, video chat, digital animation and special effects (used in both interactive products and films • Enterprise: video conferencing, eSeminar/eLearning, secure content distribution, surveillance • Scientific and Technical Computing • Life Sciences: medical imaging and analysis, secure digital medical data distribution, collaborative surgery, distance learning, scientific visualization for data analysis and drug simulation • Government and Defense: war simulation and training, weapons simulation, secure communication, seismic data processing, video surveillance • Industrial Engineering: collaborative engineering design, virtual reality, distributed virtual environments, aerospace design and simulation, oil exploration, sensor networks • Communications • Network processing • XML and SSL acceleration, DSP, voice and pattern recognition CPBSCPBS OnlineOnline GameGame ArchitectureArchitecture

Blade 0

Scene Manager Cell Phone

Zone 0 Simulation Engine

Game Database PDA Blade 1 (IBM DB2,etc.) Scene Manager

Game Framework Client Renderer Zone 1 Simulation Engine . Simulation Network Manager . Local Database Blade n Scene Manager Clients IA-32/Power Blades •IA-32

Zone n Simulation Engine •PPC •Cell Compute •ARM Intensive Cell Blades CPBSCPBS VideoVideo SurveillanceSurveillance ArchitectureArchitecture Surveillance Engine

Capture Module Video Video Objects Live Video Decoder Encoder /Events Database Server 1 … n Plug-in (H264) (IBM DB2) Application Server (Websphere) Surveillance Clients Video Manager MILS Engine Capture (IBM DB2/CM) A A Module P P Video P P Live Video Encoder videos 1 … n 1 2 (H264) (Smart clips) IA-32 /Power Compute Blades Intensive Cell Blades CommunicationCommunication AcceleratorAccelerator ArchitectureArchitecture

IA-32 /Power OpenSSL Acceleration and Apache Server Prototype Blades Development for Cell Processor System

Server Blade

Server Blade

Server Blade

Communication Blade

Communication (edge) Cell Blades CPBSCPBS RenderingRendering ArchitectureArchitecture

Blade 0 Rendering Engine

Shader Database Application Review Blade 1 Server Rendering Frame Manager Engine . File Network Database Modeling . Rendering Mgr. Blad. e n Rendering IA-32 Engine /Power Blades Video Editing

Compute Intensive Clients Cell Blades RigidRigid BodyBody PhysicsPhysics Demo • Rigid body dynamics based on mathematical semi-implicit integration calculations accelerated by the Cells SPU vector units • Next-gen video games will require real-time physically based simulation to provide players with a level of behavioral realism necessary to support new levels of game play. ClothCloth SimulationSimulation

• Physically based simulation of soft body dynamics requires: – Complex implicit integration and collision detection – Well suited to take advantage of the vector units provided by the Cell SPUs 3D3D RenderingRendering Demo • Real-time visualization of digital elevation data (e.g., satellite, mobile GPS, military) • Showing how enormous computational power of Cell could change content creation or nature of applications.

D A T A

Wireless VR Glove

Transcoding PSP? SummarySummary • Cell ushers in a new era of leading edge processors optimized for digital media, entertainment and supercomputing • Desire for realism is driving a convergence between supercomputing and entertainment • New levels of performance and power efficiency beyond what is achieved by PC processors, or traditional supercomputing vector processors • Responsiveness to the human user and the network are key drivers for Cell • Can Cell change the way we do ultra high-end supercomputing?