IBM Systems and Technology Group

Cell/B.E. processor-based systems and software offerings IBM BladeCenter® QS22 and SDK 3.0

IBM CONFIDENTIAL © 2008 IBM Corporation IBM Systems and Technology Group

The challenge today

For many years, organizations have relied on performance gains from increasing clock speeds of “traditional” architectures

This approach has been challenged by the physical limitations of semiconductors and by traditional processor architecture implementations

High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance.

2 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Broadband Engine™ (Cell/B.E.) Technology For a higher of absolute performance and efficiency

 IBM, Sony, Alliance formed in 2000

 March, 2001 – STI Design Center opened in Austin, TX

 April, 2004 - Single Cell BE operational

 July, 2004 - 2-way SMP operational

 February, 2005 - first technical disclosures at ISSCC

 May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3

 August, 2005 - published technical details of Cell/B.E. architecture

 November, 2005 - published open source SDK & Cell/B.E. simulator

 August, 2006 - introduced the very first Cell/B.E. processor-based server to the market

3 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

IBM commitment to innovation

IBM BladeCenter QS22 Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applications PowerXCell™ 8i processor

•BladeCenter QS21

•IBM SDK for Multicore Acceleration 3.0 BladeCenter QS20 Produce systems Produce robust Create initial for early adoption production ready platforms for and solution systems for targeted experimentation enablement industry applications 2007 2008 2006

4 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Cell Broadband Engine Architecture™ (CBEA) Technology Roadmap

CompatibleCompatible codecode andand securitysecurity basebase acrossacross entireentire linlinee

IBM PowerXCell 32ii 45nm SOI Performance

Enhancements/ IBM Scaling PowerXCell™ 8i (1+8eDP SPE) 65nm SOI

Cost Cell/B.E. Cell/B.E. Cell/B.E. (1+8) (1+8) Reduction (1+8) 90nm SOI 65nm SOI 45nm SOI

Committed Concept 2006 2007 2008 2009 2010

All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs.

5 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

IBM PowerXCell™ 8i processor benefits The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture™ core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance  Sets a new performance standard – Accelerates computationally intense workloads such as analytics, multimedia and vector processing. – Efficient computation per watt  Designed for flexibility – Wide variety of application domains – Cell can cover a wide range of application space with its capabilities in

– floating point operations, integer operations PowerXCellPowerXCell 8i 8i processorprocessor – data streaming / throughput support  6565 nmnm  99 cores,cores, 1010 threadsthreads – real-time support  230.4230.4 GFlopsGFlops peak peak (SP)(SP) atat 3.2GHz3.2GHz – Exploits C/C++, programming models  108.8108.8 GFlopsGFlops peak peak (DP)(DP) atat 3.2GHz3.2GHz  UpUp toto 2525 GB/sGB/s memorymemory bandwidthbandwidth  Enhanced security capability  UpUp toto 7575 GB/sGB/s I/OI/O bandwidthbandwidth – Virtual trusted computing environment for security  9292 WattsWatts @@ 3.2GHz3.2GHz  TopTop frequencyfrequency >4GHz>4GHz (observed(observed inin lab)lab)

6 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

PowerXCell 8i uses ½ the space & power and delivers more than 2.3x the GFlops of traditional architecture Example Server Example Desktop PowerXCell 8i Dual Core Quad Core 2 Nine Core 349mm , 3.4 GHz @ 150W 214 mm², 3 GHz @ 130W 109 mm 2 3.2 GHz@ 75W 2 Cores, ~27.2 SP GFlops 4 Cores, ~96 SP GFlops 1.3b @ 65nm 820m Transistors @ 45nm 9 cores, ~ 230 SP GFlops, 250m Transistors @ 65nm

On any traditional processor, shown ratio of Intel’s x86 Quad Core processors are Dual cores to cache, prediction, & related items illustrated here remains at ~50% of area the Chip Modules (DCMs), 2 of these processor chip area stacked vertically & packaged together

7 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group BladeCenter® QS22 – PowerXCell 8i

 Core Electronics D D D D D D D D – Two 3.2GHz PowerXCell 8i Processors D D D D D D D D R R R R R R R R – SP: 460 GFlops peak per blade 2 2 2 2 2 2 2 2 – DP: 217 GFlops peak per blade DDR2 – Up to 32GB DDR2 800MHz – Standard blade form factor PowerXCell 8i PowerXCell 8i – Support BladeCenter H chassis Rambus® FlexIO ™ Flash, RTC  Integrated features D D & NVRAM – Dual 1Gb Ethernet (BCM5704) D IBM D IBM R South R South 2 UART, SPI – Serial/Console port, 4x USB on PCI 2 Bridge 2 Bridge LegacyCon SPI PCI-X PCI  Optional PCI-E x16 PCI-E x8 – Pair 1GB DDR2 VLP DIMMs as I/O buffer 4x HSC *1 2x USB HSDC 1GbE (2GB total) (46C0501) 2.0 2x PCI-E x16 Optional IB – 4x SDR InfiniBand adapter (32R1760) 2 port – SAS expansion card (39Y9190) Flash IB x4 HCA Drive – 8GB Flash Drive (43W3934) USB to IB-4x to GbE to BC mid plane BC-H high speed BC mid plane fabric/mid plane

*The HSC interface is not enabled on the standard products. This interface can be enabled on “custom” system implementations for clients by working with the Cell services organization in IBM Industry Systems.

8 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Performance highlights

 Performance is an order of magnitude better than general purpose processors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability – Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance – Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency – Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels  Accelerates targeted applications with extraordinary processing capabilities – Floating-point operations – Integer operations – Data streaming / throughput support – Real-time support  Open architecture allows for optimization at compiler and application level – Performance gains from tuning compilers and applications can be significant – Tools/simulators are provided to assist in performance optimization efforts

9 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

IBM BladeCenter QS22 Premier blade for HPC workloads

 QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads

 QS22 is OPEN – based on Power Architecture and running ® OS

 QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads: – Co-exist and complement all other Blade servers offerings (Intel®, AMD®, POWER®) – Ready to scale out and deploy in production environments

 QS22 is GREEN – more than 1.7 SP (or 0.8 DP) GFLOPS per watt.

10 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM SDK for Multicore Acceleration and related tools

The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture

Eclipse-based IDE

Simulator IBM XL C/C++ compiler* Optimized compiler for use in creating Cell/B.E. optimized applications. Offers: * improved performance * automatic overlay support * SPE code generation XLC compiler is Performance a Tools complementary GNU tool chain product to SDK Libraries and frameworks Data Accelerated Communication Basic Linear Standardized Library and Algebra SIMD math Framework (ALF) Synchronization Subroutines (BLAS) libraries (DaCS)

Denotes software components included in the SDK for Multicore Acceleration 11 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

IBM SDK for Multicore Acceleration value

 Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support

 Based on industry standards to ease the transition to the Cell/B.E. – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in

 Designed to make it easy to port and optimize applications for the QS21 and QS22 – Enhancements to enable new features in QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

12 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Cell Programming Approaches are fully customizable! D ecre asi In ng p cre rog asin ram g P me rogr r at amm tent er ion Con to a trol rch ove itec r C tura 1. “Native” ell/B l de .E. tail Programming res s ourc es  Compilers, 2. Assisted Intrinsics, Programming DMA, etc.  Libraries, 3. Case Tools / Frameworks Complete Hardware Abstraction

 User tool-driven

13 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Workloads ideal for PowerXCell 8i and QS22

Extreme Stream Computation and Bandwidth requirements

Real-time Analytics Image/Video Creation/Mgt Unstructured Data Processing of Data Presentation of Data Multimodal Search Information Synthesis Visualization Data Transforms Analysis Imaging Pattern Matching

Market & Solution Specific Assets

Digital Home Media Financial Information Chemicals & Electronic Digital Video Aerospace Media Consumer Services Based Petroleum Design Surveillance and Defense Electronics Sector Medicine Automation PowerXCell 8i is suited for applications which demand extraordinary floating point performance

14 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Public sector HPC solutions Enable government labs, agencies, and academic research centers to run high performance codes faster, less expensively, and with lower power consumption than existing computing architectures

 IBM components :  The solution is designed to offer : – IBM BladeCenter QS21 & QS22 – Petaflop Scalability and reliability – IBM SDK for Multicore Acceleration – Lower power and space footprint – IBM Cell/B.E. math libraries – Lower total cost of ownership – IBM hybrid computing solution (custom offering) – PXCAB  Performance advantages : – Science code such as SPaSM, VPIC, Milagro,  ISV applications : Sweep3D, accelerated up to 4-9X faster than – Development tools from RapidMind, Gedae, AMD Opteron™ single core Wind River, etc. (Source: LANL - www.lanl.gov/roadrunner) – A growing number of university and government research labs with external collaborative missions are exercising existing and emerging science codes

*See Notes on Benchmarks, charts 46 and 47

15 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Aerospace & defense solutions Enhance competitiveness, demonstrate innovation and capture significant government contracts through dramatic performance improvements in real time signal and image processing

 IBM components:  Performance advantages: – IBM BladeCenter QS21 & QS22 – FFT workloads up to 7.7x faster than 3.0 GHz – IBM SDK for Multicore Acceleration 2-core Woodcrest x2* – IBM Cell/B.E. math libraries – Double Precision Matrix Multiplication up to – IBM hybrid computing solution (custom offering) 2.6x faster than 2.66GHz 4-core Clovertown* – PXCAB

 ISV applications: – Gedae stream, image and signal programming environment “As a time-served radar architect, I can say that Cell/Gedae is something of a dream and should – RapidMind development tools rightly impact the new design market… it is an – Wind River VxWorks RTOS and WorkBench opportunity that the DoD should not fail to grasp.” Tools - John Roulston, SCImus Solutions, March 2007

*See Notes on Benchmarks, charts 46 and 47

16 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Digital content creation solutions IBM solutions enable Media and Entertainment companies to produce the next generation of animated feature films, games, and advertising content

 IBM components:  The solution is designed to offer: – IBM BladeCenter QS21 & QS22 – Rapid turn around of digital assets – IBM SDK for Multicore Acceleration – More realistic simulation – IBM Cell/B.E. math libraries – An open and flexible solution based on standards – IBM hybrid computing solution (custom offering) – Scalability and reliability – PXCAB  Performance advantages: – IBM iRT scalable real-time ray tracer – 1080p Ray-traced images computed in milliseconds*  ISV applications – 1080p Ambient Occlusion images computed in – RapidMind development tools seconds*

*See Notes on Benchmarks, charts 46 and 47 17 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Digital video surveillance solutions Solutions deliver hardware and enablement for high-density, highly scalable encoding, transcoding, and compositing for digital video surveillance  IBM components:  The solution is designed to offer: – IBM BladeCenter QS21/QS22 – H.264 encoding – IBM Total Storage – Encoders for analog cameras – IBM DVS ADK – Transcoding to save storage and network costs  ISV applications: – Decoding acceleration to reduce – Codec libraries workstation costs and improve robustness – Video distribution software – Better management and scalability – Network-based surveillance – Compute density - with two processors per blade, 14 blades to a chassis, and two chassis 672 encoders in to a rack, it is possible to have as many as 672 a rack! H.264 encoders in the rack PTZ Aggregation Unit  Performance advantage: – One Cell/B.E processor running at 3.2 GHz, 16 camera inputs can encode 12 channels of standard definition Coax video at 30 fps to H.264 (main profile, including 16 camera inputs CABAC) [1]

14 card slots [1] Source: IBM Research benchmark IBM BladeCenter IBM BladeCenter-H QS21/QS22 IBM Total Storage *See Notes on Benchmarks, charts 46 and 47 18 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

EDA solutions Accelerate computational lithography workload to address turnaround time challenges and at the same time reduce total cost of the computing infrastructure

 IBM components:  The solution is designed to offer: – Cell/B.E. hybrid cluster – Significant run time acceleration – IBM BladeCenter QS21 – Leverages Cell/B.E. strengths to offer – IBM System x / IBM BladeCenter significant speed-up when compared to – IBM Cluster 1350 integrated cluster existing solutions in the market, reducing design turnaround time – Storage: DS4000, N series, DCS9550 – Scalability and reliability – Blade form factor improves scalability,  ISV applications: compute density and reliability – Mentor Graphics® Calibre® nmOPC and OPCVerify™

19 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Financial market analytics solutions Enable financial market professionals to perform the required speed, accuracy and highly complex analytics to support trade execution and improve their firms’ competitive position  The solution is designed to offer:  IBM components: – Flexibility and Scalability – IBM BladeCenter QS22 – IBM Bladecenter QS22 integrates with other – IBM SDK for Multicore Acceleration Bladecenter Products – Dynamic Application Virtualization – IBM SDK, DAV, third party applications for ease of – Cell/B.E. math libraries adoption within existing infrastructure – Technical Services with skilled programming expertise  ISV applications: and subject matter experts – NAG - Math & Stat Software – Power, space and cooling advantages – Platform Symphony -Grid Computing Environment  Performance advantage – Encirq – Event Processing Platform – Collateralized Debt Obligation (CDO) - 7.5X faster than 2.8 GHz 4-core Harpertown* – 650 million European options /sec using Monte Carlo simulations on QS22 blade*

*See Notes on Benchmarks, charts 46 and 47 20 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Medical imaging solutions Improve the efficiency, productivity, and quality of patient care through dramatic performance improvements in the transmission and analysis of medical images

 IBM components:  The solution is designed to offer: – IBM BladeCenter QS21 & QS22 – 3D image reconstruction, registration, volume – IBM SDK for Multicore Acceleration rendering, segmentation – IBM Cell/B.E. math libraries – On-demand compression/decompression  Performance advantage: – IBM hybrid computing solution (custom offering) – 16x improvement on MRI image reconstruction – PXCAB over Opteron system  ISV applications: – 11x improvement on CT image reconstruction – Advanced image and text analytics over 3.0GHz Xeon system – High-performance image compression – 48x improvement on image registration over 3GHz Pentium 4 – 200x shear-warp volume visualization over TI TMS320C80 processor – 40:1 CT study data compression (Source for all above: Mayo Clinic - http://www.mayoclinic.org/news2007- rst/3996.html )* *See Notes on Benchmarks, charts 46 and 47

21 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

Seismic solutions Improve the speed and accuracy of geologic visualization to reduce the cost of evaluating potential targets for oil and gas yielding potential

 IBM components:  The solution is designed to offer: – IBM BladeCenter QS22 – High-performance highly accurate rendering – IBM SDK for Multicore Acceleration of geologic structures – IBM Cell/B.E. math libraries – Cost effective HPC environment that has significant performance increases – IBM hybrid computing solution (custom offering) – Scalability and reliability – PXCAB  Performance advantages: – Standard math, vector math, FFT, BLAS, – FFT workloads up to 7.7x faster than 3.0 MPI and tridiagonal solver GHz 2-core Woodcrest x2*  ISV applications: – Double Precision Matrix Multiplication up to – Simudyne 2.6x faster than 2.66GHz 4-core – Customers own proprietary code Clovertown*

*See Notes on Benchmarks, charts 46 and 47

22 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

QS22 summary Premier blade for HPC workloads

 The QS22 is based on the new PowerXCell 8i processor – built on an enhanced version of the Cell Broadband Engine Architecture

 The QS22 offers the capabilities you need for your most demanding computational requirements – Offers extraordinary double precision and single precision floating point performance – Supports up to 32GB of processor memory

 IBM is working with ISVs and customers to accelerate workloads on the QS22 in targeted application areas

 The QS22 is extremely efficient , offering more than 1.7 SP (or 0.8 DP) GFLOPS per watt of energy

BladeCenter QS22 is Right, Open, Easy and Green

23 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group

IBM SDK for Multicore Acceleration summary

 Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support

 RHEL 5.2 Enterprise support

 Based on industry standards to ease the transition to the Cell/B.E. architecture – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in

 Designed to make it easy to port and optimize applications for the QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

24 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Cell/B.E. architecture reaches wide and deep – from consumer products to high performance computing

r tteer eenn taacc daat dd d aann allee Mini- scca Roadrunner Roadrunner orr s t ffo Custom (16,000 oorrt pppp PowerXCell 8i. ssuu ngg IBM BladeServer + AMD) ssiin reeaa (2 Cell/B.E. or nccr IIn PowerXCell 8i) Mercury 1u Dual Cell Sony Cell/B.E. PowerXCell 8i Computing Unit PCI card (Cell/B.E. + Host) (Cell/B.E. + GPU + AV I/O) SCE PS3 Toshiba High SpursEngine (Cell/B.E. + GPU) (SPU’s. + Host) Performance Consumer Business Enterprise Computing Common OS’s, Infrastructure, Tools, Libraries, Code… the SAME SPE code runs from end to end

25 Sales Conference © 2008 IBM Corporation