Cell/BE Processor-Based Systems and Software Offerings IBM Bladecenter

IBM Systems and Technology Group Cell/B.E. processor-based systems and software offerings IBM BladeCenter® QS22 and SDK 3.0 IBM CONFIDENTIAL © 2008 IBM Corporation IBM Systems and Technology Group The challenge today For many years, organizations have relied on performance gains from increasing clock speeds of “traditional” microprocessor architectures This approach has been challenged by the physical limitations of semiconductors and by traditional processor architecture implementations High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance. 2 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Cell Broadband Engine™ (Cell/B.E.) Technology For a higher of absolute performance and efficiency IBM, Sony, Toshiba Alliance formed in 2000 March, 2001 – STI Design Center opened in Austin, TX April, 2004 - Single Cell BE operational July, 2004 - 2-way SMP operational February, 2005 - first technical disclosures at ISSCC May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3 August, 2005 - published technical details of Cell/B.E. architecture November, 2005 - published open source SDK & Cell/B.E. simulator August, 2006 - introduced the very first Cell/B.E. processor-based server to the market 3 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM commitment to innovation IBM BladeCenter QS22 Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applications PowerXCell™ 8i processor •BladeCenter QS21 •IBM SDK for Multicore Acceleration 3.0 BladeCenter QS20 Produce systems Produce robust Create initial for early adoption production ready platforms for and solution systems for targeted experimentation enablement industry applications 2007 2008 2006 4 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Cell Broadband Engine Architecture™ (CBEA) Technology Roadmap CompatibleCompatible codecode andand securitysecurity basebase acrossacross entireentire linlinee IBM PowerXCell 32ii 45nm SOI Performance Enhancements/ IBM Scaling PowerXCell™ 8i (1+8eDP SPE) 65nm SOI Cost Cell/B.E. Cell/B.E. Cell/B.E. (1+8) (1+8) Reduction (1+8) 90nm SOI 65nm SOI 45nm SOI Committed Concept 2006 2007 2008 2009 2010 All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. 5 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM PowerXCell™ 8i processor benefits The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture™ core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance Sets a new performance standard – Accelerates computationally intense workloads such as analytics, multimedia and vector processing. – Efficient computation per watt Designed for flexibility – Wide variety of application domains – Cell can cover a wide range of application space with its capabilities in – floating point operations, integer operations PowerXCellPowerXCell 8i 8i processorprocessor – data streaming / throughput support 6565 nmnm 99 cores,cores, 1010 threadsthreads – real-time support 230.4230.4 GFlopsGFlops peak peak (SP)(SP) atat 3.2GHz3.2GHz – Exploits C/C++, Fortran programming models 108.8108.8 GFlopsGFlops peak peak (DP)(DP) atat 3.2GHz3.2GHz UpUp toto 2525 GB/sGB/s memorymemory bandwidthbandwidth Enhanced security capability UpUp toto 7575 GB/sGB/s I/OI/O bandwidthbandwidth – Virtual trusted computing environment for security 9292 WattsWatts @@ 3.2GHz3.2GHz TopTop frequencyfrequency >4GHz>4GHz (observed(observed inin lab)lab) 6 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group PowerXCell 8i uses ½ the space & power and delivers more than 2.3x the GFlops of traditional architecture Example Server Example Desktop PowerXCell 8i Dual Core Quad Core 2 Nine Core 349mm , 3.4 GHz @ 150W 214 mm², 3 GHz @ 130W 109 mm 2 3.2 GHz@ 75W 2 Cores, ~27.2 SP GFlops 4 Cores, ~96 SP GFlops 1.3b Transistors @ 65nm 820m Transistors @ 45nm 9 cores, ~ 230 SP GFlops, 250m Transistors @ 65nm On any traditional processor, shown ratio of Intel’s x86 Quad Core processors are Dual cores to cache, prediction, & related items illustrated here remains at ~50% of area the Chip Modules (DCMs), 2 of these processor chip area stacked vertically & packaged together 7 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group BladeCenter® QS22 – PowerXCell 8i Core Electronics D D D D D D D D – Two 3.2GHz PowerXCell 8i Processors D D D D D D D D R R R R R R R R – SP: 460 GFlops peak per blade 2 2 2 2 2 2 2 2 – DP: 217 GFlops peak per blade DDR2 – Up to 32GB DDR2 800MHz – Standard blade form factor PowerXCell 8i PowerXCell 8i – Support BladeCenter H chassis Rambus® FlexIO ™ Flash, RTC Integrated features D D & NVRAM – Dual 1Gb Ethernet (BCM5704) D IBM D IBM R South R South 2 UART, SPI – Serial/Console port, 4x USB on PCI 2 Bridge 2 Bridge Con Legacy SPI PCI-X PCI Optional PCI-E x16 PCI-E x8 – Pair 1GB DDR2 VLP DIMMs as I/O buffer 4x HSC *1 2x USB HSDC 1GbE (2GB total) (46C0501) 2.0 2x PCI-E x16 Optional IB – 4x SDR InfiniBand adapter (32R1760) 2 port – SAS expansion card (39Y9190) Flash IB x4 HCA Drive – 8GB Flash Drive (43W3934) USB to IB-4x to GbE to BC mid plane BC-H high speed BC mid plane fabric/mid plane *The HSC interface is not enabled on the standard products. This interface can be enabled on “custom” system implementations for clients by working with the Cell services organization in IBM Industry Systems. 8 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Performance highlights Performance is an order of magnitude better than general purpose processors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability – Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance – Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency – Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels Accelerates targeted applications with extraordinary processing capabilities – Floating-point operations – Integer operations – Data streaming / throughput support – Real-time support Open architecture allows for optimization at compiler and application level – Performance gains from tuning compilers and applications can be significant – Tools/simulators are provided to assist in performance optimization efforts 9 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM BladeCenter QS22 Premier blade for HPC workloads QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads QS22 is OPEN – based on Power Architecture and running Linux® OS QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads: – Co-exist and complement all other Blade servers offerings (Intel®, AMD®, POWER®) – Ready to scale out and deploy in production environments QS22 is GREEN – more than 1.7 SP (or 0.8 DP) GFLOPS per watt. 10 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM SDK for Multicore Acceleration and related tools The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture Eclipse-based IDE Simulator IBM XL C/C++ compiler* Optimized compiler for use in creating Cell/B.E. optimized applications. Offers: * improved performance * automatic overlay support * SPE code generation XLC compiler is Performance a Tools complementary GNU tool chain product to SDK Libraries and frameworks Data Accelerated Communication Basic Linear Standardized Library and Algebra SIMD math Framework (ALF) Synchronization Subroutines (BLAS) libraries (DaCS) Denotes software components included in the SDK for Multicore Acceleration 11 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group IBM SDK for Multicore Acceleration value Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support Based on industry standards to ease the transition to the Cell/B.E. – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in Designed to make it easy to port and optimize applications for the QS21 and QS22 – Enhancements to enable new features in QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform 12 Sales Conference © 2008 IBM Corporation IBM Systems and Technology Group Decreasing programmer attention to architectural details Cell ProgrammingInc rApproacheseasing Progra aremm efullyr Co ncustomizable!trol over Cell/B.E. resources 2. Assisted 3. Case Tools / 1. “Native” Programming Complete Programming Hardware Libraries, Abstraction Compilers, Frameworks User tool-driven Intrinsics, © 2008 IBM Corporation DMA, etc. Sales Conference 13 IBM Systems and Technology Group Workloads ideal for PowerXCell

Cell/BE Processor-Based Systems and Software Offerings IBM Bladecenter

Cell Processorprocessor Andand Cellcell Processorprocessor Basedbased Devicesdevices

The GPU Computing Revolution

OLCF AR 2016-17 FINAL 9-7-17.Pdf

Financial Computing on Gpus Lecture 1: Cpus and Gpus Mike Giles

E-Business on Demand- Messaging Guidebook

モバイル性能を向上させたAVノートPC Qosmiotm V65

Legendsincomputing.Pdf

Scheduling Tasks Over Multicore Machines Enhanced with Acelerators: a Runtime System's Perspective

IBM Power Systems Compiler Roadmap

Toshiba Group's Environmental Vision 2050

IBM ACTC: Helping to Make Supercomputing Easier

Fast Elliptic-Curve Cryptography on the Cell Broadband Engine