IBM STG Deep Computing

IBM Systems Group

Deep Computing with IBM Systems

Barry Bolding, Ph.D. IBM Deep Computing SciComp 2005

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group Deep Computing Components

ƒ High Performance Computing Leadership ƒ Research and Innovation ƒ Systems Expertise – pSeries – xSeries – Storage – Networking ƒ Innovative Systems

© 2004 IBM Corporation IBM Systems and Technology Group Deep Computing Focus

ƒ Government Research Labs – Energy and Defense ƒ Weather/Environmental – Weather Forecasting Centers – Climate Modeling ƒ Higher Education/Research Universities ƒ Life Sciences – Pharma, BioTech, Chemical ƒ Aero/Auto ƒ Petroleum ƒ Business Intelligence, Digital Media, Financial Services, On Demand HPC

© 2004 IBM Corporation IBM STG Deep Computing

Deep Computing Teams and Organization

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group Deep Computing Technical Team

Kent Winchell Technical Team Deep Computing

Barry Bolding Jeff Zais Technical Manager Technical Manager Public Sector Industrial Sector

Farid Parpia John Bauer Martin Feyereisen Doug Petesch HPC Applications HPC Storage Auto/Aero Auto/Aero Life Sciences Government, HPC Business Intelligence

“Suga” Sugavanam Wei Chen Charles Grassl Guangye Li EDA Government HPC Applications Auto/Aero Asia Pacific HPC HIgher Ed. BlueGene/L

Stephen Behling Ray Paden Si MacAlester Harry Young Higher Ed GPFS CFD HPC Storage Digitial Media Digital Media

Joseph Skovira James Abeles Scott Denham Janet Shiu Schedulers Weather/Environment CSM Petroleum Petroleum, Visualization

Marcus Wagner Len Johnson Government Life Sciences Digital Media/Storage

© 2004 IBM Corporation IBM STG Deep Computing

IBM Deep Computing Summary of Technology Directions

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group HPC Cluster System Direction Segmentation Based on Implementation

Off Roadmap Segment

High End High Value Segment

Midrange

Systems High Volume 'Good Enough' Segment

Blades Blades - Density - Segment

2004 2005 2006 2007

© 2004 IBM Corporation IBM Systems and Technology Group HPC Cluster Directions

Limited Configurability (Memory Size, Bisection)

Power (w/Accelerators?) P Linux, HCAs E PF BlueGene Extended Configurability R C Power, Linux, HCAs S 100TF Power, AIX, Federation Machines Capability Performance Linux Clusters

Less Demanding Communication Power, Intel, BG, Nodes usters

Capacity Cl Blades

2004 2005 2006 2007 2008 2009 2010

© 2004 IBM Corporation IBM Systems and Technology Group Deep Computing Architecture User Community

Gateways, Webservers, Firewalls, On-Demand Access

SAN switch

Large-Memory High Density Emerging BW driven Computing Technologies

HPC Network Backbone Network Storage Network Shared Storage

© 2004 IBM Corporation IBM Systems and Technology Group Deep Computing Architecture (Multicluster GPFS) User Community

Gateways, Webservers, Firewalls, On-Demand Access HPC Network Backbone Network Storage Network

Large-Memory High Density Emerging BW driven Computing Technologies

Shared Storage Shared Storage Shared Storage

© 2004 IBM Corporation IBM Systems and Technology Group IBM Offerings are Deep and Wide

pSeries, eServer1600 HPC Clusters, IBM Power4 and Power5 chip AIX/Linux Grids, Blades

nt, g, em e kin st em or Sy ag tw an Ne M e, ag or St ols To

xSeries/eServer 1350 Workstations Intel Xeon AMD Opteron Software, expertise BladeCenter and Business Partners Linux, Server2003 "to tie it all together for your HPC solution"

© 2004 IBM Corporation IBM Systems and Technology Group Processor Directions

ƒ Power Architectures – Power4 Æ Power5 ÆPower6 Æ – PPC970 Æ Power6 technology – BlueGene/L Æ BlueGene/P – Cell Architectures (Sony, Toshiba, IBM) ƒ Intel – IA32 Æ EM64T (NOCONA) Æ ƒ AMD Opteron – Single-core Æ dual-core

© 2004 IBM Corporation IBM Systems and Technology Group System Design

ƒ Power Consumption (not heat dissipation) ƒ Chips might only be 10-20% of the power on a system/node ƒ New metrics – Power/ft^2 – Performance/ft^2 – Total cost of ownership (including power/cooling) ƒ Power5 clusters (p575) = 96 cpu/rack ƒ 1U rack optimized clusters = 128 cpu/rack ƒ Bladecenter(PPC/Intel/AMD) = 168 cpu/rack (dual core will increase this) ƒ BlueGene = 2048 cpu/rack

© 2004 IBM Corporation IBM Systems and Technology Group Systems Directions

ƒ Optimizing Nodes – 2,4,8,16 CPU nodes – Large SMPs – Rack Optimized Servers and BladeCenter ƒ Optimizing Interconnects – Higher Performance Networks – HPS, Myrinets, Infiniband, Quadrics, 10GigE – Utility Networks – Ethernet, Gigabit, 10GigE ƒ Optimizing Storage – Global Filesystems (MultiCluster GPFS) – Avoiding Bottlenecks (NFS, Spindle counts, FC adapters and switches) ƒ Optimizing Grid Infrastructure

© 2004 IBM Corporation IBM Systems and Technology Group Systems Directions

ƒ pSeries – Power4 Systems (p-6xx) – 2,4,8,16 way Power5 clusters (p-5xx, OpenPower-7xx) – 32,64 way Power5 SMPs (p-595) – BladeCenter cluster (JS20) ƒ xSeries – Intel EM64T, Rack Optimized and BladeCenter – x335,x336,HS20,HS40 – AMD Opteron Rack Optimized – x325,x326,LS20 ƒ BlueGene/L ƒ Interconnects – HPS, Myrinet, IB, GIGE, 10GIGE

© 2004 IBM Corporation IBM Systems and Technology Group Software Directions ƒ System Software – Unix (AIX, Solaris) – Linux – Linux on POWER – Linux on Intel and Opteron – Windows ƒ HPC Software – Same Software on AIX and Linux on POWER – Compilers, Libraries, Tools – Same HPC Infrastructure on Linux/Intel/Opteron and POWER – GPFS, Loadleveler, CSM – MultiCluster GPFS – Grid Software – Backup and Storage Management

© 2004 IBM Corporation IBM Systems and Technology Group Linux Software Matrix

ƒ Kernels (not even considering distros) – 2.4, 2.6 ƒ Interconnects – IB (3 different vendors), Myrinet, Quadrics, GigE (mpich and lam) ƒ 32 and 64-bit binaries and libraries ƒ Compiler options (Intel, Pathscale, PGI, gcc) ƒ Geometric increase in number of binaries and sets of libraries that any code developer might need to support.

© 2004 IBM Corporation IBM Systems and Technology Group There are passengers and there are drivers!

ƒ IBM is a Driver – POWER (www.power.org) – Linux on Power and Intel/Opteron, LTC – BlueGene/L – STI Cell Architectures – Open Platform Support ƒ HP, SGI, SUN, Cray are passengers – Rely primarily on external innovations (HP, SGI, SUN, Cray).

© 2004 IBM Corporation IBM Systems and Technology Group Introducing IBM’s Deep Computing Organization

Government Weather Forecasting

Crash Analysis

• Clear #1 position in High Performance Computing (Top500, Gartner, IDC, …) Financial Services • “Our goal is to solve consistently larger and more complex problems more quickly and at lower cost.” Petroleum Exploration

Drug Chip Design Discovery Digital Media

© 2004 IBM Corporation IBM Systems and Technology Group The CAE World is in Flux

ƒHardware vendors

ƒSoftware vendors

ƒOperating systems

ƒCluster computing

ƒMicroprocessors

Most users are seeing dramatic changes in their CAE environment

© 2004 IBM Corporation IBM Systems and Technology Group

Evolution of Hardware: drive towards commonality

Clusters (~2002)

RISC SMPs (~1994) Cluster architecture (Unix & Linux) now dominates crash and Vectors CFD environments (~1983) SMP architecture was often first introduced in the CFD MainFrames department and helped (~1979) push parallel computing.

Beginning in 1986 crash simulation drove CAE compute requirements Mostly MSC.Nastran

© 2004 IBM Corporation IBM Systems and Technology Group Transition of the CAE environment Structural Analysis 100 90 80 70 Serial 60 SMP 50 40 4-30 CPUs 30 >30 CPUs 20 Percent Workload 10 0 1998 2000 2002 2004 Crash Simulation 70 60 50 Serial rkload o 40 SMP W 30 4-30 CPUs 20 >30 CPUs Percent 10 0 1998 2000 2002 2004 CFD Simulation 80 70 60 Serial

rkload 50 o SMP

W 40 4-30 CPUs 30 >30 CPUs

Percent 20 10 0 1998 2000 2002 2004

© 2004 IBM Corporation IBM Systems and Technology Group Recent Trends – Top 20 Automotive Sites

100%

S 90% P 80% MIPS FLO

ga 70% i

d G 60% e

l POWER

al 50% t Alpha s

n 40% I PA-RISC SPARC of

t 30% n e c

r 20% e

P IA-32 10% Vector Other IA-64 0% 1997 1998 1999 2000 2001 2002 2003

Source: TOP500 website http://www.top500.org/lists/2003/11/

© 2004 IBM Corporation IBM STG Deep Computing

IBM Power Technology and Products

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group POWER : The Most Scaleable Architecture

POWER5 POWER4+ POWER4 s POWER3 ervers POWER2 S Binary Compatibility Binary

PPC PPC 970FX PPC op 750GX Deskt PPC PPC 750FX ames PPC 750 750CXe G 603e

PPC PPC 440GX 440GP ded PPC Embedd 405GP PPC 401

© 2004 IBM Corporation IBM Systems and Technology Group IBM powers Mars exploration

IBM returns to MARS

ƒPowerPC is at the heart of the BAE Systems RAD6000 Single Board Computer, a specialized system enabling the Mars Rovers — Spirit and Opportunity — to explore, examine and even photograph the surface of Mars. ƒIn fact, a new generation of PowerPC based space computers is ready for the next trip to another planet. The RAD750, also built by BAE Systems, is powered by a licensed radiation-hardened PowerPC 750 microprocessor that will power space exploration and Department of Defense applications in the years in the come.

© 2004 IBM Corporation IBM Systems and Technology Group IBM OpenPower / eServerTM p5 Server Product Line No Compromises...... High-end

Linux POWER5 Midrange Systems p5-595 Std & Turbo POWER4+ PPC970+ p5-590 p5-570 p5-575 Systems Systems Express, Std & Turbo

Entry p5-550 P590 Towers Express & Std p550 p520 p570 p575 p595 p575 p5-520 p590 Express & Std p595 p50 p5-510 Entry ^ Express & Std Rack IBM Cluster 1600 Workstations Blades JS20+ IntelliStation OpenPower OP 720 Mdl 275 OP 710

© 2004 IBM Corporation IBM Systems and Technology Group POWER5 Technology Bottom to Top p5-595 p5-590 p5-575

p5-570 p5- p5- 550

520 pSeries p5-510 pSeries

19-inch 19-inch 24-inch Footprint, 19-inch 24-inch 24-inch packaging rack, rack, 19-inch rack frame rack deskside deskside by node frame frame No. CPUs/node 1, 2 1, 2 1, 2, 4 2, 4, 8, 12, 16 8 8 to 32 16 to 64 GHz clock 1.5, 1.65 1.5, 1.65 1.5, 1.65 1.5, 1.65, 1.9 1.9 1.65 1.65, 1.9 GB memory 0.5 to 32 0.5 to 32 0.5 to 64 2 to 512 1 to 256 8 to 1,024 8 to 2,048 Int. storage 587.2MB 8.2TB 15.2TB 38.7TB 1.4TB 9.3TB 14.0TB PCI-X slots 3 6 to 34 5 to 60 6 to 163 0 to 24 20 to 160 20 to 240 I/O drawers 0 4 8 20 1 8 12 LPARs 20 20 40 160 80 254 254 Max. rPerf 9.86 9.86 19.66 77.45 46.36 151.72 306.21 Cluster 1600 Yes Yes Yes Yes Yes Yes Yes HACMP™ (AIX 5L™ V5.2) Yes 2Q05 Yes Yes Yes Yes Yes Yes

© 2004 IBM Corporation IBM Systems and Technology Group

POWER5 architecture

POWER5 design 1.5, 1.65 and 1.9 GHz 276M transistors .13 micron

POWER5 enhancements

ƒ Simultaneous multi-threading POWER5 POWER5 ƒ Hardware support for Micro-Partitioning Core Core –Sub-processor allocation ƒ Enhanced distributed switch L3 L3 Dir / Ctl 1.9 MB ƒ Enhanced memory subsystem Mem L2 Cache –Larger L3 cache: 36MB Ctl

–Memory controller on-chip Memory Enhanced distributed switch ƒ Improved High Performance Computing [HPC] ƒ Dynamic power saving Chip-Chip SMPLink GX+ –Clock gating MCM-MCM

© 2004 IBM Corporation IBM Systems and Technology Group

~ p5: Simultaneous multi-threading

POWER5POWER4(simultaneous (Single Threaded) multi-threading)

FX0 Thread0 active FX1 No thread active LSO Thread1 active LS1 FP0 Appears as 4 CPUs per chip to the FP1 operating system BRZ (AIX 5L V5.3 and System throughput ST CRL Linux) SMT

ƒ Utilizes unused execution unit cycles ƒ Presents symmetric multiprocessing (SMP) programming model to software ƒ Natural fit with superscalar out-of-order execution core ƒ Dispatch two threads per processor: “It’s like doubling the number of processors.” ƒ Net result: – Better performance – Better processor utilization

© 2004 IBM Corporation IBM Systems and Technology Group p5-575 design innovations

Power Distribution CPU/Memory Module Module (DCA) ƒSingle core POWER5 chips support high ƒDistinctive, high efficiency, memory bandwidth intelligent DC power conversion and distribution ƒPackaging designed subsystem to accommodate dual core technology

I/O Module Cooling Module ƒVersatile I/O and service ƒHigh capacity 400 processor CFM impellers ƒDesigned to easily support ƒHigh efficiency motors with changes in I/O options intelligent control

© 2004 IBM Corporation IBM Systems and Technology Group POWER5 p5-575 p5-575 P655 Drawers / 12 16 Rack

H C R U6 IBM 8/16 – way 4/8 – way

H C R U6 IBM Architecture POWER5 POWER4+

H C R U6

IBM

H C R U6 IBM 32 MB / Chip / Core L3 Cache 36 MB / Chip / Core

H C R U6

IBM Shared in 8–way config

H C R U6 IBM Memory 1GB – 256GB 4GB – 64B

H C R U6 IBM Packaging 42U ( 24” rack ) 42U ( 24” rack )

H C R U6

IBM DASD / Bays Two Two H C R U6

IBM

H C R U6 IBM I/O Expansion 4 PCI-X 2 PCI-X

H C R U6 IBM Integrated Two One

H C R U6 IBM SCSI Integrated Four 10/100/1000 Two 10/100 Ethernet RIO2 Drawers 0, ½, or 1 Drawer 0, ½, or 1 Drawer Dynamic Yes Yes p5-575 System LPAR ‰42U rack chassis Redundant Yes (Frame) Yes (Frame) ƒRack: 2U Drawers Power ƒ12 Drawers / rack Redundant Yes Yes Cooling

© 2004 IBM Corporation IBM Systems and Technology Group p5-575 and Blue Gene

p5-575: Blue Gene®: 64-bit AIX 5L,/Linux® cluster node suitable for 32-bit Linux cluster suitable for highly parallel applications requiring high memory B/W and applications with limited memory requirements large memory (32GB) per 64-bit processor. (256MB per 32-bit processor) and limited or highly parallelized I/O. Scalable systems: 16 to 1,024 POWER5 Very large systems: up to 100,000+ PPC440 CPUs (more special order) CPUs “Off-the-shelf” and custom configurations Custom configurations Standard IBM service and support Custom service and support 1,000s of applications supported Highly effective in highly specialized applications

H C R U6

IBM

H C R U6

IBM

H C R U6

IBM

H C R U6 Node Board IBM

H C R U6 IBM (32 chips, 4x4x2)

H C R U6 IBM 16 Compute Cards

H C R U6

IBM

H C R U6

IBM

H C R U6 IBM Compute Card

H C R U6 IBM (2 chips, 2x1x1) 180/360 TF/s H C R U6 IBM 16 TB DDR

H C R U6 IBM Chip (2 processors) 2.9/5.7 TF/s 256 GB DDR 90/180 GF/s Largest p5-575 configuration: Blue8 GBGene/L DDR Configuration: 5.6/11.2 GF/s 2.8/5.6 GF/s 0.5 GB DDR 12,000+ CPUs ASCI PURPLE - LLNL 4 MB 131,000 CPUs - LLNL

© 2004 IBM Corporation IBM STG Deep Computing

IBM HPC Clusters: Power/Intel/Opteron

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 - Value

ƒ Leading edge Linux Cluster technology – Employs high performance, affordable Intel®, AMD® and IBM PowerPC® processor-based servers – Capitalizes on IBM’s decade of experience in clustering ƒ Thoroughly tested configurations / components – Large selection of industry standard components – Tested for compatibility with major Linux distributions ƒ Configured and tested in our factories – Assembled by highly trained professionals, tested before shipment to client site ƒ Hardware setup at client site included (except 11U) – Enables rapid accurate deployment ƒ Single point of contact for entire Linux Cluster … including third-party components – Warranty services provided/coordinated for entire system – including third-party components – Backed by IBM’s unequalled worldwide support organization

© 2004 IBM Corporation IBM Systems and Technology Group Or would you rather want to deal with this?

© 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 - Overview ƒ Integrated Linux cluster solution – Factory integrated & tested (in Greenock for EMEA) – delivered and supported as one product – Complemented by 3 year IBM warranty services including OEM parts (Cisco, Myrinet, ...) ƒ Broad solution stack portfolio – Servers xSeries 336/x346 (Xeon EM64T), eServer 326 (Opteron), Blades – Storage TotalStorage DS4100/4300/4400/4500 – Networking Cisco / SMC / Force10 Gigabit Ethernet = Commodity networks Myrinet / Infiniband = high performance low latency (< 5 µs) networks – Software Cluster Systems Management 1.4 (CSM) = cluster installation & admin. General Parallel File System 2.3 (GPFS) = optional cluster file system – Services Factory integration & testing, onsite hardware setup (included) SupportLine, software installation (both optional) – Currently supported and recommended Linux distributions: SUSE Linux Enterprise Server (SLES) 8 & 9 RedHat Enterprise Linux (RHEL) 3 – More options available via ‚special bid‘, e.g. other networking gear, ... ƒ For more info have a look at – http://www-1.ibm.com/servers/eserver/clusters/

© 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 - Node Choices

~ BladeCenter with ~ BladeCenter with ~ BladeCenter with JS20 LS20* HS20

AMD Opteron-based BladeCenter Dual Processor Support – Nocona (12/04) POWER 4-based BladeCenter Blade Single or Dual Core 2-Socket Blade 14 Blades per Chassis 2.2 GHz PPC 970, 2-way Blade Maximum Memory: 8GB 8GB Maximum Memory/Blade Maximum Memory: 4GB SFF SCSI Drives and Daughter Cards Integrated System Management IDE Drives: 2, 60GB Integrated Systems Management 7U Chassis 3 Daughter Cards available (Ethernet, Fibre Channel w/ Boot Support, Myrinet)

xSeries 346 xSeries 336 ~ 326 High availability node for Highly manageable High performance rack dense node rack-dense node application serving

Dual Processor Support - Opteron 16GB Maximum Memory*, 8 RDIMMs Dual Processor Support (Nocona/Irwindale) 2 Hot-swap SCSI hard disk drives 16GB Maximum Memory 16GB Maximum Memory (with option), 8 RDIMMs Dual Processor Support (Nocona/Irwindale) 6 Hot-swap SCSI HDDs 2 Hot-swap SCSI or 2 Fixed SATA HDDs Integrated System Management Integrated System Management Integrated System Management 1U 2U 1U

© 2004 IBM Corporation IBM Systems and Technology Group Blade portfolio continues to build

HS20 2-way Xeon HS40 4-way Xeon JS20 PowerPC AMD Opteron LS20

ƒ Intel Xeon DP ƒ Intel Xeon MP ƒ Two PowerPC® ƒ Two socket processors 970 processors AMD ƒ EM64T ƒ 4-way SMP ƒ 32-bit/64-bit ƒ Single and Dual ƒ Mainstream rack capability solution for Linux & core dense blade AIX 5L™ ƒ Supports Windows, ƒ Similar feature ƒ High availability Features Linux, and NetWare ƒ Performance for set to HS20 apps deep computing ƒ Optional HS HDD clusters

ƒ Edge and mid-tier ƒ Back-end workloads ƒ 32- or 64-bit HPC, ƒ 32- or 64-bit workloads VMX acceleration HPC ƒ Large mid-tier apps ƒ Collaboration ƒ UNIX server ƒ High memory ƒ Web serving consolidation bandwidth apps Target Apps

Common Chassis and Infrastructure

© 2004 IBM Corporation IBM Systems and Technology Group Introducing the AMD Opteron LS20 ƒ HPC performance with “enterprise” availability feature set

Ultra 320 Two non Hot sockets Swap Disk ƒ 68W w/ RAID1 processors ƒSingle and dual core Supports SFF and legacy I/O expansion 4 DDR VLP (very low Broadcom dual cards profile) DIMM slots port ethernet

Planned OS support

ƒ RHEL 4 for 32-bit and x64 ƒ RHEL 3 for 32-bit and x64 ƒ SuSE Linux ES 9 for 32-bit and x64 ƒ RHEL 2.1 (not at announce) ƒ …

© 2004 IBM Corporation IBM Systems and Technology Group How much can you fit in one rack? Your choice!

ƒ IBM eServer xSeries 336 (Xeon DP 3.6 GHz) – IA-32, up to 84 CPUs (8.7 KW) / rack – price/performance (1058.4 total SPECfp_rate) – $268.9 k list price (604.8 GFLOP peak)

ƒ IBM eServer 326 (Opteron 250) – x86-64, up to 84 CPUs (7.5 KW) / rack – memory bandwidth (1432.2 total SPECfp_rate) – $241.9k list price (403.2 GFLOP peak)

ƒ IBM eServer BladeCenter HS 20 (Xeon DP 3.6 GHz) – IA-32, up to 168 CPUs (17.3 KW) / rack – foot print, integration (2116.4 total SPECfp_rate) – $574.7k list price (1209.6 GFLOP peak)

ƒ IBM eServer BladeCenter JS20 (PPC970 2.2 GHz) – PPC-64, up to 168 CPUs (10.1 KW) / rack – Performance, foot print (1680 total SPECfp_rate) – $389.7k list price (1478 GFLOP peak)

*Prices are current as of (the date) and subject to change without notice

© 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 - Compute Node Positioning

ƒ e326 nodes - leading price/performance for memory-intensive applications in a server platform that supports both 32-bit and 64-bit applications

ƒ x336 and x346 nodes - leading performance and manageability for processor-intensive applications in an IA platform that supports both 32-bit and 64-bit applications

ƒ HS20 blades - performance density, integration, and investment protection in an IA platform that supports both 32- bit and 64-bit applications

ƒ JS20 blades - leading 64-bit price/performance in a POWER™ processor-based blade architecture or have applications that can exploit the unique capabilities of VMX

© 2004 IBM Corporation IBM Systems and Technology Group

Cluster 1350 - Storage Selections

DS4400 DS4500 DS4300 (Turbo) FC / SATA FC DS4100 FC (FAStT 700) (FAStT 900) SATA DS400 (FAStT 600) (FAStT 100) FC-SCSI

+3U Chassis +3U Chassis +Up to 32 TB – Fiber +Up to 32 TB – Fiber +3U Chassis +Up to 56 TB - SATA +3U Chassis +Up to 8TB (4300) +Up to 56 TB - SATA +3U Chassis +Single or Dual Controllers +Up to 16 TB (4300 Turbo) +Single or Dual +Up to 3.5TB Single +Up to 28TB - SATA Controllers +Up to 28TB Dual +Up to 2 TB DS300 iSCSI-SCSI

+3U, Single or Dual 1Gb Controllers +Up to 2 TB

© 2004 IBM Corporation IBM STG Deep Computing

Interconnect options of e1350 (Intel/Opteron)

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 - Network Selections / Ethernet

ƒ Cisco 6509 – Used as core switch or aggregation switch ƒ Cisco 4006 – 8 slots for configuration – Used as core switch or aggregation switch in very – Up to 384 1Gb copper ports large clusters – Up to 32 10 Gb Fiber ports – 5 slots for Line cards only – Non-blocking – Up to 240 1Gb copper ports – Max of 3.75:1 oversubscribed ƒ Cisco 6503 – Used as an aggregation switch or as a small ƒ SMC 8648T cluster core switch – Used for core switch in small clusters or – 2 slots for configuration aggregation switch in mid-size clusters – Up to 96 1Gb copper ports – 1U form factor / 48 ports – Up to 8 10Gb Fiber ports – Non-blocking by itself – Max of 5:4 oversubscribed ‘Near line rate’ – At best 5:1 blocking in distributed mode

ƒ Force10 E600 ƒ SMC 8624T – Used as core switch or aggregation switch – Used for core switch in small clusters or – Up to 324 ports aggregation switch in small clusters – Non-blocking – 1U form factor / 48 ports – Alternative to 6509 – Non-blocking by itself – At best 5:1 blocking in distributed mode

© 2004 IBM Corporation IBM Systems and Technology Group Some more Cisco Components

ƒ Catalyst 3750G-24TS 24 port GigE 1U switch – 32Gbit backplane, stackable – Used for small clusters & distributed switch networks

ƒ Catalyst 4500 Series Switches – 3-slot & 6-slot versions – Lower cost, over-subscribed – Up to 384 GigE ports

ƒ Catalyst 6500 Series Switches – 3-slot & 9-slot versions – Higher cost, non-blocking (720Gbit backplane) – Up to 384 GigE ports – 10GigE ports available

© 2004 IBM Corporation IBM Systems and Technology Group Gigabit Ethernet Details – Force10 Overview

Why Force10?

– High port count GigE switches capable of non- blocking throughput are hard to find. Force10 series is one of the few.

– E600 specifications: – 900 Gbps non-blocking switch fabric – 1/3 rack chassis (19" rack width) – 500 million packets per second – 7 line card slots – 1+1 redundant RPMs – 8:1 redundant SFMs – 3+1 & 2+2 redundant AC power supplies – 1+1 redundant DC Power Entry Modules

© 2004 IBM Corporation IBM Systems and Technology Group New Myrinet switches for large clusters

Spine Clos256 Clos256+256

256 512 768 hosts hosts hosts Only 320 cables

1024 1280 hosts hosts

All inter-switch cabling on quad ribbon fiber.

© 2004 IBM Corporation IBM Systems and Technology Group TopSpin InfiniBand Switch Portfolio

Topspin 720* Topspin 270 Topspin 120

Chassis Type 1U Fixed 6U Modular 8U Modular

Max 12X ports 8 32 64 (32 if dual fabric config) Max 4X ports 24 96 192 (96 if dual fabric config)

Fixed Config. (rear) 8 Horizontal Slots (rear) 16 Vertical Slots (rear) Interface 4 by 12X (optical or copper) 4 by 12X (optical or copper) Module 8 Port 12X (opt. or cop) 12 by 4X (copper) 12 by 4X (copper) Options 24 Port 4X (copper) Hybrid 9 by 4X + 1 by 12X Hybrid 9 by 4X + 1 by 12X Popular Configs Single Fabric: 192/0,128/16,96/32,0/64 0/8, 24/0 96/0, 64/8, 48/16, 0/32 (4X/12X) Dual Fabric: 96/0, 64/8, 48/16, 0/32 Redundant Power/Cooling Redundant Power/Cooling Redundant Power Redundant Control Redundant Control High Availability Redundant Cooling Hot Swap Interfaces Hot Swap Interfaces Dual Box Fault Tol. Dual Box Fault Tolerance Dual Fabric or Dual Box Fault Tol. Fabric Manager Embedded Embedded Embedded Product Avail. Q1CY04 Q2CY04 Q4CY04

© 2004 IBM Corporation IBM Systems and Technology Group Voltaire InfiniBand Switch Router 9288

ƒ Voltaire’s Largest InfiniBand switch – 288 4x or 96 12x InfiniBand ports – Non-blocking bandwidth – Ideal for Clusters ranging from tens to thousands of nodes

ƒ Powerful multi-protocol capabilities for SAN/LAN connectivity – Up to 144 GbE ports or FC ports

ƒ No single point of failure – Redundant and hot-swappable Field Replaceable Units (FRUs)

ƒ Non-disruptive software update, processor fail-over

© 2004 IBM Corporation IBM Systems and Technology Group Cluster 1350 …. Today and Tomorrow Cluster 1350 will continue to expand client choice and flexibility by offering leading-edge technology and innovation in a reliable, factory-integrated and tested cluster system. TODAY TOMORROW

• x336, x346, e326, HS20 and High Performance • Dual core technology JS20 • Expanded Blade-based offerings Nodes • New PowerPC technology

• Gigabit Ethernet, Myrinet, and • Emerging technologies Infiniband High Speed Switches, • Expanded 3rd party offerings • High performance local and network Interconnects, and • Focus on both commercial and storage solutions Storage HPC environments

• Red Hat and SUSE enterprise Leading OS & Cluster • Leading Linux distributions offerings Management Software • Enhanced cluster and HPC software • LCIT, CSM, GPFS, SCALI

• Factory hardware integration • Custom hardware, OS and DB • Single point of contact for warranty Worldwide Service and integration services service • Enhanced cluster support •Custom IGS services Support offerings

© 2004 IBM Corporation IBM Systems and Technology Group Thank you very much for your attention!

ƒ Links for more detailed information and further reading: – IBM eServer Clusters http://www-1.ibm.com/servers/eserver/clusters/ – IBM eServer Cluster 1350 http://www-1.ibm.com/servers/eserver/clusters/hardware/1350.html – Departmental Supercomputing Solutions http://www-1.ibm.com/servers/eserver/clusters/hardware/dss.html – IBM eServer Cluster Software (CSM, GPFS, LoadLeveler) http://www-1.ibm.com/servers/eserver/clusters/software/ – IBM Linux Clusters Whitepaper http://www-1.ibm.com/servers/eserver/clusters/whitepapers/linux_wp.html – Linux Clustering with CSM and GPFS Redbook http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246601.html?Open

© 2004 IBM Corporation IBM STG Deep Computing

Innovative Technologies

Confidential | Systems Group 2004 © 2004 IBM Corporation IBM Systems and Technology Group PURPLE

Over 460 TF Total IBM Solution y1.5X the total power of the Top 500 List IBM's proven capability to deliver the world's largest production quality supercomputers yASCI Blue (3.9 TF) & ASCI White (12.3 TF) yASCI Pathforward (Federation 4GB Switch) Three IBM Technology Roadmaps y100 TF eServer 1600 pSeries Cluster –12,544 POWER5 based processors –7 TF POWER4+ system in 2003 y9.2 TF eServer 1350 Linux Cluster –1,924 Intel Xeon processors y360 TF Blue Gene/L (From IBM Research) –65,536 PowerPC based nodes

© 2004 IBM Corporation IBM Systems and Technology Group BlueBlue Gene/LGene/L System (64 cabinets, 64x32x32)

Cabinet (32 Node boards, 8x8x16)

Node Board (32 chips, 4x4x2) 16 Compute Cards

Compute Card (2 chips, 2x1x1) 180/360 TF/s 16 TB DDR Chip (2 processors) 2.9/5.7 TF/s 256 GB DDR 90/180 GF/s 8 GB DDR 5.6/11.2 GF/s 2.8/5.6 GF/s 0.5 GB DDR 4 MB

© 2004 IBM Corporation IBM Systems and Technology Group Blue Gene/L - The Machine y65536 nodes interconnected with three integrated networks

Ethernet yIncorporated into every node ASIC yDisk I/O yHost control, booting and diagnostics 3 Dimensional Torus yVirtual cut-through hardware routing to maximize efficiency y2.8 Gb/s on all 12 node links (total of 4.2 GB/s per node) yCommunication backbone y134 TB/s total torus interconnect bandwidth y1.4/2.8 TB/s bisectional bandwidth

Global Tree yOne-to-all or all-all broadcast functionality yArithmetic operations implemented in tree y~1.4 GB/s of bandwidth from any node to all other nodes yLatency of tree less than 1usec y~90TB/s total binary tree bandwidth (64k machine)

© 2004 IBM Corporation IBM Systems and Technology Group The BlueGene computer as a central processor for radio telescopes. LOFAR Bruce Elmegreen IBM Watson Research Center 914 945 2448 [email protected] LOFAR = Low Frequency Array LOIS = LOFAR Outrigger in Sweden

BlueGene/L at ASTRON: 6 racks, 768 IOs 27.5 Tflops LOIS LOFAR

© 2004 IBM Corporation IBM Systems and Technology Group

Enormous Data Flows from Antenna Stations

ƒLOFAR will have 46 remote stations and 64 stations in the central core ƒEach remote station transmits: ƒ32000 channels/ms in one beam –or 8 beams with 4000 channels sample LOFAR ƒ8+8 bit (or 16+16) complex data station array and antenna array for ƒ2 polarizations each station – 1-2 Gbps from each station

ƒEach central core station transmits the same data rate in several independent sky directions (for epoch of recombination experiment) ƒt 110 - 300 Gbps input rates to central processor

© 2004 IBM Corporation IBM Systems and Technology Group BlueGene Replaces Specialized Processors

32-processor Mark IV digital 32-node BlueGene/L board with correlator (MIT, Jodrell Bank, 1. 64x64 bit comp. prod every 2 clock cycles ASTRON) 2. Four Gbps ethernet IOs 3. One chip type (dual core PowerPC) 4. A LINUX "feel" © 2004 IBM Corporation IBM Systems and Technology Group

Thank You!

for your time & attention

Questions?

© 2004 IBM Corporation