Status and Trends in High Performance Computing & Grids

Selected Viewpoints

Dr. Frank Baetke HPCD Richardson, Tx. & Munich

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Trends

Daegu 25.5.2005 the landscape of processor technologies

NewNew NewNew featuresfeatures !! (SMT) featuresfeatures ?(SMT)! tm Itanium2 9M

PA-8900 POWER5 xplicitly tm EExplicitly 2 PA-8800 Alpha EV7 PParallelarallel nstruction MultipleMultiple Cores Cores & & IInstructionItanium2 IntegratedIntegrated InterconnectsInterconnects CComputingomputing POWER4 Alpha EV68 EM64T /Z -64 bit /390 31 bit PA 8700 Technology IA-32 Family OOO / SuperScalar CISC&RISC UltraSPARC UltraSparc is the only MIPS 16K CISC RISC remaining ‘non OOO’ processor.

Conservative Approach Innovative Approach 2 © 2002 Daegu 25.5.2005 CISC RISC EPIC

Daegu 25.5.2005 CISC, RISC and EPIC Instructions

CISC (,x86-64, , z)

Instruction 0 Instruction 1 Instruction 2 Instruction 3 variable length RISC (alpha, mips, pa-risc, power, )

Instruction 0 Instruction 1 Instruction 2 Instruction 3 constant length EPIC (Itanium) 127 Instruction 0 Instruction 1 Instruction 2 Template constant length with template field

Daegu 25.5.2005 Details – not widely known: IA-64 vs. x86-64

Daegu 25.5.2005 Itanium vs. Opteron (Addressing/Registers)

Opteron Madison 0.13u 0.13u Clock (for this comparison) 2.2 GHz 1.5 GHz

Physical address Space 40 bit 50 bit Virtual address space 48 bit 64 bit

Int (=GRs) Registers 16 128 I-NaT-bits (for speculation) - 128 Pre-Regs (for branch elim) - 64 Float Registers 8 128 SSE2 (SIMD only) 16 -

Daegu 25.5.2005 Itanium vs. Opteron (Address translation)

Opteron Itanium TLB entries (instructions) 512 128 TLB-I associativity 4 128 (!) TLB entries (data) 512 128 TLB-D associativity 4 128 (!) supported page sizes 4 KB, 2 MB 4 KB … 4 GB resulting address range with no TLB miss/fault 1 GB 512 GB

Comment: x86-64 heritage is again visible from the supported memory page sizes 4 KB and 2 MB (as in IA-32). Itanium supports variable page size from 4 KB up to 4 GB (!) and a resulting ‘fault- less’ address space of 512 GB !!

Daegu 25.5.2005 TOP500 Trends

Daegu 25.5.2005 Top500 Supercomputer – Nov 2004 Trends in Vendor Slots

www.top500.org 250 200 150 100 Nov-02 Jun-03 50 Nov-03 Jun-04 0 Nov-04 HP IBM SGI Sun Cray Dell NEC

Daegu 25.5.2005 250 Top500 Supercomputers - Nov 2004 Trends in Processors 200 Nov-02 11/02 - 06/03 - 11/03 - 06/04 - 11/04 Jun-03 www.top500.org Nov-03 Jun-04 Nov-04 150

100

50

0 Xeon HP - PA Pow er4 IA64 Alpha MIPS Pow er Opteron SPARC Industry Standard Architectures are the only ones growing! Daegu 25.5.2005 Top500 Supercomputers – Jun 02 to Nov 04 Trends in Processors

500 Other 450 400 SPARC 350 er Pow S eron 300 MIP Opt pha 250 Al 4 wer 200 Po Xeon 150 PA-RISC 100

50 IA64 0 Jul-03 Jul-04 Jun-03 Jun-04 Oct-03 Daegu 25.5.2005 Oct-04 Jan-03 Apr-03 Jan-04 Apr-04 Nov-02 Mar-03 Nov-03 Mar-04 Nov-04 Feb-03 Feb-04 Aug-03 Aug-04 Dec-02 May-03 Dec-03 May-04 Sep-03 Sep-04 OE Trends (The Matrix)

Daegu 25.5.2005 All, system vendors, software vendors and customers are obviously facing a similar question:

Which operating environments will survive ?

Daegu 25.5.2005 Operating Systems vs. Operating Environments II Nomenclature:

A = available and supported by ISVs a = available but (very) limited support

P = planned and announced

M = migration announced

A, P = endangered environments

Daegu 25.5.2005 A Processors / Operating Systems • Let‘s have a look at the yellow special purpose

systems ... OS40 AIX HP- Linux WIN Linux WIN z/OS Solaris 0 BlueG UX 64 64 32 32

/390+ a

Power400 a

PowerBG a

Power A a

USparc A

Itanium A A A

X86-64 A A A A

X86 a A A

Daegu 25.5.2005 OE-Matrix – currently available (A)

AIX Solari Tru 64 HP- Linux WIN Linux WIN s UX 64 64 32 32 Power A a USparc A Alpha A PA-RISC A Itanium A A A X86(64) A A A X86 a A A

Daegu 25.5.2005 OE-Matrix – planned OEs (P)

AIX Solari Tru 64 HP- Linux WIN Linux WIN s UX 64 64 32 32 Power A a Sparc64 A Alpha A PA-RISC A Itanium A A A X86(64) P A P A A X86 a A A

Daegu 25.5.2005 OE-Matrix – changes / migrations ...

AIX Solari Tru 64 HP- Linux WIN Linux WIN s UX 64 64 32 32 Power A a Sparc64 A Alpha m PA-RISC m Itanium A A A X86(64) a A a A A X86 a A A

Daegu 25.5.2005 OE-Matrix–HPC Node/Server Futures (32bit environments become obsolete )

AIX Solari Tru 64 HP- Linux WIN Linux WIN s UX 64 64 32 32 Power A a Sparc64 A

Alpha

PA-RISC

Itanium A A A X86(64) a A A A A X86 A A

Daegu 25.5.2005 OE-Matrix (64bit). Next steps …

AIX Solari Tru 64 HP- Linux WIN Linux WIN s UX 64 64 32 32 Power A a Sparc64 A

Itanium A A A X86(64) a A A

X86

Daegu 25.5.2005 … leading to a potential consolidation

Other AIX Solari Tru 64 HP- Linux WIN Linux WIN VMS s UX 64 64 32 32 .... Power A Sparc64

Itanium A A A X86(64) A A

X86

Daegu 25.5.2005 Server Trends

Fast ! Good ! Cheap !

Daegu 25.5.2005 Server Trends

Fast ! Good ! Cheap !

Pick any two but only two!

Daegu 25.5.2005 Servers (HP Examples)

Daegu 25.5.2005 HPC platforms

• Choice • Performance • Manageability HP Integrity HP Integrity Superdome HP Integrity rx8620 HP Integrity rx7620 HP Integrity rx4640 HP Integrity rx2620 rx1620

HP ProLiant HP DL585 ProLiant HP ProLiant Version 2 HP ProLiant DL360 G4p BL20p G3 DL140 G2 DL385 BL25p DL145 G2 DL380 G4 BL30p Daegu 25.5.2005BL35p Blades for HPC The benefits of blades applied to HPC clusters

• Simplified management • Designed for performance and scalability • Reduced interconnect and network complexity • High density • Centralized • New challenges … Front Rear Daegu 25.5.2005 CFD Modeling of Machine Room

HP used CFD and 3D modeling to understand thermal characteristics of data centers

Daegu 25.5.2005 Static Smart Cooling Analysis Planning Failure Mitigation – CRAC #1 failure Unacceptable temperature level Develop Strategies for Failure Mitigation • Workload Migration • “Smart” Redundancy of AC equipment

** It took 80 seconds to reach unacceptable temperature level.

Daegu 25.5.2005 Cluster Trends

Daegu 25.5.2005 A Generic Cluster Vision … strong tendency towards standards!

High Speed Interconnect

Lustre servers

Log-in App node App node

App node App node Object Storage Servers Log-in App node App node OST

OST

inbound connections App node App node

OST Admin App node App node Meta Data Servers Compute Nodes O(1000+) Services MDS

MDS Services Viz node Viz node

Viz node Viz node Scalable HA Scalable HA Services Storage Storage Viz node VIz node Farm Farm Service Nodes Visualization Nodes O(100s) O(“dozens”) Pixel Network

single process space across all nodes Multi-Panel Display Device

users Daegu 25.5.2005 SSCK – new 10+ TFLOP/s System (University of Karlsruhe)

thin nodes for MPI applications fat nodes for multithreaded appl.

......

fast interconnect

. . .

network

nodes for login, service nodes pre-/postproc.

parallel file system

Daegu 25.5.2005 Clusters will dominate: what else is of interest

Daegu 25.5.2005 … A Scalable Cluster Filesystem

System area network Network I/O

Sys Admin

system support

Dedicated resources Compute Farm For Metadata service and lock management SFS runs on IA-32, Itanium and Opteron-based nodes

Daegu 25.5.2005 Increasing filesystem technology to meet scalability and performance metrics

3,5 • I have over a year’s Lustre over Elan4 3 Lustre over Elan3 experience running a Aggregate Local IO 2,5 NFS over GigE data center using 2

GB/s 1,5 Lustre for aggressive 1 IO from a 0,5 0 supercomputer. 1248− Highly stable Clients 660MB/s from a single client with a − Still hard to manage simple “dd” is faster than any local or global filesystem we have tested. (Scott Studham, PNNL)

We are finally in the era where global filesystems provide faster access

Daegu 25.5.2005 Making Lustre a Products

Daegu 25.5.2005 HP SFS: delivers value by providing an unmatched combination of features

The Lustre-based HP SFS product is the result of a joint resarch and development project (Hendrix) between the HP, the US DoE and Cluster File System Inc.

SFS AFS NFS Network-Neutral Yes No No Cluster-Wide POSIX Semantics Yes No Yes No Single Points of Failure Yes Yes No High Performnace Direct I/O Yes No No Seperate Metadata and Data Servers Yes No No Scalable to Thousands of Nodes Yes Yes No

Daegu 25.5.2005 Grid Trends

Daegu 25.5.2005 European Analysts Briefing, London. September 5, 2000 hp confidential Many definitions of Grid ...

Daegu 25.5.2005 Sep 2004 Jan 2005

Daegu 25.5.2005 24 Jan 2005: HP, IBM, Intel and Sun Launch Globus Consortium

http://www.globusconsortium.org/

Daegu 25.5.2005 (some of) HP’s Grid Collaborations

Daegu 25.5.2005 HP Collaboration and Competency Network

HP CCN is a forum to facilitate collaboration, innovation, discovery, and competency sharing between HP and our HPC customers and partners

Collaboration areas: − Computational and data grids − Global file system for Linux (Lustre – HP SFS) − Scientific visualization (SEPIA) − Linux SMP scaling − Algorithms & Methods

Daegu 25.5.2005 Why HP CCN?

• Persistent framework for partnership with high- performance computing customers • Enable customers to interact with HP’s R&D community and with each other in a structured manner • Facilitate exchange of expertise and experience among like-minded people world-wide • Avoid the disadvantages of creating ad-hoc relationships on a case-by-case basis

Daegu 25.5.2005 HP CCN Members

KTH Stockholm

Univ. of Bristol Univ. of Birmingham HLBS Uni Karlsruhe Uni Calgary Tech. U of Berlin (TUB) SharcNet PSNC PNNL CEA/DAM Seagate Stoney Brook Uni Utah Santa Clara U IndianaU Johnson & Johnson HP OakRidge Natl Labs CESCA Uni Lecce Uni Houston Cal Tech CILEA Rice Univ NCMIR ASCC UTexas-Austin Uni Calabria Uni Naples National Grid CASPAR Office of Singapore

Uni Federal de Campina

Daegu 25.5.2005 HP CCN Grid Collaboration

• Objective Advance the state of the art in creating, running, and using grids

• Member Engagement − Regular technical teleconferences and discussions − The HP CCN Grid, an operational grid that includes systems at member sites and at HP − Exchange of applications, , and tools − Experiment and test new grid technology • Expected Outcome A worldwide community of leading-edge research institutions working together to produce a body of understanding and best practices for grids

Daegu 25.5.2005 Recent Teleconference Topics

• Use of Realm Specific IP (RSIP) in conjunction with MPICH-G2 National Grid Office of Singapore and HP • “OurGrid”, a grid for smaller and mid-size research labs Universidade Federal de Campina Grande

• Monitoring of Grid environments University of Calgary Grid Research Center

• Data replication management system using WSRF University of Calgary Grid Research Center

• Requirements for Grid management HP

Daegu 25.5.2005 SEPAC South European Partnership for Advanced

ComDaegup 25.5.2005uting SEPAC Partner Sites & Logical Grid Structure Partners ETH/UniZH CSCSCILEA SPACI

User-level UnicoreUnicore UIUI ???? EngineFrameEngineFrame GRBGRB (Lecce)(Lecce)

Apps-level GEAs (Grid-Enabled Applications

MDSMDS MPICH-G2MPICH-G2 AAAAAA GLOBUSGLOBUS 2.4.32.4.3 (->(-> 4.0.x)4.0.x) GRID-level UnicoreUnicore

OS-level AIX AIX Tru64 HP-UX HP-UX Linux-64 Linux-32 Linux-64 Linux-32 Linux-64 Linux-32 Linux-32 SuperUX Linux-32 Linux-64 Linux-32 Linux-64 Linux-32 Linux-64 Linux-32 SuperUX

Systems-level

WAN-level

Daegu 25.5.2005 Deploying the LHC Grid Service

Today > 30 countries, 130 sites > 14,000 CPUs > 5 Pb storage

Today − 2 HP sites – Bristol and Puerto Rico − HP 1st commercial member of the LHC grid − IA64 resources on LCG ... 2nd to Poland

Daegu 25.5.2005 Roadmap to HP Labs Grid Vision

IT as a cost IT as a profit IT as a revenue Public centre centre generator utility/grid Lar ge ente Large rprises enterprises intragrids become both service providers and consumers of service provider utilities services es nterpris large e SMEs become dium & all, me primarily Sm consumers of services

2005 Daegu 25.5.2005 2015 Closing Remarks

• HPC applications are the major driver of – and the best fit for the early adoption of a utility computing business model

• As usual, HPC learnings and advances will influence mainstream enterprise IT

• Broad industry agreement on vision of utility/ grid/service-oriented/adaptive/on-demand computing

• The roadmap to the vision is unfolding: lots of different starting points, similar destinations

Daegu 25.5.2005 Thank You

Daegu 25.5.2005