Application of General Purpose HPC Systems in HPEC

David Alexander Director of Engineering SGI, Inc.

dca 8/2003 SGI HPC System Architecture

dca 8/2003 SGI Scalable ccNUMA Architecture Basic Node Structure and Interconnect

C C C C A A A A C C C C H CPU CPU H H CPU CPU H E E E E Processor Interface Processor Interface (>6GB/sec) NUMAlink Interconnect (>6GB/sec) (>6GB/sec) Interface Interface Chip Chip Memory Interface Memory Interface (>8GB/sec) (>8GB/sec)

Physical Memory Physical Memory (2-16GB) (2-16GB)

Physical Memory SGI Scalable ccNUMA Architecture Scaling to Large Node Counts

CPU CPU CPU CPU CPU CPU CPU CPU

Interface Interface Interface Interface Chip Chip Chip Chip

Memory Memory Memory Memory

System Interconnect Fabric • Scaling to 100’s of processors • RT Memory Latency < 600ns worst case (64p config) • Bi-section bandwidth >25GB/sec (64p config) SGI Scalable ccNUMA Architecture SGI® NUMAflex™ Modular Design

System Performance:Performance: HighHigh--bandwidthbandwidth Interconnect interconnectinterconnect withwith veryvery lowlow latencylatency

CPU & Flexibility:Flexibility: TailoredTailored configurationsconfigurations forfor Memory differentdifferent dimensionsdimensions ofof scalabilityscalability InvestmentInvestment protection:protection: AddAdd newnew System I/O technologiestechnologies asas theythey evolveevolve Scalability:Scalability: NoNo centralcentral busbus oror switch;switch; justjust Standard I/O modulesmodules andand NUMAlink™NUMAlink™ cablescables Expansion High BW I/O Expansion Graphics Expansion

Storage Expansion

Modules from SGI® Onyx® 3000 Series SGI Scalable ccNUMA Architecture The Benefits of Shared Memory

Traditional Clusters SGI® ™ 3000 Commodity interconnect Fast NUMAflex™ interconnect mem mem mem mem mem ... mem Global shared memory node node node node node node node node node node + + + + + + + + + ... + OS OS OS OS OS OS OS OS OS OS What is shared memory? • All nodes operate on one large shared memory space, instead of each node having its own small memory space Shared memory is high-performance • All nodes can access one large memory space efficiently, so complex communication and data passing between nodes aren’t needed • Big data sets fit entirely in memory; less disk I/O is needed Shared memory is cost-effective and easy to deploy • The SGI Altix 3000 family supports all major parallel programming models • It requires less memory per node, because large problems can be solved in big shared memory • Simpler programming means lower tuning and maintenance costs SGI® Altix™ 3000 HPC Product

dca 8/2003 SGI® Altix™ 3000 Overview

First Linux® node with 64 CPUs in single-OS image First clusters with global shared memory across multiple nodes First Linux solution with HPC system- and data-management tools World-record performance for floating- point calculations, memory performance, I/O bandwidth, and real technical applications SGI® Altix™ 3000 Fusion of Powerhouse Technologies

SGI® supercomputing technology, Intel’s most advanced processors, and open-source computing

Global shared memory (64-bit shared memory across cluster nodes) SGI ProPack™ and supercomputing enhancements SGI® NUMAlink™ (Built-in high-bandwidth, low-latency interconnect) SGI® NUMAflex™ (Third-generation modular supercomputing architecture) Intel® ® 2 processor family

Industry-standard Linux® SGI® Altix™ 3000 Family Extreme Power, Extreme Potential

Model 3700 Superclusters

Scalable performance offering Model 3300 Servers 4–64 Itanium 2 processors in a single node 1.30 GHz/3MB L3 cache Single-node entry offering 1.50 GHz/6MB L3 cache 4–12 1.30 GHz Itanium® 2 Shared memory across nodes processors, 3MB L3 cache Scalable to 2,048 processors, 16TB memory Up to 96GB memory Nodes up to 64P, 4TB memory Multi-Paradigm Architecture

dca 8/2003 Multi-Paradigm Architecture Overview

IO IO R • NUMAlink system interconnect MSEC MSEC C C C • General-purpose compute nodes R R C C IO • Peer-attached general purpose I/O V C MSE • Mission-specific accel. and/or I/O (MSE) C R R C

V • Integrated graphics/visualization MSEC MSEC IO IO Multi-Paradigm Architecture Mission Specific Element (MSE)

Memory

System >5GB/s FPGA, I/O Interconnect TIO ASIC DSP, or Fabric ASIC NL4 Ports (>5GB/s x 2) MSE

Implementation variants: • Customer supplied/loaded FPGA algorithms and/or ASIC/DSP • Subroutine or standard library acceleration • Specific use “appliance” Embedded High Performance Computing (EHPC)

dca 8/2003 Embedded High Performance Computing Unified Development/Deployment Environment EHPC Today EHPC Vision Different Development and Deployment Environments Unified Development and Deployment Enhanced software development productivity Superior performance and HW utilization Demonstration to deployment in days Benefits from mainstream HPC advances Development Environment HPC methods, tools, hardware Scalable ccNUMA Architecture provides freedom to develop new approaches System Architecture Embedded Systems

Platform Port Packaging Months of work SGI Millions of dollars EHPC Significant schedule risk System Significant architectural/performance risk Development Model

Seamless HPC Application Mission-Specific Processing

Deployment Proprietary hardware and software “Islands of memory” narrows algorithm options Embedded High Performance Computing EHPC Platform

Scalable ccNUMA

System Architecture Embedded Systems

• Unified software environment for development, SGI Packaging prototype and deployment EHPC System Development Model • Full multi-paradigm computing support Mission-Specific Seamless HPC Application • Field-upgradeable mission specific processing Processing acceleration • Highly optimized performance per watt and performance per slot • Design to fit into established environmentals, form factors, and interfaces • 6U Eurocard form factor • Passive, slot configurable backplane • PMC module connectivity to standard interfaces • Able to address oceanic, ground and airborne environmentals Embedded High Performance Computing Family of 6U Form Factor Modules

6U Form Factor 6U Form Factor

Cache Cache

PMC Module Site uP uP

PIC

Memory Memory PMC Module Site DDR SDRAM DDR SDRAM NUMAlink NUMAlink4 Controller Memory Ports Memory DDR SDRAM DDR SDRAM NUMAlink4 Memory NUMAlink Memory Ports DDR SDRAM DDR SDRAM Controller Memory System Diagnostic & Memory DDR SDRAM Control L1 Ports Diagnostic & DDR SDRAM System Control L1 Ports

Scalable General Purpose General Purpose Processor/Memory Blade I/O Blade Embedded High Performance Computing Family of 6U Form Factor Modules

6U Form Factor 6U Form Factor Memory Memory

6.4GB/s NUMAlink4 NUMAlink4 FPGA TIO Port Port ASIC TIO FPGA

External I/O NL4 System Diagnostic & External I/O NL4 System Diagnostic & (Optional) Port Control L1 Ports (Optional) Port Control L1 Ports

6.4GB/s NUMAlink4 NUMAlink4 FPGA TIO Port ASIC TIO Port FPGA

Memory Memory

Mission-Specific Accelerator and/or I/O Blades www.sgi.com

©2002 , Inc. All rights reserved. Silicon Graphics, SGI, IRIX, Challenge, Origin, Onyx, and the SGI logo are registered trademarks and OpenVault, FailSafe, Altix, XFS, and CXFS are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries worldwide. MIPS is a registered trademark of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. UNIX is a registered trademark of The Open Group in the U.S. and other countries. Intel and Itanium are registered trademarks of Intel Corporation. Linux is a registered trademark of Linus Torvalds. Mac is a registered trademark of Apple Computer, Inc. Windows is a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.Red Hat is a registered trademark of Red Hat, Inc. All other trademarks are the property of their respective owners. (1/03)