Welcome to the outer limits

Budapest, March 19, 2013

M.Sc. Ji ří Hlavá č HPC consultant + sales manager for CEE [email protected] ©2012 International Corp. / Presented Only Under Non-Disclosure Agreement 1 Jiri Hlavac (by Czech Ji ří Hlavá č) 51 years old, 4 children …

• MSc. Computers (1986) • Development of PC OSs for Tesla Czech (1986-1989) • Own SW company (1986-1991) • Owner SGI Distributor @ Czechoslovakia (1991-1995) • Employee @ SGI Czech office (1995-now) • Technical Director, Academic Sales, Enterprise Sales • HPC Consultant (2001-2011) • Sales Manager for Central + East Europe (2005-now)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 2 SGI = Experts @ HPC

Structural Mechanics Structural Mechanics Computational Fluid Electro-Magnetics Implicit Explicit Dynamics

Computational Chemistry Computational Chemistry Computational Biology Seismic Processing Quantum Mechanics Molecular Dynamics

Reservoir Simulation Rendering / Ray Tracing Climate / Weather Data Analytics Ocean Simulation

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 3 SGI = Focus on every detail (here Power Consumption)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure4 Agreement 4 SGI = Frontier @ Research

SGI = Winner of last HPCwire Readers' Choice Award (Nov 2012) for "Top Supercomputing Achievement" for SGI contribution to the NASA Ames Pleiades .

SGI = Winner of last HPCwire Editor's Choice Award (Nov 2012) for "Best use of HPC in 'edge HPC‘ application" for Wikipedia historical mapping and exploration on UV 2000.

SGI stock is growing

5

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure5 Agreement 5 Advanced Energy Exploration and Production Total: World's Largest Commercial HPC System (2.3 PF) SGI ICE

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement Image courtesy NOAA6 Measuring Wind Velocity on a Mast Irish Centre for High-End Computing SGI ICE

©2012Image Silicon courtesy Graphics InternationalICHEC Corp. / Presented Only Under Non-Disclosure Agreement 7 Measuring Wind Velocity on a Mast Irish Centre for High-End Computing SGI ICE

©2012Image Silicon courtesy Graphics InternationalICHEC Corp. / Presented Only Under Non-Disclosure Agreement 8 UN Chief Calls for Urgent Action on Climate Change NASA Advanced Supercomputing Division SGI ICE

Images taken by the Thematic Mapper sensor aboard Landsat 5

©2012Source: Silicon USGS Graphics Landsat International Missions Corp. / Presen Gallery,ted Only U.S.Under DepartmNon-Disclosureent Agreement of the Interior / U.S. Geological Survey 9 Finding Life in the Universe NASA Advanced Supercomputing Division SGI ICE

©2012Image Silicon courtesy Graphics NASAInternational Corp. / Presented Only Under Non-Disclosure Agreement 10 Understanding the Interaction of Solar Wind With the Magnetosphere NASA Advanced Supercomputing Division SGI ICE

Image©2012 Silicon courtesy Graphics NASAInternational Corp. / Presented Only Under Non-Disclosure Agreement 11 Modeling the World’s Climate Nat’l. Oceanic & Atmospheric Administration SGI ICE

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement Image courtesy NOAA12 Studying the Formation of Star Clusters University of Exeter SGI ICE

©2012Image Silicon courtesy Graphics MatthewInternational Corp. Bate, / Presen Universityted Only Under of Non-DisclosureExeter Agreement 13 COSMOS, Cambridge University - UV 1 & UV2

768 Cores @ UV1000 + 1856 cores @ UV2000 2 TB Shared Memory @ UV1 + 14TB @ UV2 63 Tflop/s Peak Performance 64TB IS4100 storage, CXFS Professor Stephen MOAB Grid suite Hawking inspecting the Water cooled final UV installation

Strategic projects, such as Planck satellite analysis codes Large simulation requirements of theoretical ideas and comparison with vast new observational data sets ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure14 Agreement 14 ZAMG – Austria Meteorological Institute

Altix ICE-X [4 racks] + SGI Storage [2 racks] 4000 Processor Cores, 8 TB Memory, 1 PB storage

1,5x - 2x application performance advantage against all competitors

©2012 Silicon Graphics(based International Corp. on / Presen officialted Only Under RFPNon-Disclosure15 offers Agreement submitted for evaluation) 15 Škoda Auto ICE 8200+8400, ICEX, UV1000, UV2000, SMP nodes, Storage, …

SGI UV (>1000 cores) SGI ICE (>5000 Cores, ~60 Tflop/s Peak Performance, ~10 TB Memory) SGI is Integrator of > 1 PByte storage infrastructure for HPC (Nexis 9000, ISS3500, CXFS, NAS servers) 20x pre+post processing SMP nodes (each 144GB mem.)

©2012Grid Silicon Graphicsengine International + Corp.Other / Presented HPC Only Under technologies Non-Disclosure16 Agreement 16 Agenda Examples of SGI customer’s research SGI HPC systems available in Hungary • SGI UV1000 – SMP system (Pecs) • SGI ICE 8400 – MPP/Cluster system (Debrecen) • Comparison of UV and ICE capabilities • HW accelerators available inside each system How to Succeed Together

Invitation to next SGI training (Software tools, HPC Libraries, Application optimization, etc…) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 17 Architecture of a typical Large HPC Centers in Europe

SAN and Shared File System

Large-node (SMP) systems Small-node, highly parallel e.g. SGI UV (MPP) systems e.g. SGI ICE

InfiniBand Network 18 ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 18 SGI ® ICE TM

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 19 Altix ICE at NIIF / Debrecen

1 World’s2 Tall racksfastest (Water distributed Cooled) memory computer Base on SPECmpil. Up to dual IB channels per node.

2 Scalable2048 Xeon cores @ 3,33 GHz Supports(6-core up to 131,072 CPU nodes, 1 MillionXeon + X-5680)Cores Open 3 Runs8 TB Standard memory Linux, use Intel Xeon (SundyBridge or Westmere)(4 GB/ or AMD core) Opteron 6200 CPUs 4 New Topologies Hypercube,Dual-rail enhanced QDR hypercube,InfiniBand all-to-all, fat-tree Differentiated(All-to-All (on topology) capabilities) 5 Dell M100e (single plane), Bullx (95w), Sun Blade 6048 (95w), Altix ICE 8400 Appro>500TB 5000 Series,Lustre HP c7000,filesystem IBM H Chassis (in Blades,another Cray XT6 2 Racks) 20 ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 20 Common InfiniBand Topologies

Fat Tree Enhanced All-to-All (CLOS Hypercube Hypercube Networks)

All Supported on SGI ICE X

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 21 SGI ICE Topology Study Geometric Mean Over All Applications

120% SHC 1r EHC 1r alltoall 1r fat tree 1r SHC 2r EHC 2r alltoall 2r

100%

80%

60%

40% Performance relative to 1r Performance relative SHC

20%

0% 16 32 64 128 Number of nodes

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 22 SGI ICE Topology Study: Geometric Mean for Global Interconnect Kernels (HPCC PTRANS, HPCC MPIFFT, HPCC RR BW, IMB Bisection BW, IMB All to All)

450% SHC 1r EHC 1r alltoall 1r fat tree 1r SHC 2r EHC 2r alltoall 2r 400%

350%

300%

250%

200%

150% Performance relative to relative SHC 1r Performance 100%

50%

0% 16 32 64 128 Number of nodes

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 23 ICE Differentiation: OS Noise Synchronization • OS system noise: CPU cycles stolen from a user application by the OS to do periodic or asynchronous work (monitoring, daemons, garbage collection, etc) . • Management interface will allow users to select what gets synchronized • Performance boost on larger scales systems Process on: Unsynchronized OS Noise → Wasted Cycles System Wasted Wasted Node 1 Overhead Cycles Cycles

Node 2 Wasted System Wasted Compute Cycles Cycles Overhead Cycles

Node 3 Wasted Wasted System Cycles Cycles Overhead Barrier Complete Process on : System Node 1 Overhead

System Node 2 Overhead

System Node 3 Overhead

Synchronized OS Noise → Faster Results 24 ©2012Slide Silicon 24 Graphics International Corp. / Presented Only Under Non-DisclosureTime Agreement 24 Another key SGI Differentiation Premier Software Environment for Technical Computing

• Complete, integrated Linux ® environment across all SGI systems • Highest level of scalability and performance while maintaining ISV compatibility • Best of Breed solutions delivered with industry leading partners

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 25 ANSYSAdvantages FLUENT of the SGI® -Ultimate MPI Scalability Library Throughala SGI Application SGI® Benchmarking PerfBoost Expertise World Record Scalabili ty!

• Previously reported FLUENT scaling with HP-MPI did not exceed 512 cores

• SGI ICE, powered by SGI MPI PerfBoost, scaled FLUENT to 3072 cores with a rating of 1333.3 jobs per day

•Proved the ability to run FLUENT on all 4,092 cores in the ICE system

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 26 SGI ® UV TM

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 27 Altix UV 1000 at NIIF / Pecs

1 World’s3 Tall racksfastest with shared Water memory Cooling computer Base on SPECint and SPECfp 1152 Xeon cores @ 2,6 GHz 2 Scalable Single(6-core system image CPU up to Intel 2048 cores Xeon and X-5680)16TB memory

3 Open6 TB memory Runs (4Standard GB/ Linuxes core) + Windows (Intel Xeon Processors) 4 New Markets HPC,SGI Large NumaLink Databases, Scalable5 Interconnect I/O, RISC replacement (All-to-All, >100 GB/s BB) Differentiated 5 BullXFS Mesca FileSystem (128c), HP Superdome >500TB 2 (Itanium), direct Sun SPARC M9000 (256c), IBM Power 780 (Power 7), Intel 8s designs Altix UV 1000 attached (in another 2 Racks)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 28 SGI Altix UV Shared Memory Architecture x Cluster Architecture

ICE + Commodity Clusters SGI ® Altix ® UV Platform

Infiniband or Gigabit Ethernet SGI ® NUMAlink 5 ™ Interconnect

Mem mem mem mem mem Global shared memory to 16TB ~64GB system system system system ... system System + + + + + + OS OS OS OS OS OS

• Each system has own memory and OS • All nodes operate on one large shared • Nodes communicate over commodity memory space interconnect • Eliminates data passing between nodes • Inefficient cross-node communication • Big data sets fit entirely in memory creates bottlenecks • Less memory per node required • Coding required for parallel code • Simpler to program execution • High Performance, Low Cost, Easy to Deploy

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure29 Agreement 29 Customer Challenges (and SGI UV answers)

• Need Faster Simulation on Large Datasets – Fastest CPU’s available + world record benchmarks • Need to Simulate Very Large Datasets – Large Shared Memory (16TB today, 32 TB tomorrow) • Fast Delivery to Production and Results – Standard OS+SW stack, factory integrated box • Ability to Scale System as Requirements Increase – Scalable without limits • Simple programming & Management – Natively supports OpenMP & MPI (with HW acceleration) = Simple like a BIG PC !!!

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure30 Agreement 30 Key SGI UV characteristics World’s Fastest Supercomputer • World Record SPECint_rate, SPECfp_rate, SPECjbb2005 • High speed NUMAlink® 5 interconnect (15 GB/sec) • MPI offload engines maximize efficiency (HW acceleration of MPI) • Direct access to global data sets up to 16TB • Compelling performance regardless of type of application

Scalable – Single system image scales up to 2048 cores & 16TB memory – Investment protection: start with four sockets, scale up over time

Open Platform – Leverages latest Intel® Xeon® processors (Westmere, Sundy Bridge) – Runs industry-standard x86 operating systems & application code

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure31 Agreement 31 External Flow over Truck Body (111M cells)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure32 Agreement 32 Additional Performance Acceleration

Barrier Latency <1usec (4096 thread)

 Altix UV offers up to 3X improvement in MPI reduction processes.  Barrier latency is dramatically better than competing platforms

HPCC Benchmarks

UV with MOE UV, MOE disabled  HPCC benchmarks show substantial improvement possible UV with MOE UV, MOE disabled with MPI Offload Engine (MOE)

UV with MOE UV. MOE disabled 0

Source: SGI Engineering projections ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure33 Agreement 33 Globally Shared Memory System

• NUMAlink® 5 is the glue of Altix® UV 100/1000 NUMAlink Router

Altix UV Blade Altix UV Blade Altix UV Blade Altix UV Blade

HUB HUB HUB HUB

CPU CPU CPU CPU CPU CPU CPU CPU

64GB 64GB 64GB 64GB 64GB 64GB 64GB 64GB

512GB Shared Memory Up to 16 TB Global Shared Memory

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure34 Agreement 34 Altix UV Compute Blade

Each blade: 8-16 Intel® Xeon® 7500 cores Up to 128GB DDR3

I/O risers provide choice of expansion slot capabilities

 SGI® NUMAlink® 5 = 15.0 GB/s  Intel® Quick Path Interconnect (QPI) = 25.6 GB/s aggregate (6.4GT/s)  Directory FBD1 = 6.4GB/s Read + 3.2GB/s Write (800MHz DIMMs)  Intel 7500 Scalable Memory Buffers with 4 channels of DDR3 DIMMs  Intel® Scalable Memory Interconnect (SMI)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure35 Agreement 35 Altix UV is Ideal for Wide Range of Applications

• Ideal application characteristics include – I/O-Bound and memory-bound apps – Inter-processor communications intensive apps – In-Memory and Large (VLDB) Databases – Graphs Traversal, Sort and Inferences – MapReduce – Apps with asymmetric computational patterns

• A Single System Image (SSI) system like Altix® UV is often the perfect complement to large scale-out clusters with Altix UV being the “simulation supernode”

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure36 Agreement 36 Full Spectrum of HPC applications

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 37 SGI Solution Enables Complete Workflow Under “One Roof” Hybrid Enabled Workflow Typical CAE Workflow SGI ® UV ™ SGI ® Rackable ® SGI ® ICE ™ X •CAD model creation •Mesh-generation •Model decomposition •Running solvers or •Viewing results •Adjust and repeat

Infiniband or GigE Fabric

SGI ® CXFS™, Lustre ™, Gluster and Panasas ® Same OS and System Management Across Entire CAE Workflow!

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 38 SGI: Application Experts! - (ANSYS Fluent Example)

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 39 ANSYS® FLUENT on SGI Architectures http://www.sgi.com/pdfs/4309.pdf

•ANSYS FLUENT software enables modeling of flow, turbulence, heat transfer, and reactions for industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants.

•Explores MPI performance, core frequency, choice of a network topology, memory speed and use of hyper-threading of such solutions to establish guidelines for running ANSYS FLUENT on advanced SGI computer hardware systems.

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 40 ANSYS FLUENT Benchmarking Results

Number of benchmark runs in 24 hr.

Truck model has ~14M and 111M compute elements (cells) to simulate turbulence.

Unparalleled scalability across SGI systems and processors!

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 41 Agenda Examples of SGI customer’s research SGI HPC systems available in Hungary • SGI UV1000 – SMP system (Pecs) • SGI ICE 8400 – MPP/Cluster system (Debrecen) • Comparison of UV and ICE capabilities • HW accelerators available inside each system How to Succeed Together

Invitation to next SGI training (Software tools, HPC Libraries, Application optimization, etc…) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 42 SGI Technical Computing Premier Software Environment for Technical Computing

• Complete, integrated Linux ® environment across all SGI systems • Highest level of scalability and performance while maintaining ISV compatibility • Best of Breed solutions delivered with industry leading partners

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 43 Operating Systems

Scalability & Key Features Memory RHEL 6.3: Scales to • Strong government and enterprise market adoption 2048 cores, 16 TB • KVM hypervisor support • SE Linux – Common Criteria security certification • Red Hat software maintenance and technical support SLES 11: Scales to • Top 500 systems 2048 cores/4096 • KVM hypervisor support threads, 16 TB • SGI technical support and SUSE software maintenance CentOS 6.3 scalability • Community supported, open source distribution similar to Red Hat CentOS • Suited for educational, commercial, and research organizations with skilled staff • SGI software stack supported on CentOS

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 44 SGI ® Management Center

Key Features: • Premier system management console with remote server monitoring and control. • Ease of use, full system management including GUI and CLI. • Policy driven, fine-grained power control and monitoring. • Advanced fault, event, and alert management for improved reliability. • Advanced capabilities including GPU management, BIOS management, and high availability

Comprehensive Operational Management Application for Technical Computing

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 45 SGI Performance Suite

SGI SGI SGI SGI Accelerate MPI REACT UPC

• Accelerate • SGI’s scalable, high • Hard real-time • SGI’s optimized applications with performance MPI performance for Linux Unified Parallel C optimized software environment compiler environment • Only hard real-time libraries and tools • More than just an solution for standard • Scales across SGI • Tune applications MPI library distribution Linux Numalink and without recompiling InfiniBand • Includes runtime MPI • No custom Linux • Optimize acceleration, profiling, kernel needed performance with checkpoint/restart and specialized algorithms more Top 500 level performance for standard distribution Linux

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 46 Q&A

©2012 SGI – Company Confidential

©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 47 ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 48