Cray Xk6 Redefining Supercomputing

CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS | Introduction | History | Specifications | Cray XK6 | Architecture | Performance | Industry acceptance and applications | Summary INTRODUCTION | The Cray XK6 supercomputer is a trifecta of scalar, network and many-core innovation. | Hybrid supercomputer | Combination of: Cray’s Gemini interconnect, AMD's leading multi-core scalar processors and NVIDIA’s powerful many-core GPU processors | Enhanced version of XE6 | Uses Blade architecture as in Cray XE6 | Capable of scaling to 500,000 scalar processors and 50 petaflops of hybrid peak performance HISTORY | In 1988, Cray Research introduced Cray Y-MP, the world's first supercomputer | Sustained over 1 gigaflop on many applications | Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7 gigaflops per processor. | The Hitachi SR2201: peak performance of 600 gigaflops in 1996 by using 2048 | The Intel Paragon had 1000 to 4000 Intel i860 processors, was ranked the fastest in the world in 1993 SUPER-COMPUTER STATISTICS COMPARISON WITH THE PRESENT CRAY SUPERCOMPUTERS CRAY XK6- ARCHITECTURE | Four nodes per blade | Adaptive hybrid computing | Scalable compute nodes, I/Os | Gemini Mezzanine | Plug compatible with | Cray XE6 blade | Configurable processor, memory and SXM GPU | AMD Opteron 6200 Series processor: y Highly associative on-chip data cache supports aggressive out-of-order execution y Integrated memory controller y Significant performance advantage to algorithms • The NVIDIA Tesla 20-series: Based on the next generation CUDA GPU architecture codenamed “Fermi NODE- ARCHITECTURE XK6 ACCELERATOR BLADE GEMINI INTERCONNECTION NETWORK GEMINI INTERCONNECTION NETWORKS | Each node acts as 2 nodes on a 3D Torus | Each Node provided with a High Radix YARC router to support up to 168 Gbps. | Parallel electrical and optical paths y High Bandwidth and lower latency for both long and short messages y Low cost of integration | Gemini Mezzanine card to avoid memory – ICN bottlenecks. NVIDIA TESLA X2090 | Special Embedded version of Tesla M2090. | Provides High Performance Computing for highly parallel applications. | 448 cores with 6 GB GDDR5 Memory. Can support up to 600+ GFLOPs | High Bandwidth to host – Quick Master-Slave Communication. | CUDA capable for easy programmability. CRAY XK6 CABINETS | Each cabinet has up to 96 processors | Two processors wrapped in the form of a “blade” (XE6 compatible) | With 1536 cores, can give 70+ TFLOPs performance SPECIFICATIONS SPECIFICATIONS PERFORMANCE- LUDWIG | 10 cabinets of Cray XK6 | 936 GPUs (nodes) | Only 4% deviation from perfect scaling between 8 and 936 GPUs | Application sustaining 40+ Tflop/s and still scaling... | Strong scaling also very good, but physicists want to simulate larger systems PERFORMANCE - HIMENO | Parallel 3D Poisson equation solver benchmark | iterative loop evaluating 19-point stencil | Co-Array Fortran version of code | Fully ported to accelerators using 27 directive pairs | Strong scaling | Use asynchronous GPU data transfers and kernel launches to help avoid this INDUSTRIAL ACCEPTANCE • Oak Ridge National Laboratory Jaguar/TITAN | High computation capacity for Scientific research | 200 cabinets with > 18000 nodes. | Estimated 10 – 20 PFLOPs | Currently upgrading from XT5 based Jaguar system to XK6 based Titan system with increased performance. INDUSTRIAL ACCEPTANCE INDUSTRIAL ACCEPTANCE | CSCS- Swiss National Super Computing Centre | Cray XE6 y 402 Tflops y 1496 nodes y Gemini Interconnects | Cray XK6 y 176 nodes with one AMD and one GPU element each SUMMARY | Higher Supercomputing potential with GPU Accelerated computing | Better Inter node communication with the Gemini Optical interconnects | Backward compatible with XE6 cabinets and can be merged with XE6 systems. | Highly suited to Scientific Research computations requiring high computational power of the order of 100s TFLOPs REFERENCES | http://www.cray.com/Products/XK6/XK6.aspx | CrayXK6Brochure.pdf | http://en.wikipedia.org/wiki/Supercomputer | http://i.top500.org/stats | Applications on Cray XK6, Roberto Ansaloni.

Cray Xk6 Redefining Supercomputing

Workload Management and Application Placement for the Cray Linux Environment™

Lessons Learned in Deploying the World's Largest Scale Lustre File

Titan: a New Leadership Computer for Science

Musings RIK FARROWOPINION

Cray XT and Cray XE Y Y System Overview

Jaguar Supercomputer

Pubtex Output 2011.12.12:1229

Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral

Use Style: Paper Title

Titan Introduction and Timeline Bronson Messer

IBM US Nuke-Lab Beast 'Sequoia' Is Top of the Flops (Petaflops, That Is) | Insidehpc.Com

Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?