Cray Xk6 Redefining Supercomputing

Cray Xk6 Redefining Supercomputing

CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS | Introduction | History | Specifications | Cray XK6 | Architecture | Performance | Industry acceptance and applications | Summary INTRODUCTION | The Cray XK6 supercomputer is a trifecta of scalar, network and many-core innovation. | Hybrid supercomputer | Combination of: Cray’s Gemini interconnect, AMD's leading multi-core scalar processors and NVIDIA’s powerful many-core GPU processors | Enhanced version of XE6 | Uses Blade architecture as in Cray XE6 | Capable of scaling to 500,000 scalar processors and 50 petaflops of hybrid peak performance HISTORY | In 1988, Cray Research introduced Cray Y-MP, the world's first supercomputer | Sustained over 1 gigaflop on many applications | Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7 gigaflops per processor. | The Hitachi SR2201: peak performance of 600 gigaflops in 1996 by using 2048 | The Intel Paragon had 1000 to 4000 Intel i860 processors, was ranked the fastest in the world in 1993 SUPER-COMPUTER STATISTICS COMPARISON WITH THE PRESENT CRAY SUPERCOMPUTERS CRAY XK6- ARCHITECTURE | Four nodes per blade | Adaptive hybrid computing | Scalable compute nodes, I/Os | Gemini Mezzanine | Plug compatible with | Cray XE6 blade | Configurable processor, memory and SXM GPU | AMD Opteron 6200 Series processor: y Highly associative on-chip data cache supports aggressive out-of-order execution y Integrated memory controller y Significant performance advantage to algorithms • The NVIDIA Tesla 20-series: Based on the next generation CUDA GPU architecture codenamed “Fermi NODE- ARCHITECTURE XK6 ACCELERATOR BLADE GEMINI INTERCONNECTION NETWORK GEMINI INTERCONNECTION NETWORKS | Each node acts as 2 nodes on a 3D Torus | Each Node provided with a High Radix YARC router to support up to 168 Gbps. | Parallel electrical and optical paths y High Bandwidth and lower latency for both long and short messages y Low cost of integration | Gemini Mezzanine card to avoid memory – ICN bottlenecks. NVIDIA TESLA X2090 | Special Embedded version of Tesla M2090. | Provides High Performance Computing for highly parallel applications. | 448 cores with 6 GB GDDR5 Memory. Can support up to 600+ GFLOPs | High Bandwidth to host – Quick Master-Slave Communication. | CUDA capable for easy programmability. CRAY XK6 CABINETS | Each cabinet has up to 96 processors | Two processors wrapped in the form of a “blade” (XE6 compatible) | With 1536 cores, can give 70+ TFLOPs performance SPECIFICATIONS SPECIFICATIONS PERFORMANCE- LUDWIG | 10 cabinets of Cray XK6 | 936 GPUs (nodes) | Only 4% deviation from perfect scaling between 8 and 936 GPUs | Application sustaining 40+ Tflop/s and still scaling... | Strong scaling also very good, but physicists want to simulate larger systems PERFORMANCE - HIMENO | Parallel 3D Poisson equation solver benchmark | iterative loop evaluating 19-point stencil | Co-Array Fortran version of code | Fully ported to accelerators using 27 directive pairs | Strong scaling | Use asynchronous GPU data transfers and kernel launches to help avoid this INDUSTRIAL ACCEPTANCE • Oak Ridge National Laboratory Jaguar/TITAN | High computation capacity for Scientific research | 200 cabinets with > 18000 nodes. | Estimated 10 – 20 PFLOPs | Currently upgrading from XT5 based Jaguar system to XK6 based Titan system with increased performance. INDUSTRIAL ACCEPTANCE INDUSTRIAL ACCEPTANCE | CSCS- Swiss National Super Computing Centre | Cray XE6 y 402 Tflops y 1496 nodes y Gemini Interconnects | Cray XK6 y 176 nodes with one AMD and one GPU element each SUMMARY | Higher Supercomputing potential with GPU Accelerated computing | Better Inter node communication with the Gemini Optical interconnects | Backward compatible with XE6 cabinets and can be merged with XE6 systems. | Highly suited to Scientific Research computations requiring high computational power of the order of 100s TFLOPs REFERENCES | http://www.cray.com/Products/XK6/XK6.aspx | CrayXK6Brochure.pdf | http://en.wikipedia.org/wiki/Supercomputer | http://i.top500.org/stats | Applications on Cray XK6, Roberto Ansaloni.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us