Cray CS-Storm Nodes to Handle Billions of Entries 70 – K80 Will Be an Advantage Generic 2-GPU 60 50 40 “CS-Storm’S Dense, Tightly Integrated Dense GPU 30

Realizing GPU Computation at Scale CS-Storm: Building a Productive & Reliable GPU Platform John K. Lee, Vice President, Cluster Products Maria Iordache PhD, Product Management Director C O M P U T E | S T O R E | A N A L Y Z E Legal Disclaimer Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document. Cray Inc. may make changes to specifications and product descriptions at any time, without notice. All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Cray uses codenames internally to identify products that are in development and not yet publically announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user. Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA and YARCDATA. The following are trademarks of Cray Inc.: ACE, APPRENTICE2, CHAPEL, CLUSTER CONNECT, CRAYPAT, CRAYPORT, ECOPHLEX, LIBSCI, NODEKARE, THREADSTORM. The following system family marks, and trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT and XT. The registered trademark LINUX is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other names and brands may be claimed as the property of others. Other product and service names mentioned herein are the trademarks of their respective owners. Copyright 2015 Cray Inc. C O M P U T E | S T O R E | A N A L Y Z E 2 Cray Inc. - GTC, March 2014 Agenda ● About Cray ● Building a Productive & Reliable dense GPU Platform – CS-Storm ● Realizing GPU computation at scale K40 or K80 inside C O M P U T E | S T O R E | A N A L Y Z E 3 Cray Inc. - GTC, March 2014 About Cray Seymour Cray founded Cray Research in 1972 • 1972-1996, Cray Research grew to leadership in Supercomputing • 1996-2000, Cray was subsidiary of SGI • 2000- present, Cray Inc. growing to $561M in revenue in 2014 Cray Inc. • NASDAQ: CRAY • Over 1,000 employees across 30 countries • Headquartered in Seattle, WA Three Focus Areas • Computation • Storage • Analytics Seven Major • Austin, TX • San Jose, CA Development Sites: • Chippewa Falls, WI • Seattle, WA • Pleasanton, CA • Bristol, UK • St. Paul, MN C O M P U T E | S T O R E | A N A L Y Z E Cray Inc. Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data Modeling The World Cray Supercomputers solving “grand challenges” in science, engineering and analytics Data Models Math Models Data-Intensive Processing Modeling and Integration of datasets and math models for simulation augmented High throughput search, analysis, with data to provide event processing & the highest fidelity data capture from predictive modeling virtual reality results sensors, data feeds and knowledge and instruments discovery Compute Store Analyze C O M P U T E | S T O R E | A N A L Y Z E Cray Inc. Supercomputing Leadership Top 500 Supercomputers in the World November 2014 Top 50 Top 100 Top 500 Cray Systems 16 28 62 Vendor Rank #1 #1 (tied) #3 C O M P U T E | S T O R E | A N A L Y Z E 6 Cray Inc. - GTC, March 2014 Cray has vast experience building very large scale, GPU-based HPC systems ● Cray has more accelerated systems in Top500 than anyone ● 75 of Top500 are systems with accelerators ● 14 of these are made by Cray ● Cray systems have more GPU performance (Rmax) on Top500 than all others combined! ● GPUs supported on both, Cray’s Cluster Systems and Supercomputer line – multiple of each in Top500 ● CS-Storm dense-GPU Computing Platform, with 8 GPUs / server – launched in Aug 2014: ● #10 on Top500, Nov 2014 ● #4 on Green500, Nov 2014 C O M P U T E | S T O R E | A N A L Y Z E 7 Cray Inc. - GTC, March 2014 Background on CS-Storm ● Started as a custom engineering project in late 2013 ● Customer funded project to build the highest performing PCIe-attached Accelerator platform ● Able to run the host processors at peak performance ● Able to run Accelerators at peak performance for cards up to 300W ● Successfully delivered a production system with over 4,480x K40 GPUs in a single system ● #10 on last year’s Top500 list ● Another system is #4 on the Green500 list ● Designed to support any full-height, double width PCIe accelerators ● Currently supports K40 and K80 GPUs C O M P U T E | S T O R E | A N A L Y Z E 8 Cray Inc. - GTC, March 2014 CS-Storm: Design What makes it special? Efficient air-cooled 2U design, with enough power and cooling capacity to operate eight current and future GPUs and CPUs at peak level without capping their performance • Pull/push fan architecture efficiently cools the chassis, providing a consistent temperature across the system and its accelerators Most efficient use of PCIe bandwidth to operate the eight GPUs at full performance • Cray R&D performed a comprehensive signal integrity study to determine the electrical signaling capabilities of the PCIe bus Innovative, easy-to-maintain server design, providing easy access to all components • 3 x 1,630W power supplies capable of 2+1 redundancy • Native 480 V and 208 V server power supplies – increases power efficiency Room-neutral cooling via optional rear-door heat exchangers C O M P U T E | S T O R E | A N A L Y Z E 9 Cray Inc. - GTC, March 2014 CS-Storm: Innovative Design Six local SSD drives Host Intel processors IVB / HSW 512 / 1024 GB max 16 DIMMS 2x4 NVIDIA K40 or K80 11.4 or 15 TF/node 2U form factor • 22 nodes / 48U rack • 176 GPUs / 48U rack C O M P U T E | S T O R E | A N A L Y Z E 10 Cray Inc. - GTC, March 2014 CS-Storm: Server PCI-e Layout Haswell motherboard High Speed High Speed Network2 Network1 QPI performance Optimized GPU Optimized C O M P U T E | S T O R E | A N A L Y Z E 11 Cray Inc. - GTC, March 2014 GPU Cage Serviceability 24”-wide chassis provides more room than standard 19” chassis to effectively service and cool the GPUs C O M P U T E | S T O R E | A N A L Y Z E 1 Cray Inc. - GTC, March 2014 2 Individual GPU Access ● Each cage holds 4x GPUs. ● Each GPU assembly can be removed easily through a blind mate quick disconnect. 3 1 4 2 GPU assembly C O M P U T E | S T O R E | A N A L Y Z E 1 Cray Inc. - GTC, March 2014 3 Future-Proof Design PCIe Gen3 standard size K40 / K80 Maximum dimension PCIe card supported: 39.06 mm x 132.08 mm x 313.04 mm C O M P U T E | S T O R E | A N A L Y Z E 14 Cray Inc. - GTC, March 2014 CS-Storm System: Cooling Performance GP GPU Power CS-Storm's pull/push fan architecture U # Temperature Consumption efficiently cools the chassis, providing 1 63 213W a consistent temperature across the 2 64 222W system and its coprocessors. 3 63 214W CS-Storm is designed to provide enough 4 61 218W air flow to keep all GPUs cool under the most challenging compute workloads 5 53 218W 6 53 215W . Measured at Chippewa Falls manufacturing facility; with 27C ambient temperature 7 57 218W . DGEMM was running on all GPUs at the time of 8 51 213W record capture . The table shows the “hottest” of the 22 nodes in the rack C O M P U T E | S T O R E | A N A L Y Z E 1 Cray Inc. - GTC, March 2014 5 CS-Storm System: Efficient, Room Neutral Cooling Customized 48U RDHx with 64.2 kW max cooling Rear-door heat exchangers are available in both 42U and 48U heights C O M P U T E | S T O R E | A N A L Y Z E 16 Cray Inc. - GTC, March 2014 Software at Scale: Partnerships deliver a complete software ecosystem Essential software and management tools needed to build a powerful, flexible and highly available supercomputer Development & Intel® Parallel PGI Cluster NVIDIA® Performance ® Studio XE Cluster GNU toolchain Cray PE on CS * Development Kit® CUDA® Tools Edition HPC Application Cray LibSci, Programming Intel® MPI, MKL Platform MPI MVAPICH2 OpenMPI Libraries LibSci_ACC Tools Rogue Wave Debuggers Allinea DDT, MAP Intel® IDB PGI PGDBG® GNU GDB TotalView® Resource Adaptive Computing: IBM Platform™ Management / SLURM MOAB® / Maui / Altair PBSPro Grid Engine LSF® Job Scheduling Torque Schedulers, File Systems and File Systems Lustre® NFS GPFS PanFS® Local (ext3, ext4, XFS) Management Cluster Cray® Advanced Cluster Engine (ACE™) Management Software** Management Drivers and Operating Accelerator software stack & drivers OFEDTM Systems and Network Mgmt.

Cray CS-Storm Nodes to Handle Billions of Entries 70 – K80 Will Be an Advantage Generic 2-GPU 60 50 40 “CS-Storm’S Dense, Tightly Integrated Dense GPU 30

An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System

A Scheduling Policy to Improve 10% of Communication Time in Parallel FFT

View Annual Report

TECHNICAL GUIDELINES for APPLICANTS to PRACE 17Th CALL

Cray XC40 Power Monitoring and Control for Knights Landing

Hpc in Europe

Piz Daint”, One of the Most Powerful Supercomputers

Accelerating Science with the NERSC Burst Buffer Early User Program

Cray System Monitoring: Successes, Requirements, and Priorities

Using NERSC for Research in High Energy Physics Theory

Cray XC40 Power Monitoring and Control for Intel Knights Landing Processors Steven J

XC40 Programming Environment Form OS up to Modules