The Arm Ecosystem Approach to HPC

Current Products and Future Outlook

Dan Ernst, PhD Advanced Technology , Inc.

Applications to Architectures An Observation…

4

3.5

3

2.5

2

Speedup 1.5

1

0.5

0 1 2 3 4 56 7 Anonymized CPUs

Memory Bandwidth "CrayMarks"

Copyright 2018 Cray Inc. 2 An Observation…

16

14

12 Industry Roadmap 10 Trajectory

8

Speedup 6 Trajectory for 4 Applications

2

0 1 2 3 4 5 6 7 Anonymized CPUs

Memory Bandwidth "CrayMarks" Peak FP Rate

Copyright 2018 Cray Inc. 3 Application-Focused Design: The Ecosystem Way

1. Decide you want to build hardware to address a certain market 2. Design an ISA with the right features 3. Design/Build Cores 4. Design memories & interfaces 5. Test Everything 6. Write an OS 7. Write a (good) compiler 8. Port all software to run to your ISA 9. Go to Market! 10.Go back to step 1 Could this ecosystem allow more entrants into Server and HPC?

Copyright 2018 Cray Inc Copyright 2018 Cray Inc Supercomputing …Made Possible by Cray

CRAY CATAPULTS ARM-BASED PROCESSORS INTO SUPERCOMPUTING Cray Adds Arm Processors with Complete Software Stack to the Cray XC50

Seattle, WA – November 13, 2017 – Global supercomputer leader Cray Inc. (Nasdaq: CRAY) today announced the Company is creating the world’s first production-ready, Arm®-based supercomputer with the addition of Cavium (Nasdaq: CAVM) ThunderX2™ processors, based…

Programming Environment

Copyright 2018 Cray Inc Open Source is Ready

From SC17 Panel: “The Arm Software Ecosystem: Are We There Yet?” Data Collected by CJ Newburn, Nvidia Copyright 2018 Cray Inc Cray Programming Environment for XC Systems

Programming Programming Programming Libraries Development Development Languages Models Environments Tools Tools

Distributed Memory PrgEnv- Scientific Libraries Environment setup Performance Analysis

Fortran Cray MPI Cray Compiling LAPACK Modules CrayPAT Environment SHMEM PrgEnv-cray ScaLAPACK C Cray Apprentice2 BLAS Debuggers GNU TotalView C++ Iterative Refinement Shared Memory / GPU PrgEnv-gnu Porting Toolkit DDT OpenMP Chapel Reveal OpenACC 3rd Party compilers FFTW gdb4hpc (Intel, PGI) Global Arrays CCDB Python PrgEnv-intel, etc. I/O Libraries Debugging Support PGAS & Global View R NetCDF UPC Abnormal Termination Fortran coarrays Processing (ATP) ML Frameworks HDF5 Coarray C++ Cray PE ML Plugin Chapel STAT for TensorFlow

Cray Developed 3rd party packaging Cray added value to 3rd party Licensed ISV SW

Copyright 2018 Cray Inc Cray Programming Environment for XC50 with ARM

Programming Programming Programming Libraries Development Development Languages Models Environments Tools Tools

Distributed Memory PrgEnv- Scientific Libraries Environment setup Performance Analysis

Fortran Cray MPI Cray Compiling LAPACK Modules CrayPAT Environment SHMEM PrgEnv-cray ScaLAPACK C Cray Apprentice2 BLAS Debuggers GNU TotalView C++ Iterative Refinement Shared Memory / GPU PrgEnv-gnu Porting Toolkit DDT OpenMP Chapel Reveal OpenACC FFTW 3rd Party compiler gdb4hpc Global Arrays PrgEnv-allinea CCDB Python I/O Libraries Debugging Support PGAS & Global View R NetCDF UPC Abnormal Termination Fortran coarrays Processing (ATP) ML Frameworks HDF5 Coarray C++ Cray PE ML Plugin Chapel STAT for TensorFlow

Cray Developed 3rd party packaging Cray added value to 3rd party Licensed ISV SW

Copyright 2018 Cray Inc Getting from observation to application gains requires all ecosystem pieces!

Copyright 2018 Cray Inc Forward-Looking Ecosystem for Future Technologies

Copyright 2018 Cray Inc Future Technology Investigations

● Began long-term R&D partnership with Arm, Vulcan Team, and customers in 2012 ● DOE FastForward 2: Cray/Arm/Broadcom Partner collaborators: ● Exploring architectures to most effectively address HPC applications Eric Van Hensbergen Nigel Stephens ● Covered areas from architecture, software, and management Matt Horsnell Alejandro Rico ● Actively working with Cavium on feature Jose Joao sets for upcoming HPC processors ● DOE PathForward: Cray/Arm/Cavium Rabin Sugumar Giri Chukkapalli David Hass

… and MANY others! Copyright 2018 Cray Inc FF2: Partnership on Future Architectures

Cray, Arm, and Cavium teams have collaborated on future HPC-related technologies:

● Cray contributed to the development of the Arm Scalable Vector Extension (SVE) ● Leverages ISA elements pioneered by Cray ● Provided guidance on compiler-friendliness of ISA ● Explored and developed the ecosystem of HPC- relevant future technologies, including off-chip interfaces, memories, and software infrastructure ● Focus on open standards to maintain ecosystem benefits ● Partnered with US DOE to understand impact on end-user applications ● Application-driven architecture REQUIRES understanding!

Copyright 2018 Cray Inc Cray Collaboration Evident in ARM SVE

Feature Benefit Scalable vector length (VL) Increased parallelism while allowing implementation choice of VL Supports a programming paradigm of write-once, run-anywhere VL agnostic (VLA) programming scalable vector code Enables vectorization of complex data structures with non-linear Gather-load & Scatter-store access patterns Enables vectorization of complex, nested control code containing Per-lane predication side effects and avoidance of loop heads and tails (particularly for VLA) Predicate-driven loop control and management Reduces vectorization overhead relative to scalar code Vector partitioning and SW managed speculation Permits vectorization of uncounted loops with data-dependent exits Extended integer and floating-point horizontal Allows vectorization of more types of reducible loop-carried reductions dependencies Supports vectorization of loops containing complex loop-carried Scalarized intra-vector sub-loops dependencies

Nigel Stephens – https://community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture Copyright 2018 Cray Inc FF2: Prototyping SVE Compiler Support

● First pass: Transliteration of Cray X2

● Low overhead for experimentation Vectorization Capability of Compilers and ISAs (LULESH) ● Cases where mappings are inexact are painful 100 18

90 16

80 14 ● Capabilities: 70 12 ● SVE Vectors 60 10 ● Predicated execution, with various predication 50 8 schemes 40 6 ● Widths from 128 bits to 2048 bits 30 % Vector Floating Point Ops Point Floating % Vector 4 Clock per FLOPs Algorithmic ● OpenMP 4.0 capabilities, including SIMD 20 directives 10 2 0 0 ● Explicit memory hierarchy support CCE - NEON WIC - SVE CCE - SVE PROTO Compiler - ISA ● Followed by prototype of native SVE CCE %VFP AlgF/C

Copyright 2018 Cray Inc Analysis of Applications

To reach the goal of producing architectures well-suited to HPC applications…

… you must understand the applications

Applications to Architectures

Copyright 2018 Cray Inc Cray Modeling and Simulation Capabilities

● Cray developed an infrastructure under FF2 provides to analyze applications at the granularity that matters for HPC ● Some capabilities enabled: ● MPI+OpenMP up to at least 256 total threads ● Instrumented emulation of SVE (as it evolved) for any vector width ● Modeling of SVE implementation details ● Full memory model, including DDR, HBM, tiered configs, migration, online placement analysis ● Evaluation of high bandwidth prefetchers and other ISA enhancements ● Implemented NoC models along with modeling/enabling acceleration technologies ● Protobuf-based trace format for interaction with other simulators ● Periodic performance validation against hardware and partner simulations ● Also included prototype runtime support for: ● Memory hierarchy management (placement, migration, staging) ● Software thread mapping to cores ● Analysis of full node-sized applications

Copyright 2018 Cray Inc 1 7 FF2: Data Movement Optimized Computing

Cray: ● Explored HPC-appropriateness of a broad range of memory technologies ● Worked to make sure HBM standard was appropriate for HPC ● Characterized apps in terms of memory behavior ● Explored system-level design space Copyright 2018 Cray Inc Enabling the Future of HPC Systems

The End of Moore’s Law is bringing diversification to silicon

Enabling these ecosystems with HPC technologies will pay dividends in the coming era

Copyright 2018 Cray Inc Collaboration Opportunities

We are looking for others interested in future Arm HPC technologies

Do you have applications we should be analyzing?

Do you work with bringing Arm HPC technologies (e.g. SVE) into open source?

Copyright 2018 Cray Inc Thank You! [email protected]

This work supported in part by the US Department of Energy

Copyright 2018 Cray Inc Backup

Copyright 2018 Cray Inc Benchmarks: CCE vs Alternative Arm Compilers

ARM

Copyright 2018 Cray Inc