“Future Technologies” (WP8) Prototypes Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany Prototype Overview (1/2)

CEA 1U Tesla Server T1070 (CUDA, Take more easily advantage of accelerators. Compare “GPU/CAPS” CAPS, DDT) , Intel Harpertown nodes HMPP with other approaches to program accelerators.

Assess the applicability of new file system and I/O Subsystem (SSD, Lustre, pNFS) CINECA storage technologies.

CINES-LRZ Hybrid SGI ICE/UV/Nehalem-EP & Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators “LRB/CS” Nehalem-EX/ClearSpeed/Larrabee with a shared file system.

CSCS Prototype PGAS language compilers Understand the usability and programmability “UPC/CAF” (CAF + UPC for Cray XT systems) of PGAS languages.

EPCC Maxwell – FPGA prototype (VHDL Assess the potential of high-level languages support & consultancy + software for using FPGAs in HPC. Compare energy “FPGA” licenses (e.g., Mitrion-C)) efficiency with other solutions.

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 2 Prototype Overview (2/2)

FZJ eQPACE (PowerXCell Gain deep expertise in communication “ & FPGA 8i cluster with special network issues. Extend the application interconnect” network processor) domain of the QPACE system.

LRZ RapidMind Multi-Core Development Assess the potential of data stream languages. Platform (automatic code generation Compare RapidMind with other approaches for “RapidMind” for x86, GPUs and Cell) programming accelerators or multi-core systems

NCF ClearSpeed Evaluate ClearSpeed accelerator hardware “ClearSpeed” CATS 700 units for large-scale applications.

Air cooled blade system from SNIC- Supermicro with AMD Istanbul Evaluate and optimize energy efficiency and packing density of Experiences with the KTH processors & QDR IB commodity hardware. prototypes will be reported (subject to EC approval) in Deliverable D8 .3 .2 [http://www.prace-project.eu/documents/public-deliverables-1/]

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 3 The teaser A SELECTION OF RESULTS

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 4 Rinf

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 5 Euroben results - accelerator languages

Accelerator Languages (absolute performance)

1000000 MKL (8 Nehalem cores)

CUDA (1 C1060) 94% 81% 100000 CellSs (1 PowerXCell8i) 79% 78% v. peak Cn (1CSX700) 10000 Accelerator Languages (%peak perf) 94 81 79 78 100.00 1000 30 Mflops 4.5 6 10.00 3.3 100 2 MKL formance

rr 090.9 1.00 pe CUDA

CellSs peak

10 0.10 of mod2f/MKL: 0.04 mod2f/MKL: Cn single‐threaded only % single‐threaded 0.03 only 0010.01 1 mod2am mod2as mod2f peak perf mod2am mod2as mod2f

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 6 Euroben results - GPGPU languages Performance Comparison (dense matrix‐matrix mul.) on Nvidia C1060 100 90 80 70 60 CUDA 50

CAPS GG 40 CUDA+MPI 4x4 30 RapidMind 20 OpenCL 10 MKL (8cores Nehalem) 0

matrix size (m) SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 7 Euroben results - productivity Development Time versus Performance (dense matrix-matrix mul.) 100000 20

18

10000 16

14 e in Days Mflops 1000 12 mm * * 10 100 8

6 elopment Ti erformance in

vv Performance PP

10 ** 4 De total time 2 first version

1 0

* OpenCL and CUDA+MPI port based on existing CUDA port ** RapidMind developer included time for benchmarking

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 8 First IO -Results

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 9 A glimpse on what you will find in Deliverable D8.3.2 PROTOTYPES

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 10 eQPACE

Extend communication capabilities of eQPACE to make it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ). • HdHardware: Power XCll8iXCell8i processor no des w ihith custom 3D-torus interconnect. • BhkBenchmarks: HPL, Euroben kernels, torus network benchmark , applications & iterative solvers. •Progggramming environments: Cell SDK & CellSs

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 11 RapidMind

Evaluation of the RapidMind programming model (LRZ). RidMidRapidMind mod2am

70 • Hardware: 60 50 40 30 – CPUs (Nehalem EP, AMD Opteron) Gfops 20 10 – GPUs ( and Quadro FX) 0 – Cell (QS22-blade cluster) matrix size (m)

• Software: x86‐dp (8 cores nehalem) cuda‐dp (c1060) glsl‐sp (FX 5800) RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell. – Evaluate ease-of-use & portability – Assess RapidMind performance on different architectures – Compare RapidMind with other accelerator languages

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 12 LRZ-CINES

Evaluation of a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ). • HdHardware: – SGI ICE (Nehalem EP) – SGI UV (Nehalem EX) – Clearspeed CSX700 • Benchmarks: – Euroben kernels – Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP – Application BMs: Gadget, Raxml, Specfm3dglobe

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 13 Hybrid technology demonstrator

Evaluating GPGPU with CAPS HMPP (CEA). CAPS hmpp mod2am • Hardware: 70 60 50 Tesla servers connected to 40 ops 30 Gfl BllBull servers v ia PCI-E. 20 10 • Software: 0

CAPS HMPP a llows to exp lo it the matrix size (m) CUDA mod2am

potential of GPGPUs by simply 70 60 adding preprocessor directives to 50 40 30 Gflops legacy Fortran and C codes. 20 10 0

matrix size (m) SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 14 Maxwell FPGA

Evaluate the performance and usability of the HARWEST Compiling Environment (EPCC). • Hardware: FPGA prototype “Maxwell” (32 FPGAs) fbhAlhDLddfrom both Alpha Data Ltd and NllNallatec hLdLtd us ing Virtex-4 FPGAs supplied by Xilinx Corp. • BhkBenchmarks: 4 Euroben kernels • Languages: – VHDL – HCE

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 15 PGAS languages

Evaluate ease of use of PGAS programming model (CSCS). • Hardware: Cray XT5 • Compiler: Cray Compiler Environment (CCE) • Evaluation of the compiler: – Functional correctness – Conformance with language standards – Usability for existing CAF and UPC benchmarks/applications • Benchmarks from Rice University, George Washington University and the Lawrence Berkley National Laboratory SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 16 ClearSpeed/PetaPath

Evaluate ClearSpeed-Petapath system (NCF). • Hardware: 114 ClearSpeed CSX700 cards • Language: Cn • Benchmarks: – 4 Euroben kernels – 4 Applications • Astronomy • Geophysics • numerical mathematics • medical tomography

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 17 XC4-IO

• Compare performances in storage infrastructure access, using different hardware configurations and file system architectures. (CINECA).

Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen.

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 18 SNIC-KTH Preliminary Results (Gromacs) Evaluate energy efficiency of high density commodity parts (SNIC-KTH). • Hardware: AMD Istanbul • Benchmarks: Euroben, STREAM, IMB, Gromacs, CFD • Measure power consumption per component • Adjust fan speed and fan power • Assess energy management features of AMD Istanbul (Control of voltage and frequency of components)

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 19 Results will be reported in Deliverable D8.3.2. RESEARCH ACTIVITIES

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 20 Parallel GPU

Evaluation of GPGPU programming languages (CSC). • Languages

– CUDA+MPI GPU-HMMER – OpenCL • Benchmarks: – GPU-HMMER – Euroben Kernels • Hardware – Tesla – AMD Firestream – CEA WP8 Prototype

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 21 Advanced PGAS Programming

upc_barrier; Evaluate usability of PGAS upc_forall (sc=0; sc

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 22 Research on power efficiency

Evaluate power consumption of components (STFC, PSNC). • Hardware: ClearSpeed, Tesla, Firestream, Cell, Power6. • Different workloads: stand-by, neutral, real life, artificial stress. • Assess CPU, Memories, Accelerators, HDD’s, cooling fans, backplane, power supply. • PtithPower measurements with: Clamp meters, PDUs with built-in ammeters, values from system management software

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 23 Contact information: Dr. Herbert Huber (WP8 Leader), [email protected] IiIris ChiChris ta dler (WP8C(WP8 Co-Ld)Leader), chihris ta dler @lrz. de Leibniz Supercomputing Centre, Germany

THANK YOU FOR YOUR ATTENTION! COMMENTS? QUESTIONS?

SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 24