Hpcg Update: Isc'17

hpcg-benchmark.org 1 HPCG UPDATE: ISC’17 Jack Dongarra Michael Heroux Piotr Luszczek hpcg-benchmark.org 2 HPCG Update: Agenda • Mike Heroux, HPCG Update. • Jack Dongarra, Awards & Results. • Erich Strohmaier, HPCG Analysis. • Lin Gan, NSCC Wuxi • Open discussion. hpcg-benchmark.org 3 HPCG Snapshot • High Performance Conjugate Gradients (HPCG). • Solves Ax=b, A large, sparse, b known, x computed. • An optimized implementation of PCG contains essential computational and communication patterns that are prevalent in a variety of methods for discretization and numerical solution of PDEs • Patterns: • Dense and sparse computations. • Dense and sparse collectives. • Multi-scale execution of kernels via MG (truncated) V cycle. • Data-driven parallelism (unstructured sparse triangular solves). • Strong verification (via spectral properties of PCG). hpcg-benchmark.org Model Problem Description • Synthetic discretized 3D PDE (FEM, FVM, FDM). • Zero Dirichlet BCs, Synthetic RHS s.t. solution = 1. (n × n × n ) • Local domain: x y z (npx × npy × npz ) • Process layout: (nx *npx ) × (ny *npy ) × (nz *npz ) • Global domain: • Sparse matrix: • 27 nonzeros/row interior. • 8 – 18 on boundary. • Symmetric positive definite. hpcg-benchmark.org 5 Merits of HPCG • Includes major communication/computational patterns. • Represents a minimal collection of the major patterns. • Rewards investment in: • High-performance collective ops. • Local memory system performance. • Low latency cooperative threading. • Detects/measures variances from bitwise reproducibility. • Executes kernels at several (tunable) granularities: • nx = ny = nz = 104 gives • nlocal = 1,124,864; 140,608; 17,576; 2,197 • ComputeSymGS with multicoloring adds one more level: • 8 colors. • Average size of color = 275. • Size ratio (largest:smallest): 4096 • Provide a “natural” incentive to run a big problem. • Full performance discussion: 5 • http://www.hpcg-benchmark.org -> “Performance Overview” tab. hpcg-benchmark.org 6 HPL vs. HPCG: Bookends • Some see HPL and HPCG as “bookends” of a spectrum. • Applications teams know where their codes lie on the spectrum. • Can gauge performance on a system using both HPL and HPCG numbers. hpcg-benchmark.org 7 HPCG Status hpcg-benchmark.org 8 HPCG 3.0 Release, Nov 11, 2015 • Available on GitHub.com • Using GitHub issues, pull requests, Wiki. • Optimized 3.0 version: • Vendor or site developed. • Intel, Nvidia, IBM: Available to their customers. • Need: Contacts with ARM providers. • All results require HPCG 3.0 use. hpcg-benchmark.org 9 Main HPCG 3.0 Features See http://www.hpcg-benchmark.org/software/index.html for full discussion • Problem generation is timed. • Memory usage counting and reporting. • Memory bandwidth measurement and reporting • Provides 2.4 rating and 3.0 rating in output. • Command line option (--rt=) to specify the run time. • "Quick Path" option to make obtaining results on production systems easier. • Important for filling TOP500 gaps. hpcg-benchmark.org 10 Other Items • Reference version on GitHub: • https://github.com/hpcg-benchmark/hpcg • Website: hpcg-benchark.org. • Mail list [email protected] • HPCG & Student Cluster Competitions. • Used in SC15/16/17, ASC • ISC17: First time HPCG is used for SCC. • HPCG-optimized kernels in vendor libraries. hpcg-benchmark.org 11 Summary • HPCG is • Addressing original goals. • Rewarding vendor investment in features we care about. • HPCG is integrated into TOP500. • Will work to fill in results gaps. • Version 3.X is the final planned major version. • Version 3.1: • Minor bug fixes to API. • Better support for heterogeneous systems. • More rigorous, automated policy checks. hpcg-benchmark.org 12 HPCG RANKINGS JUNE 2017 hpcg-benchmark.org 13 http://tiny.cc/hpcg 14 We Have a Tie for Third Place • Sunway TaihuLight – Sunway MPP, SW26010 260C 1.45GHz, Sunway NRCPC • Piz Daint – Cray XC50, Intel Xeon E5-2690v3 12C 2.6GHz, Aries interconnect, NVIDIA Tesla P100 Cray • Both systems “clock” in at 0.48… Pflop/s for HPCG • Less than 1% difference in the performance hpcg-benchmark.org 15 hpcg-benchmark.org 16 hpcg-benchmark.org 17 hpcg-benchmark.org 18 HPCG Results, Jun 2017, top-10 Rmax HPCG HPCG % of Rank Site Computer Cores Pflops Pflops /HPL Peak RIKEN Advanced Institute for K computer, SPARC64 VIIIfx 2.0GHz, Tofu 705,024 1 Computational Science, Japan interconnect 10.5 0.60 5.7% 5.3% NSCC / Guangzhou Tianhe-2 NUDT, Xeon 12C 2.2GHz + Intel 3,120,000 2 China Xeon Phi 57C + Custom 33.8 0.58 1.7% 1.1% National Supercomputing Sunway TaihuLight – Sunway MPP, 3 Center in Wuxi SW26010 260C 1.45GHz, Sunway, 10,649,600 93.0 0.48 0.5% 0.4% China NRCPC Swiss National Piz Daint – Cray XC50, Xeon E5-2690v3 12C Supercomputing Centre 2.6GHz, Aries interconnect, NVIDIA P100, 3 (CSCS) 361,760 19.6 0.48 2.4% 1.9% Switzerland Cray Joint Center for Advanced Oakforest-PACS – PRIMERGY CX600 5 HPC M1, Intel Xeon Phi 7250 68C 1.4GHz, 557,056 24.9 0.39 2.8% 1.5% Japan Intel OmniPath, Fujitsu DOE/SC/LBNL/NERSC Cori – XC40, Intel Xeon Phi 7250 68C 632,400 6 USA 1.4GHz, Cray Aries, Cray 13.8 0.36 2.6% 1.3% DOE/NNSA/LLNL Sequoia – IBM BlueGene/Q, PowerPC 1,572,864 7 USA A2 16C 1.6GHz, 5D Torus, IBM 17.2 0.33 1.9% 1.6% Titan - Cray XK7 , Opteron 6274 16C DOE/SC/Oak Ridge National 8 Lab 2.200GHz, Cray Gemini interconnect, 560,640 17.6 0.32 1.8% 1.2% NVIDIA K20x Trinity - Cray XC40, Intel E5-2698v3, Aries DOE/NNSA/LANL/SNL 301,056 9 custom, Cray 8.10 0.18 2.3% 1.6% Pleiades - SGI ICE X, Intel E5-2680, E5- 10 NASA / Mountain View 2680v2, E5-2680v3, E5-2680v4, Infiniband 243,008 6.0 0.17 2.9% 2.5% FDR, HPE Bookends: Peak and HPL OnlyBookends: Peak and HPL Pflop/s 0.001 1000 0.01 100 0.1 10 1 1 2 3 4 5 6 7 8 9 10 14 15 17 18 19 20 21 22 23 25 26 31 34 38 39 40 44 50 55 57 58 59 60 63 64 71 72 73 80 85 87 90 94 95 112 113 123 131 164 203 209 214 220 226 247 280 282 292 370 380 430 442 472 Rpeak Rmax Bookends: Peak, HPL, and HPCG Pflop/s 0.001 1000 0.01 100 0.1 10 1 1 2 3 4 5 6 7 8 9 10 14 15 17 18 19 20 21 22 23 25 26 31 34 38 39 40 44 50 55 57 58 59 60 63 64 71 72 73 80 85 87 90 94 95 112 113 123 131 164 203 209 214 220 226 247 280 282 292 370 380 430 442 472 HPCG Rpeak Rmax HPCG Lists over Time 1.0000 0.1000 Jun'17 Nov'16 0.0100 Jun'16 Nov'15 HPCG Performance (Pflop/s) Jun '15 Nov '14 0.0010 Jun '14 0.0001 111 108 105 102 99 96 93 90 87 84 81 78 75 72 69 66 63 60 57 54 51 48 45 42 39 36 33 30 27 24 21 18 15 12 9 6 3 HPCG Rank Performance by Region Top500 Americas: 34% Asia: 41% Europe Americas Europe:17% 29% 29% Asia 42% Performance by Country Top500 USA: 34% Brazil Belgium Canada China China: 34% 0% 0% 0% Japan: 8% USA 16% Croatia Germany: 5% 29% 0% France: 3% France 6% UK 3% Germany 9% The Netherlands Italy Switzerland 0% 2% 7% Saudi Arabia Poland Japan Sweeden Russia 2% 1% 0% 1% 24% Performance by Network Type Top500 Infiniband: 30% Custom: 53% GigE: 17% Infiniband 32% Ethernet Custom 0% 68% Performance by Network Details Tianhe Express Blue Gene 10% 11% Sunway Other 8% 0% NEC 3% Intel Omni-Path 9% Cray 31% Infiniband QDR 3% Gigabit Ethernet Infiniband FDR Infiniband EDR 0% 20% 5% Performance by Processor Type Vector NVIDIA GPU 3% 16% Intel Phi CPU 22% 59% Performance by Processor Details Sunway IBM Power Sparc FX 7% 10% 13% Intel Phi 22% NVIDIA 16% NEC SX Intel Xeon 3% 29% Performance by Segment Top500 Industry: 50% Research: 21% Vendor Academic 19% 1% Academic Govern: 7.6% 30% Classified Research 0% 59% Industry Government 4% 6% hpcg-benchmark.org 30 • https://www.top500.org/statistics/sublist/.

Hpcg Update: Isc'17

CS 110 Computer Architecture Lecture 5: Intro to Assembly Language, MIPS Intro

Power and Energy Characterization of an Open Source 25-Core Manycore Processor

Eithne: a Framework for Benchmarking Micro-Core Accelerators

Hironori Kasahara Professor, Dept

Update on International HPC Activities (Mostly Asia)

論文 / 著書情報 Article / Book Information

Future High Performance Computing Capabilities Summary Report of The

George A. Matheou Phd.Pdf

An Optimization Framework for Codes Classification and Performance Evaluation of RISC Microprocessors

Pdf Download

ISC 2017 Conference Guide

Introduction to Parallel Processing