Evaluation of AMD EPYC
Total Page:16
File Type:pdf, Size:1020Kb
Evaluation of AMD EPYC Chris Hollowell <[email protected]> HEPiX Fall 2018, PIC Spain What is EPYC? EPYC is a new line of x86_64 server CPUs from AMD based on their Zen microarchitecture Same microarchitecture used in their Ryzen desktop processors Released June 2017 First new high performance series of server CPUs offered by AMD since 2012 Last were Piledriver-based Opterons Steamroller Opteron products cancelled AMD had focused on low power server CPUs instead x86_64 Jaguar APUs ARM-based Opteron A CPUs Many vendors are now offering EPYC-based servers, including Dell, HP and Supermicro 2 How Does EPYC Differ From Skylake-SP? Intel’s Skylake-SP Xeon x86_64 server CPU line also released in 2017 Both Skylake-SP and EPYC CPU dies manufactured using 14 nm process Skylake-SP introduced AVX512 vector instruction support in Xeon AVX512 not available in EPYC HS06 official GCC compilation options exclude autovectorization Stock SL6/7 GCC doesn’t support AVX512 Support added in GCC 4.9+ Not heavily used (yet) in HEP/NP offline computing Both have models supporting 2666 MHz DDR4 memory Skylake-SP 6 memory channels per processor 3 TB (2-socket system, extended memory models) EPYC 8 memory channels per processor 4 TB (2-socket system) 3 How Does EPYC Differ From Skylake (Cont)? Some Skylake-SP processors include built in Omnipath networking, or FPGA coprocessors Not available in EPYC Both Skylake-SP and EPYC have SMT (HT) support 2 logical cores per physical core (absent in some Xeon Bronze models) Maximum core count (per socket) Skylake-SP – 28 physical / 56 logical (Xeon Platinum 8180M) EPYC – 32 physical / 64 logical (EPYC 7601) Maximum socket count Skylake-SP – 8 (Xeon Platinum) EPYC – 2 Processor Inteconnect Skylake-SP – UltraPath Interconnect (UPI) EYPC – Infinity Fabric (IF) PCIe lanes (2-socket system) Skylake-SP – 96 EPYC – 128 (some used by SoC functionality) Same number available in single socket configuration 4 EPYC: MCM/SoC Design EPYC utilizes an SoC design Many functions normally found in motherboard chipset on the CPU SATA controllers USB controllers etc. Each EPYC processor consists of four CPU dies, interconnected via Infinity Fabric Multi-Chip Module (MCM) architecture ”CPU Complexes” (CCX) Each CCX attached to its own memory 2 memory channels per CCX All Skylake-SP cores are on a single die AMD claims MCM results in a cost reduction by improving yields Believed to scale better than monolithic die approach as core counts continue to increase Drawback: higher memory latency for non-NUMA-aware applications 5 EPYC: MCM/SoC Design (Cont.) # lscpu # lscpu Architecture: x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Byte Order: Little Endian CPU(s): 64 CPU(s): 72 On-line CPU(s) list: 0-63 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Thread(s) per core: 2 Core(s) per socket: 16 Core(s) per socket: 18 Socket(s): 2 Socket(s): 2 NUMA node(s): 8 NUMA node(s): 2 Vendor ID: AuthenticAMD Vendor ID: GenuineIntel CPU family: 23 CPU family: 6 Model: 1 Model: 85 Model name: AMD EPYC 7351 16-Core Model name: Intel(R) Xeon(R) Gold 6150 CPU @ Processor 2.70GHz Stepping: 2 Stepping: 4 CPU MHz: 2400.000 CPU MHz: 2700.000 CPU max MHz: 2400.0000 BogoMIPS: 5404.41 CPU min MHz: 1200.0000 Virtualization: VT-x BogoMIPS: 4799.41 L1d cache: 32K Virtualization: AMD-V L1i cache: 32K L1d cache: 32K L2 cache: 1024K L1i cache: 64K L3 cache: 25344K L2 cache: 512K NUMA node0 CPU(s): L3 cache: 8192K 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38, NUMA node0 CPU(s): 0-3,32-35 40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70 NUMA node1 CPU(s): 4-7,36-39 NUMA node1 CPU(s): NUMA node2 CPU(s): 8-11,40-43 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39, NUMA node3 CPU(s): 12-15,44-47 41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71 NUMA node4 CPU(s): 16-19,48-51 NUMA node5 CPU(s): 20-23,52-55 NUMA node6 CPU(s): 24-27,56-59 NUMA node7 CPU(s): 28-31,60-63 EPYC vs Skylake-SP (SNC Disabled) NUMA Configuration 6 Socket LGA 3647 & SP3 Skylake SP – Socket LGA 3647 EPYC – Socket SP3 Both CPUs/sockets are quite large Visible quadrants in the SP3 socket for the four CPU complexes in the EPYC processor 7 Skylake and EPYC Model Lineup Comparison Model Base Frequency Cores SMT TDP Memory Retail Xeon Bronze 3104 1.7 GHz (no turbo) 6 No 85W 2133 MHz DDR4 $213 Xeon Silver 4110 2.1 GHz 8 Yes 85W 2400 MHz DDR4 $501 Xeon Gold 5115 2.4 GHz 10 Yes 85W 2666 MHz DDR4 $1,221 Xeon Gold 6130 2.1 GHz 16 Yes 125W 2666 MHz DDR4 $1,900 Xeon Gold 6136 3.0 GHz 12 Yes 150W 2666 MHz DDR4 $2,460 Xeon Gold 6148 2.4 GHz 20 Yes 150W 2666 MHz DDR4 $3,072 Xeon Gold 6150 2.7 GHz 18 Yes 165W 2666 MHz DDR4 $3,358 Xeon Platinum 8170 2.1 GHz 28 Yes 165W 2666 MHz DDR4 $7,405 Xeon Platinum 8180M 2.5 GHz 28 Yes 205W 2666 MHz DDR4 $13,011 EPYC 7251 2.1 GHz 8 Yes 120W 2400 MHz DDR4 $475 EPYC 7351 2.4 GHz 16 Yes 170W 2666 MHz DDR4 $1,110 Uniprocessor (P) - $750 EPYC 7401 2.0 GHz 24 Yes 170W 2666 MHz DDR4 $1,850 Uniprocessor (P) - $1,075 EPYC 7451 2.3 GHz 24 Yes 180W 2666 MHz DDR4 $2,400 EPYC 7551 2.0 GHz 32 Yes 180W 2666 MHz DDR4 $3,400 EPYC 7601 2.2 GHz 32 Yes 180W 2666 MHz DDR4 $4,200 8 EPYC vs Skylake-SP: HEP/NP Performance Benchmarks HEPSPEC06 “all_cpp” subset of SPEC-CPU2006 run in parallel CERN Cloud Benchmark Suite Various benchmarks, run in parallel DB12 Whetstone ATLAS KV Unless noted, memory configured to utilize all 8 channels per CPU on EPYC, and 6 channels per CPU for Skylake-SP, with at least 2 GB RAM/logical core ~11% HS06 performance degradation seen for EPYC 7441 when only populating half of the memory channels All 2666 MHz DDR4 Noted dual rank (DR) DIMMs downclocked to 2400 MHz for EPYC All run under SL/CentOS/RHEL 7 SMT/Hyperthreading enabled, unless otherwise indicated Systems are dual CPU, unless noted 9 10 EPYC [email protected] [Uniprocessor - 24threads] EPYC [email protected] [Uniprocessor - 48threads] EPYC [email protected] [32 threads] EPYC [email protected] [64 threads] EPYC [email protected] [48 threads] DDR-2400 EPYC [email protected] [96 threads] DDR-2400 EPYC [email protected] [64 threads] EPYC [email protected] [128 threads] EPYC [email protected] [64 threads] DDR4-2400 EPYC [email protected] [128 threads] DDR4-2400 EPYC 7601 SMT 1296 EPYC 7601 1078 EPYC 7551 SMT 1148 EPYC 7551 872 EPYC 7451 SMT 1101 CPU EPYC 7451 883 EPYC 7351 SMT 780 EPYC 7351 541 EPYC 7401P SMT 489 EPYC 7401P 368 25%+ 25%+ HS06 performance improvement with SMT (“hyperthreading”) enabled 0 200 400 600 800 1000 1200 1400 EPYC HEPSPEC06: SMT Off vs On Off EPYC HEPSPEC06: SMT HS06 11 XeonGold [email protected] GHz [40 threads]+ XeonGold [email protected] GHz [64 threads] XeonGold [email protected] GHz [48 threads] XeonGold [email protected] GHz [80 threads] XeonGold [email protected] GHz [72 threads] [email protected] XeonPlatinum GHz [104 threads] * EPYC [email protected] [Uniprocessor - 48threads] EPYC [email protected] [64 threads] EPYC [email protected] [96 threads]DDR-2400 EPYC [email protected] [128 threads] EPYC [email protected] [128 threads] DDR4-2400 + = System using only 3 memory channels per CPU per channels memory 3 only using System = + CERN by reported * Value = EPYC 7601 1296 EPYC 7551 1148 EPYC 7451 1101 EPYC 7351 780 EPYC 7401P 489 Xeon Platinum 8170 1261 CPU Xeon Gold 6150 1035 Xeon Gold 6148 1068 Xeon Gold 6136 790 Xeon Gold 6130 729 Xeon Gold 5115 394 0 200 400 600 800 1000 1200 1400 EPYC vs Skylake-SP: HEPSPEC06 HS06 EPYC vs Skylake: HEPSPEC06 (Cont.) Larger values are better Similar maximum HS06 (~1,275) performance for the models tested Data for highest level EPYC (7601), but not highest model Skylake-SP (8180M) Can assume Xeon Skylake 8180M would perform better than the 8170 value listed Same number of cores/threads as 8170, but higher clock speed 2.5 GHz vs 2.1 GHz Mid-range model HS06 performance also similar ~700 HS06 - ~1100 HS06 TDP somewhat higher for EPYC CPUs vs Xeon Gold, in general 165 W max Xeon Gold, vs 180 W max EPYC Can likely expect EPYC to use a bit more power as a result 12 13 Xeon Gold [email protected] [40 threads] + Xeon Gold [email protected] [72 threads] EPYC [email protected] [64 threads] EPYC [email protected] [128 threads] + = System using only 3 memory channels per CPU per channels memory 3 only using System = + EPYC 7551 120 EPYC 7351 67 Xeon Gold 6150 65 Events/Sec Xeon Gold 5115 15 ATLAS KV(aggregate) ATLAS EPYC 7551 361 EPYC 7351 210 BWIPS Xeon Gold 6150 262 Xeon Gold 5115 114 Whetstone (aggregate) Whetstone EPYC 7551 1256 EPYC 7351 733 Xeon Gold 6150 998 Dirac HS06 Dirac Est.