IBM Power Systems HPC Solutions Brief
Total Page:16
File Type:pdf, Size:1020Kb
IBM Power Systems August 2017 Solution Brief IBM Power Systems HPC solutions Faster time to insight with the high-performance, comprehensive portfolio designed for your workflows Architecting superior high performance computing (HPC) clusters Highlights requires a holistic approach that responds to performance at every level of the deployment. • IBM Power Systems HPC solutions are comprehensive clusters built for faster time ® to insight IBM high performance computing solutions, built with IBM Power Systems™, IBM® Spectrum™ Computing, IBM Spectrum Storage™, and • IBM POWER8 offers the performance, IBM Software technologies, provide an integrated platform to optimize cache and memory bandwidth to drive the best results from high performance your HPC workflows, resulting in faster time to insights and value. computing and high performance data analytics applications The industry’s most comprehensive, • IBM HPC software is designed to data-centric HPC solutions capitalize on the technical features Only IBM provides a total HPC solution, including optimized, of IBM Power Systems clusters best-of-breed components at all levels of the system stack. Comprehensive solutions ensure: • Rapid deployment • Clusters that deliver value immediately after acceptance IBM HPC solutions are built for data-centric computing, and delivered with integration expertise targeting performance optimization at the workflow level. Data-centric design minimizes data motion, enables compute capabilities across the system stack, and provides a modular, scalable architecture that is optimized for HPC. Data-centric HPC and CORAL Data-centric design was a primary reason the Department of Energy selected IBM for the CORAL deployment. Summit (Oak Ridge National Labora- tory) and Sierra (Lawrence Livermore National Laboratory) will become some of the largest, most groundbreaking, and most utilized installations in the world. Bring that same data-centric design to your HPC cluster by partnering with IBM. IBM Power Systems August 2017 Solution Brief A total HPC solution Beyond the server: Superior data IBM HPC solutions offer industry-leading innovation management and storage within and across the system stack. From servers, accel- A pillar of data-centric system innovation, IBM Spectrum erators, network fabric and storage, to compilers and Scale™ software-defined storage offers scalable, high- development tools, cluster management software and performance, and reliable unified storage for files and data cloud integration points, solution components are designed objects. It does so with parallel performance for HPC users. for superior integration and total workflow performance optimization. This comprehensive scope is unique among Implementing the unique advantages of IBM Spectrum Scale competitive technology providers and reflects IBM’s deep (formerly GPFS), IBM Elastic Storage Server is a storage expertise in data-centric system design and integration. Only solution that provides persistent performance at any scale. IBM can deliver a data-centric system optimized for your It ensures fast access and availability of the right data, at the right workflows, realizing the fastest time to insight and value. time, across clusters. Built in management and administration tools ensure ease of deployment and continual optimization. Right tie to IBM Spectrum Computing the cloud InfiniBand, Ethernet rt IBM Power Systems RHEL LE Accelerators Ubuntu egration, suppo t IBM Storage IBM FlashSystem™ IBM Elastic Storage Server Design, in IBM Spectrum Scale IBM tape solutions HPSS Enterprise storage IBM + Partner Middleware/Dev. Software IBM XL ESSL IBM Spectrum MPI PGI Compilers NVIDIA CUDA Parallel Performance To olkit LLVM, GCC OpenMP, OpenACC Figure 1: IBM HPC portfolio 2 IBM Power Systems August 2017 Solution Brief IBM POWER8: Designed for the New IBM Power Systems LC nodes intersection of high performance for HPC and HPDA computing and high performance The IBM Power Systems LC servers are designed for data analytics HPC workloads. They allow you to: The IBM POWER8® processor delivers industry-leading • Realize incredible speedups in application performance for HPC and high performance data analytics performance with accelerators (HPDA) applications, with multi-threading designed for fast execution of analytics algorithms (eight threads per core), • Deploy a processor architecture designed for multi-level cache for continuous data load and fast response HPC performance (including an L4 cache), and a large, high-bandwidth • Benefit from ecosystem innovation from the memory workspace to maximize throughput for data- OpenPOWER Foundation intensive applications. System Processor Memory Storage Acceleration HPC use cases IBM Power Systems 2x POWER8 with Up to 1TB 2x 2.5” drives 4x NVIDIA Tesla P100 Built for the next wave S822LC for High Perfor- NVLink CPUs 230 GB/s bandwidth (HDD or SSD) with NVLink GPU ac- of GPU acceleration mance Computing celerators 10 cores each, 2.86- NVMe for ultra-fast I/O 3.25Ghz IBM Power Systems 2x POWER8 CPUs Up to 1 TB 2x 2.5” drives Optional CAPI- Built for CPU S822LC 10 cores each, 230 GB/s bandwidth (HDD or SSD) attached accelerators performance 2.9-3.3 GHz NVMe for ultra-fast I/O Optional Tesla K80 IBM Power Systems 1x POWER8 CPU, Up to 1 TB 14x 3.5” drives Optional CAPI- Optimized for S812LC 10 cores each, 115 GB/s bandwidth (84TB, HDD, SSD) attached accelerators Hadoop, Spark 2.9-3.3 GHz Table 1: Technical details for three Power Systems offerings Cores • 8-12 cores (SMT8) Ac SMP L • Enhanced prefetching Core Core Core celerat Core Core Core ink L2 L2 L2 or L2 L2 L2 s Caches s 8M L3 Mem. • 64K data cache (L1) rl Region • 512 KB SRAM L2 / core Ct L3 cache & chip interconnect Ct SMP L • 96 MB eDRAM shared L3 rl . • Up to 128 MB eDram L4 (off-chip) Mem. L2 L2 L2 inks P L2 L2 L2 Memory Core Core Core CIe Core Core Core • Dual memory controllers • Up to 230 GB/sec sustained bandwidth Continuous Massive Superior parallel Large-scale data load IO bandwidth processing memory processing Figure 2: The POWER8 processor 3 IBM Power Systems August 2017 Solution Brief Leadership in HPC Compelling application performance versus application performance competing server architectures: • CFD results 40 percent faster on OpenFOAM on IBM IBM HPC solutions are built for better HPC. They allow Power System S822LC compared to competing servers you to analyze faster, simulate better and process more through these attributes: Architectural advantages matched to HPC simpleFoam 1 Node applications, such as memory bandwidth: (smaller values preferred) • 60-79 percent greater memory bandwidth compared to competing servers 2000 es 15000 m i nt 10000 200 +79% +60% Ru 5000 180 0 160 189 0.E+00 5.E+07 1.E+08 ) c Grid points 140 se / B 120 IBM Power S822LC Bull R424 E4 (G iad 100 • Results are based on IBM internal testing of systems running OpenFOAM Tr 118.2 version 2.3.0 code benchmarked on POWER8 systems. Individual results 80 105.4 will vary depending on individual workloads, configurations and conditions. AM • IBM Power Systems S822LC, POWER8, 3.5 GHz, 512 GB memory, 2x 10 core processors/4 threads per core. Job size 128GB memory per socket. 60 • BULL R424-E4, Intel Xeon E5-2680v3, 2.3 GHz, 256 GB memory, 2x 10 TRE S core processors/1 thread per core. Job size 128GB memory per socket. 40 20 Figure 4: OpenFOAM simpleFoam 1 node 0 IBM S822LC Intel Server Intel Server 20c/160t, System System 32 DIMMs E5-2690 v3 E5-2690 v3 24c/48t 2DPC 24c/48t • IBM Power Systems S822LC results are based on IBM internal measurements of STREAM Triad; 20 cores / 20 of 160 threads active. POWER8; 3.5GHz, up to 1TB memory. • Intel Xeon data is based on published data of Intel Server Systems R2208WTTYS running STREAM Triad; 24 cores / 24 of 48 threads active, E5-2690 v3; 2.3GHz Figure 3: STREAM Triad 4 IBM Power Systems August 2017 Solution Brief Compelling throughput on GPU computing applications and workloads: • Up to a 7.3X improvement in NAMD performance 8 7.3X by adding NVIDIA Tesla P100 GPUs to the workload 7 • Up to 2.7X the throughput of Kinetica Filter-by-Location queries through POWER8, Tesla P100, and NVIDIA e 6 NVLink, as compared to a competing server 5 manc or • f Up to 2.91X the realized CPU:GPU bandwidth of x86 r 4 servers featuring PCI-E x16 3.0, unleashing custom code pe ve i 3 Additional Application Proof Points available at: at l e https://www.ibm.com/developerworks/linux/perfcol/ R 2 1 ) 210 2.7X 0 180 1,000s ( STMV 150 S822LC 20c/2.86GHz /1x Te sla P100 GPUs r Hour 120 Pe S822LC/20c/2.86GHz es ri 90 ue • Results are based on IBM internal testing of systems running Q 60 NAMD version 2.11 STMV code benchmarked on POWER8 systems with NVIDIA Tesla P100 GPUs Individual results will vary depending on individual 30 workloads, configurations and conditions. • IBM Power Systems S822LC; 20 cores / 160 threads, POWER8 with NVLink; 2.86GHz, 256GB memory Number of 0 • IBM Power Systems S822LC; 20 cores / 160 threads, System Type POWER8 with NVLink, 2.86GHz, 256GB memory, 2 Tesla P100 GPUs x86 Competitor, 20c/2.4GHz/4 Te sla P100 GPUs S822LC for HPC, 20c/2.86GHz/4 Te sla P100 GPUs Figure 6: NAMD performance 40 ) 2.9X c • Results are based on IBM internal testing of Kinetica Filter-by-Location se / query with 280,000,000 records B 32 Individual results will vary depending on individual (G 34.1 workloads, configurations and conditions. • IBM Power Systems S822LC for HPC; 20 cores / 160 threads, 24 POWER8 with NVLink, 2.86GHz, 1024GB, 4 Te sla P100 with NVLink GPUs • x86 Competitor, 20 cores / 40 threads, Xeon E5-2640 v4; 2.4GHz, GB, 4 Tesla P100 PCI-E GPUs 16 512 A H2D Bandwidth UD C 8 11.7 0 Figure 5: Kinetica accelerated database performance IBM S822LC x86 Competitor POWER8 E5-2640 v4 with NVLink Te sla K80 Device 0 Tesla P100 • IBM Power Systems S822LC results are based on IBM internal measurements of CUDA H2D BW Test; 20 cores / POWER8 with NVLink; 2.86GHz, up to 256GB memory, 1 Tesla P100 • Intel Xeon data is based on IBM internal testing ; 20 cores / 40 threads active, Xeon E5-2640 v4 2.4GHz, Tesla K80 GPU Device 0.