IBM Power Systems August 2017 Solution Brief

IBM Power Systems HPC solutions Faster time to insight with the high-performance, comprehensive portfolio designed for your workflows

Architecting superior high performance computing (HPC) clusters Highlights requires a holistic approach that responds to performance at every level of the deployment. • IBM Power Systems HPC solutions are

comprehensive clusters built for faster time ® to insight IBM high performance computing solutions, built with IBM Power Systems™, IBM® Spectrum™ Computing, IBM Spectrum Storage™, and • IBM POWER8 offers the performance, IBM Software technologies, provide an integrated platform to optimize cache and memory bandwidth to drive the best results from high performance your HPC workflows, resulting in faster time to insights and value. computing and high performance data analytics applications The industry’s most comprehensive,

• IBM HPC software is designed to data-centric HPC solutions capitalize on the technical features Only IBM provides a total HPC solution, including optimized, of IBM Power Systems clusters best-of-breed components at all levels of the system stack. Comprehensive solutions ensure:

• Rapid deployment • Clusters that deliver value immediately after acceptance

IBM HPC solutions are built for data-centric computing, and delivered with integration expertise targeting performance optimization at the workflow level. Data-centric design minimizes data motion, enables compute capabilities across the system stack, and provides a modular, scalable architecture that is optimized for HPC.

Data-centric HPC and CORAL Data-centric design was a primary reason the Department of Energy selected IBM for the CORAL deployment. Summit (Oak Ridge National Labora- tory) and Sierra (Lawrence Livermore National Laboratory) will become some of the largest, most groundbreaking, and most utilized installations in the world. Bring that same data-centric design to your HPC cluster by partnering with IBM. IBM Power Systems August 2017 Solution Brief

A total HPC solution Beyond the server: Superior data IBM HPC solutions offer industry-leading innovation management and storage within and across the system stack. From servers, accel- A pillar of data-centric system innovation, IBM Spectrum erators, network fabric and storage, to compilers and Scale™ software-defined storage offers scalable, high- development tools, cluster management software and performance, and reliable unified storage for files and data cloud integration points, solution components are designed objects. It does so with parallel performance for HPC users. for superior integration and total workflow performance optimization. This comprehensive scope is unique among Implementing the unique advantages of IBM Spectrum Scale competitive technology providers and reflects IBM’s deep (formerly GPFS), IBM Elastic Storage Server is a storage expertise in data-centric system design and integration. Only solution that provides persistent performance at any scale. IBM can deliver a data-centric system optimized for your It ensures fast access and availability of the right data, at the right workflows, realizing the fastest time to insight and value. time, across clusters. Built in management and administration tools ensure ease of deployment and continual optimization.

Right tie to IBM Spectrum Computing the cloud

InfiniBand, Ethernet

rt IBM Power Systems

RHEL LE Accelerators egration, suppo t IBM Storage

IBM FlashSystem™ IBM Elastic Storage Server Design, in IBM Spectrum Scale IBM tape solutions HPSS Enterprise storage

IBM + Partner Middleware/Dev. Software

IBM XL ESSL IBM Spectrum MPI PGI Compilers CUDA Parallel Performance To olkit LLVM, GCC OpenMP, OpenACC

Figure 1: IBM HPC portfolio

2 IBM Power Systems August 2017 Solution Brief

IBM POWER8: Designed for the New IBM Power Systems LC nodes intersection of high performance for HPC and HPDA computing and high performance The IBM Power Systems LC servers are designed for data analytics HPC workloads. They allow you to: The IBM POWER8® processor delivers industry-leading • Realize incredible speedups in application performance for HPC and high performance data analytics performance with accelerators (HPDA) applications, with multi-threading designed for fast execution of analytics algorithms (eight threads per core), • Deploy a processor architecture designed for multi-level cache for continuous data load and fast response HPC performance (including an L4 cache), and a large, high-bandwidth • Benefit from ecosystem innovation from the memory workspace to maximize throughput for data- OpenPOWER Foundation intensive applications.

System Processor Memory Storage Acceleration HPC use cases

IBM Power Systems 2x POWER8 with Up to 1TB 2x 2.5” drives 4x NVIDIA Tesla P100 Built for the next wave S822LC for High Perfor- NVLink CPUs 230 GB/s bandwidth (HDD or SSD) with NVLink GPU ac- of GPU acceleration mance Computing celerators 10 cores each, 2.86- NVMe for ultra-fast I/O 3.25Ghz

IBM Power Systems 2x POWER8 CPUs Up to 1 TB 2x 2.5” drives Optional CAPI- Built for CPU S822LC 10 cores each, 230 GB/s bandwidth (HDD or SSD) attached accelerators performance 2.9-3.3 GHz NVMe for ultra-fast I/O Optional Tesla K80

IBM Power Systems 1x POWER8 CPU, Up to 1 TB 14x 3.5” drives Optional CAPI- Optimized for S812LC 10 cores each, 115 GB/s bandwidth (84TB, HDD, SSD) attached accelerators Hadoop, Spark 2.9-3.3 GHz

Table 1: Technical details for three Power Systems offerings

Cores

• 8-12 cores (SMT8) Ac SMP L • Enhanced prefetching Core Core Core celerat Core Core Core ink

L2 L2 L2 or s L2 L2 L2 Caches s

8M L3 Mem. Ct .

• 64K data cache (L1) rl Region • 512 KB SRAM L2 / core L3 cache & chip interconnect SMP L • 96 MB eDRAM shared L3 rl . • Up to 128 MB eDram L4 (off-chip) Mem. Ct

L2 L2 L2 inks P L2 L2 L2

Memory Core Core Core CIe Core Core Core • Dual memory controllers • Up to 230 GB/sec sustained bandwidth

Continuous Massive Superior parallel Large-scale data load IO bandwidth processing memory processing

Figure 2: The POWER8 processor 3 IBM Power Systems August 2017 Solution Brief

Leadership in HPC Compelling application performance versus application performance competing server architectures: • CFD results 40 percent faster on OpenFOAM on IBM IBM HPC solutions are built for better HPC. They allow Power System S822LC compared to competing servers you to analyze faster, simulate better and process more through these attributes:

Architectural advantages matched to HPC simpleFoam 1 Node applications, such as memory bandwidth: (smaller values preferred) • 60-79 percent greater memory bandwidth compared to competing servers 2000

es 15000 m i

nt 10000

200 +79% +60% Ru 5000 180 0 160 189 0.E+00 5.E+07 1.E+08 )

Grid points 140 se / B 120 IBM Power S822LC Bull R424 E4 (G

iad 100 • Results are based on IBM internal testing of systems running OpenFOAM 118.2 version 2.3.0 code benchmarked on POWER8 systems. Individual results 80 105.4 will vary depending on individual workloads, configurations and conditions.

AM Tr • IBM Power Systems S822LC, POWER8, 3.5 GHz, 512 GB memory, 2x 10 core processors/4 threads per core. Job size 128GB memory per socket. 60 • BULL R424-E4, Intel Xeon E5-2680v3, 2.3 GHz, 256 GB memory, 2x 10 TRE

S core processors/1 thread per core. Job size 128GB memory per socket. 40

20 Figure 4: OpenFOAM simpleFoam 1 node 0 IBM S822LC Intel Server Intel Server 20c/160t, System System 32 DIMMs E5-2690 v3 E5-2690 v3 24c/48t 2DPC 24c/48t

• IBM Power Systems S822LC results are based on IBM internal measurements of STREAM Triad; 20 cores / 20 of 160 threads active. POWER8; 3.5GHz, up to 1TB memory. • Intel Xeon data is based on published data of Intel Server Systems R2208WTTYS running STREAM Triad; 24 cores / 24 of 48 threads active, E5-2690 v3; 2.3GHz

Figure 3: STREAM Triad

4 IBM Power Systems August 2017 Solution Brief

Compelling throughput on GPU computing applications and workloads: • Up to a 7.3X improvement in NAMD performance 8 7.3X by adding NVIDIA Tesla P100 GPUs to the workload 7 • Up to 2.7X the throughput of Kinetica Filter-by-Location

queries through POWER8, Tesla P100, and NVIDIA e 6 NVLink, as compared to a competing server 5 manc or • f

Up to 2.91X the realized CPU:GPU bandwidth of x86 r 4 servers featuring PCI-E x16 3.0, unleashing custom code pe ve i 3

Additional Application Proof Points available at: at l e

https://www.ibm.com/developerworks/linux/perfcol/ R 2

1 ) 210 2.7X 0 180 1,000s

( STMV

150 S822LC 20c/2.86GHz /1x Te sla P100 GPUs r Hour 120 S822LC/20c/2.86GHz es Pe 90

ue ri • Results are based on IBM internal testing of systems running Q 60 NAMD version 2.11 STMV code benchmarked on POWER8 systems with NVIDIA Tesla P100 GPUs Individual results will vary depending on individual 30 workloads, configurations and conditions. • IBM Power Systems S822LC; 20 cores / 160 threads, POWER8 with NVLink; 2.86GHz, 256GB memory Number of 0 • IBM Power Systems S822LC; 20 cores / 160 threads, System Type POWER8 with NVLink, 2.86GHz, 256GB memory, 2 Tesla P100 GPUs

x86 Competitor, 20c/2.4GHz/4 Te sla P100 GPUs S822LC for HPC, 20c/2.86GHz/4 Te sla P100 GPUs Figure 6: NAMD performance

40

) 2.9X c

• Results are based on IBM internal testing of Kinetica Filter-by-Location se /

query with 280,000,000 records B 32 Individual results will vary depending on individual (G 34.1 workloads, configurations and conditions. • IBM Power Systems S822LC for HPC; 20 cores / 160 threads, 24 POWER8 with NVLink, 2.86GHz, 1024GB, 4 Te sla P100 with NVLink GPUs • x86 Competitor, 20 cores / 40 threads, Xeon E5-2640 v4; 2.4GHz, 512GB, 4 Tesla P100 PCI-E GPUs 16 A H2D Bandwidth UD C 8 11.7

0 Figure 5: Kinetica accelerated database performance IBM S822LC x86 Competitor POWER8 E5-2640 v4 with NVLink Te sla K80 Device 0 Tesla P100

• IBM Power Systems S822LC results are based on IBM internal measurements of CUDA H2D BW Test; 20 cores / POWER8 with NVLink; 2.86GHz, up to 256GB memory, 1 Tesla P100 • Intel Xeon data is based on IBM internal testing ; 20 cores / 40 threads active, Xeon E5-2640 v4 2.4GHz, Tesla K80 GPU Device 0. Te st executed measures bandwidth solely to Device 0 (of devices 0, 1).

Figure 7: CUDA H2D bandwidth for developers

5 IBM Power Systems August 2017 Solution Brief

Workflow-based design with software solutions — IBM Spectrum Computing workload and infra- defined infrastructure structure management, IBM Spectrum Scale storage, and Software defined infrastructure (SDI) provides a complete optimized HPC libraries — SDI delivers a flexible solution HPC software solution customizable based on your needs. for all cluster sizes, accommodating changing needs. Incorporating both community and IBM-supported software

High Performance Hadoop/Big Data High Performance Application Analytics Computing Frameworks (Low Latency Parallel) (Batch, Serial, MPI, Workflow) (Long-running Applications)

Hadoop Cloudera Hadoop NCBI BLAST Spark Algorithmics IBM InfoSphere ECL To mcat R BigInsights MongoDB OpenFOAM Jenkins Example Cassandra Applications Homegrown Homegrown IBM IBM Spectrum Workload IBM Spectrum Spectrum Symphony ® Conductor Engines Symphony IBM Spectrum LSF (MapReduce)

Resource Management IBM Spectrum Computing Data and Storagge Management IBM Spectrum Scale Infra. Mangemeng,g t, Cloud Busting IBM Spectrum Cluster Foundation IBM High Performance Services (Cloud Bursting)

Flash Tape Disk Power x86 on z Docker VM

On premises, On cloud, Hybrid Infrastructure (heterogeneous distributed computing and storage environment)

Figure 8: Investments in software defined infrastructure: Indicative of workflow-based design

6 IBM Power Systems August 2017 Solution Brief

IBM HPC software optimized MPI with IBM Spectrum MPI, drop-in acceleration of for Power Systems OpenMP applications on CPU or Tesla GPU with IBM IBM HPC software is designed to seamlessly exploit PESSL, and IBM XL C++/ compilers for parallel and deliver optimal performance of IBM Power Systems development. HPC clusters. Then, put your performance optimized applications to work with Libraries and development tools ensure you can easily reap maximum efficiency with IBM Spectrum workload management the performance benefits of specialized hardware and data- tools. Supply them with data through IBM Spectrum Scale: a centric system design, including support for CUDA-aware- scalable, reliable, high-performance parallel .

Products Client benefits

Systems management IBM Spectrum Cluster • Ease of use: web portal Foundation • Customizable: admin productivity xCAT • Faster time to system productivity • Robust monitoring

Application runtime IBM Spectrum MPI • Optimize parallel runtime ESSL/PESSL • Optimized LAPACK and ScaLPACK libraries • User-controlled workflow support CUDA runtime

Development productivity Parallel Performance Toolkit • Modern application development environment using Eclipse IBM XL Compiler Suite • Performance analysis tools to help analyze applications • Optimized complier for IBM Power Systems Rogue Wave TotalView debugger

Workload management IBM Spectrum LSF • Optimize utilization of resources • Policy-aware and resource-aware scheduling

Data management IBM Spectrum Scale • Scalable/reliable storage for parallel filesystem (Elastic Storage Server HPSS solution also available) • ILM for transparent migration of data from storage to tape and back IBM Spectrum Protect • Enhance availability with RAID-based ESS and tape

Application environment IBM Spectrum • Simplify job submission for repeatable workload Conductor • Customizable • Faster time to system productivity

Table 2: Benefits of IBM and partner technologies for various use cases

7 IBM Power Systems August 2017 Solution Brief

IBM Power Accelerated Computing Roadmap

Mellanox ConnectX-4 ConnectX-4 ConnectX-5 Interconnect EDR Infiniband EDR Infiniband HDR Infiniband PCIe Gen3 CAPI over PCIe Gen3 Enhanced CAPI over PCIe Gen4

NVIDIA Kepler Pascal Volta GPUs PCIe Gen3 NVLink Enhanced NVLink

POWER9

IBM CAPI and CPUs POWER8 NVLink

2015 2016 2017

Server

S822LC S822LC for HPC

Figure 9: IBM Power Accelerated Computing Roadmap

Differentiated acceleration Acceleration is critical to building leading HPC clusters. CAPI-attached accelerators IBM Power Systems offers choice and flexibility for hardware acceleration of HPC and HPDA workloads. Two different POWER8 options for differentiated acceleration are available: Coherence Bus FPGA or ASIC • CAPI (Coherent Accelerator Processor Interface): Memory CAPP and cache coherency, treating the accelerator as a peer- New ecosystems with CAPI processor with virtual addressing. For select network, • Technical and programming ease: virtual addressing, cache coherence compute, and storage accelerators. • Accelerator is hardware peer • NVIDIA NVLink: A broader, fatter pipe to NVIDIA GPUs than ever before, enabling the faster host-device, device- POWER8 with NVLink & NVIDIA Te sla P100 GPUs Tesla P100 Tesla P100 device communication many HPC applications require. 40+ 40 GB/s HBM2 HBM2 POWER8 with NVLink Available now in the Power Systems S822LC for HPC, 40+ 40 GB/s40+ 40 GB/s POWER8 with NVLink delivers a 2.5X faster CPU-to-GPU interface than PCI-E x16 3.0, enabling ultra-fast memory System POWER8 memory access between CPU and GPU when combined with Unified with NVLink Memory and NVIDIA Page Migration Engine. The platform Innovative Systems with NVLink also provides improved GPU-to- GPU link bandwidth. • Faster GPU-GPU communication • Breaks down barriers between CPU and GPU • New system architectures Previous barriers related to difficulty of data movement, memory capacity and the burden of custom coding for data management can now make way for GPU acceleration, Figure 10: Differentiated accelerator interfaces: CAPI and NVLink opening up new application classes to accelerated computing. 8 IBM Power Systems August 2017 Solution Brief

6 continents

300+ members 100+ innovations under way

30 countries 90+ technologies revealed

Figure 11: Revolutionizing computing through open innovation

Revolutionizing computing This brings the leading processor together with the best of through open innovation our partners and end users across the ecosystem — from HPC As a founding member of the OpenPOWER Foundation with and HPDA, to hyperscale data centers, to system designers NVIDIA, , and others, IBM has broadened access to worldwide. Learn more about the ecosystem at the Power architecture with accelerators. www.openpowerfoundation.org.

9 IBM Power Systems August 2017 Solution Brief

Delivering accelerated application performance for HPC Your applications run on the POWER8 platform, often with far superior performance and accelerated computing support. *GPU Supported. Talk to your IBM salesperson for the latest version of A sampling of HPC applications suited for IBM Power the IBM HPC Applications Summary (.biz/hpcapplications) Systems HPC servers: Hundreds of thousands of additional non-HPC packages are offered in ppcle Linux distros. Explore at ibm.biz/ospat-tool

Astrophysics

GADGET HACC p-GADGET Peasoup PLUTO

Bio and Life Sciences Genomics (Many Available/Bundled via BioBuilds)

ABySS CLARK GATK NovelSeq salmon SoapFuse ALLPATHS-LG CoNIFER GenomeFisher nose (Library) samblaster Spaces, Spades BALSA Cufflinks GenomicConsensus Illumina (ISAAC) Sam2Bam SplazerS bamkit cutadapt HISAT PairHMM Sambamba SQLite BarraCUDA Databiology HMMER PARASIGHT SAMtools Sratoolkig BEAST DELLY2 HTSeq PHYLIP scalpel STAR-Fusion bcftools diamond HTSlib PICARD scikit-bio Tabix BEDtools drFAST IGV Pindel seqtk Tablet BEDOPS DupMasker Juicer PLINK, plink-ng setuptools (Library) TASSEL BFAST ELSA Kraken PRADA SHRiMP T-C of fe e Bioconductor EMBOSS Lighter Primer3 SnpEff, SnpSift TMAP BLAST ESP LoFreq RAxML SNAP TopHat Boost (Library) FASTA /SW LUMPY RepeatMasker Snippy Trimmomatic Bowtie/Bowtie 2 FASTX-Toolkit Mothur R-EBSeq snpeff Trinity BWA FastQC MrBayes RNA-star SOAP3-dp VCFtools chimerascan FreeBayes MrFAST/MrsFAST RSEM SOAPaligner/SOAP2 Velvet/Oases Churchill GCTA MUSCLE Sailfish SOAPDenovo Zlib

Bio and Life Sciences Bioinformatics/Translational Medicine

ACUMI Bio Builds BioVelocity Galaxy LoFreq Zato Analytics bioPython IGV tranSMART Suite

Bio and Life Sciences Molecular Dynamics, Computational Chemistry

AMBER CHARMM GROMACS NAMD VMD QMCPACK CoMD CPMD MAFIA Nest Q-Box Quantum Espresso

CFD/CAE

AMG2013 Culises LBM D2Q37 LS-DYNA Ludwig OpenFOAM A LYA Code-Saturne (Lattice-Boltzmann) MiniGhost Nekbone SU2 AVUS Lattice -Boltzmann

10 IBM Power Systems August 2017 Solution Brief

Chemistry and Physics

B-CALM GAMESS KKRnano Lulesh LSQR SNAP DL-POLY Heat3d Lattice QCD, QUDA LSMS MCB UMT2013 VASP

Databases

Kinetica MapD

Deep Learning (Many frameworks in the PowerAI Software Distribution)

Caffe caffe-nv CNTK DIGITS Theano Torch caffe-ibm Chainer TensorFlow PowerAI

Finance and Math

Altimesh Hybridizer STAC-A2 STAC-M3 Julia

Libraries

AmgX cuBLAS cuDNN cuSOLVER OpenBLAS NPP AMG2013 CUDA Math Lib cuFFT FFTW (vectorized) NCCL SciPy Atlas cuRAND/cuSPARSE LIBLINEAR NumPy Thrust

Metadata

HOMP iRODS MODS Nirvana OpenARC PyReshaper

Geosciences, Oil and Gas

Stone Ridge Echelon heat3d RTM Kernel (IBM) SeisSol

Programming Tools, Specialized Languages

Allinea XL C/C++ MODS PGI Fortran Python (Supporting R GCC PGI Accelerator C/C++ XL Fortran OpenARC Library) R tidyverse, R cowplot

Utilities, Workload Orchestration

IBM ILOG® LuaJIT WSMP Spectrum Cluster Spectrum LSF Spectrum Conductor Panasas DirectFlow Foundation

Weather

AROME Cosmo SVN HYCOM MG2 MPAS-A RegCM CamSE JURASSIC LES Meso-NH POPPerf WRF

11 © Copyright IBM Corporation 2017

IBM Corporation IBM Systems Route 100 Somers, NY 10589

Produced in the United States of America August 2017

IBM, the IBM logo, ibm.com, , IBM Spectrum , IBM Elastic Storage, FlashSystem, ILOG, LSF, POWER8, Power Systems, Spectrum Scale, and Spectrum Storage are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

SoftLayer is a registered trademark of SoftLayer, Inc., an IBM Company.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANT­ ABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

Please Recycle

POS03149-USEN-09