IBM Power Systems Performance Report
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Investigations on Hardware Compression of IBM Power9 ProcessorsInvestigations on hardware compression of IBM Power9 processors Jérome Kieffer, Pierre Paleo, Antoine Roux, Benoît Rousselle Outline ● The bandwidth issue at synchrotrons sources ● Presentation of the evaluated systems: – Intel Xeon vs IBM Power9 – Benchmarks on bandwidth ● The need for compression of scientific data – Compression as part of HDF5 – The hardware compression engine NX-gzip within Power9 – Gzip performance benchmark – Bitshuffle-LZ4 benchmark – Filter optimizations – Benchmark of parallel filtered gzip ● Conclusions – on the hardware – on the compression pipeline in HDF5 Page 2 HDF5 on Power9 18/09/2019 Bandwidth issue at synchrotrons sources Data analysis computer with the main interconnections and their associated bandwidth. Data reduction Upgrade to → Azimuthal integration 100 Gbit/s Data compression ! Figures from former generation of servers Kieffer et al. Volume 25 | Part 2 | March 2018 | Pages 612–617 | 10.1107/S1600577518000607 Page 3 HDF5 on Power9 18/09/2019 Topologies of Intel Xeon servers in 2019 Source: intel.com Page 4 HDF5 on Power9 18/09/2019 Architecture of the AC922 server from IBM featuring Power9 Credit: Thibaud Besson, IBM France Page 6 HDF5 on Power9 18/09/2019 Bandwidth measurement: Xeon vs Power9 Computer Dell R840 IBM AC922 Processor 4 Intel Xeon (12 cores) 2 IBM Power9 (16 cores) 2.6 GHz 2.7 GHz Cache (L3) 19 MB 8x 10 MB Memory channels 4x 6 DDR4 2x 8 DDR4 Memory capacity → 3TB → 2TB Memory speed theory 512 GB/s 340 GB/s Measured memory speed 160 GB/s 270 GB/s Interconnects PCIe v3 PCIe v4 NVlink2 & CAPI2 GP-GPU co-processor 2Tesla V100 PCIe v3 2Tesla V100 NVlink2 Interconnect speed 12 GB/s 48 GB/s CPU ↔ GPU Page 8 HDF5 on Power9 18/09/2019 Strength and weaknesses of the OpenPower architecture While amd64 is today’s de facto standard in HPC, it has a few competitors: arm64, ppc64le and to a less extend riscv and mips64.
- 
												  IBM Systems GroupIBM Systems Group IBM ~ p5 520 IBM ~ p5 550 Milan Patel Offering Manager, pSeries Entry Servers v 3.04 IBM Confidential © 2004 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to clients. IBM Systems Group Agenda ❧Learning Objectives ❧Offering Description – p5-520 1.65GHz (2 way) – p5-550 1.65GHz (2-4way) ❧Selling Scenarios ❧Pricing and Positioning – The Value Paks New – Portfolio Positioning – Market Positioning – Retention Positioning ❧Target Sectors and Applications ❧Speeds and Feeds © 2004 IBM Corporation page 2 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group Learning Objectives At the conclusion of this material, you should be able to: ❧ Articulate the key messages and value-prop of ~ p5-520, ~ p5-550 ❧ Identify the opportunities and target sectors for the ~ p5-520, ~ p5-550 ❧ Articulate the enhancements and benefits of the ~ p5-520, ~ p5-550 ❧ Explain how Advanced POWER™ virtualization can help to reduce costs and simplify customer environments © 2004 IBM Corporation page 3 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group Offerings Overview p5-520 and p5-550 © 2004 IBM Corporation page 4 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group IBM ~ p5 520 What is p5-520? ❧ p5-520 is a 2 way entry server that complements the p5-550 in the entry space ❧ p5-520 delivers - – Outstanding performance –
- 
												  March 11, 2010 PresentationIBM Power Systems POWER7TM Announcement, The Next Generation of Power Systems Power your planet. February 25, 2010 IBM Power Systems 2 February 25, 2010 IBM Power Systems POWER7 System Highlights .Balance System Design - Cache, Memory, and IO .POWER7 Processor Technology - 6th Implementation of multi-core design - On chip L2 & L3 caches .POWER7 System Architecture - Blades to High End offerings - Enhances memory implementation - PCIe, SAS / SATA .Built in Virtualization - Memory Expansion - VM Control .Green Technologies - Processor Nap & Sleep Mode - Memory Power Down support - Aggressive Power Save / Capping Modes 600 500 .Availability 400 - Processor Instruction Retry 300 - Alternate Process Recovery 200 100 - Concurrent Add & Services 0 JS23 JS43 520 550 560 570/16 570/32 595 3 February 25, 2010 IBM Power Systems 4 February 25, 2010 IBM Power Systems Power Processor Technology IBM investment in the Power Franchise Dependable Execution for a decade POWER8 POWER7 45 nm Globalization and globally POWER6 available resources 65 nm •Performance/System Capacity POWER5 •4-5X increase from Power6 130 nm •Multi Core – Up to 8 POWER4 •SMT4 – 4 threads/core 180 nm . Dual Core •On-Chip eDRAM . High Frequencies • Energy . Dual Core . Virtualization + . Enhanced Scaling • Efficiency: 3-4X Power6 . Memory Subsystem + . SMT • Dynamic Energy . Dual Core . Altivec . Distributed Switch + Management . Chip Multi Processing . Instruction Retry . Distributed Switch . Core Parallelism + • Reliability + . Dyn Energy Mgmt . Shared L2 . FP Performance + . SMT + •Memory DIMM – DRAM . Dynamic LPARs (32) . Memory bandwidth + . Protection Keys Sparing . Virtualization •N+2 Voltage Regulator Redundancy •Protection Keys + 5 February 25, 2010 IBM Power Systems POWER6 – POWER7 Compare Wireless world Mobile platforms are developing as new means of identification.
- 
												  Memory Centric Characterization and Analysis of SPEC CPU2017 SuiteSession 11: Performance Analysis and Simulation ICPE ’19, April 7–11, 2019, Mumbai, India Memory Centric Characterization and Analysis of SPEC CPU2017 Suite Sarabjeet Singh Manu Awasthi [email protected] [email protected] Ashoka University Ashoka University ABSTRACT These benchmarks have become the standard for any researcher or In this paper, we provide a comprehensive, memory-centric charac- commercial entity wishing to benchmark their architecture or for terization of the SPEC CPU2017 benchmark suite, using a number of exploring new designs. mechanisms including dynamic binary instrumentation, measure- The latest offering of SPEC CPU suite, SPEC CPU2017, was re- ments on native hardware using hardware performance counters leased in June 2017 [8]. SPEC CPU2017 retains a number of bench- and operating system based tools. marks from previous iterations but has also added many new ones We present a number of results including working set sizes, mem- to reflect the changing nature of applications. Some recent stud- ory capacity consumption and memory bandwidth utilization of ies [21, 24] have already started characterizing the behavior of various workloads. Our experiments reveal that, on the x86_64 ISA, SPEC CPU2017 applications, looking for potential optimizations to SPEC CPU2017 workloads execute a significant number of mem- system architectures. ory related instructions, with approximately 50% of all dynamic In recent years the memory hierarchy, from the caches, all the instructions requiring memory accesses. We also show that there is way to main memory, has become a first class citizen of computer a large variation in the memory footprint and bandwidth utilization system design.
- 
												  Overview of the SPEC Benchmarks9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
- 
												  Oracle Corporation: SPARC T7-1SPEC CINT2006 Result spec Copyright 2006-2015 Standard Performance Evaluation Corporation Oracle Corporation SPECint_rate2006 = 1200 SPARC T7-1 SPECint_rate_base2006 = 1120 CPU2006 license: 6 Test date: Oct-2015 Test sponsor: Oracle Corporation Hardware Availability: Oct-2015 Tested by: Oracle Corporation Software Availability: Oct-2015 Copies 0 300 600 900 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 5600 6000 6400 6800 7600 1100 400.perlbench 192 224 1040 675 401.bzip2 256 224 666 875 403.gcc 160 224 720 1380 429.mcf 128 224 1160 1190 445.gobmk 256 224 1120 854 456.hmmer 96 224 813 1020 458.sjeng 192 224 988 7550 462.libquantum 416 224 7330 1190 464.h264ref 256 224 1150 956 471.omnetpp 255 224 885 986 473.astar 416 224 862 1180 483.xalancbmk 256 224 1140 SPECint_rate_base2006 = 1120 SPECint_rate2006 = 1200 Hardware Software CPU Name: SPARC M7 Operating System: Oracle Solaris 11.3 CPU Characteristics: Compiler: C/C++/Fortran: Version 12.4 of Oracle Solaris CPU MHz: 4133 Studio, FPU: Integrated 4/15 Patch Set CPU(s) enabled: 32 cores, 1 chip, 32 cores/chip, 8 threads/core Auto Parallel: No CPU(s) orderable: 1 chip File System: zfs Primary Cache: 16 KB I + 16 KB D on chip per core System State: Default Secondary Cache: 2 MB I on chip per chip (256 KB / 4 cores); Base Pointers: 32-bit 4 MB D on chip per chip (256 KB / 2 cores) Peak Pointers: 32-bit L3 Cache: 64 MB I+D on chip per chip (8 MB / 4 cores) Other Software: None Other Cache: None Memory: 512 GB (16 x 32 GB 4Rx4 PC4-2133P-L) Disk Subsystem: 732 GB, 4 x 400 GB SAS SSD
- 
												  POWER® Processor-Based SystemsIBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
- 
												  Openpower AI CERN V1.PdfMoore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
- 
												  Multiply Your Power!Matthias Koechl Senior IT Architect IBM SAP Competence Center Walldorf, Germany Multiply Your Power! - combine the power of SAP® business solutions with IBM Power System™ and AIX™ strengths Power your planet Last Update 09/2010 by MK © 2010 IBM Corporation IBM SAP Alliance 2 ISICC Walldorf - MK, 09/2010 © 2010 IBM Corporation IBM SAP Alliance Agenda IBM Power System™ News Principles of POWER™ virtualization Benefits of POWER for SAP landscapes AIX for SAP Business Applications System Management 3 ISICC Walldorf - MK, 09/2010 © 2010 IBM Corporation IBM SAP Alliance Agenda IBM Power System™ News Principles of POWER™ virtualization Benefits of POWER for SAP landscapes AIX for SAP Business Applications System Management 4 ISICC Walldorf - MK, 09/2010 © 2010 IBM Corporation IBM SAP Alliance IBM’s 10 years march to Unix leadership UNIX Server Rolling Four Quarter Average Revenue Share 45% POWER6 40% Live Partition Mobility POWER7 35% POWER5 Workload Optimized Micro -Partitioning Leadership 30% POWER4 25% Dynamic LPARs 20% HP Sun IBM Source: IDC Quarterly Server Tracker May 2010 15% 0 1 10 101 30 307 109 110 Q Q300 Q Q Q102 Q302 Q103 Q303 Q104 Q304 Q105 Q305 Q106 Q306 Q107 Q Q108 Q308 Q Q309 Q For SAP new installs on UNIX platform Power System w/ AIX market share is in the 50% range Even better than overall Unix share of AIX SAP customers rely on the excellent Power platform! 5 ISICC Walldorf - MK, 09/2010 © 2010 IBM Corporation IBM SAP Alliance Power Systems: Only UNIX platform to grow Source: IDC Server Tracker Q104 - Q109 Server Tracker, 06/09, rolling four quarter average 6 ISICC Walldorf - MK, 09/2010 © 2010 IBM Corporation IBM SAP Alliance 2,100 successful Power Migration Factory migrations to date.
- 
												  Poweredge R940 (Intel Xeon Gold 5122, 3.60 Ghz) Specint Rate Base2006 = 1080 CPU2006 License: 55 Test Date: Jun-2017 Test Sponsor: Dell IncSPEC CINT2006 Result spec Copyright 2006-2017 Standard Performance Evaluation Corporation Dell Inc. SPECint_rate2006 = 1150 PowerEdge R940 (Intel Xeon Gold 5122, 3.60 GHz) SPECint_rate_base2006 = 1080 CPU2006 license: 55 Test date: Jun-2017 Test sponsor: Dell Inc. Hardware Availability: Jul-2017 Tested by: Dell Inc. Software Availability: Nov-2016 Copies 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 18000 917 400.perlbench 32 32 762 521 401.bzip2 32 32 487 808 403.gcc 32 32 802 1480 429.mcf 32 624 445.gobmk 32 2010 456.hmmer 32 32 1530 710 458.sjeng 32 32 658 17700 462.libquantum 32 1170 464.h264ref 32 32 1130 554 471.omnetpp 32 32 509 614 473.astar 32 1430 483.xalancbmk 32 SPECint_rate_base2006 = 1080 SPECint_rate2006 = 1150 Hardware Software CPU Name: Intel Xeon Gold 5122 Operating System: SUSE Linux Enterprise Server 12 SP2 CPU Characteristics: Intel Turbo Boost Technology up to 3.70 GHz 4.4.21-69-default CPU MHz: 3600 Compiler: C/C++: Version 17.0.3.191 of Intel C/C++ FPU: Integrated Compiler for Linux CPU(s) enabled: 16 cores, 4 chips, 4 cores/chip, 2 threads/core Auto Parallel: Yes CPU(s) orderable: 2,4 chip File System: xfs Primary Cache: 32 KB I + 32 KB D on chip per core System State: Run level 3 (multi-user) Secondary Cache: 1 MB I+D on chip per core Base Pointers: 32-bit L3 Cache: 16.5 MB I+D on chip per chip Peak Pointers: 32/64-bit Other Cache: None Other Software: Microquill SmartHeap V10.2 Memory: 768 GB (48 x 16 GB 2Rx8 PC4-2666V-R) Disk Subsystem: 1 x 960 GB SATA SSD Other Hardware: None Standard Performance Evaluation Corporation [email protected] Page 1 http://www.spec.org/ SPEC CINT2006 Result spec Copyright 2006-2017 Standard Performance Evaluation Corporation Dell Inc.
- 
												  IBM Power Systems Performance Report Apr 13, 2021IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+)
- 
												  Towards a Portable Hierarchical View of Distributed Shared Memory Systems: Challenges and SolutionsTowards A Portable Hierarchical View of Distributed Shared Memory Systems: Challenges and Solutions Millad Ghane Sunita Chandrasekaran Margaret S. Cheung Department of Computer Science Department of Computer and Information Physics Department University of Houston Sciences University of Houston TX, USA University of Delaware Center for Theoretical Biological Physics, [email protected],[email protected] DE, USA Rice University [email protected] TX, USA [email protected] Abstract 1 Introduction An ever-growing diversity in the architecture of modern super- Heterogeneity has become increasingly prevalent in the recent computers has led to challenges in developing scientifc software. years given its promising role in tackling energy and power con- Utilizing heterogeneous and disruptive architectures (e.g., of-chip sumption crisis of high-performance computing (HPC) systems [15, and, in the near future, on-chip accelerators) has increased the soft- 20]. Dennard scaling [14] has instigated the adaptation of heteroge- ware complexity and worsened its maintainability. To that end, we neous architectures in the design of supercomputers and clusters need a productive software ecosystem that improves the usability by the HPC community. The July 2019 TOP500 [36] report shows and portability of applications for such systems while allowing how 126 systems in the list are heterogeneous systems confgured every parallelism opportunity to be exploited. with one or many GPUs. This is the prevailing trend in current gen- In this paper, we outline several challenges that we encountered eration of supercomputers. As an example, Summit [31], the fastest in the implementation of Gecko, a hierarchical model for distributed supercomputer according to the Top500 list (June 2019) [36], has shared memory architectures, using a directive-based program- two IBM POWER9 processors and six NVIDIA Volta V100 GPUs.