Power8 Quser Mspl Nov 2015 Handout

Total Page:16

File Type:pdf, Size:1020Kb

Power8 Quser Mspl Nov 2015 Handout IBM Power Systems Power Systems Hardware: Today and Tomorrow November 2015 Mark Olson [email protected] © 2015 IBM Corporation IBM Power Systems POWER8 Chip © 2015 IBM Corporation IBM Power Systems Processor Technology Roadmap POWER11 Or whatever it is POWER10 named Or whatever it is POWER9 named Or whatever it is named POWER7 POWER8 POWER6 45 nm 22 nm POWER5 65 nm 130 nm POWER4 90 nm 180 nm 130 nm 2001 2004 2007 2010 2014 Future 3 © 2015 IBM Corporation IBM Power Systems Processor Chip Comparisons POWER5 POWER6 POWER7 POWER7+ POWER8 2004 2007 2010 2012 45nm SOI 32nm SOI 22nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM eDRAM Compute Cores 2 2 8 8 12 Threads SMT2 SMT2 SMT4 SMT4 SMT8 Caching On-chip 1.9MB (L2) 8MB (L2) 2 + 32MB (L2+3) 2 + 80MB (L2+3) 6 + 96MB (L2+3) Off-chip 36MB (L3) 32MB (L3) None None 128MB (L4) Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s 4 © 2015 IBM Corporation IBM Power Systems Processor Designs POWER5+ POWER6 POWER7 POWER7+ POWER8 Max cores 4 2 8 8 12 Technology 90nm 65nm 45nm 32nm 22nm Size 245 mm2 341 mm2 567 mm2 567 mm2 650 mm2 * Transistors 276 M 790 M 1.2 B 2.1 B 4.2 B * 1.9 4 - 5 3 – 4 Up to 4.4 Up to 4.1 Frequencies GHz GHz GHz GHz GHz ** SMT (threads) 2 2 4 4 8 L2 Cache 1.9MB Shared 4MB / Core 256KB / core 256KB / core 512KB/core 4MB / Core 10MB / Core 8MB / Core L3 Cache 36MB 32MB On chip On chip On chip L4 Cache -- -- -- -- Up to 128MB Bandwidth Sust memory 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s * with 12-core chip 5 ©** 2015 announced IBM Corporation so far IBM Power Systems Processor Designs POWER4 POWER4+ POWER5 POWER5+ POWER6 POWER7 POWER7+ POWER8 Max cores 2 2 2 4 2 8 8 12 Technology 180nm 130nm 130nm 90nm 65nm 45nm 32nm 22nm Size 412mm2 mm2 389 mm2 245 mm2 341 mm2 567 mm2 567 mm2 650 mm2 * Transistors 170 M 180 M 276 M 276 M 790 M 1.2 B 2.1 B 4.2 B * 1.65-1.9 1.9-2.2 3 - 5 – Up to 4.4 3 -- 4.35 Frequencies 3 4.25 1.3 GHz 1.9 GHz GHz GHz GHz GHz GHz GHz SMT (threads) 1 1 2 2 2 4 4 8 1.4 MB 1.5 MB 1.9MB 1.9MB 256KB / 256KB / L2 Cache Shared Shared Shared Shared 4MB / Core core core 512KB/core 4MB / Core 10MB / Core 8MB / Core L3 Cache 32MB 32MB 36MB 36MB 32MB On chip On chip On chip Up to L4 Cache -- -- -- -- -- -- -- 128MB Bandwidth Sust memory 6GB/s 6GB/s 15GB/s 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 2GB/s 2GB/s 6GB/s 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s * with 12-core chip 6 © 2015 IBM Corporation IBM Power Systems Innovation Drives Performance Gain by Technology Scaling Gain by Innovation 100% 80% Relative % 60% of Improvement 40% 20% 0% 180 nm 130 nm 90 nm 65 nm 45 nm 32 nm 22 nm 14 & 7 IBM plans for future 22 nm technology are subject to change. 7 © 2015 IBM Corporation IBM Power Systems POWER8 Chip Packaging Technology • 22nm SOI, eDRAM, 650mm2 • 15-layers = great bandwidth Accelerators Core Core Core Links SMP Core Core Core L2 L2 L2 L2 L2 L2 8M L3 Region Mem. Ctrl. L3 Cache & Chip Interconnect Mem. Ctrl. L2 L2 L2 LinksSMP L2 L2 L2 PCIe Core Core Core Core Core Core 8 © 2015 IBM Corporation IBM Power Systems World Class 22nm Semiconductor Technology Silicon On Insulator -Faster Transistor, Less Noise On-chip eDRAM - 6x latency improvement - No off-chip signaling rqmt - 8x bandwidth improvement - 3x less area than SRAM 22nm 15-layer copper wire - 5x less energy than SRAM 22nm eDRAM Cell Dense interconnect - Faster connections - Low latency distance paths - High density complex circuits - 2X wire per transistor © 2015 IBM Corporation IBM Power Systems IBM 22nm Technology 15 Low level wires used for dense local circuit interconnect. 15 Layer IBM 14 Top level wires used for metal stack for power distribution, clocks, and off-chip signaling. 13 12 11 10 9 8 8 7 9 6 6 5 7 3 4 4 1 2 5 3 2 9 Layer 1 (industry) metal stack © 2015 IBM Corporation IBM Power Systems POWER8 Chip Packaging Cores • 12 cores (SMT8) Accelerators • 8 dispatch, 10 issue, 16 exec pipe Core Core Core Links SMP Core Core Core • 2X internal data flows/queues • Enhanced prefetching L2 L2 L2 L2 L2 L2 8M L3 Region Caches Mem. Ctrl. L3 Cache & Chip Interconnect Mem. Ctrl. • 64K Data cache (L1) • 512 KB SRAM L2 / core Links SMP • 96 MB eDRAM shared L3 L2 L2 L2 PCIe L2 L2 L2 • Up to 128 MB eDRAM L4 (off-chip) Core Core Core Core Core Core Accelerators • Crypto & memory expansion • Transactional Memory Energy Management • Data Move / VM Mobility • On-chip Power Management Micro-controller • Integrated Per-core VRM Bus Interfaces • Critical Path Monitors • Integrated PCIe Gen3 • SMP Interconnect • CAPI Memory • Dual memory Controllers • 230 GB/sec Sustained bandwidth 11 © 2015 IBM Corporation IBM Power Systems POWER8 Memory Buffer Chip DRAM Memory POWER8 Chips Buffer Chip “L4 cache” Intelligence Moved into Memory • Previously on POWER7+ chip onto buffer Processor Interface • High speed interface Performance Value 12 © 2015 IBM Corporation IBM Power Systems Memory Bandwidth per Socket POWER8 POWER7 POWER6 POWER5 0 10 20 30 40 50 60 70 GB/Sec 13 © 2015 IBM Corporation IBM Power Systems Memory Bandwidth per Socket Reset the scale POWER8 POWER7 POWER6 POWER5 0 50 100 150 200 GB/Sec 14 © 2015 IBM Corporation IBM Power Systems POWER8 Memory Bandwidth per Socket POWER8 POWER7 POWER6 POWER5 0 50 100 150 200 250 GB/Sec 15 © 2015 IBM Corporation IBM Power Systems POWER8 Integrated PCI Gen 3 POWER8 Chip POWER7 Chip GX PCIe Gen3 Bus I/O Bridge PCIe Gen2 PCI Device PCI Devices 16 © 2015 IBM Corporation IBM Power Systems Power 770/780 Node I/O Bandwidth (System node or processor enclosure or CEC drawer) POWER7+ 770 POWER7 770 POWER6 570 0 20 40 60 80 GB/Sec 17 © 2015 IBM Corporation IBM Power Systems Power 770/780 Node I/O Bandwidth (System node or processor enclosure or CEC drawer) Reset the scale POWER7+ 770 POWER7 770 POWER6 570 0 50 100 150 200 250 300 GB/Sec 18 © 2015 IBM Corporation IBM Power Systems E870/E880 Node I/O Bandwidth (System node or processor enclosure or CEC drawer) POWER8 E870 POWER7+ 770 POWER7 770 POWER6 570 0 50 100 150 200 250 300 GB/Sec 19 © 2015 IBM Corporation IBM Power Systems IO Bandwidth Comparing 2-Socket Servers Reset the scale POWER8 POWER7+ POWER7 POWER6 0 50 100 150 200 GB/Sec 21 © 2015 IBM Corporation IBM Power Systems POWER8 IO Bandwidth Comparing 2-Socket Servers POWER8 POWER7+ POWER7 POWER6 0 50 100 150 200 GB/Sec 22 © 2015 IBM Corporation IBM Power Systems POWER8 CAPI (Coherent Accelerator Processor Interface) POWER8 POWER8 Coherence Bus Like an “extra” core for the POWER8 chip PCIe Gen3 Transport for encapsulated messages FPGA or ASIC Customizable Hardware / Application Accelerator • Specific system SW, middleware, or user application • Written to durable interface 23 © 2015 IBM Corporation IBM Power Systems First CAPI Solution Example High speed, fast response, social application – example Twitter Enabled by in-memory NoSQL, distributed hash tables Was initially implemented on x86 servers, but limited DRAM memory meant LOTS of servers resulting in a costly, complex infrastructure One 2-socket, 2U server PLUS one FlashSystem Drawer replaced 24 x86 servers Much lower cost of acquisition Much smaller foot print, less energy Much lower operational cost 24 © 2015 IBM Corporation IBM Power Systems Example: CAPI Attached Flash Optimization Application Read/Write Syscall FileSystem strategy() iodone() 20K Instructions LVM strategy() iodone() Disk & Adapter DD Pin buffers, Interrupt, Translate, unmap, Map DMA, unpin,Iodone Start I/O scheduling 25 © 2015 IBM Corporation IBM Power Systems Example: CAPI Attached Flash Optimization Application .Attach flash memory to POWER8 Read/Write Syscall via CAPI coherent Attach FileSystem strategy() iodone() 20K Application Instructions Posix Async aio_read() LVM I/O Style API aio_write() strategy() iodone() User Library Disk & Adapter DD < 500 Shared Memory Pin buffers, Interrupt, Instructions Translate, unmap, Work Queue Map DMA, unpin,Iodone Start I/O scheduling . Issues Read/Write Commands from applications to eliminate 97% of instruction path length CAPI Flash controller Operates in User Space . Saves 10 Cores per 1M IOPs 26 © 2015 IBM Corporation IBM Power Systems POWER8 Leapfrogs Memory Bandwidth I/O Bandwidth POWER8POWER8 POWER7POWER7+ POWER6POWER7PCIe Gen3 POWER8 PLUS …. POWER7 .CAPI .Accelerators POWER5POWER6 GX .Transactional Memory Bus PCIe Gen3 I/O .Scaleability Bridge 00 5050 100100.Smart use150150 of energy 200 PCIe Gen2 PCI .… and more PCI Device Devices 27 © 2015 IBM Corporation IBM Power Systems Scale-out CPW Comparisons .720 POWER7+ (1 socket) .S814 (1 socket) – 4-core 3.6 GHz 28,400 +40% – 4-core 3.0 GHz 39,500 – 6-core 3.6 GHz 42,400 – 6-core 3.0 GHz 59,500 – 8-core 3.6 GHz 56,300 – 8-core 3.7 GHz 85,500 +50% ~ GHz .740 POWER7+ (1 or 2 socket) .S824 (1 or 2 socket) – 6-core 4.2 GHz 49,000 – 6-core 3.8 GHz 72,000 – 12-core 4.2 GHz 91,700 – 12-core 3.8 GHz 130,000 +40% – 8-core 3.6 GHz 56,300 – 8-core 4.1 GHz 94,500 – 16-core 3.6 GHz 106,500 – 16-core 4.1 GHz 173,500 +60% – 8-core 4.2 GHz 64,500 – 12-core 1-socket not offered – 16-core 4.2 GHz 120,000 – 24-core 3.5 GHz 230,500 +90% 28 © 2015 IBM Corporation IBM Power Systems CPW POWER7+ 770 – 16-core 3.8 GHz 110,000 E870 – 32-core 3.8 GHz 191,500 – 32-core 4.02 GHz 359,000 – 48-core 3.8 GHz 290,500 – 64-core 4.02 GHz 711,000 +87% – 64-core 3.8 GHz 379,300 – 40-core 4.19 GHz 460,000 – 12-core 4.2 GHz 90,000 – 80-core 4.19 GHz 911,000 – 24-core 4.2 GHz 154,800 – 36-core 4.2 GHz 242.600 – 48-core 4.2 GHz 306,600 POWER7+ 780 E880 – 32-core 4.35 GHz 381,000 – 16-core 4.4 GHz 123,500 – 64-core 4.35 GHz
Recommended publications
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • Ray Tracing on the Cell Processor
    Ray Tracing on the Cell Processor Carsten Benthin† Ingo Wald Michael Scherbaum† Heiko Friedrich‡ †inTrace Realtime Ray Tracing GmbH SCI Institute, University of Utah ‡Saarland University {benthin, scherbaum}@intrace.com, [email protected], [email protected] Abstract band”) architectures1 to hide memory latencies, at exposing par- Over the last three decades, higher CPU performance has been allelism through SIMD units, and at multi-core architectures. At achieved almost exclusively by raising the CPU’s clock rate. Today, least for specialized tasks such as triangle rasterization, these con- the resulting power consumption and heat dissipation threaten to cepts have been proven as very powerful, and have made modern end this trend, and CPU designers are looking for alternative ways GPUs as fast as they are; for example, a Nvidia 7800 GTX offers of providing more compute power. In particular, they are looking 313 GFlops [16], 35 times more than a 2.2 GHz AMD Opteron towards three concepts: a streaming compute model, vector-like CPU. Today, there seems to be a convergence between CPUs and SIMD units, and multi-core architectures. One particular example GPUs, with GPUs—already using the streaming compute model, of such an architecture is the Cell Broadband Engine Architecture SIMD, and multi-core—become increasingly programmable, and (CBEA), a multi-core processor that offers a raw compute power CPUs are getting equipped with more and more cores and stream- of up to 200 GFlops per 3.2 GHz chip. The Cell bears a huge po- ing functionalities. Most commodity CPUs already offer 2–8 tential for compute-intensive applications like ray tracing, but also cores [1, 9, 10, 25], and desktop PCs with more cores can be built requires addressing the challenges caused by this processor’s un- by stacking and interconnecting smaller multi-core systems (mostly conventional architecture.
    [Show full text]
  • March 11, 2010 Presentation
    IBM Power Systems POWER7TM Announcement, The Next Generation of Power Systems Power your planet. February 25, 2010 IBM Power Systems 2 February 25, 2010 IBM Power Systems POWER7 System Highlights .Balance System Design - Cache, Memory, and IO .POWER7 Processor Technology - 6th Implementation of multi-core design - On chip L2 & L3 caches .POWER7 System Architecture - Blades to High End offerings - Enhances memory implementation - PCIe, SAS / SATA .Built in Virtualization - Memory Expansion - VM Control .Green Technologies - Processor Nap & Sleep Mode - Memory Power Down support - Aggressive Power Save / Capping Modes 600 500 .Availability 400 - Processor Instruction Retry 300 - Alternate Process Recovery 200 100 - Concurrent Add & Services 0 JS23 JS43 520 550 560 570/16 570/32 595 3 February 25, 2010 IBM Power Systems 4 February 25, 2010 IBM Power Systems Power Processor Technology IBM investment in the Power Franchise Dependable Execution for a decade POWER8 POWER7 45 nm Globalization and globally POWER6 available resources 65 nm •Performance/System Capacity POWER5 •4-5X increase from Power6 130 nm •Multi Core – Up to 8 POWER4 •SMT4 – 4 threads/core 180 nm . Dual Core •On-Chip eDRAM . High Frequencies • Energy . Dual Core . Virtualization + . Enhanced Scaling • Efficiency: 3-4X Power6 . Memory Subsystem + . SMT • Dynamic Energy . Dual Core . Altivec . Distributed Switch + Management . Chip Multi Processing . Instruction Retry . Distributed Switch . Core Parallelism + • Reliability + . Dyn Energy Mgmt . Shared L2 . FP Performance + . SMT + •Memory DIMM – DRAM . Dynamic LPARs (32) . Memory bandwidth + . Protection Keys Sparing . Virtualization •N+2 Voltage Regulator Redundancy •Protection Keys + 5 February 25, 2010 IBM Power Systems POWER6 – POWER7 Compare Wireless world Mobile platforms are developing as new means of identification.
    [Show full text]
  • IBM System P5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction
    Giuliano Anselmi Redpaper Bernard Filhol SahngShin Kim Gregor Linzmeier Ondrej Plachy Scott Vetter IBM System p5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction The quad-core module (QCM) is based on the well-known POWER5™ dual-core module (DCM) technology. The dual-core POWER5 processor and the dual-core POWER5+™ processor are packaged with the L3 cache chip into a cost-effective DCM package. The QCM is a package that enables entry-level or midrange IBM® System p5™ servers to achieve additional processing density without increasing the footprint. Figure 1 shows the DCM and QCM physical views and the basic internal architecture. to I/O to I/O Core Core 1.5 GHz 1.5 GHz Enhanced Enhanced 1.9 MB MB 1.9 1.9 MB MB 1.9 L2 cache L2 L2 cache switch distributed switch distributed switch 1.9 GHz POWER5 Core Core core or CPU or core 1.5 GHz L3 Mem 1.5 GHz L3 Mem L2 cache L2 ctrl ctrl ctrl ctrl Enhanced distributed Enhanced 1.9 MB Shared Shared MB 1.9 Ctrl 1.9 GHz L3 Mem Ctrl POWER5 core or CPU or core 36 MB 36 MB 36 L3 cache L3 cache L3 DCM 36 MB QCM L3 cache L3 to memory to memory DIMMs DIMMs POWER5+ Dual-Core Module POWER5+ Quad-Core Module One Dual-Core-chip Two Dual-Core-chips plus two L3-cache-chips plus one L3-cache-chip Figure 1 DCM and QCM physical views and basic internal architecture © Copyright IBM Corp. 2006.
    [Show full text]
  • POWER® Processor-Based Systems
    IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
    [Show full text]
  • Openpower AI CERN V1.Pdf
    Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
    [Show full text]
  • Implementing Powerpc Linux on System I Platform
    Front cover Implementing POWER Linux on IBM System i Platform Planning and configuring Linux servers on IBM System i platform Linux distribution on IBM System i Platform installation guide Tips to run Linux servers on IBM System i platform Yessong Johng Erwin Earley Rico Franke Vlatko Kosturjak ibm.com/redbooks International Technical Support Organization Implementing POWER Linux on IBM System i Platform February 2007 SG24-6388-01 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. Second Edition (February 2007) This edition applies to i5/OS V5R4, SLES10 and RHEL4. © Copyright International Business Machines Corporation 2005, 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix The team that wrote this redbook. ix Become a published author . xi Comments welcome. xi Chapter 1. Introduction to Linux on System i platform . 1 1.1 Concepts and terminology . 2 1.1.1 System i platform . 2 1.1.2 Hardware management console . 4 1.1.3 Virtual Partition Manager (VPM) . 10 1.2 Brief introduction to Linux and Linux on System i platform . 12 1.2.1 Linux on System i platform . 12 1.3 Differences between existing Power5-based System i and previous System i models 13 1.3.1 Linux enhancements on Power5 / Power5+ . 14 1.4 Where to go for more information . 15 Chapter 2. Configuration planning . 17 2.1 Concepts and terminology . 18 2.1.1 Processor concepts .
    [Show full text]
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • IBM Powerpc 970 (A.K.A. G5)
    IBM PowerPC 970 (a.k.a. G5) Ref 1 David Benham and Yu-Chung Chen UIC – Department of Computer Science CS 466 PPC 970FX overview ● 64-bit RISC ● 58 million transistors ● 512 KB of L2 cache and 96KB of L1 cache ● 90um process with a die size of 65 sq. mm ● Native 32 bit compatibility ● Maximum clock speed of 2.7 Ghz ● SIMD instruction set (Altivec) ● 42 watts @ 1.8 Ghz (1.3 volts) ● Peak data bandwidth of 6.4 GB per second A picture is worth a 2^10 words (approx.) Ref 2 A little history ● PowerPC processor line is a product of the AIM alliance formed in 1991. (Apple, IBM, and Motorola) ● PPC 601 (G1) - 1993 ● PPC 603 (G2) - 1995 ● PPC 750 (G3) - 1997 ● PPC 7400 (G4) - 1999 ● PPC 970 (G5) - 2002 ● AIM alliance dissolved in 2005 Processor Ref 3 Ref 3 Core details ● 16(int)-25(vector) stage pipeline ● Large number of 'in flight' instructions (various stages of execution) - theoretical limit of 215 instructions ● 512 KB L2 cache ● 96 KB L1 cache – 64 KB I-Cache – 32 KB D-Cache Core details continued ● 10 execution units – 2 load/store operations – 2 fixed-point register-register operations – 2 floating-point operations – 1 branch operation – 1 condition register operation – 1 vector permute operation – 1 vector ALU operation ● 32 64 bit general purpose registers, 32 64 bit floating point registers, 32 128 vector registers Pipeline Ref 4 Benchmarks ● SPEC2000 ● BLAST – Bioinformatics ● Amber / jac - Structure biology ● CFD lab code SPEC CPU2000 ● IBM eServer BladeCenter JS20 ● PPC 970 2.2Ghz ● SPECint2000 ● Base: 986 Peak: 1040 ● SPECfp2000 ● Base: 1178 Peak: 1241 ● Dell PowerEdge 1750 Xeon 3.06Ghz ● SPECint2000 ● Base: 1031 Peak: 1067 Apple’s SPEC Results*2 ● SPECfp2000 ● Base: 1030 Peak: 1044 BLAST Ref.
    [Show full text]
  • IBM Power Systems Performance Report Apr 13, 2021
    IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+)
    [Show full text]
  • Progress Codes
    Power Systems Progress codes Power Systems Progress codes Note Before using this information and the product it supports, read the information in “Notices,” on page 109, “Safety notices” on page v, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. This edition applies to IBM Power Systems™ servers that contain the POWER6® processor and to all associated models. © Copyright IBM Corporation 2007, 2009. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Safety notices ............v Chapter 13. (CAxx) Partition firmware progress codes ...........79 Chapter 1. Progress codes overview . 1 Chapter 14. (CF00) Linux kernel boot Chapter 2. AIX IPL progress codes . 3 progress codes ...........91 Chapter 3. AIX diagnostic load Chapter 15. (D1xx) Service processor progress indicators .........29 firmware progress codes .......93 Chapter 4. Dump progress indicators Chapter 16. (D1xx) Service processor (dump status codes) .........33 status progress codes ........95 Chapter 5. AIX crash progress codes Chapter 17. (D1xx) Service processor (category 1) ............35 dump status progress codes .....97 Chapter 6. AIX crash progress codes Chapter 18. (D1xx) Platform dump (category 2) ............37 status progress codes .......101 Chapter 7. AIX crash progress codes Chapter 19. (D2xx) Partition status (category 3) ............39 progress codes ..........103 Chapter 8. (C1xx) Service processor Chapter 20. (D6xx) General status progress codes ...........41 progress codes ..........105 Chapter 9. (C2xx) Virtual service Chapter 21. (D9xx) General status processor progress codes ......63 progress codes ..........107 Chapter 10. (C3xx, C5xx, C6xx) IPL Appendix. Notices .........109 status progress codes ........67 Trademarks ..............110 Electronic emission notices .........111 Chapter 11.
    [Show full text]
  • Introduction to the Cell Multiprocessor
    Introduction J. A. Kahle M. N. Day to the Cell H. P. Hofstee C. R. Johns multiprocessor T. R. Maeurer D. Shippy This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization. The paper discusses the history of the project, the program objectives and challenges, the design concept, the architecture and programming models, and the implementation. Introduction: History of the project processors in order to provide the required Initial discussion on the collaborative effort to develop computational density and power efficiency. After Cell began with support from CEOs from the Sony several months of architectural discussion and contract and IBM companies: Sony as a content provider and negotiations, the STI (SCEI–Toshiba–IBM) Design IBM as a leading-edge technology and server company. Center was formally opened in Austin, Texas, on Collaboration was initiated among SCEI (Sony March 9, 2001. The STI Design Center represented Computer Entertainment Incorporated), IBM, for a joint investment in design of about $400,000,000. microprocessor development, and Toshiba, as a Separate joint collaborations were also set in place development and high-volume manufacturing technology for process technology development. partner. This led to high-level architectural discussions A number of key elements were employed to drive the among the three companies during the summer of 2000. success of the Cell multiprocessor design. First, a holistic During a critical meeting in Tokyo, it was determined design approach was used, encompassing processor that traditional architectural organizations would not architecture, hardware implementation, system deliver the computational power that SCEI sought structures, and software programming models.
    [Show full text]