Architecting for Power Management: the IBM® POWER7™ Approach

Total Page:16

File Type:pdf, Size:1020Kb

Architecting for Power Management: the IBM® POWER7™ Approach Architecting for Power Management: The IBM® POWER7™ Approach Malcolm Ware*, Karthick Rajamani*, Michael Floyd§, Bishop Brock§, Juan C Rubio*, Freeman Rawson*, John B Carter* §IBM Systems and Technology Group, *IBM Research Austin {mware,karthick,mfloyd,bcbrock,rubioj,frawson,retrac}@us.ibm.com Abstract 32KB 4-way set associative L1 I-cache and a 32KB 8- way set associative L1 D-cache, a private per-core The POWER7 processor is the newest member of 256KB L2 cache and a 4MB portion of the shared the IBM POWER® family of server processors. With 32MB L3 cache. The L2 is fully inclusive of both the greater than 4X the peak performance and the same local D/I L1 caches. The L3 cache exploits embedded power budget as the previous generation POWER6®, DRAM technology [8] to maximize area and power POWER7 will deliver impressive energy-efficiency efficiency. The clock frequency of each core chiplet boosts. The improved peak energy-efficiency is may be independently (and asynchronously to the accompanied by a wide array of new features in the fabric) controlled via an innovative new digital PLL processor and system designs that advance IBM’s circuit. EnergyScale™ dynamic power management methodology. This paper provides an overview of these new features, which include better sensing, more advanced power controls, improved scalability for power management, and features to address the diverse needs of the full range of POWER servers from blades to supercomputers. We also highlight three challenges that need attention from a range of systems design and research teams: (i) power management in highly virtualized environments, (ii) power (in)efficiency of systems software and applications, and (iii) memory power costs, especially for servers with large memory footprints. 1. Introduction Figure 1: The IBM POWER7 Processor The POWER7 processor is the next generation server processor in the IBM POWER family. Each Like POWER6, POWER7 integrates memory POWER7 chip is 567mm2, contains 1.2 billion controllers on chip. It has full support for cache transistors, and is built in IBM’s 45 nm (12s CMOS) coherence for large SMP configurations and is Cu SOI technology. It is designed to offer better designed to be used in a wide range of servers, ranging performance, more opportunities for parallelism, and from blades to very large commercial configurations higher levels of power efficiency than its predecessors. and supercomputers. It is an 8-core design with each core having up to 4 Starting with POWER6 [1], IBM’s POWER SMT threads, where a core can run in single-threaded, family machines offer an array of power management SMT2 or full SMT4 mode. Cores are out-of-order, capabilities collectively known as EnergyScale [2]. allowing them to maximize instruction-level These capabilities include collection of power and parallelism. performance measurements and a selection of power Each core region, known as a “chiplet,” contains a management modes, including: (i) static power save, 987-1-4244-5659-8/09/$25.00 ©2009 IEEE which reduces power at a fixed performance cost; (ii) designed to have lower latency than in POWER6, dynamic power save, which constantly adjusts core where it was the only idle power mode supported. The frequencies to exploit opportunities to save power with Nap state is entered whenever the hypervisor has a minimal impact on performance; and (iii) a maximum executed a power-save instruction on all threads, with performance variant, which exploits available power at least one thread executing a Nap instruction. In the and thermal headroom to boost performance. Nap state, all of the execution units in a core and the EnergyScale also supports power capping, which L1 cache are clocked off; however the higher level ensures the safe and reliable operation of the servers caches and certain timing facilities remain functional, within user-set power limits regardless of workload allowing low-latency workload resumption in the event transients. of timer or external interrupts. Core RAS and EnergyScale relies on a combination of features configuration registers remain accessible to firmware provided by the components of the POWER7-based during Nap. system. It is primarily controlled by a dedicated The Nap state by itself provides modest power microcontroller, the Thermal and Power Management reduction over a software idle loop. Further, the Device (TPMD), which operates under the control of hardware supports the option of automatically the Flexible Support Processor (FSP) present on all lowering cache frequencies while in the Nap state. POWER family servers. The TPMD implements the This feature provides a significant reduction in power power management policies for the system. In larger, for napping core chiplets, at the potential expense of multi-node POWER7 machines, where each node is increased access latency for shared data requested by a processor-memory-power delivery complex in its own non-napping core from a napping cache (L2/L3). The right, there is one TPMD for every node in the system. latency from the presentation of an interrupt to a Hard real-time firmware running on the TPMD napping core to the first instruction completion after implements the selected power management policies Nap exit is typically less than 5µs. Instruction and collects data for reporting purposes. execution begins immediately upon wakeup regardless The POWER7 offers a variety of new features in of whether the frequency was dropped while in Nap. If support of EnergyScale that represent a dramatic the frequency was dropped for Nap, it is slewed back improvement over those found on the POWER6. These up to the operating point set by the TPMD firmware new features allow POWER7 machines to offer more while instruction execution resumes at Nap exit. energy savings, better energy proportionality, and finer control over the various components of the system. 2.2. Sleep This paper describes the features and their uses. Sleep is a new architectural feature introduced in 2. Architected Processor Idle Modes POWER7. It is a lower-power, higher-latency standby state intended for cores that the hypervisor/OS predicts The POWER architecture specifications [6] will be unused for an extended period of time. The supported by POWER7 defines four power-saving Sleep state is entered when every thread on a core modes that can provide a continuum of power savings executes a Sleep instruction. Upon entering Sleep, versus latency and software impact. POWER7 hardware state machines purge all data from the core implements two of these modes, Nap and Sleep. Both and caches before completely clocking off the entire Nap and Sleep are hypervisor-privileged modes that chiplet. A small logic macro associated with the core maintain few of the architected processor resources; chiplet remains awake to handle external interrupts that exit from a power saving instruction is similar to a wake the core out of sleep. thread-level reset. Power-saving instructions that When all cores in a POWER7 processor enter the trigger the entry to these modes also cause dynamic Sleep state, the voltage supplied to the core chiplets SMT mode switching. The core-level power-saving can be automatically lowered down up to a retention modes described below are only activated when every level, a non-operational voltage sufficient only to thread in the core has executed a Nap or Sleep maintain static configuration data in the latches and instruction. arrays. This mode provides the lowest standby power for a POWER7 processor. Note that firmware running 2.1. Nap on the FSP or TPMD can temporarily restore operational clocks and voltages to a Sleeping chiplet at Nap is a processor low-power state designed for any time for maintenance operations, e.g., to access short processor idle periods. Nap in POWER7 is RAS or configuration registers, without restarting instructions on that core. The latency for entering Sleep varies based on the eDRAM) cells, while a slightly lower corresponding system design and workload configuration. Sleep exit Vdd voltage significantly reduces leakage for the latency for a single core is typically less than 1ms, and majority of logic circuits in the core chiplet that is dominated by the time required to re-initialize the L3 perform computation and maintain data coherence. cache eDRAM. Chip-level Processor Sleep exit is Table 1: Major POWER7 voltage domains dominated by voltage change latency from the retention voltage, which varies by system depending Rail Type Use on the voltage control scheme used in the system design. Vdd Dynamic CPU core, cache logic Figure 2 shows the relative power reductions for Vcs Dynamic Cache arrays, other SRAM Nap and Sleep as measured on a small, early sample set of POWER7 processors. For these measurements Vio Fixed Interconnect logic and I/O the minimum frequency used, fmin, was about 46% of Vmem Fixed Memory controller I/O the maximum frequency used, fmax. Vmax and Vmin refer to the set of voltages corresponding to those frequency levels and Vret to the set of voltages 3.1. DPLL (Digital Phase-Locked Loop) corresponding to the retention level. 120% Scenario 1 Scenario 2 scenario 100% Average power in idle modes Energy savings Energy savings using using fmin for fmin for cores with low napping cores load 100% 80% 90% 80% 60% 70% 40% 60% 50% 20% 40% Normalized power 30% 0% Power normalized to common frequence case for each case for Power normalized to common frequence Common Frequency Per-core Frequency Common Frequency Per-core Frequency 20% 1 Core busy, 7 Napping 1 Core busy, 7 at low load 10% 0% Figure 3: Better energy reduction from DPLL idle nap nap at fmin sleep at Vmax sleep at Vmin sleep at Vret with per-core frequency scaling. Figure 2: Comparison of processor idle modes POWER7 for the first time allows cores in a 3.
Recommended publications
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • POWER® Processor-Based Systems
    IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
    [Show full text]
  • Openpower AI CERN V1.Pdf
    Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
    [Show full text]
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • Progress Codes
    Power Systems Progress codes Power Systems Progress codes Note Before using this information and the product it supports, read the information in “Notices,” on page 109, “Safety notices” on page v, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. This edition applies to IBM Power Systems™ servers that contain the POWER6® processor and to all associated models. © Copyright IBM Corporation 2007, 2009. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Safety notices ............v Chapter 13. (CAxx) Partition firmware progress codes ...........79 Chapter 1. Progress codes overview . 1 Chapter 14. (CF00) Linux kernel boot Chapter 2. AIX IPL progress codes . 3 progress codes ...........91 Chapter 3. AIX diagnostic load Chapter 15. (D1xx) Service processor progress indicators .........29 firmware progress codes .......93 Chapter 4. Dump progress indicators Chapter 16. (D1xx) Service processor (dump status codes) .........33 status progress codes ........95 Chapter 5. AIX crash progress codes Chapter 17. (D1xx) Service processor (category 1) ............35 dump status progress codes .....97 Chapter 6. AIX crash progress codes Chapter 18. (D1xx) Platform dump (category 2) ............37 status progress codes .......101 Chapter 7. AIX crash progress codes Chapter 19. (D2xx) Partition status (category 3) ............39 progress codes ..........103 Chapter 8. (C1xx) Service processor Chapter 20. (D6xx) General status progress codes ...........41 progress codes ..........105 Chapter 9. (C2xx) Virtual service Chapter 21. (D9xx) General status processor progress codes ......63 progress codes ..........107 Chapter 10. (C3xx, C5xx, C6xx) IPL Appendix. Notices .........109 status progress codes ........67 Trademarks ..............110 Electronic emission notices .........111 Chapter 11.
    [Show full text]
  • IBM Power Roadmap
    POWER6™ Processor and Systems Jim McInnes [email protected] Compiler Optimization IBM Canada Toronto Software Lab © 2007 IBM Corporation All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only Role .I am a Technical leader in the Compiler Optimization Team . Focal point to the hardware development team . Member of the Power ISA Architecture Board .For each new microarchitecture I . help direct the design toward helpful features . Design and deliver specific compiler optimizations to enable hardware exploitation 2 IBM POWER6 Overview © 2007 IBM Corporation All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only POWER5 Chip Overview High frequency dual-core chip . 8 execution units 2LS, 2FP, 2FX, 1BR, 1CR . 1.9MB on-chip shared L2 – point of coherency, 3 slices . On-chip L3 directory and controller . On-chip memory controller Technology & Chip Stats . 130nm lithography, Cu, SOI . 276M transistors, 389 mm2 . I/Os: 2313 signal, 3057 Power/Gnd 3 IBM POWER6 Overview © 2007 IBM Corporation All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only POWER6 Chip Overview SDU RU Ultra-high frequency dual-core chip IFU FXU . 8 execution units L2 Core0 VMX L2 2LS, 2FP, 2FX, 1BR, 1VMX QUAD LSU FPU QUAD . 2 x 4MB on-chip L2 – point of L3 coherency, 4 quads L2 CNTL CNTL . On-chip L3 directory and controller . Two on-chip memory controllers M FBC GXC M Technology & Chip stats C C .
    [Show full text]
  • Power Architecture® ISA 2.06 Stride N Prefetch Engines to Boost Application's Performance
    Power Architecture® ISA 2.06 Stride N prefetch Engines to boost Application's performance History of IBM POWER architecture: POWER stands for Performance Optimization with Enhanced RISC. Power architecture is synonymous with performance. Introduced by IBM in 1991, POWER1 was a superscalar design that implemented register renaming andout-of-order execution. In Power2, additional FP unit and caches were added to boost performance. In 1996 IBM released successor of the POWER2 called P2SC (POWER2 Super chip), which is a single chip implementation of POWER2. P2SC is used to power the 30-node IBM Deep Blue supercomputer that beat world Chess Champion Garry Kasparov at chess in 1997. Power3, first 64 bit SMP, featured a data prefetch engine, non-blocking interleaved data cache, dual floating point execution units, and many other goodies. Power3 also unified the PowerPC and POWER Instruction set and was used in IBM's RS/6000 servers. The POWER3-II reimplemented POWER3 using copper interconnects, delivering double the performance at about the same price. Power4 was the first Gigahertz dual core processor launched in 2001 which was awarded the MicroProcessor Technology Award in recognition of its innovations and technology exploitation. Power5 came in with symmetric multi threading (SMT) feature to further increase application's performance. In 2004, IBM with 15 other companies founded Power.org. Power.org released the Power ISA v2.03 in September 2006, Power ISA v.2.04 in June 2007 and Power ISA v.2.05 with many advanced features such as VMX, virtualization, variable length encoding, hyper visor functionality, logical partitioning, virtual page handling, Decimal Floating point and so on which further boosted the architecture leadership in the market place and POWER5+, Cell, POWER6, PA6T, Titan are various compliant cores.
    [Show full text]
  • POWER8: the First Openpower Processor
    POWER8: The first OpenPOWER processor Dr. Michael Gschwind Senior Technical Staff Member & Senior Manager IBM Power Systems #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 OpenPOWER is about choice in large-scale data centers The choice to The choice to The choice to differentiate innovate grow . build workload • collaborative • delivered system optimized innovation in open performance solutions ecosystem • new capabilities . use best-of- • with open instead of breed interfaces technology scaling components from an open ecosystem Join the conversation at #OpenPOWERSummit Why Power and Why Now? . Power is optimized for server workloads . Power8 was optimized to simplify application porting . Power8 includes CAPI, the Coherent Accelerator Processor Interconnect • Building on a long history of IBM workload acceleration Join the conversation at #OpenPOWERSummit POWER8 Processor Cores • 12 cores (SMT8) 96 threads per chip • 2X internal data flows/queues • 64K data cache, 32K instruction cache Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 • Up to 128 MB eDRAM L4 (off-chip) Accelerators • Crypto & memory expansion • Transactional Memory • VMM assist • Data Move / VM Mobility • Coherent Accelerator Processor Interface (CAPI) Join the conversation at #OpenPOWERSummit 4 POWER8 Core •Up to eight hardware threads per core (SMT8) •8 dispatch •10 issue •16 execution pipes: •2 FXU, 2 LSU, 2 LU, 4 FPU, 2 VMX, 1 Crypto, 1 DFU, 1 CR, 1 BR •Larger Issue queues (4 x 16-entry) •Larger global completion, Load/Store reorder queue •Improved branch prediction •Improved unaligned storage access •Improved data prefetch Join the conversation at #OpenPOWERSummit 5 POWER8 Architecture . High-performance LE support – Foundation for a new ecosystem . Organic application growth Power evolution – Instruction Fusion 1600 PowerPC .
    [Show full text]
  • IBM System P 570 with New POWER6 Processor Increases Bandwidth and Capacity
    IBM United States Announcement 107-288, dated May 22, 2007 IBM System p 570 with new POWER6 processor increases bandwidth and capacity Description .................................................2 At a glance Statement of general direction .................12 Publications ............................................. 14 The System p 570 mid-range server is designed to deliver outstanding Services ...................................................16 price/performance, mainframe-inspired reliability and availability Technical information ...............................16 features, flexible capacity upgrades, and innovative virtualization IBM Electronic Services ...........................25 technologies. The System p 570 features: Terms and conditions .............................. 26 • Powerful new IBM POWER6 processors Prices .......................................................29 • 2-, 4-, 8-, 12-, and 16-core configurations Order now ................................................42 • Three processor frequency choices: 3.5 GHz, 4.2 GHz, and 4.7 GHz • Up to 768 GB of DDR2 memory • Memory frequencies of up to 667 MHz • Six Serial Attached SCSI (SAS) DASD drives per enclosure (24 max per system) • Up to 7.2 TB of internal disk storage • Modular rack-mount design with one to four CEC enclosures • Seven I/O expansion slots per enclosure (28 max per system) • Dynamic logical partitions — up to 160 per system (optional) • Innovative on demand features for both processors and memory • Integrated Virtual Ethernet ports — select from 1 Gb and 10 Gb options For ordering, contact: Your IBM representative or the Americas Call Centers at 800-IBM-CALL Reference: YE001 Overview The innovative System p™ 570 with POWER6™ dual-core processors is a 2- to 16-core SMP, rack-mounted server. The new POWER6 processors in this server are 64-bit, dual cores on a single chip module, with 32 MB of L3 cache, 8 MB of L2 cache, and 12 DDR2 memory DIMM slots.
    [Show full text]
  • Power8 Quser Mspl Nov 2015 Handout
    IBM Power Systems Power Systems Hardware: Today and Tomorrow November 2015 Mark Olson [email protected] © 2015 IBM Corporation IBM Power Systems POWER8 Chip © 2015 IBM Corporation IBM Power Systems Processor Technology Roadmap POWER11 Or whatever it is POWER10 named Or whatever it is POWER9 named Or whatever it is named POWER7 POWER8 POWER6 45 nm 22 nm POWER5 65 nm 130 nm POWER4 90 nm 180 nm 130 nm 2001 2004 2007 2010 2014 Future 3 © 2015 IBM Corporation IBM Power Systems Processor Chip Comparisons POWER5 POWER6 POWER7 POWER7+ POWER8 2004 2007 2010 2012 45nm SOI 32nm SOI 22nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM eDRAM Compute Cores 2 2 8 8 12 Threads SMT2 SMT2 SMT4 SMT4 SMT8 Caching On-chip 1.9MB (L2) 8MB (L2) 2 + 32MB (L2+3) 2 + 80MB (L2+3) 6 + 96MB (L2+3) Off-chip 36MB (L3) 32MB (L3) None None 128MB (L4) Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s 4 © 2015 IBM Corporation IBM Power Systems Processor Designs POWER5+ POWER6 POWER7 POWER7+ POWER8 Max cores 4 2 8 8 12 Technology 90nm 65nm 45nm 32nm 22nm Size 245 mm2 341 mm2 567 mm2 567 mm2 650 mm2 * Transistors 276 M 790 M 1.2 B 2.1 B 4.2 B * 1.9 4 - 5 3 – 4 Up to 4.4 Up to 4.1 Frequencies GHz GHz GHz GHz GHz ** SMT (threads) 2 2 4 4 8 L2 Cache 1.9MB Shared 4MB / Core 256KB / core 256KB / core 512KB/core 4MB / Core 10MB / Core 8MB / Core L3 Cache 36MB 32MB On chip On chip On chip L4 Cache -- -- -- -- Up to 128MB Bandwidth Sust memory 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s * with 12-core
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • KVM on POWER7
    Developments in KVM on Power Paul Mackerras, IBM LTC OzLabs [email protected] Outline . Introduction . Little-endian support . OpenStack . Nested virtualization . Guest hotplug . Hardware error detection and recovery 2 21 October 2013 © 2013 IBM Introduction • We will be releasing POWER® machines with KVM – Announcement by Arvind Krishna, IBM executive • POWER8® processor disclosed at Hot Chips conference – 12 cores per chip, 8 threads per core – 96kB L1 cache, 512kB L2 cache, 8MB L3 cache per core on chip u u Guest Guest m host OS process m e e OS OS q host OS process q host OS process KVM Host Linux Kernel SAPPHIRE FSP POWER hardware 3 21 October 2013 © 2013 IBM Introduction • “Sapphire” firmware being developed for these machines – Team led by Ben Herrenschmidt – Successor to OPAL • Provides initialization and boot services for host OS – Load first-stage Linux kernel from flash – Probe the machine and set up device tree – Petitboot bootloader to load and run the host kernel (via kexec) • Provides low-level run-time services to host kernel – Communication with the service processor (FSP) • Console • Power and reboot control • Non-volatile memory • Time of day clock • Error logging facilities – Some low-level error detection and recovery services 4 21 October 2013 © 2013 IBM Little-endian Support • Modern POWER CPUs have a little-endian mode – Instructions and multi-byte data operands interpreted in little-endian byte order • Lowest-numbered byte is least significant, rather than most significant – “True” little endian, not address swizzling
    [Show full text]