Memory Subsystem

Total Page:16

File Type:pdf, Size:1020Kb

Memory Subsystem Power 7 Dan Christiani Kyle Wieschowski History 1980 - 2000 ● 1980 RISC Prototype ● 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) ● 1993 IBM launches 66MHz POWER2 (.35 um) ● 1997 POWER2 ‘Super Chip’ POWER1 History 1980 - 2000 ● 1980 RISC Prototype ● 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) ● 1993 IBM launches 66MHz POWER2 (.35 um) ● 1997 POWER2 ‘Super Chip’ ● 1998 POWER3 (.22 um) 64-bit (POWER2+PowerPC) History 2000-2007 ● 2001 POWER4 (180 nm) - Dual Core ● 2004 POWER5 (130 nm) - SMT ● 2006 POWER6 (65 nm) - High Frequency ○ 4.7 GHz - Dual Core ○ First server to hold all major benchmark records ○ 3x faster than the comparable Intel Itanium processor ● 2010 POWER7 (45 nm) - Cores, eDRAM Power 7 Architectural Focus ● Reduce core area and power ○ Frequency is lowered to reduce power ● Fit the chip in the same sockets as POWER6 ● Utilize the same SMP and I/O buses ○ At higher frequencies ● Remove external L3 cache chips ● Double floating-point capability of each core Architecture Overview ● 8 Cores ○ 12 execution units ○ Four-way SMT ○ Integrated L2 cache ● 2 Memory Controllers ○ 4 channels of DDR3 ● Shared L3 Cache ● 5 SMP Links ○ Allows 32 sockets The Core ● 6 Primary Units ○ IFU, ISU, LSU, FXU, VSU, and decimal FPU ● 12 Execution Units ○ 2 fixed point, 2 load/store, 4 double-precision, 1 vector, 1 branch, 1 decimal FP, 1 control register ● In a given cycle: ○ Fetch up to 8 instructions ○ Decode and dispatch up to 6 instructions ○ Issue and execute up to 8 instructions Instruction Fetch Unit (IFU) ● Feeds pipeline with most likely instructions ○ Based on branch prediction ● Maintains balance of instruction execution ○ Based on software-defined thread priority ● Decodes and groups instructions ● Executes branch instructions Instruction-Sequencing Unit (ISU) ● Dispatches instructions ○ As groups to a single thread ● Renames registers ● Completes instructions ○ Global Completion Table ○ As groups also ● Handles exception conditions ● In charge of flushing core Load/Store Unit (LSU) ● 2 symmetric LS execution pipelines (OoO) ○ 1 load or store operation each ● Dependencies: ○ 1 stall between load and FXU operations ○ 2 stalls between load and VSU operations ● Also executes FX add and logical instructions ● SRQ - 32 outstanding stores can be issued ● LRQ - 32 outstanding loads can be issued Fixed-Point Unit (FXU) ● Two identical pipelines ● Containing: ○ Multiport GPR file ○ ALU, Divider, and Multiplier ○ Rotator ○ Count leading zeros unit ○ Bit-select unit ○ Miscellaneous unit (to execute population count, parity, and binary-coded decimal assist instructions) Vector and Scalar Unit (VSU) ● Vector instructions for ○ Vector modification: e.g. Merge, Shift, ○ Load/Store ○ Arithmetic - no Divide ○ Floating Point Arithmetic- no divide Cache ● Private 32 KB Level 1 caches ○ Instruction Cache integrated with the IFU ○ Data Cache integrated with the LSU ● Private 256 KB Level 2 caches ○ 8-way set associative ● 32 MB Level 3 cache ○ 4 MB of Local L3 (comprised of 32 eDRAM macros) ○ 28 MB of Global L3 Memory Subsystem ● 2 Memory Controllers ○ Synchronous Region: ■ Services reads and writes ■ Arbitrates among conflicting requests ■ Manages coherence directory information ○ Asynchronous Region: ■ Manages traffic through channels/buffer chips ■ Schedules reads, writes, and maintenance ■ Balances utilization of resources The Future of Power POWER8 (mid-2014) ● 22 nm Design ● SMT8 ● 12 Core: ○ 10 Issue ○ 16 Execution Pipes ■ 2 FXU, 2 LSU, 2 LU ■ 4 FPU, 2 VMX ■ 1 Crypto, 1 DFU ■ 1 CR, 1 BR ○ 64 KB L1, external 128 MB L3 ○ 2x Estimated performance during max SMT OpenPower “The OpenPOWER Consortium brings together an ecosystem of hardware, system software, and enterprise applications that will provide powerful computing systems based on NVIDIA GPUs and POWER CPUs” POWER8 + Infrastructure + CUDA = NextGen DataCenter Questions? References http://www-05.ibm.com/cz/events/febannouncement2012/pdf/power_architecture.pdf https://www-950.ibm.com/events/wwe/grp/grp030.nsf/vLookupPDFs/Tour%20P8%20Charts/$file/Tour%20P8% 20Charts.pdf http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/ http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power2%20%28mpr%29.pdf.
Recommended publications
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • Introduction to the Cell Multiprocessor
    Introduction J. A. Kahle M. N. Day to the Cell H. P. Hofstee C. R. Johns multiprocessor T. R. Maeurer D. Shippy This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization. The paper discusses the history of the project, the program objectives and challenges, the design concept, the architecture and programming models, and the implementation. Introduction: History of the project processors in order to provide the required Initial discussion on the collaborative effort to develop computational density and power efficiency. After Cell began with support from CEOs from the Sony several months of architectural discussion and contract and IBM companies: Sony as a content provider and negotiations, the STI (SCEI–Toshiba–IBM) Design IBM as a leading-edge technology and server company. Center was formally opened in Austin, Texas, on Collaboration was initiated among SCEI (Sony March 9, 2001. The STI Design Center represented Computer Entertainment Incorporated), IBM, for a joint investment in design of about $400,000,000. microprocessor development, and Toshiba, as a Separate joint collaborations were also set in place development and high-volume manufacturing technology for process technology development. partner. This led to high-level architectural discussions A number of key elements were employed to drive the among the three companies during the summer of 2000. success of the Cell multiprocessor design. First, a holistic During a critical meeting in Tokyo, it was determined design approach was used, encompassing processor that traditional architectural organizations would not architecture, hardware implementation, system deliver the computational power that SCEI sought structures, and software programming models.
    [Show full text]
  • Power Architecture® ISA 2.06 Stride N Prefetch Engines to Boost Application's Performance
    Power Architecture® ISA 2.06 Stride N prefetch Engines to boost Application's performance History of IBM POWER architecture: POWER stands for Performance Optimization with Enhanced RISC. Power architecture is synonymous with performance. Introduced by IBM in 1991, POWER1 was a superscalar design that implemented register renaming andout-of-order execution. In Power2, additional FP unit and caches were added to boost performance. In 1996 IBM released successor of the POWER2 called P2SC (POWER2 Super chip), which is a single chip implementation of POWER2. P2SC is used to power the 30-node IBM Deep Blue supercomputer that beat world Chess Champion Garry Kasparov at chess in 1997. Power3, first 64 bit SMP, featured a data prefetch engine, non-blocking interleaved data cache, dual floating point execution units, and many other goodies. Power3 also unified the PowerPC and POWER Instruction set and was used in IBM's RS/6000 servers. The POWER3-II reimplemented POWER3 using copper interconnects, delivering double the performance at about the same price. Power4 was the first Gigahertz dual core processor launched in 2001 which was awarded the MicroProcessor Technology Award in recognition of its innovations and technology exploitation. Power5 came in with symmetric multi threading (SMT) feature to further increase application's performance. In 2004, IBM with 15 other companies founded Power.org. Power.org released the Power ISA v2.03 in September 2006, Power ISA v.2.04 in June 2007 and Power ISA v.2.05 with many advanced features such as VMX, virtualization, variable length encoding, hyper visor functionality, logical partitioning, virtual page handling, Decimal Floating point and so on which further boosted the architecture leadership in the market place and POWER5+, Cell, POWER6, PA6T, Titan are various compliant cores.
    [Show full text]
  • POWER8: the First Openpower Processor
    POWER8: The first OpenPOWER processor Dr. Michael Gschwind Senior Technical Staff Member & Senior Manager IBM Power Systems #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 OpenPOWER is about choice in large-scale data centers The choice to The choice to The choice to differentiate innovate grow . build workload • collaborative • delivered system optimized innovation in open performance solutions ecosystem • new capabilities . use best-of- • with open instead of breed interfaces technology scaling components from an open ecosystem Join the conversation at #OpenPOWERSummit Why Power and Why Now? . Power is optimized for server workloads . Power8 was optimized to simplify application porting . Power8 includes CAPI, the Coherent Accelerator Processor Interconnect • Building on a long history of IBM workload acceleration Join the conversation at #OpenPOWERSummit POWER8 Processor Cores • 12 cores (SMT8) 96 threads per chip • 2X internal data flows/queues • 64K data cache, 32K instruction cache Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 • Up to 128 MB eDRAM L4 (off-chip) Accelerators • Crypto & memory expansion • Transactional Memory • VMM assist • Data Move / VM Mobility • Coherent Accelerator Processor Interface (CAPI) Join the conversation at #OpenPOWERSummit 4 POWER8 Core •Up to eight hardware threads per core (SMT8) •8 dispatch •10 issue •16 execution pipes: •2 FXU, 2 LSU, 2 LU, 4 FPU, 2 VMX, 1 Crypto, 1 DFU, 1 CR, 1 BR •Larger Issue queues (4 x 16-entry) •Larger global completion, Load/Store reorder queue •Improved branch prediction •Improved unaligned storage access •Improved data prefetch Join the conversation at #OpenPOWERSummit 5 POWER8 Architecture . High-performance LE support – Foundation for a new ecosystem . Organic application growth Power evolution – Instruction Fusion 1600 PowerPC .
    [Show full text]
  • Power8 Quser Mspl Nov 2015 Handout
    IBM Power Systems Power Systems Hardware: Today and Tomorrow November 2015 Mark Olson [email protected] © 2015 IBM Corporation IBM Power Systems POWER8 Chip © 2015 IBM Corporation IBM Power Systems Processor Technology Roadmap POWER11 Or whatever it is POWER10 named Or whatever it is POWER9 named Or whatever it is named POWER7 POWER8 POWER6 45 nm 22 nm POWER5 65 nm 130 nm POWER4 90 nm 180 nm 130 nm 2001 2004 2007 2010 2014 Future 3 © 2015 IBM Corporation IBM Power Systems Processor Chip Comparisons POWER5 POWER6 POWER7 POWER7+ POWER8 2004 2007 2010 2012 45nm SOI 32nm SOI 22nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM eDRAM Compute Cores 2 2 8 8 12 Threads SMT2 SMT2 SMT4 SMT4 SMT8 Caching On-chip 1.9MB (L2) 8MB (L2) 2 + 32MB (L2+3) 2 + 80MB (L2+3) 6 + 96MB (L2+3) Off-chip 36MB (L3) 32MB (L3) None None 128MB (L4) Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s 4 © 2015 IBM Corporation IBM Power Systems Processor Designs POWER5+ POWER6 POWER7 POWER7+ POWER8 Max cores 4 2 8 8 12 Technology 90nm 65nm 45nm 32nm 22nm Size 245 mm2 341 mm2 567 mm2 567 mm2 650 mm2 * Transistors 276 M 790 M 1.2 B 2.1 B 4.2 B * 1.9 4 - 5 3 – 4 Up to 4.4 Up to 4.1 Frequencies GHz GHz GHz GHz GHz ** SMT (threads) 2 2 4 4 8 L2 Cache 1.9MB Shared 4MB / Core 256KB / core 256KB / core 512KB/core 4MB / Core 10MB / Core 8MB / Core L3 Cache 36MB 32MB On chip On chip On chip L4 Cache -- -- -- -- Up to 128MB Bandwidth Sust memory 15GB/s 30GB/s 100GB/s 100GB/s 230GB/s Peak I/O 6GB/s 20GB/s 40GB/s 40GB/s 96GB/s * with 12-core
    [Show full text]
  • I.T.S.O. Powerpc an Inside View
    SG24-4299-00 PowerPC An Inside View IBM SG24-4299-00 PowerPC An Inside View Take Note! Before using this information and the product it supports, be sure to read the general information under “Special Notices” on page xiii. First Edition (September 1995) This edition applies to the IBM PC PowerPC hardware and software products currently announced at the date of publication. Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address given below. An ITSO Technical Bulletin Evaluation Form for reader′s feedback appears facing Chapter 1. If the form has been removed, comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JLPC Building 014 Internal Zip 5220 1000 NW 51st Street Boca Raton, Florida 33431-1328 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright International Business Machines Corporation 1995. All rights reserved. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Abstract This document provides technical details on the PowerPC technology. It focuses on the features and advantages of the PowerPC Architecture and includes an historical overview of the development of the reduced instruction set computer (RISC) technology. It also describes in detail the IBM Power Series product family based on PowerPC technology, including IBM Personal Computer Power Series 830 and 850 and IBM ThinkPad Power Series 820 and 850.
    [Show full text]
  • The POWER4 Processor Introduction and Tuning Guide
    Front cover The POWER4 Processor Introduction and Tuning Guide Comprehensive explanation of POWER4 performance Includes code examples and performance measurements How to get the most from the compiler Steve Behling Ron Bell Peter Farrell Holger Holthoff Frank O’Connell Will Weir ibm.com/redbooks International Technical Support Organization The POWER4 Processor Introduction and Tuning Guide November 2001 SG24-7041-00 Take Note! Before using this information and the product it supports, be sure to read the general information in “Special notices” on page 175. First Edition (November 2001) This edition applies to AIX 5L for POWER Version 5.1 (program number 5765-E61), XL Fortran Version 7.1.1 (5765-C10 and 5765-C11) and subsequent releases running on an IBM ^ pSeries POWER4-based server. Unless otherwise noted, all performance values mentioned in this document were measured on a 1.1 GHz machine, then normalized to 1.3 GHz. Note: This book is based on a pre-GA version of a product and may not apply when the product becomes generally available. We recommend that you consult the product documentation or follow-on versions of this redbook for more current information. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
    [Show full text]
  • Powerpc Operating Environment Architecture Book III Version 2.01
    PowerPC Operating Environment Architecture Book III Version 2.01 December 2003 Manager: Joe Wetzel/Poughkeepsie/IBM Technical Content: Ed Silha/Austin/IBM Cathy May/Watson/IBM Brad Frey/Austin/IBM The following paragraph does not apply to the United Kingdom or any country or state where such provisions are inconsistent with local law. The specifications in this manual are subject to change without notice. This manual is provided “AS IS”. Interna- tional Business Machines Corp. makes no warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. International Business Machines Corp. does not warrant that the contents of this publication or the accompanying source code examples, whether individually or as one or more groups, will meet your requirements or that the publication or the accompanying source code examples are error-free. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. Address comments to IBM Corporation, Internal Zip 9630, 11400 Burnett Road, Austin, Texas 78758-3493. IBM may use or distribute whatever information you supply in any way it believes appropriate without incurring any obligation to you. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries: IBM PowerPC RISC/System 6000 POWER POWER2 POWER4 POWER4+ IBM System/370 Notice to U.S. Government Users—Documentation Related to Restricted Rights—Use, duplication or disclosure is subject to restrictions set fourth in GSA ADP Schedule Contract with IBM Corporation.
    [Show full text]
  • IBM POWER9 SMT Deep Dive Summit Training Workshop
    IBM POWER9 SMT Deep Dive Summit Training Workshop Brian Thompto POWER Systems, IBM Systems © 2018 IBM Corporation POWER9 Processor Performance Optimized for Open Interfaces for Cognitive Workloads Accelerated Computing New Core Microarchitecture 1st processor introduction of PCIeG4 Enhanced cache hierarchy Up to 120 MB / Chip 25G Coherent Link: On Chip Super-Highway Next-gen CAPI technology Connect Cores, Caches And Accelerators / GPU’s NVLINK2.0 for GPU attach 14nm silicon technology Family of Scale-out & Scale-up Optimized Offerings Dual Memory Subsystems optimized for Scale Out (latency/density) & Enterprise (capacity/bandwidth/RAS) 12 SMT8 or 24 SMT4 cores (96 threads) HiGh bandwidth scale-up fabric: 2-16 socket offerings with 2-4x chip-to-chip interconnect bandwidth © 2018 IBM Corporation 2 POWER9 – AC922 with 6 GPU’s SMP/Accelerator Signaling Memory Signaling Core Core Core Core Core Core Core Core L2 L2 L2 L2 L3 Region L3 Region L3 Region L3 Region L3 Region L3 Region L3 Region L3 Region L2 L2 L2 L2 Core Core Core Core Core Core Core Core PCIe Signaling PCIe On-Chip Accel SMP Signaling L3 Region L3 Region & Interconnect SMP L3 Region L3 Region L2 L2 Enablement Accelerator Chip L2 L2 - Core Core Core Core Off Core Core Core Core SMP/Accelerator Signaling Memory Signaling POWER9 Chip with 22 / 24 Active Cores Up to 88 Threads / Socket Images / diagrams modified from: "IBM POWER9 systems designed for commercial cognitive and cloud", IBM J. Res. & Dev., vol. 62, no. 4/5, 2018 "POWER9: Processor for the cognitive era", Proc. Hot Chips 28 Symp., pp. 1-19, Aug.
    [Show full text]
  • IBM POWER8 CPU Architecture
    POWER8 Jeff Stuecheli IBM Power Systems IBM Systems & Technology Group Development © 2013 International Business Machines Corporation 1 POWER7+ POWER7 2012 POWER6 2010 POWER5 2007 2004 45nm SOI 32nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM Compute Cores 2 2 8 8 Threads SMT2 SMT2 SMT4 SMT4 Caching On-chip 1.9MB 8MB 2 + 32MB 2 + 80MB Off-chip 36MB 32MB None None Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s Peak I/O 3GB/s 10GB/s 20GB/s 20GB/s © 2013 International Business Machines Corporation 2 POWER8 POWER7+ POWER7 2012 POWER6 2010 POWER5 2007 2004 45nm SOI 32nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM Compute Cores 2 2 8 8 Today’s Threads SMT2 SMT2 SMT4 SMT4 Topic Caching On-chip 1.9MB 8MB 2 + 32MB 2 + 80MB Off-chip 36MB 32MB None None Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s Peak I/O 3GB/s 10GB/s 20GB/s 20GB/s © 2013 International Business Machines Corporation 3 Leadership System Open System Performance Innovation Innovation • Increase core throughput • Higher capacity cache hierarchy • CAPI at single thread, SMT2, and highly threaded processor • Memory interface SMT4, and SMT8 level • Enhanced memory bandwidth, • Open system software • Large step in per socket capacity, and expansion performance • Flexible SMT • Enable more robust • Dynamic code optimization multi-socket scaling • Hardware-accelerated virtual memory management © 2013 International Business Machines Corporation 4 Technology • 22nm SOI, eDRAM, 15 ML 650mm2 Caches Cores • 512 KB SRAM L2 / core • 12 cores (SMT8) • 96 MB eDRAM shared L3 • 8 dispatch, 10 issue, Local SMP Links SMP Local • Up to 128 MB eDRAM L4 Accelerators 16 exec pipe Core Core Core Core Core Core (off-chip) • 2X internal data flows/queues L2 L2 L2 L2 L2 L2 Memory 8M L3 • Enhanced prefetching Region • Up to 230 GB/s • 64K data cache, Mem .
    [Show full text]
  • POWER Processor
    POWER Processor Technology Overview Myron Slota POWER Systems, IBM Systems © 2017 IBM Corporation Quarter Century of POWER 22nm Legacy of Leadership Innovation 45/32nm Driving Client Value 65nm POWER8 0.18um 0.25um 130/90nm POWER7/7+ 0.35um Business 0.5um RS64IV Sstar 180/130nm POWER6 0.5um RS64III Pulsar RS64II North Star 0.5um POWER5/5+ RS64I Apache 0.22um Cobra A10 Muskie POWER4/4+ A35 Modern UNIX Era 0.35um Workstation POWER3 -630 0.72um POWER2 P2SC 1.0um RSC 0.25um POWER1 0.35um PC 0.6um 604e 603 601 1990 1995 2000 2005 2010 2015 © 2017 IBM Corporation 2 IBM Optimized Semiconductor Technology World class technology with value-added features for server business. POWER9 is built on 14nm finFET technology transitioned to Global Foundaries 17-layer copper wire Silicon On Insulator On-chip eDRAM (14nm) -Faster Transistor, Less Noise - 6x latency improvement - No off-chip signaling required - 8x bandwidth improvement - 3x less area than SRAM - 5x less energy than SRAM Dense interconnect - Faster connections - Low latency distance paths - High density complex circuits - 2X wire per transistor DT DT eDRAM Cell “IBM is committed to meeting the rising demands of cognitive systems and cloud computing. GF’s leading performance in 7LP process technology, reflecting our joint Research collaboration, will allow IBM Power and Mainframe systems to push beyond limitations to provide high-performance computing solutions while aggressively pursuing 5nm to advance our leadership for years to come.” Tom Rosamilia, Senior Vice President, IBM Systems © 2017
    [Show full text]