The IBM Power Micro-Architecture 1 Introduction

Total Page:16

File Type:pdf, Size:1020Kb

The IBM Power Micro-Architecture 1 Introduction The IBM Power Micro-architecture Report for COMP9244: Software View of Processor Architectures Godfrey van der Linden 2006-08-01 Abstract The IBM PowerPC instruction set architecture and the implementations of it have pow- ered many different computer systems. It is a second generation RISC design that incorpo- rates many instruction extensions designed to ease the generation of quality code by modern compilers. The RISC design lends itself to scaling from very small implementations designed for embedded applications, through super–computers, to standalone desktop and server ma- chines. Although the design is fundamentally RISC, the IBM designers have explored much of the landscape of modern computer architectures, from large super–scalar processors down to tiny single issue pipelined processor cells, suitable for use in custom designed ASICs. This paper explores the history of the architecture and some of the unusual instruction set choices that the original PowerPC designers made, before investigating some of the PowerPC im- plementations. Modern PowerPC processors, such as the POWER4, have very powerful and general supervisor and hypervisor mode functionality. The later sections of this paper discusses operating system design and some of the some of the issues posed by the POWER4 architecture. 1 Introduction as mobile phones. In this introduction we shall briefly discuss the development history of the PowerPC(PPC) Continuing with IBM’s long history of com- architecture and a quick survey of the PPC ex- puter design innovation, the PowerPC archi- tensions to traditional RISC architectures. In tecture and the various implementations of it, two middle sections will discuss the large main- such as the POWER5, are worthy representa- frame desktop implementations and followed tives of their world class design. IBM lead the by the smaller embedded processor variations. way with many performance innovations in the Finally, in Section 4, issues and design features 1950s and ’60s; it is interesting that so many of will be discussed from the viewpoint of an op- the technologies that they invented have been erating system designer. of direct use in their RISC architectures some In this document it is assumed that the 40 years later. reader is familiar with both modern RISC pro- Apple Computer shocked the personal com- cessor architectures and also with operating puter world when they announced their inten- system design issues. tion of switching to the Intel Core Duo line of processors. IBM and Freescale seem to be un- 1.1 History of the PowerPC concerned, certainly IBM will be shipping as many of these processors as their fabricators In 1985 IBM started working on a second can make attempting to satisfy the insatiable generation RISC architecture, known as the gaming console market. In a remarkable coup “AMERICA architecture”, [CM90]. In 1986 both MicroSoft’s XBox 360 and Sony PlaySta- the development on the RS/6000 was started, tion 3 will be powered by PowerPC chips. which was the first implementation of the Freescale is concentrating their design efforts AMERICA architecture, in February 1990, the with low power signal processing chips aim- first RS/6000 machines shipped, [IBM01] with ing for the handheld embedded market, such a renamed POWER architecture. At the end 1 of development of the RS/6000 series IBM had 1.2.1 Branch registers: Link and Count reduced the chip count of the CPU from eleven The PowerPC ISA uses a dedicated link regis- chips down to a single chip in the low end ma- ter, , to save the return address for a function chines. lr call. It is the callees responsibility to save the In the early ’90s Apple and IBM were col- lr if it isn’t a leaf function. Using a special reg- laborating on a number of projects. During ister for the link register can make the return this time IBM approached Apple to collabo- jump faster since the hardware may not need rate on the development of a family of single– to go through a register read pipeline stage for chip processors based on the POWER archi- function returns. tecture. Soon after, Apple asked Motorola to The ISA also has a dedicated register join the collaboration to leverage Motorola’s for controlling loops, the count register ctr. experience in manufacturing high–volume pro- This register is automatically decremented and cessors. This collaboration was known as the tested against 0 by the branch conditional in- 1 AIM alliance; the result was a modification of structions. By using this special register the POWER, known as the PowerPC instruction branch processor can determine the result of set architecture(ISA). PowerPC added single– branch early in the pipeline. precision floating point, general register–to– In the PowerPC architecture the branch register multiply and divide and removed some and instruction fetch units are closely associ- POWER instructions, such as MQ register ated, in fact he branch unit is split into two based multiplies. PowerPC also added a par- with the branch prediction unit very close — allel 64–bit instruction set mode. in terms of clock cycles — to the fetch unit. The 1993, the POWER 2 was released The lr and ctr registers can be implemented and the POWER3 was released in 1997. The in the part of the branch unit that is near the POWER3 was the first implementation of fetch unit and the instruction set can the use the full 32 and 64–bit PowerPC ISA. The these registers to implement indirect function POWER4 was released in 2001 and has be- calls. See [HP03, Appendix C] for details. come the basis for later development of rela- tively large systems. The PowerPC 970(2002) 1.2.2 Load and Store extensions and POWER5(2005) extending the POWER4 micro-architecture. Atomic update The Load and Re- serve(lwarx) and Store Condi- tional(stwcx) instructions provide the PowerPC shared–storage atomic–update 1.2 PowerPC instruction set archi- mechanism. Lwarx instruction loads a tecture value from a memory location and marks that location as reserved. The subse- The PowerPC instruction set architecture is quent stwcx stores a new value, if the part of a group of second generation RISC ISAs reservation is still valid. A reservation that were developed in the late ’80s and early is invalidated when another processor ’90s. This document will concentrate on those successfully commits a stwcx instruc- parts of the PowerPC ISA that are unconven- tion. Using this mechanism a general tional. In most ways the PowerPC ISA is con- atomic operation suite can be built such ventionally RISC in that only load and store as atomic increment, compare and swap operations are used to access memory; that and test and set. See [WSM+5b]. is, there are no memory operand addressing modes. However unlike many RISC instruction With update Indexed load and store instruc- sets, the PowerPC has some extended instruc- tions have a ’with update’ variant, which tions that are capable of modifying multiple updates the source index register with target registers or are inherently multi-cycle in the computed effective address after the nature. load or store is initiated. The load in- 1(A)pple, (I)BM and (M)otorola 2 struction will modify two registers; some- use of memory barriers. However the de- thing that is unusual for RISC instruc- tails are so complex and difficult to explain tion sets. This variation is essentially a that the average programmer is left behind. convenience to reduce the number of in- Which means that some system programmers structions executed inside a loop itera- may utilise sophisticated synchronisation tech- tion. See [WSM+5a]. niques internally in the operating system, but probably needs to publish quite conservative, Multiple Another pair of convenience in- i.e. slow, services to external non-kernel pro- structions are the load and store multi- grammers. ple register instructions lmw/stmw. These In PowerPC ISA there are four data instructions are intended for the pream- synchronisation instructions: isync, eieio, ble/postamble of functions to save and lwsync and sync. restore non-volatile registers, as de- fined by the application binary interface isync Instruction synchronise This instruc- (ABI ). These instruction do not really tion insures that all preceding instruc- increase program performance directly tions are completed before the isync as they still take one cycle per regis- completes and stops all subsequent in- ter to implement but rather are used to structions from scheduling until comple- increase the instruction stream density. tion. It also invalidates any instructions See [WSM+5a]. in the I-cache with respect to any out- String/misaligned A variation of the standing instruction cache block invali- Load/Store Multiple, the Load/Store dations, icbi, from the bus. In lock string instruction. These instructions acquisition, this instruction is used im- load a series of words from an arbitrary mediately after the stwcx complete test byte address in to or out of multiple reg- to guarantee that no later instructions isters. See [WSM+5a]. are executed speculatively until the re- sults are known. In general isync can Byte reversed The load/store word/halfword be thought of as a load barrier. instructions all support a byte swapped variant, where the (half–)word at the eieio Enforce in–order execution of I/O The specified location can be byte–swapped behavior of this instruction depends on on the way to or from memory. As many the type of caching enabled in preced- of the POWER implementations have ing loads and stores. Memory mapped switchable endianess these instructions device registers are mapped cache– are used to efficiently work with cross– inhibited/guarded (see section ??). For endian integers. See [WSM+5a]. this type of mapped register operation, this instruction guarantees load and store 1.2.3 Shared Storage ordering such that all preceding loads and stores complete to main memory be- The PowerPC specifies a weakly consistent fore the eieio completes.
Recommended publications
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • Ray Tracing on the Cell Processor
    Ray Tracing on the Cell Processor Carsten Benthin† Ingo Wald Michael Scherbaum† Heiko Friedrich‡ †inTrace Realtime Ray Tracing GmbH SCI Institute, University of Utah ‡Saarland University {benthin, scherbaum}@intrace.com, [email protected], [email protected] Abstract band”) architectures1 to hide memory latencies, at exposing par- Over the last three decades, higher CPU performance has been allelism through SIMD units, and at multi-core architectures. At achieved almost exclusively by raising the CPU’s clock rate. Today, least for specialized tasks such as triangle rasterization, these con- the resulting power consumption and heat dissipation threaten to cepts have been proven as very powerful, and have made modern end this trend, and CPU designers are looking for alternative ways GPUs as fast as they are; for example, a Nvidia 7800 GTX offers of providing more compute power. In particular, they are looking 313 GFlops [16], 35 times more than a 2.2 GHz AMD Opteron towards three concepts: a streaming compute model, vector-like CPU. Today, there seems to be a convergence between CPUs and SIMD units, and multi-core architectures. One particular example GPUs, with GPUs—already using the streaming compute model, of such an architecture is the Cell Broadband Engine Architecture SIMD, and multi-core—become increasingly programmable, and (CBEA), a multi-core processor that offers a raw compute power CPUs are getting equipped with more and more cores and stream- of up to 200 GFlops per 3.2 GHz chip. The Cell bears a huge po- ing functionalities. Most commodity CPUs already offer 2–8 tential for compute-intensive applications like ray tracing, but also cores [1, 9, 10, 25], and desktop PCs with more cores can be built requires addressing the challenges caused by this processor’s un- by stacking and interconnecting smaller multi-core systems (mostly conventional architecture.
    [Show full text]
  • IBM System P5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction
    Giuliano Anselmi Redpaper Bernard Filhol SahngShin Kim Gregor Linzmeier Ondrej Plachy Scott Vetter IBM System p5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction The quad-core module (QCM) is based on the well-known POWER5™ dual-core module (DCM) technology. The dual-core POWER5 processor and the dual-core POWER5+™ processor are packaged with the L3 cache chip into a cost-effective DCM package. The QCM is a package that enables entry-level or midrange IBM® System p5™ servers to achieve additional processing density without increasing the footprint. Figure 1 shows the DCM and QCM physical views and the basic internal architecture. to I/O to I/O Core Core 1.5 GHz 1.5 GHz Enhanced Enhanced 1.9 MB MB 1.9 1.9 MB MB 1.9 L2 cache L2 L2 cache switch distributed switch distributed switch 1.9 GHz POWER5 Core Core core or CPU or core 1.5 GHz L3 Mem 1.5 GHz L3 Mem L2 cache L2 ctrl ctrl ctrl ctrl Enhanced distributed Enhanced 1.9 MB Shared Shared MB 1.9 Ctrl 1.9 GHz L3 Mem Ctrl POWER5 core or CPU or core 36 MB 36 MB 36 L3 cache L3 cache L3 DCM 36 MB QCM L3 cache L3 to memory to memory DIMMs DIMMs POWER5+ Dual-Core Module POWER5+ Quad-Core Module One Dual-Core-chip Two Dual-Core-chips plus two L3-cache-chips plus one L3-cache-chip Figure 1 DCM and QCM physical views and basic internal architecture © Copyright IBM Corp. 2006.
    [Show full text]
  • Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important
    VOLUME 13, NUMBER 13 OCTOBER 6,1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important by Keith Diefendorff company has decided to make a last-gasp effort to retain control of its high-end server silicon by throwing its consid- Not content to wrap sheet metal around erable financial and technical weight behind Power4. Intel microprocessors for its future server After investing this much effort in Power4, if IBM fails business, IBM is developing a processor it to deliver a server processor with compelling advantages hopes will fend off the IA-64 juggernaut. Speaking at this over the best IA-64 processors, it will be left with little alter- week’s Microprocessor Forum, chief architect Jim Kahle de- native but to capitulate. If Power4 fails, it will also be a clear scribed IBM’s monster 170-million-transistor Power4 chip, indication to Sun, Compaq, and others that are bucking which boasts two 64-bit 1-GHz five-issue superscalar cores, a IA-64, that the days of proprietary CPUs are numbered. But triple-level cache hierarchy, a 10-GByte/s main-memory IBM intends to resist mightily, and, based on what the com- interface, and a 45-GByte/s multiprocessor interface, as pany has disclosed about Power4 so far, it may just succeed. Figure 1 shows. Kahle said that IBM will see first silicon on Power4 in 1Q00, and systems will begin shipping in 2H01. Looking for Parallelism in All the Right Places With Power4, IBM is targeting the high-reliability servers No Holds Barred that will power future e-businesses.
    [Show full text]
  • Implementing Powerpc Linux on System I Platform
    Front cover Implementing POWER Linux on IBM System i Platform Planning and configuring Linux servers on IBM System i platform Linux distribution on IBM System i Platform installation guide Tips to run Linux servers on IBM System i platform Yessong Johng Erwin Earley Rico Franke Vlatko Kosturjak ibm.com/redbooks International Technical Support Organization Implementing POWER Linux on IBM System i Platform February 2007 SG24-6388-01 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. Second Edition (February 2007) This edition applies to i5/OS V5R4, SLES10 and RHEL4. © Copyright International Business Machines Corporation 2005, 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix The team that wrote this redbook. ix Become a published author . xi Comments welcome. xi Chapter 1. Introduction to Linux on System i platform . 1 1.1 Concepts and terminology . 2 1.1.1 System i platform . 2 1.1.2 Hardware management console . 4 1.1.3 Virtual Partition Manager (VPM) . 10 1.2 Brief introduction to Linux and Linux on System i platform . 12 1.2.1 Linux on System i platform . 12 1.3 Differences between existing Power5-based System i and previous System i models 13 1.3.1 Linux enhancements on Power5 / Power5+ . 14 1.4 Where to go for more information . 15 Chapter 2. Configuration planning . 17 2.1 Concepts and terminology . 18 2.1.1 Processor concepts .
    [Show full text]
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • IBM Powerpc 970 (A.K.A. G5)
    IBM PowerPC 970 (a.k.a. G5) Ref 1 David Benham and Yu-Chung Chen UIC – Department of Computer Science CS 466 PPC 970FX overview ● 64-bit RISC ● 58 million transistors ● 512 KB of L2 cache and 96KB of L1 cache ● 90um process with a die size of 65 sq. mm ● Native 32 bit compatibility ● Maximum clock speed of 2.7 Ghz ● SIMD instruction set (Altivec) ● 42 watts @ 1.8 Ghz (1.3 volts) ● Peak data bandwidth of 6.4 GB per second A picture is worth a 2^10 words (approx.) Ref 2 A little history ● PowerPC processor line is a product of the AIM alliance formed in 1991. (Apple, IBM, and Motorola) ● PPC 601 (G1) - 1993 ● PPC 603 (G2) - 1995 ● PPC 750 (G3) - 1997 ● PPC 7400 (G4) - 1999 ● PPC 970 (G5) - 2002 ● AIM alliance dissolved in 2005 Processor Ref 3 Ref 3 Core details ● 16(int)-25(vector) stage pipeline ● Large number of 'in flight' instructions (various stages of execution) - theoretical limit of 215 instructions ● 512 KB L2 cache ● 96 KB L1 cache – 64 KB I-Cache – 32 KB D-Cache Core details continued ● 10 execution units – 2 load/store operations – 2 fixed-point register-register operations – 2 floating-point operations – 1 branch operation – 1 condition register operation – 1 vector permute operation – 1 vector ALU operation ● 32 64 bit general purpose registers, 32 64 bit floating point registers, 32 128 vector registers Pipeline Ref 4 Benchmarks ● SPEC2000 ● BLAST – Bioinformatics ● Amber / jac - Structure biology ● CFD lab code SPEC CPU2000 ● IBM eServer BladeCenter JS20 ● PPC 970 2.2Ghz ● SPECint2000 ● Base: 986 Peak: 1040 ● SPECfp2000 ● Base: 1178 Peak: 1241 ● Dell PowerEdge 1750 Xeon 3.06Ghz ● SPECint2000 ● Base: 1031 Peak: 1067 Apple’s SPEC Results*2 ● SPECfp2000 ● Base: 1030 Peak: 1044 BLAST Ref.
    [Show full text]
  • IBM Power Systems Performance Report Apr 13, 2021
    IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+)
    [Show full text]
  • IBM's POWER5 Micro Processor Design and Methodology
    IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group © 2003 IBM Corporation IBM’s POWER5 Micro Processor Design and Methodology Outline POWER5 Overview Design Process Power UT CS352 , April 2005 © 2003 IBM Corporation IBM’s POWER5 Micro Processor Design and Methodology POWER Server Roadmap 2001 2002-3 2004* 2005* 2006* POWER4 POWER4+ POWER5 POWER5+ POWER6 90 nm 65 nm 130 nm 130 nm Ultra high 180 nm >> GHz >> GHz frequency cores Core Core 1.7 GHz 1.7 GHz > GHz > GHz L2 caches Core Core Core Core 1.3 GHz 1.3 GHz Shared L2 Advanced System Features Core Core Distributed Switch Shared L2 Shared L2 Distributed Switch Distributed Switch Shared L2 Simultaneous multi-threading Distributed Switch Sub-processor partitioning Reduced size Dynamic firmware updates Enhanced scalability, parallelism Chip Multi Processing Lower power High throughput performance - Distributed Switch Larger L2 Enhanced memory subsystem - Shared L2 More LPARs (32) Dynamic LPARs (16) Autonomic Computing Enhancements *Planned to be offered by IBM. All statements about IBM’s future direction and intent are* subject to change or withdrawal without notice and represent goals and objectives only. UT CS352 , April 2005 © 2003 IBM Corporation IBM’s POWER5 Micro Processor Design and Methodology POWER5 Technology: 130nm lithography, Cu, SOI 389mm2 276M Transistors Dual processor core 8-way superscalar Simultaneous multithreaded (SMT) core Up to 2 virtual processors per real processor UT CS352 , April 2005 © 2003 IBM Corporation IBM’s POWER5 Micro
    [Show full text]
  • Microprocessor
    MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE POWER5 TOPS ON BANDWIDTH IBM’s Design Is Still Elegant, But Itanium Provides Competition By Kevin Krewell {12/22/03-02} On large multiprocessing systems, often the dominant attributes needed to assure good incremental processor performance include memory coherency issues, aggregate memory bandwidth, and I/O performance. The Power4 processor, now shipping from IBM, nicely balances integration with performance. The Power4 has an processor speed, but are eight bytes wide, offering more than eight-issue superscalar core, 12.8GB/s of memory bandwidth, 4GB/s of bandwidth per bus. The fast buses are used to 1.5MB of L2 cache, and 128MB of external L3 cache. The enable what IBM terms aggressive cache-to-cache transfers. recently introduced Power5 steps up integration and per- This tightly coupled multiprocessing is designed to allow formance by integrating the distributed switch fabric between processing threads to operate in parallel over the processor memory controller and core/caches. (See MPR 10/14/03-01, array. With eight modules, Power5 supports up to 128-way “IBM Raises Curtain on Power5.”) The Power5 has an on-die multithreaded processing. memory controller that will support both DDR and DDR2 The bus structure and distributed switch fabric also SDRAM memory. The Power5 also improves system per- allow IBM to create a 64-way (processor cores) configuration. formance, as each processor core is now multithreaded. (See MPR 9/08/03-02, “IBM Previews Power5.”) IBM has packed four processor die (eight processor cores) and four L3 cache chips on one 95- × 95mm MCM, seen in Figure 1.
    [Show full text]
  • Chapter 1-Introduction to Microprocessors File
    Chapter 1 Introduction to Microprocessors Expected Outcomes Explain the role of the CPU, memory and I/O device in a computer Distinguish between the microprocessor and microcontroller Differentiate various form of programming languages Compare between CISC vs RISC and Von Neumann vs Harvard architecture NMKNYFKEEUMP Introduction A microprocessor is an integrated circuit built on a tiny piece of silicon It contains thousands or even millions of transistors which are interconnected via superfine traces of aluminum The transistors work together to store and manipulate data so that the microprocessor can perform a wide variety of useful functions The particular functions a microprocessor perform are dictated by software The first microprocessor was the Intel 4004 (16-pin) introduced in 1971 containing 2300 transistors with 46 instruction sets Power8 processor, by contrast, contains 4.2 billion transistors NMKNYFKEEUMP Introduction Computer is an electronic machine that perform arithmetic operation and logic in response to instructions written Computer requires hardware and software to function Hardware is electronic circuit boards that provide functionality of the system such as power supply, cable, etc CPU – Central Processing Unit/Microprocessor Memory – store all programming and data Input/Output device – the flow of information Software is a programming that control the system operation and facilitate the computer usage Programming is a group of instructions that inform the computer to perform certain task NMKNYFKEEUMP Introduction Computer
    [Show full text]
  • Introduction to the Cell Multiprocessor
    Introduction J. A. Kahle M. N. Day to the Cell H. P. Hofstee C. R. Johns multiprocessor T. R. Maeurer D. Shippy This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization. The paper discusses the history of the project, the program objectives and challenges, the design concept, the architecture and programming models, and the implementation. Introduction: History of the project processors in order to provide the required Initial discussion on the collaborative effort to develop computational density and power efficiency. After Cell began with support from CEOs from the Sony several months of architectural discussion and contract and IBM companies: Sony as a content provider and negotiations, the STI (SCEI–Toshiba–IBM) Design IBM as a leading-edge technology and server company. Center was formally opened in Austin, Texas, on Collaboration was initiated among SCEI (Sony March 9, 2001. The STI Design Center represented Computer Entertainment Incorporated), IBM, for a joint investment in design of about $400,000,000. microprocessor development, and Toshiba, as a Separate joint collaborations were also set in place development and high-volume manufacturing technology for process technology development. partner. This led to high-level architectural discussions A number of key elements were employed to drive the among the three companies during the summer of 2000. success of the Cell multiprocessor design. First, a holistic During a critical meeting in Tokyo, it was determined design approach was used, encompassing processor that traditional architectural organizations would not architecture, hardware implementation, system deliver the computational power that SCEI sought structures, and software programming models.
    [Show full text]