The IBM Power Micro-Architecture 1 Introduction
Total Page:16
File Type:pdf, Size:1020Kb
The IBM Power Micro-architecture Report for COMP9244: Software View of Processor Architectures Godfrey van der Linden 2006-08-01 Abstract The IBM PowerPC instruction set architecture and the implementations of it have pow- ered many different computer systems. It is a second generation RISC design that incorpo- rates many instruction extensions designed to ease the generation of quality code by modern compilers. The RISC design lends itself to scaling from very small implementations designed for embedded applications, through super–computers, to standalone desktop and server ma- chines. Although the design is fundamentally RISC, the IBM designers have explored much of the landscape of modern computer architectures, from large super–scalar processors down to tiny single issue pipelined processor cells, suitable for use in custom designed ASICs. This paper explores the history of the architecture and some of the unusual instruction set choices that the original PowerPC designers made, before investigating some of the PowerPC im- plementations. Modern PowerPC processors, such as the POWER4, have very powerful and general supervisor and hypervisor mode functionality. The later sections of this paper discusses operating system design and some of the some of the issues posed by the POWER4 architecture. 1 Introduction as mobile phones. In this introduction we shall briefly discuss the development history of the PowerPC(PPC) Continuing with IBM’s long history of com- architecture and a quick survey of the PPC ex- puter design innovation, the PowerPC archi- tensions to traditional RISC architectures. In tecture and the various implementations of it, two middle sections will discuss the large main- such as the POWER5, are worthy representa- frame desktop implementations and followed tives of their world class design. IBM lead the by the smaller embedded processor variations. way with many performance innovations in the Finally, in Section 4, issues and design features 1950s and ’60s; it is interesting that so many of will be discussed from the viewpoint of an op- the technologies that they invented have been erating system designer. of direct use in their RISC architectures some In this document it is assumed that the 40 years later. reader is familiar with both modern RISC pro- Apple Computer shocked the personal com- cessor architectures and also with operating puter world when they announced their inten- system design issues. tion of switching to the Intel Core Duo line of processors. IBM and Freescale seem to be un- 1.1 History of the PowerPC concerned, certainly IBM will be shipping as many of these processors as their fabricators In 1985 IBM started working on a second can make attempting to satisfy the insatiable generation RISC architecture, known as the gaming console market. In a remarkable coup “AMERICA architecture”, [CM90]. In 1986 both MicroSoft’s XBox 360 and Sony PlaySta- the development on the RS/6000 was started, tion 3 will be powered by PowerPC chips. which was the first implementation of the Freescale is concentrating their design efforts AMERICA architecture, in February 1990, the with low power signal processing chips aim- first RS/6000 machines shipped, [IBM01] with ing for the handheld embedded market, such a renamed POWER architecture. At the end 1 of development of the RS/6000 series IBM had 1.2.1 Branch registers: Link and Count reduced the chip count of the CPU from eleven The PowerPC ISA uses a dedicated link regis- chips down to a single chip in the low end ma- ter, , to save the return address for a function chines. lr call. It is the callees responsibility to save the In the early ’90s Apple and IBM were col- lr if it isn’t a leaf function. Using a special reg- laborating on a number of projects. During ister for the link register can make the return this time IBM approached Apple to collabo- jump faster since the hardware may not need rate on the development of a family of single– to go through a register read pipeline stage for chip processors based on the POWER archi- function returns. tecture. Soon after, Apple asked Motorola to The ISA also has a dedicated register join the collaboration to leverage Motorola’s for controlling loops, the count register ctr. experience in manufacturing high–volume pro- This register is automatically decremented and cessors. This collaboration was known as the tested against 0 by the branch conditional in- 1 AIM alliance; the result was a modification of structions. By using this special register the POWER, known as the PowerPC instruction branch processor can determine the result of set architecture(ISA). PowerPC added single– branch early in the pipeline. precision floating point, general register–to– In the PowerPC architecture the branch register multiply and divide and removed some and instruction fetch units are closely associ- POWER instructions, such as MQ register ated, in fact he branch unit is split into two based multiplies. PowerPC also added a par- with the branch prediction unit very close — allel 64–bit instruction set mode. in terms of clock cycles — to the fetch unit. The 1993, the POWER 2 was released The lr and ctr registers can be implemented and the POWER3 was released in 1997. The in the part of the branch unit that is near the POWER3 was the first implementation of fetch unit and the instruction set can the use the full 32 and 64–bit PowerPC ISA. The these registers to implement indirect function POWER4 was released in 2001 and has be- calls. See [HP03, Appendix C] for details. come the basis for later development of rela- tively large systems. The PowerPC 970(2002) 1.2.2 Load and Store extensions and POWER5(2005) extending the POWER4 micro-architecture. Atomic update The Load and Re- serve(lwarx) and Store Condi- tional(stwcx) instructions provide the PowerPC shared–storage atomic–update 1.2 PowerPC instruction set archi- mechanism. Lwarx instruction loads a tecture value from a memory location and marks that location as reserved. The subse- The PowerPC instruction set architecture is quent stwcx stores a new value, if the part of a group of second generation RISC ISAs reservation is still valid. A reservation that were developed in the late ’80s and early is invalidated when another processor ’90s. This document will concentrate on those successfully commits a stwcx instruc- parts of the PowerPC ISA that are unconven- tion. Using this mechanism a general tional. In most ways the PowerPC ISA is con- atomic operation suite can be built such ventionally RISC in that only load and store as atomic increment, compare and swap operations are used to access memory; that and test and set. See [WSM+5b]. is, there are no memory operand addressing modes. However unlike many RISC instruction With update Indexed load and store instruc- sets, the PowerPC has some extended instruc- tions have a ’with update’ variant, which tions that are capable of modifying multiple updates the source index register with target registers or are inherently multi-cycle in the computed effective address after the nature. load or store is initiated. The load in- 1(A)pple, (I)BM and (M)otorola 2 struction will modify two registers; some- use of memory barriers. However the de- thing that is unusual for RISC instruc- tails are so complex and difficult to explain tion sets. This variation is essentially a that the average programmer is left behind. convenience to reduce the number of in- Which means that some system programmers structions executed inside a loop itera- may utilise sophisticated synchronisation tech- tion. See [WSM+5a]. niques internally in the operating system, but probably needs to publish quite conservative, Multiple Another pair of convenience in- i.e. slow, services to external non-kernel pro- structions are the load and store multi- grammers. ple register instructions lmw/stmw. These In PowerPC ISA there are four data instructions are intended for the pream- synchronisation instructions: isync, eieio, ble/postamble of functions to save and lwsync and sync. restore non-volatile registers, as de- fined by the application binary interface isync Instruction synchronise This instruc- (ABI ). These instruction do not really tion insures that all preceding instruc- increase program performance directly tions are completed before the isync as they still take one cycle per regis- completes and stops all subsequent in- ter to implement but rather are used to structions from scheduling until comple- increase the instruction stream density. tion. It also invalidates any instructions See [WSM+5a]. in the I-cache with respect to any out- String/misaligned A variation of the standing instruction cache block invali- Load/Store Multiple, the Load/Store dations, icbi, from the bus. In lock string instruction. These instructions acquisition, this instruction is used im- load a series of words from an arbitrary mediately after the stwcx complete test byte address in to or out of multiple reg- to guarantee that no later instructions isters. See [WSM+5a]. are executed speculatively until the re- sults are known. In general isync can Byte reversed The load/store word/halfword be thought of as a load barrier. instructions all support a byte swapped variant, where the (half–)word at the eieio Enforce in–order execution of I/O The specified location can be byte–swapped behavior of this instruction depends on on the way to or from memory. As many the type of caching enabled in preced- of the POWER implementations have ing loads and stores. Memory mapped switchable endianess these instructions device registers are mapped cache– are used to efficiently work with cross– inhibited/guarded (see section ??). For endian integers. See [WSM+5a]. this type of mapped register operation, this instruction guarantees load and store 1.2.3 Shared Storage ordering such that all preceding loads and stores complete to main memory be- The PowerPC specifies a weakly consistent fore the eieio completes.