The Design Space of Register Renaming Techniques
Total Page:16
File Type:pdf, Size:1020Kb
THE DESIGN SPACE OF REGISTER RENAMING TECHNIQUES TO BOOST PROCESSOR AND SYSTEM PERFORMANCE, VIRTUALLY ALL RECENT SUPERSCALARS RENAME REGISTERS. Register renaming is a technique to Register renaming remove false data dependencies—write after The principle of register renaming is read (WAR) and write after write (WAW)— straightforward. If the processor encounters that occur in straight line code between reg- an instruction that addresses a destination reg- ister operands of subsequent instructions.1-3 ister, it temporarily writes the instruction’s By eliminating related precedence require- result into a dynamically allocated rename ments in the execution sequence of the buffer rather than into the specified destina- instructions, renaming increases the average tion register. For instance, in the case of the number of instructions that are available for following WAR dependency: parallel execution per cycle. This results in increased IPC (number of instructions exe- i1: add …, r2, …; [… ← (r2) + (…)] cuted per cycle). i2: mul r2, …, …; [r2 ← (…) ∗ (…)] The identification and exploration of the design space of register-renaming lead to a com- the destination register of i2 (r2) is renamed, prehensive understanding of this intricate tech- say to r33. Then, instruction i2 becomes nique. As this article shows, the design space of register renaming is spanned by four main i2′ : mul r33, …, …; [r33 ← (…) * (…)] Dezsö Sima dimensions: the scope of register renaming, the layout of the rename buffers, the method of Its result is written into r33 instead of into r2. Budapest Polytechnic register mapping, and the rename rate. Rele- This resolves the previous WAR dependency vant aspects of the design space give rise to eight between i1 and i2. In subsequent instructions, basic alternatives for register-renaming. In addi- however, references to source registers must tion, the kind of operand fetch policy signifi- be redirected to the rename buffers allocated cantly affects how the processor carries out the to them as long as this renaming remains rename process, which duplicates the eight valid.3 basic alternatives to 16 possible implementa- A precursor to register renaming was intro- tion schemes. The article indicates which basic duced for floating-point instructions in 1967 implementation scheme is used in relevant by Tomasulo in the IBM 360/91,4 a scalar superscalar processors. supercomputer of that time that pioneered As register renaming is usually implement- both pipelining and shelving (dynamic ed in conjunction with shelving, the under- instruction issue). The 360/91 renamed float- lying microarchitecture is assumed to employ ing-point registers to preserve the logical con- shelving. (See the “Instruction shelving prin- sistency of the program execution rather than ciple” box for a discussion of this technique.) to remove false data dependencies. 70 0272-1732/00/$10.00 2000 IEEE Tjaden and Flynn5 first suggested the use exception of Sun’s UltraSparc line. At present, of register renaming for removing false data register renaming is considered to be a stan- dependencies for a limited set of instructions dard feature of performance-oriented super- that corresponds more or less to the load scalar processors. instructions. However, they didn’t use the term “register renaming.” Keller6 introduced Design space of register-renaming this designation in 1975 and extended renam- techniques ing to cover all instructions including a desti- The main dimensions of register renaming nation register. He also described how to are as follows: implement register renaming in processors. Even so, due to the complexity of this tech- • scope of register renaming, nique almost two decades passed after its con- • layout of the renamed registers, ception before register renaming came into widespread use in superscalars at the begin- ning of the 1990s. Instruction shelving principle Early superscalars such as the HP PA 7100, In early superscalars, decoded and executable instructions are issued immediately to the Sun SuperSparc, DEC Alpha 21064, MIPS execution units. However, using this scheme control and data dependencies, and busy exe- R8000, and Intel Pentium typically didn’t use cution units, cause issue bottlenecks. The basic technique used to remove an issue bottle- renaming. Renaming appeared gradually— neck is instruction shelving, also known as dynamic instruction issue.3,35,45 first in a restricted form called partial renam- Shelving presumes the availability of dedicated buffers, called shelving buffers, in front of ing (to be discussed in the next section) —in the execution units. The processor first issues instructions into available shelving buffers with- the early 1990s in the IBM RS/6000 out checking for data or control dependencies, or for busy execution units. As data depen- (Power1), Power2, PowerPC 601, and the dencies or busy execution units no longer restrict instruction issue, the issue bottleneck NextGen Nx586 processors. See Figure 1. Full problem occurring in early superscalars is removed. In a second step, instructions held in the renaming emerged later, beginning in 1992, shelving buffers are dispatched for execution. During dispatching, instructions are checked for first in the high-end models of the IBM main- dependencies, and not-dependent instructions are forwarded to free execution units. frame ES/9000, then in the PowerPC 603. At the time being, there’s no consensus on the use of terms instruction issue and instruc- Subsequently, renaming spread into virtually tion dispatch. Both terms are used in both possible interpretations. all superscalar processors with the notable Compaq Alpha Alpha 21064 (2) Alpha 21164 (4) Alpha 21264 (4)7 Motorola MC88000 MC88110 (2) HP PA PA7100 (2) PA7200 (2)8 PA8000 (4)9 PA82000 (4)10 PA 8500 (4)11 Power1 (4)12 Power2 (6/4)13*** IBM Power P2SC (6/4)16*** (RS/6000) 14* PPC 604 (4)16 * PowerPC PPC 601 (3) PowerPC 17 PPC 620 (4)19* Power3 (4)20 Alliance PPC 603 (3)15* PPC 602 (2) * RISC processors MIPS R R8000 (4) UltraSparc (4) R10000 (4)21 R12000 (4)22 PMI (4)23 Sun/Hal Sparc SuperSparc (3) UltraSparc-2 (4) UltraSparc-3 (4) (Sparc64) Pentium/MMX (2) 25 Intel 80x86 Pentium (2) Pentium Pro (3)24 Pentium II (3) Pentium III (3)26 IBM ES ES/9000 (2) TRON Gmicro Gmicro/500 (2) Cyrix M M1 (2)28 MII (2)29 CISC processors Motorola MC68000 MC68060 (3) AMD Nx/K Nx586 (1/3)30** K5 (4)31 K6 (3)32 K7 (3)33 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 ∗PPC designates PowerPC. Partial renaming ∗∗ Full renaming The Nx586 has scalar issue for CISC instructions but a 3-way superscalar core for converted RISC instructions. ∗∗∗The issue rate of the Power2 and P2SC is 6 along the sequential path while only 4 immediately after a branch. Figure 1. Chronology of register renaming in commercial superscalar processors. The introduction date indicates the first year of volume production. Following the model designation is the issue rate of the processors (in parentheses). Note that for the issue rate of CISC processors, one x86 instruction is considered to be the equivalent of 1.3 to 1.9 RISC instructions.34 SEPTEMBER–OCTOBER 2000 71 RENAMING REGISTERS Renaming is restricted Renaming comprises fixed-point instructions. to particular all eligible instruction types instruction types Full renaming covers all instructions includ- The indicated superscalar ing a destination register. As Figure 1 demon- Examples are a few line examples begin with the strates, virtually all recent superscalar early superscalar processors: PowerPC 603 (1993) processors employ full renaming. Notewor- PA 7200 (1995) thy exceptions are the Sun UltraSparc line and Power11 (RS/6000, 1990) Pentium Pro (1995) Alpha processors preceding the Alpha 21264. Power22 (1993) R10000 (1996) PowerPC3 601 (1993), and K5 (1995), and Nx5864 (1994). MII (1997). Rename buffer layout Rename buffers establish the actual frame- Most notable exceptions are Alpha processors work for renaming. There are three essential preceding the Alpha 21264 design aspects in their layout: and Sun UltraSparc line • type of rename buffers, (a) (b) • number of rename buffers, and Trend • number of read and write ports. 1 Renames only FP loads. 2 Extends renaming to all FP instructions. 3 Renames only the link and count register. Rename buffer types 4 Since the Nx586 has an integer core, it renames only FX instructions. The choice of which type of rename buffers to use in a processor has far-reaching impact Figure 2. Register renaming scope: partial (a) and full (b). on the implementation of the rename process. Given its importance, designers must consid- er the various design options. To simplify this • method of register mapping and, presentation, I initially assume a common • rename rate architectural register file for all processed data types, and then extend the discussion to the as indicated in subsequent sections. Due to split-register scenario that is commonly volume restrictions we ignore additional employed. dimensions that are related to the renaming As Figure 3 illustrates, there are four fun- process such as recovery from a misprediction. damentally different ways to implement rename buffers. The range of choices includes Scope of register renaming a) using a merged architectural and rename To indicate how extensively the processor register file, b) employing a stand-alone makes use of renaming, I distinguish between rename register file, c) keeping renamed val- partial and full renaming. Partial renaming is ues either in the reorder buffer (ROB), or d) restricted to one or only a few instruction types, in the shelving buffers. for instance, only to floating-point instructions. In the first approach, rename buffers are Early processors typically employed this incom- implemented along with the architectural reg- plete form of renaming; the Power1 (RS/6000), isters in the same physical register file called Power2, PowerPC 601, and the Nx586 are the merged architectural and rename register examples, as shown in Figure 2. Of these, the file or the merged register file for short.