MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor
Total Page:16
File Type:pdf, Size:1020Kb
Awards ................................................................................................................................................................ Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor ONUR MUTLU ETH Zurich RICH BELGARD ......We are continuing our series of of performance optimization on the com- sions made in the MIPS project that retrospectives for the 10 papers that piler and keep the hardware design sim- passed the test of time. The retrospective received the first set of MICRO Test of ple. The compiler is responsible for touches on the design tradeoffs made to Time (“ToT”) Awards in December generating and scheduling simple instruc- couple the hardware and the software, 2014.1,2 This issue features four retro- tions, which require little translation in the MIPS project’s effect on the later spectives written for six of the award- hardware to generate control signals to development of “fabless” semiconductor winning papers. We briefly introduce control the datapath components, which companies, and the use of benchmarks these papers and retrospectives and in turn keeps the hardware design simple. as a method for evaluating end-to-end hope that you will enjoy reading them as Thus, the instructions and hardware both performance of a system as, among much as we have. If anything ties these remain simple, whereas the compiler others, contributions of the MIPS project works together, it is the innovation they becomes much more important (and that have stood the test of time. delivered by taking a strong position in likely complex) because it must schedule the RISC/CISC debates of their decade. instructions well to ensure correct and High-Performance Systems We hope the IEEE Micro audience, espe- high-performance use of a simple pipe- The second retrospective addresses three cially younger generations, will find the line. Many modern processors continue related papers that received MICRO ToT historical perspective provided by these to take advantage of some of the design Awards in 2014. These papers introduce retrospectives invaluable. principles outlined in this paper, either in the High Performance Substrate (HPS) their ISA (the MIPS ISA still has consider- microarchitecture, examine its critical MIPS able use) or in their internal execution of design issues, and discuss the use of large The first retrospective is for the oldest instructions (many complex instructions hardware-supported atomic units to paper being discussed in this issue: are broken into simple RISC-style microin- enhance performance in HPS-style, dynam- “MIPS: A Microprocessor Architecture.”3 structions in modern processors). The ically scheduled microarchitectures. This work introduced MIPS (Microproces- MIPSISAanddesignisalsousedinmany The first paper, “HPS, A New Micro- sor without Interlocked Pipeline Stages) modern computer architecture courses architecture: Rationale and Introduction,”4 and its design philosophy, principles, hard- because of its simplicity. introduced the HPS microarchitecture, its ware implementation, related systems, Thomas Gross, Norman Jouppi, John design principles, and a prototype imple- and software issues. It is a concise over- Hennessy, Steven Przybylski, and Chris mentation. It also introduced the concept view of the MIPS design, which is one of Rowen provide in their retrospective a of restricted dataflow and its use in imple- the early reduced, or simple, instruction historical perspective on the development menting out-of-order instruction schedul- sets that have significantly impacted the of MIPS, the context in which they arrived ing and execution while maintaining microprocessor industry as well as com- at the design principles, and the develop- sequential execution and precise excep- puter architecture research and education ments that occurred after the paper. They tions from the programmer’s perspective for decades (and probably counting!). The also provide their reflections on what they (that is, without exposing the out-of-order- key design principle is to push the burden see as the design and evaluation deci- ness of dataflow execution to software). ....................................................... 70 Published by the IEEE Computer Society 0272-1732/16/$33.00 c 2016 IEEE This concept of out-of-order execution memory disambiguation problem, has on the realization that adding another level using a restricted dataflow engine that since been the subject of many works. of history to predict a branch’s direction supports precise exceptions has been the The third paper, “Hardware Support can greatly improve branch prediction hallmark of high-performance processor for Large Atomic Units in Dynamically accuracy, in addition to the single level designs since the success of the Intel Scheduled Machines,”7 likely provides that tracks which direction the same Pentium Pro,5 which implemented the the first treatment of large units of work static branch went the last N times it was concept. The paper describes how com- as atomic units that can enable higher executed (which had been established plexinstructionscanbetranslatedinhard- performance in an HPS-like dynamically practice since Jim Smith’s seminal ISCA ware to simple micro-operations and how scheduled processor. It discusses the 1981 paper11). In other words, the direc- dependent micro-operations are linked in benefits of enlarged blocks in an out-of- tion the branch took the last N times the hardware such that they are executed in order microarchitecture and describes same “history of branch directions” was a dataflow manner, in which a micro- how enlarged blocks can be formed at dif- encountered can provide a much better operation “fires” when its input operands ferent layers: architectural, compiler, and prediction for where the branch might go are ready. The mechanisms described in hardware. It introduces the notion of a fill the next time the same “history of branch this work paved the way for many future unit, which is a hardware structure that directions” is encountered. and current processor designs with com- enables the formation of large blocks of This insight formed the basis of many plex instruction sets to execute programs micro-operations that can be executed future branch predictor designs used in using out-of-order execution (and exploit atomically. This work has opened up new high-performance processors, most nota- some of the RISC principles) without dimensions and is considered a direct pre- bly starting with the Pentium Pro. Yeh requiring changes to the instruction set cursor to notions such as the trace cache, and Patt’s MICRO 1991 paper discusses architecture or the software. The retro- which is implemented in the Intel Pen- various design choices for this predictor spective by Yale Patt and his former stu- tium 4,8 and the block-structured ISA.9 and provides a detailed evaluation of their dents Wen-mei Hwu, Steve Melvin, and In their retrospective, Yale Patt and impact on prediction accuracy. For exam- Mike Shebanow beautifully describes the colleagues provide a historical perspec- ple, they examine the effect of keeping six major contributions of the original HPS tive of HPS, the environment and the global or local branch history registers paper. works that influenced the design deci- (many modern hybrid predictors use a The second paper, “Critical Issues sions behind it, its reception by the com- combination) and the effect of how the Regarding HPS, A High Performance munity and the industry, and what pattern history tables are organized. The Microarchitecture,”6 discusses what the happened afterward. We hope you enjoy retrospective provides the historical per- authors deem as critical issues in the reading this retrospective that describes spective as to how two-level branch pre- design of the HPS microarchitecture. The the inspirations and development of a diction came about, along with the paper is forward looking, discussing some research project that has heavily influ- contemporary works that followed it. of the problems that the authors thought enced almost all modern high- “would have to be solved to make HPS performance microprocessor designs, Compressed Code RISC viable,” as the retrospective puts it. The CISC or RISC. The authors also provide a Processor paper can be seen as a “problem book” glimpse of some of the future research The final retrospective in this issue is for that anticipates the many issues to be that has been enabled and influenced by “Executing Compressed Programs on an solved in designing a modern out-of-order the HPS project, of which the next paper Embedded RISC Architecture,”12 one of execution engine that translates a com- we discuss is a prime example. the first papers to tackle the problem of plex instruction set to a simple set of code compression to save physical mem- micro-operations internally. For example, Two-Level Branch Prediction ory space in cost-sensitive embedded it introduces the notion of a node cache, The third retrospective is for “Two-Level processors. It is also one of the first to an early precursor to a modern trace Adaptive Training Branch Prediction”10 by examine cost-related architectural issues cache, and discusses issues related to Tse-Yu Yeh and Yale Patt. This paper in a RISC-based system on chip (SoC), as control flow, design of the dynamic addresses a critical problem in the HPS Andy Wolfe’s retrospective nicely instruction scheduler, the memory units, microarchitecture, and in pipelined and describes. The basic idea is to