A Retrospective on “MIPS: a Microprocessor Architecture”

Home , Microprocessor, MIPS architecture, Motorola 68000, R2000 (microprocessor)

...... A Retrospective on “MIPS: A Microprocessor Architecture”

THOMAS R. GROSS ETH Zurich

NORMAN P. J OUPPI Google

STEVEN PRZYBYLSKI Verdande Group

CHRIS ROWEN Cadence Design Systems

...... The MIPS project started in early to pipeline and did not take advantage of The quarter before the MIPS project 1981. At that time, mainstream architec- compiler capabilities (such as register allo- started, Stanford University had offered tures such as the VAX, IBM 370, Intel cation) that could provide more efficient for the second time a (graduate) class on 8086, and Motorola 68000 were fairly execution. VLSI design, based on the approach complex, and their operation was con- One area of architecture research in developed by Carver Mead and Lynn trolled by microcode. Designers of high- the early 1980s was to create even more Conway.1 One key idea was that the performance implementations discovered complex CISC architectures. Notable design rules could be specified without that pipelining of complex-instruction-set examples of these include high-level lan- close coupling to the details of a specific computer (CISC) machines, especially the guage machines such as Lisp machines, process. Most of the MIPS project team VAX, was hard; the VAX 11/780 required and later the Intel 432. However, the members had taken this class. Another around 10 processor cycles to execute a approach taken in the MIPS project as difference was that in VLSI, tradeoffs dif- single instruction, on average. Tradeoffs well as the RISC project at the University fer from those made in multiboard that had been made when there were of California, Berkeley, was to design machines built from hundreds of small-, few resources available for implementa- simpler instruction sets that did not medium-, and large-scale integration tion (for example, dense instruction sets require execution of microcode. These parts like the VAX 11/780. In a multiboard sequentially decoded over many cycles by simpler instructions were a better fit for machine, gate transistors were expen- microcode) ended up creating unneces- an optimizing compiler’s output—more sive (for example, only four 2-input gates sary constraints when more resources complex operations could be synthe- per transistor-transistor logic package), became available. For example, as transis- sized from several simple operations, but whereas ROM transistors were relatively tors were replacing the core, the cost of in the common case where only a simple cheap (for example, kilobits of ROM per memory dropped much faster than operation was needed, the more com- package). Much of this was driven by logic—favoring a slightly less dense pro- plex aspects of a CISC instruction could pinout and other limitations of DIP pack- gram encoding over logic-intensive inter- be effectively optimized away. In the aging. This naturally favored the design pretation. Also, the design of instruction case of the MIPS project, we empha- of microcoded machines, in which con- sets did not assume the use of an optimiz- sized ease of pipelining and sophisti- trol and sequencing signals could be effi- ing compiler. CISC instruction features cated register allocation, whereas the ciently stored as many bits per package such as arithmetic operations with oper- Berkeley RISC project included support instead of being computed by low- ands in memory were especially difficult for register windows in hardware. density gates requiring many packages...... JULY/AUGUST 2016 73 ...... AWARDS

that was heavily inﬂuenced by the Stan- ford MIPS design and was sold as the MIPS R2000. A ﬁnal evaluation of the Stanford MIPS architecture was published in 1988.4

Reflections Several papers described the implementation and testing of the MIPS pro- cessor5 and discussed the design decisions that turned out to be right and those that deserve to be reconsidered. More than 30 years later, at a time when microprocessors contain multiple independent processing units (cores) and more than 5 billion transistors, a few of Figure 1. Die photo of MIPS processor. these have passed the test of time.

However, in VLSI, a transistor has ative to other processor designs of this MIPS Design approximately the same cost no matter era, a board powered by a MIPS pro- As part of the MIPS project, we devel- what logical function it is used for, which cessor could deliver the same perform- oped various CAD tools, such as tools for favors the adoption of direct control as in ance as a cabinet-sized computer built programmable logic array synthesis and early RISC architectures instead of with lower levels of integration. At the timing verification.6 These tools were microcode. Another difference with time we wrote our paper in the fall of driven by the designer’s requirements widespread adoption of VLSI was that 1982,2 the architecture and microarchi- and allowed a small team of students the topology of the wiring was important. tecture were defined, compilers for C and faculty to complete the processor In 1981, we had only a single level of and Pascal were written, and several design in a timely manner. metal, so long wires had to be largely pla- test chips covering different parts of the We also invested heavily in compilers nar. This also favored simple and regular design had been sent out for fabrication. beyond the immediate needs of the designs. Finally, in a VLSI design it was MIPS processor development. Fellow clearly zero-sum: we had a maximum student Fred Chow designed and imple- chip size available, so any feature put in Developments After mented a machine-independent global would have to justify its value and would Publication optimizer, and fellow student Peter displace a less valuable feature. These The MIPS design was completed in the Steenkiste developed a Lisp compiler to constraints brought a lot of clarity of spring of 1983 and sent out for fabrica- investigate dynamic type checking on a focus to the design process. tion.3 At about the time the design was processor that did not provide any dedi- Given this context, we decided to sent for fabrication, the Center for Inte- cated hardware support for this task. build a single-chip high-performance grated Systems at Stanford University Personal workstations were a novelty microprocessor that could be realized developed a 3-lm VLSI fabrication pro- at that time, and the MIPS project mem- with the fabrication technology available cess and used the MIPS design (shrunk bers were the first to enjoy workstations to university teams—a 4-lmsingle-level optically) to tune its process. In early in their offices. These workstations let us metal nMOS process that put severe lim- 1984, we received working 3-lmMIPS support the implementation with novel itations on the design complexity and processors from MOSIS that operated at tools and interactively experiment with size. (The DARPA MOSIS program bro- the speed we had expected for the different compiler and hardware optimi- kered access to silicon foundries so that design at 4 lm. Figure 1 shows the die zations. RISC designs emphasize effi- students and researchers in universities photo. These processors were run on a cient resource usage and encourage could obtain real silicon parts.) High per- test board developed by our fellow grad designers to employ resources where formance at that time meant that the pro- student, Anant Agarwal, and on 20 Feb- they can contribute the most; this RISC cessor could execute more than 1 million ruary 1984, the first program (8queens) strategy drove many of the major design (meaningful) instructions per second was run to completion. In mid-1984, decisions (for example, to abandon (MIPS), and the design target was 2 MIPS Computer Systems was founded. microcode in favor of simple instructions, MIPS (that is, a clock rate of 4 MHz). Rel- It designed a completely new processor or to use precious on-chip transistors for ...... 74 IEEE MICRO frequent operations and leave other tradeoff between design complexity, crit- conferences, benchmark results started operations to be dealt with by software). ical path length, and software develop- to become important, and the MIPS ment costs. In addition to clever circuit paper (as well as papers by other design Hardware/Software Coupling design and novel tools, this ability to groups, such as the UC Berkeley RISC The MIPS project invested in software make hardware/software tradeoffs was project8) presented empirical evidence. early on; the compiler tool chain worked a key factor for success. Stanford University published the set of before the design was finalized. We The team wanted to pick a name for benchmarks used by the MIPS project made several key microarchitecture deci- the project that emphasized performance. (the Stanford Benchmark Suite), and sions (including the structure of the pipe- About nine months earlier, the RISC proj- despite their limitations, they were in line) after running benchmark programs ect at UC Berkeley had started, so we use (at least) 25 years later to explore and assessing the impact of proposed needed a catchy acronym. “Million instruc- array indexing and recursive function design changes. tions per second” (MIPS) sounded right, calls. About five years later came the The MIPS project also emphasized given the project’s goals, but this metric founding of the SPEC consortium, which tradeoffs across layers as defined by was also known as the “meaningless indi- eventually produced a large body of real- then-prevalent industry practice. Two cator of processor speed.” So, we settled istic benchmarks. In 2011, about 30 sample design decisions illustrate this on “microprocessor without interlocked years later, the first ACM conferences aspect: the MIPS processor uses word pipeline stages.” initiated a process that lets authors of addresses, not byte addresses. The rea- The Mead/Conway approach to VLSI accepted papers submit an artifact col- soning was that byte addressing would design emphasized a decoupling of archi- lection (the benchmarks and tools used complicate the memory interface and tecture and fabrication. No longer was a to generate any empirical evidence pre- slow down the processor, that bytes chip design tied to a specific (often pro- sented in a paper).9 The MIPS project aren’t accessed that often, and that an prietary, in-house) process. The MIPS cannot claim credit for these develop- optimizing compiler could handle any project demonstrated that a high- ments, but it emphasized early on that programming language issues. This deci- performance design could be realized in end-to-end performance, from source sion was probably correct for research this framework and supported the view program to executing machine instruc- processors (and saved us from debating that VLSI design could be done without tions, is the metric that matters. if the processor should be big-endian or close coupling to a proprietary process. little-endian) and illustrates the freedom Vertical integration—that is, design and he Stanford MIPS project was an afforded by looking beyond a single layer. fabrication in one company—might offer T important evolutionary step. Later But all subsequent descendants sup- benefits, but so does the separation of RISC architectures such as MIPS Inc. and ported byte addressing (of course, by fabrication and design. The MOSIS ser- DEC Alpha were able to learn from both that time, VLSI had advanced to CMOS vice was an early (nonprofit) experiment; our mistakes and successes, producing with at least two levels of metal, so byte a few years after the end of the MIPS cleaner and more widely applicable archi- selection logic was easier to implement). project, commercial silicon foundries tectures, including integrated floating- However, word alignment of word started to offer fabrication services and point and system support features such accesses as in MIPS was still retained, allowed the creation of “fabless” semi- as TLBs. All new architectures that unlike in minicomputers. conductor companies. appeared afterward (since 1985) incorpo- Sometimes the absence of interlocks rated ideas from the RISC designs, and on MIPS is seen as a defining feature of Benchmarks as the concern for resource and power this project. However, it was a tradeoff, The MIPS project used a quantitative efficiency continues to be important, we based on the capabilities of the imple- approach to decide on various features expect RISC ideas to remain relevant for mentation technology of the time. To of the processor and therefore needed a processor designs. And, finally, this MIPS meet the design goal of a 4-MHz clock, collection of benchmark programs to col- paper was also an important step in the the designers had to streamline the pro- lect the data needed for decision making. transition of the SIGMICRO Annual Work- cessor, and still most of the critical paths In retrospect, the benchmarks we used shop on Microprogramming to the IEEE/ involved the processor’s control compo- were tiny and did not include any signifi- ACM International Symposium on nent.7 For the MIPS processor, it made cant operating systems code. Conse- Microarchitecture. sense to simplify the design as much as quently, the designers focused on possible; later descendants, with access producing a design that delivered per- to better VLSI processes, made different formance for compiled programs but References decisions. The absence of hardware paid less attention to the operating sys- 1. C. Mead and L. Conway, Introduction interlocks (to delay an instruction if one tem interface and the need to connect to VLSI Systems, Addison-Wesley, of the operands wasn’t ready) was a the processor to a memory hierarchy. At 1980...... JULY/AUGUST 2016 75 ...... AWARDS

2. J. Hennessy et al., “MIPS: A Micro- 6. N. Jouppi, “TV: An NMOS Timing ETH Zurich. Contact him at thomas. processor Architecture,” Proc. 15th Analyzer,” Proc. 3rd CalTech Conf. [email protected]. Ann. Microprogramming Workshop, VLSI, 1983, pp. 71–85. 1982, pp. 17–22. 7. S. Przybylski et al., “Organization and Norman P. Jouppi is a distinguished hard- 3. C. Rowen et al., “MIPS: A High Per- VLSI Implementation of MIPS,” J. ware engineer at Google. Contact him formance 32-bit NMOS Microproc- VLSI and Computer Systems, vol. 1, at [email protected]. no. 2, 1984, pp. 170–208. essor,” Proc. Int’l Solid-State Circuits John L. Hennessy is president emeritus 8. D.A. Patterson and C.H. Sequin, Conf., 1984, pp. 180–181. of Stanford University. Contact him at “RISC-I: A Reduced Instruction Set 4. T.R. Gross et al., “Measurement and [email protected]. VLSI Computer,” Proc. 8th Ann. Evaluation of the MIPS Architecture Symp. Computer Architecture, 1981, Steven Przybylski is the president and and Processor,” ACM Trans. Com- pp. 443–457. principal consultant of the Verdande puter Systems, Aug. 1988, pp. 229– 9. S. Krishnamurthi, “Artifact Evaluation Group. Contact him at sp@verdande. 258. for Software Conferences,” SIGPLAN com. 5. S. Przybylski, “The Design Verification Notices, vol. 48, no. 4S, 2013, pp. 17–21. and Testing of MIPS,” Proc. Conf. Chris Rowen is the CTO of the IP Group at Advanced Research in VLSI, 1984, pp. Thomas R. Gross is a faculty member in Cadence Design Systems. Contact him at 100–109. the Computer Science Department at [email protected].

...... HPS Papers: A Retrospective

YALE N. PATT University of Texas at Austin

WEN-MEI W. H WU University of Illinois at Urbana–Champaign

STEPHEN W. M ELVIN

MICHAEL C. SHEBANOW Samsung

...... HPS happened at a time (1984) ing its bets, pursuing the development of scripted variables—say I—and checked when the computer architecture com- the i860 along with continued activity on the upper and lower bounds; if they both munity was being inundated with the x86. passed, it used the size information of promises of RISC technology. Dave Pat- The RISC phenomenon rejected the each element in the array and the array’s terson’s RISC at Berkeley and John Hen- VAX and x86 architectures as far too com- dimensions to compute part of the loca- nessy’s MIPS at Stanford were visible plex. Both architectures had variable- tion of A[I,J,K]. More than a half-dozen university research projects. Hewlett length instruction sets, often with multi- operations were performed in carrying Packard had abandoned its previous ple operations in each instruction. The out the work of this instruction. Intel’s instruction set architecture (ISA) in favor VAX Index instruction, for example, x86’s variable-length instruction had pre- of the HP Precision Architecture (HP/PA) required six operands to assist in comput- ﬁxes to override an instruction’s normal and had attracted a number of people ing the memory location of a desired ele- activity, and several bytes when neces- from IBM to join their workforce. Sun ment in a multidimensional subscripted sary to locate the location of an operand. Microsystems had moved from the array. If you wanted the location of RISC advocates argued that with sim- Motorola 68020 to the Sparc ISA. Motor- A[I,J,K], you could obtain it with three pler instructions requiring in general a ola itself was focusing on their 78K, later instantiations of the Index instruction. single operation, wherein the signals renumbered 88K. Even Intel was hedg- The Index instruction took one of the sub- needed to control the datapath were ...... 76 IEEE MICRO