Reduced Instruction Set Computers
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLES REDUCED INSTRUCTION SET COMPUTERS Reduced instruction set computers aim for both simplicity in hardware and synergy between architectures and compilers. Optimizing compilers are used to compile programming languages down to instructions that are as unencumbered as microinstructions in a large virtual address space, and to make the instruction cycle time as fast as possible. DAVID A. PATTERSON As circuit technologies reduce the relative cost of proc- called a microinstruction, and the contents are essen- essing and memory, instruction sets that are too com- tially an interpreter, programmed in microinstructions. plex become a distinct liability to performance. The The main memories of these computers were magnetic designers of reduced instruction set computers (RISCs) core memories, the small control memories of which strive for both simplicity in hardware and synergy be- were usually 10 times faster than core. tween architecture and compilers, in order to stream- Minicomputer manufacturers tend to follow the lead line processing as much as possible. Early experience of mainframe manufacturers, especially when the indicates that RISCs can in fact run much faster than mainframe manufacturer is IBM, and so microprogram- more conventionally designed machines. ming caught on quickly. The rapid growth of semicon- ductor memories also speeded this trend. In the early BACKGROUND 1970s. for example, 8192 bits of read-only memory The IBM System/360, first introduced in 1964, was the (ROM) took up no more space than 8 bits of register. real beginning of modern computer architecture. Al- Eventually, minicomputers using core main memory though computers in the System/360 “family” provided and semiconductor control memory became standard in a different level of performance for a different price, all the minicomputer industry. ran identical software. The System/360 originated the With the continuing growth of semiconductor mem- distinction between computer architecture-the abstract ory, a much richer and more complicated instruction structure of a computer that a machine-language pro- set could be implemented. The architecture research grammer needs to know to write programs-and the community argued for richer instruction sets. Let us hardware implementation of that structure. Before the review some of the arguments they advanced at that System/BBO, architectural trade-offs were determined time: by the effect on price and performance of a single im- plementation: henceforth, architectural trade-offs be- 1. Richer instruction sets would simplify compilers. As came more esoteric. The consequences of single imple- the story was told, compilers were very hard to mentations would no longer be sufficient to settle an build, and compilers for machines with registers argument about instruction set design. were the hardest of all. Compilers for architectures Microprogiamming was the primary technological with execution models based either on stacks or innovation behind this marketing concept. Micropro- memory-to-memory operations were much simpler and more reliable. gramming relied on a small control memory and was an elegant way of buil.ding the processor control unit for a 2. Richer instruction sets would alleviate the software cri- large instruction set. Each word of control memory is sis. At a time when software costs were rising as 0 1985 ACM OOOl-0782/85/0100-0008 75~ fast as hardware costs were dropping, it seemed 8 Communications of the ACM Ianua y 1985 Volume 28 Number 1 Articles appropriate to move as much function to the hard- many exotic instruction formats that reduced program ware as possible. The idea was to create machine size. instructions that resembled programming language The rapid rise of the integrated circuit, along with statements, so as to close the “semantic gap” be- arguments from the architecture research community tween programming languages and machine lan- in the 1970s led to certain design principles that guided guages. computer architecture: 3. Richer instruction sets would improve architecture qual- The memo y technology used for microprograms was ity. After IBM differentiated architecture from im- growing rapidly, so large microprograms would add lit- plementation, the research community looked for tle or nothing to the cost of the machine. ways to measure the quality of an architecture, as Since microinstructions were much faster than normal opposed to the speed at which implementations machine instructions, moving software functions to mi- could run programs. The only architectural metrics crocode made for faster computers and more reliable then widely recognized were program size, the functions. number of bits of instructions, and bits of data Since execution speed was proportional to program size, fetched from memory during program execution architectural techniques that led to smaller programs (see Figure 1). also led to faster computers. Registers were old fashioned and made it hard to build Memory efficiency was such a dominating concern in compilers; stacks or memory-to-memory architectures these metrics because main memory-magnetic core were superior execution models. As one architecture re- memory-was so slow and expensive. These metrics searcher put it in 1978, “One’s eyebrows should rise are partially responsible for the prevailing belief in the whenever a future architecture is developed with a 1970s that execution speed was proportional to program register-oriented instruction set.“’ size. It even became fashionable to examine long lists of instruction execution to see if a pair or triple of old Computers that exemplify these design principles are instructions could be replaced by a single, more power- the IBM 370/168, the DEC VAX-11/780, the Xerox ful instruction. The belief that larger programs were ’ Myers, G.). The case against stack-oriented instruction sets. Comput. Archit. invariably slower programs inspired the invention of News 6. 3 (Aug. 1977). 7-10. 8 4 16 8 16 . Load i rB : B Load i B I : : Load : rC i C Add : C I . Add i rA 3 rf3 ! rC Store I A . I . : Store : rA ! A c . I I = 104b; D = 96b; M = 200b I = 72b; D = 96b; M = 168b (Register-to-Register) (Memory-to-Register) 8 16 16 16 . : :. Add : B . C . A . I I I = 56b; D = 96b; M = 152b (Memory-to-Memory) In this example, the three data words are 32 bits each These metrics suggest that a memory-to-memory archi- and the address field is 16 bits. Metrics were selected by tecture is the “best” architecture, and a register-to-regis- research architects for deciding which architecture is ter architecture the “worst.” This study led one research best; they selected the total size of executed instructions architect in 1978 to suggest that future architectures (I), the total size of executed data (D), and the total mem- should not include registers. ory traffic-that is, the sum of I and D, which is (M). FIGURE1. The Statement A c B + C Translated into Assembly Language for Three Execution Models: Register-to-Register, Memory-to-Register, and Memory-to-Memory fanua y 1985 Volume 28 Number 1 Communications of the ACM Articles TABLE I. Four Implementations of Modem Architectures l8M 370/168 VAX-11/780 Dorado iAPX-432 Year 1973 1978 1978 1982 Number of instructions 208 303 270 222 Control memory size 420 Kb 480 Kb 138 Kb 64 Kb Instruction sizes (bits) 16-48 16-456 8-24 6-321 Technology ECL MSI l-I-L MSI ECL MSI NMOS VLSI Execution model reg-mem reg-mem stack stack mem-mem mem-mem mem-mem 44-w reg-reg Cache size 64 Kb 64 Kb 64 Kb 0 These four implementations, designed in the 197Os, all used VAX and the 432. Note how much larger the control memo- microprogramming. The emphasis on memory efficiency at ries were than the cache memories. that time led to the varying-sized instruction formats of the Dorado, and the Intel iAPX-432. Table I shows some of ing microprograms, since microprogramming was the the characteristics of these machines. most tedious form of machine-language programming. Although computer architects were reaching a con- Many researchers, including myself, built compilers sensus on design principles, the implementation world and debuggers for microprogramming. This was a for- was changing around them: midable assignment, for virtually no inefficiencies could be tolerated in microcode. These demands led to Semiconductor memory was replacing core, which the invention of new programming languages for micro- meant that main memories would no longer be 10 programming and new compiler techniques. times slower tha.n control memories. Unfortunately for me and several other researchers, Since it was virtually impossible to remove all mis- there were three more impediments that kept WCSs takes for 400,000 bits of microcode, control store from being very popular. (Although a few machines ROMs were becoming control store RAMS. offer WCS as an option today, it is unlikely that more Caches had been invented-studies showed that the than one in a thousand programmers take this option.) locality of programs meant that small, fast buffers These impediments were could make substantial improvement in the imple- mentation speed.of an architecture. As Table I shows, Virtual memory complications. Once computers caches were included in nearly every machine, made the transition from physical memory to vir- though control memories were much larger than tual memory, microprogrammers incurred the cache memories. added difficulty of making sure that any routine could start over if any memory operand caused a Compilers were subsetting architectures-simple virtual memory fault. compilers found it difficult to generate the complex Limited address space. The most difficult program- new functions that were included to help close the ming situation occurs when a program must be “semantic gap.” Optimizing compilers subsetted ar- forced to fit in too small a memory. With control chitectures because they removed so many of the memories of 4096 words or less, some unfortunate unknowns at compiler time that they rarely needed WCS developers spent more time squeezing space the powerful instructions at run time.