<<

Architecture

David A. Moon Symbolics, Inc.

W hat is an architecture? In com- languages, user interface, and operating This architecture puter systems, an architecture system. System architecture defines the rapid is a specification of an inter- product that people actually use; the other enables face. To be dignified by the name architec- levels of architecture define the mecha- development and ture, an interface should be designed for a nism underneath that implements it. Sys- long lifespan and should connect system tem architecture is implemented by soft- efficient execution of components maintained by different orga- ware; hardware only sets bounds on what large, ambitious nizations. Often an architecture is part ofa is possible. System architecture defines the product definition and defines character- motivation for most of the design choices applications. An istics on which purchasers of that product at the other levels ofarchitecture. This sec- rely, but this is not true of everything that tion is an overview of Symbolics system unconventional design is called an architecture. An architecture is architecture. avoids trading off more formal than an internal interface be- The Symbolics system presents itself to tween closely-related system components, the user through a high-resolution bitmap safety for speed. and has farther-reaching effects on system display. In addition to text and graphics, characteristics and performance. the display contains presentations of ob- A system typically contains jects. The user operates on the objects by many levels and types ofarchitecture. This manipulating the presentations with a article discusses three architectures de- mouse. The display includes a continuous- fined in Symbolics : ly updated reminder of the mouse com- (1) System architecture-defines how mands applicable to the current context. the system appears to end users and appli- Behind the display is a powerful symbol cation programmers, including the char- processor with specialized hardware and acteristics of languages, user interface, . The system is dedicated to one and . user at a time and shares such resources as (2) Instruction architecture-defines files, printers, and electronic mail with the instruction set of the machine, the other Symbolics and non-Symbolics com- types of data that can be manipulated by puters through both local-area and long- those instructions, and the environment in distance networks of several types. The which the instructions operate, for exam- local-area network is integral to system ple calling discipline, virtual operation. memory management, interrupts and ex- The system is designed for high- ception traps, etc. This is an interface be- productivity software development both tween the and the hardware. in symbolic languages, such as Common (3) Processor architecture-defmes the Lisp' and Prolog, and in nonsymbolic overall structure ofthe implementation of languages, such as Ada and . It is the instruction architecture. This is an in- also designed for efficient execution of terface between the firmware and the large programs, particularly in symbolic hardware, and is also an interface between languages, and delivery of such programs the parts of the processor hardware. to end users. The system is intended to be especially suited to complex, ambitious ap- plications that go beyond what has been System architecture done before; thus it provides facilities for exploratory programming, complexity System architecture defines how the sys- management, incremental construction of tem looks to the end user and to the pro- programs, and so forth. The operating grammer, including the characteristics of system is written in Lisp and the architec- January 1987 0018-9162/87/0100-0043S01.00 © 1987 IEEE 43 tural concept originated at the MIT Artifi- speed. Our system architecture deems this (2) Why use a symbolic system architec- cial Intelligence Laboratory. However, ap- unacceptable, because complex, ambi- ture? plications are not limited to Lisp and Al. tious application programs are typically (3) Why build a symbolic system archi- Many non-Al applications that are com- never finished to the point where it is safe tecture on unconventional lower-level plex enough to be difficult on an ordi- to declare them bug-free and remove run- architectures? nary computer have been successfully time error-checking. We feel it is essential implemented. for such applications to be robust when Meeting these needs requires an extraor- delivered to end users, so that when some- Why dedicate a computer to each user dinary system architecture-just another thing unanticipated by the programmer instead oftime-sharing? This seemed like PC or clone won' do. The intended happens, the application will fail in an ob- a big issue back in 1974 when Lisp applications demand a lot of processor vious, comprehensible, and controlled machines were invented, but perhaps by power, main and size, and way, rather than just supplying the wrong now the battle has been won. A report disk capacity. The system must provide as answer. To support such applications, a from that era3 states these reasons for much performance as possible without ex- system must provide speed and safety at abandoning time-sharing: ceeding practical limits on cost, and com- the same time. * Time-sharing systems degrade under puting capacity must not be diluted by Symbolics systems use a combination of heavy load, so work on large, ambitious sharing it among multiple users. These approaches to break the traditional dilem- programs could only be conducted in off- purely hardware aspects are not sufficient, ma in which a programmer must choose peak hours. In contrast, a single-user sys- however. The system must also improve either speed or safety and comfortable tem would perform consistently at any both the speed ofsoftware production and software development: time of day. the quality of the resulting software by * The hardware performs low-level * Performance was limited by the speed providing a more complete substrate on checking in parallel with computation and of the disk when running programs too which to erect programs than has been memory access, so that this checking takes large to fit in main memory. Dedicating a customary. Programmers should not be no extra time. disk to each user would give better perfor- handed just a language and an operating * Machine instructions are generic. For mance. system and be forced to do everything else example, the Add instruction is capable of The underlying argument was that in- themselves. adding any two numbers regardless of creasing program size and advancing tech- At a high level, the Symbolics substrate their data types. Programs need not know nology, making capable processors much provides many facilities that can be incor- ahead of time what type of numbers they less expensive, had eliminated the econo- porated into user programs, such as user- will be adding, and they need no declara- my of scale of time-sharing systems. The interface management, embedded lan- tions to achieve efficiency when using only original purpose of time-sharing was to guages, object-oriented programming, the fastest types of numbers. Automatic share expensive hardware that was only and networking. At a low level, the sub- conversion between data types occurs lightly used by any individual user. The strate provides full run-time checking of when the two operands of Add are not of serendipitous feature of time-sharing was data types, of array subscript bounds, of the same type. interuser communication. Both of these the number ofarguments passed to a func- * Function calling is very fast, yet does purposes are now served by local-area net- tion, and of undefined functions and vari- not lose information needed for debug- working. Expensive hardware units are ables. Programs can be incrementally ging and does not prevent functions from still shared, but the processor is no longer modified, even while they are running, being redefined. among them. and information needed for debugging is * Built-in substrate facilities are already These arguments apply to all types of not lost by compilation. Thus the edit- optimized and available for programmers dedicated single-user computers, even compile-test program development cycle to incorporate into their programs. PCs, not only to symbolic architectures. can be repeated very rapidly. Storage man- * Application-specific control of agement, including reclamation of space virtual-memory paging is possible. Pre- Why use a symbolic system architec- occupied by objects that are no longer in paging, postpurging, multipage transfers, ture? Many users who need a platform for use, is automatic so that the programmer and reordering of objects to improve efficient execution of large symbolic does not have to worry about it; incre- locality are supported. 2 programs, a high-productivity software mental so that it interferes minimally with These benefits are not without costs: development environment, or a system for response to the user; and efficient because * Both the cost and the complexity of exploratory programming and rapid pro- it concentrates on ephemeral objects, system hardware and software are in- totyping have found symbolic languages which are the best candidates for reclama- such as Lisp and symbolic architectures tion. creased by these additional facilities. The system never compromises safe- * Performance is such as this one very beneficial. Programs for the sake optimization not ty of speed. (A notorious always automatic. Programmers still must can be built more quickly, and fit more exception, the dynamic rather than indef- sometimes resort to smoothly into an integrated environment, extent of metering tools. Decla- inite &rest arguments, is recog- rations are available to optimize certain dif- by incorporating such built-in substrate nized as a from the past that holdover is ficult cases, but their use is much less fre- facilities as automatic storage manage- not consistent with the system architecture quent than in conventional architectures. ment and the flexible display with its and will certainly be fixed in the future.) presentation-based user interface. The full In an ordinary architecture, such fea- Why ? This is really three error-checking saves time when devel- tures would substantially diminish perfor- questions: oping new programs. The programmer can mance, requiring the introduction of (1) Why dedicate a computer to each concentrate on the essential aspects of the switches to turn offthe features and regain user instead of time-sharing? program without fussing about minor 44 COMPUTER mistakes, because the machine will catch abled and safety compromised, while the dialect that represents Lisp as it was in them. The ability to change the program in- 3640 was doing full checking as always. 1970. This makes it easier to benchmark a crementally greatly speeds up development. Like any benchmark figures presented broad spectrum of machines, but makes Once the initial exploration phase is without a compIete explanation of what the benchmarks less valid predictors ofthe over, it is possible to turn prototypes into was measured, how it was measured, what performance of real-world programs. products quickly. Good performance can full range of cases was tested, and how it Since 1970, there have been many ad- be achieved without a lot of programmer can be reproduced in another laboratory, vances in the understanding of symbolic effort and without sacrificing those devel- these numbers should not be taken very processing and in the range of its applica- opment-oriented features that are also of seriously. However, they give some idea of tions. The basic operations measured by value later in the program's life, during the effect of optimizing the instruction ar- these benchmarks, such as function call- maintenance and enhancement. chitecture to fit the system architecture. ing, small-integer arithmetic, and list pro- One could say that the VAX is three times cessing, are still important today, but Why build a symbolic system archi- better at Fortran than at Lisp and that the many other operations not measured are tecture on unconventional lower-level ar- 68020 and VAX are similar for Lisp. These of equal importance. These benchmarks chitectures? Conventional instruction ar- figures also show the effect of different do not use the more modern features of chitectures are optimized to implement strategies on identical hardware. (such as structures, se- system architectures very different from This comparison was scaled to remove quences, and multiple values), do not use Symbolics'. For example, they have no the effect of cycle time and show only the object-oriented programming, and are notions of parallel error-checking and effect of architecture. This is not com- generally not affected by system-wide generic instructions; they often obstruct pletely fair to the conventional machines, facilities such as paging and garbage col- the implementation ofa fast function call, because in general they can be expected to lection. As predefined, portable pro- especially one that retains error-checking, have faster cycle times than a symbolic grams, these benchmarks cannot benefit incremental compilation, and debugging machine. Running the 68020 at full speed from the unusual aspects ofSymbolics sys- information; and they usually pay great and using a newer model of the VAX tem architecture, such as large program attention to complex indexing and mem- would have improved their times. Hard- support, full run-time safety, efficient ory addressing modes, which have little ware technology of conventional ma- storage management, substrate facilities, utility for symbolic languages. Implement- chines will always be a couple of years support for languages other than Lisp, ing Symbolics' system architecture on a ahead of symbolic hardware, in cycle time and faster development of efficient conventional instruction architecture and price/cycle, because of the driving programs. would force a choice between safety and force of their larger market. It's interest- performance: wecould not have both. The ing to note that this hardware advantage type of software we are interested in either applies only to the processor, which usual- Instruction architecture could not run at all or would require much ly contributes less than 25 percent of sys- Symbolics' philosophy is that different faster hardware to achieve the same per- tem cost. Power supplies, sheet metal, and levels of architecture should be free to formance. Later I will discuss the special disk drives don't care whether the archi- change independently, to satisfy different aspects of Symbolics' instruction and pro- tecture is symbolic; they cost and perform goals and constraints. Users see only the cessor architectures that make them more the same for equivalent configurations of system architecture, leaving the lower suitable to support a symbolic system. either type of machine. levels, such as the instruction architecture, Comparing the performance of ma- This comparison is not completely fair free to change to utilize available technol- chines with equivalent cycle times and dif- to the symbolic machine, either. Software ogy, maximize performance, or minimize ferent architectures can sometimes be il- exploiting the full capabilities of the sym- cost. Most other computer families allow luminating. The 3640, VAX 11/780, and bolic machine should have been com- users to depend on the instruction archi- 10-MHz 68020 all have cycle times of pared, but this software won't run at all on tecture and therefore are not free to change about 200 ns. (The 68020 takes two clock the conventional machines. Software it. It tends to be optimized for only the first cycles to perform a basic operation, so its technology on symbolic machines will member of the family. Later implementa- 100-ns nominal cycle time is equivalent to always be a couple of years ahead of con- tions using newer technology, as well as the other two machines' 200ns.) On a For- ventional machines, because it is built on a implementations at the high or low ex- tran benchmark (single-precision Whet- more powerful substrate using more pro- tremes of the price/performance curve, stone), the VAX is 1.8 times the speed of ductive tools. are penalized by the need for compatibility the 3640 (750 versus 400). With floating- with an unsuitable instruction architecture. point accelerators on each machine, the Performance. The best published Symbolics system architecture has been ratio is 2.1. On the Lisp benchmark analysis of performance of Lisp systems implemented on three different instruc- Boyer, 4 the 3640 is 1.75 times the speed of appears in Gabriel's work.4 The various tion architectures. The LM-2 machine, the VAX running , 3600 models perform quite capably on based on the original MIT , 3 3.9 times the speed of the VAX running these benchmarks, as can be seen from a was the first; it was discontinued in 1983. DEC Common Lisp, and 2.1 times the perusal of the book. Some of the reasons The 3600 family ofmachines uses a second speed of the 68020 running Lucid Com- for such good performance will become instruction architecture and three dif- mon Lisp. (The 68020 time at 10 MHz was apparent as we proceed. ferent processor architectures. A third estimated by multiplying its 16-MHz time However, one must always ask exactly instruction architecture, appropriate for by 1.6, no doubt an inaccurate procedure.) what a benchmark measures. A problem VLSI implementation, is being used in The VAX and 68020 programs were com- with Gabriel's benchmarks is that they are a line of future products now under piled with run-time error checking dis- written in a least common denominator development. January 1987 45 33 3231 0 Int, ITag | 32- Integer 33 32 31 0 Flo TTag 32-bit. Single-Precision floating Point l 33 3231 28 2? 0 Figure 3. A string containing the seven characters "Exam- Minor 28-bit Address |MajorTag Ades ple" stores each character in a single 8-bit byte. Bytes are ITag28b PUKC.0j11ZA XInU;'ntn OLUt1_hit intnour*ltlCgtr UUJM5nhahte

'Stored Representation' 3S 34 33 28 27? Cons Symbol Tag Address - BOB Figure 1. An object reference is a 34-bit quantity, consisting End List Tag Address either of a 32-bit data word with a 2-bit data type tag, or of 3S 34 33 28 2? ) a 28-bit address with a 6-bit data type tag. 3$ 34 33 28 2? / o Cons~Symbol Tag Address -4RAY End Nil Tag Address- NIL 3s 28 2? 0 3S 34 33 28 2 0 Array HeaderTag Type and Length Figure 4. An ordinary list of two elements requires four of storage. Unlike arrays, lists do not have headers. m/X/mbol Tao Address FOOwords Int |259 33433 |Symbol Tag |Address _ AR 28 27 e Next Symbol Tag Address - BOB 3S 34 33 3231 28 2? 0 Figure 2. An array of three elements-FOO, 259, and End Symbol Tag Address - RAY BAR-consists of a header word defining the type and 3s 34ss 28 27 0 length of the array, followed by an object reference for each Figure 5. A compact list of two elements requires two words array element. of storage. It uses the cdr code to eliminate two object references. (Reprinted from "Architecture of the Symbolics 3600," 12th Int'l Symp. Computer Architecture, © 1985 IEEE.)

The following sections summarize the in- Variables in Lisp and variables in con- tation. All object references address the struction and processor architectures ofthe ventional languages are fundamentally same stored representation, so they all see 3600 family, discuss some of the design different. In Lisp, the value ofa variable is the side-effect. tradeoffs involved, and show how these ar- an object reference, which can refer to an In addition to such object references by chitectures are especially effective at sup- object ofany type. Variables do not intrin- address, it is possible to have an immediate porting the desired system architecture. sically have types; the type of the object is object reference, which directly contains Further details can be found elsewhere. 5,6 encoded in the object reference. In a con- the entire representation ofthe object. The' ventional language, assigning the value of advantage is that no memory needs to be Data are object references. The funda- one variable to another copies the object, allocated when creating such an object. mental form of data manipulated by any possibly converts its type, and loses its The disadvantage is that copying an im- Lisp system is an object reference, which identity. mediate object reference effectively copies designates a conceptual object. The values A typical object reference contains the the object. Thus, immediate object refer- of variables, the arguments to functions, address of the object's representation in ences can only be used for object types that the results of functions, and the elements storage. There can be several object refer- are not subject to meaningful side-effects, of lists are all object references. There can ences to a particular object, but it has only have a small representation, and need very be more than one reference to a given ob- a single stored representation. Side-effects efficient allocation of new objects. Small ject. Copying an object reference makes a to an object, such as changing the contents integers (traditionally called fixnums) and new reference to the same object; it does of one element of an array, are imple- single-precision floating-point numbers not make a copy of the object. mented by modifying the stored represen- are examples of such types. 46 COMPUTER In the 3600 architecture, an object ref- either immediate data or an address, as in * Generic instructions alter their opera- erence is a 34-bit quantity consisting of a an object reference. tion according to the tags of their 32-bit data word and a 2-bit major data A special marker indicates that the operands. type tag. The tag determines the interpre- memory location containing it does not * Automatic storage management is tation of the data word. Often the data currently contain an object reference. Any simple, efficient, and reliable. It can be word is broken down into a 4-bit minor attempt to read that location signals an er- assisted by hardware, since the data struc- data type tag and a 28-bit address (see ror. The address field of a special marker tures it deals with are simple and indepen- Figure 1). This variable-length tagging specifies what kind of error should be sig- dent of context. The details appear else- scheme accommodates industry-standard nalled. For example, the value cell of an where. 8,5 32-bit fixed and floating-point numbers uninitialized variable contains a special * Data use less storage due to compact with a minimum of overhead for tag- marker that addresses the name of the representations. Programs use less storage ging. Addresses are narrower than num- variable. An attempt to use the value of a due to generic instructions and because tag bers to make additional tag bits available variable that has no value provokes an er- checking is done in hardware, not for the many types of objects that Lisp ror message that includes the variable's software. uses. name. The cost of tagging is that more main Addresses are 28 bits wide and designate Aforwardingpointer specifies that any memory and disk space are required to 36-bit words in a virtual memory with reference to the location containing it store numerical information. Each main 256-word pages. The address granularity is should be redirected to another memory memory word includes 7 bits for error a word, rather than a byte as in many other location, just as in postal forwarding. detection and correction, so the 4 tag bits machines, because the architecture is These are used for a number of internal add 10 percent. Each 256-word disk sector object-oriented and objects are always bookkeeping purposes by the storage includes about 128 bytes of formatting aligned on a word boundary. This results management software, including the im- overhead, so the 4 tag bits per word add 11 in one of usable virtual memory. plementation of extensible arrays. percent. We feel that the benefits amply It is interesting to note that the 3600's Some objects include packed data in justify these costs. 28-bit address can actually access the same their stored representation. For example, number of usable words as the VAX's character strings store each character in a Instruction set. The 3600 architecture 32-bit address, because the VAX expends single 8-bit byte (see Figure 3). For unifor- includes an instruction set produced bythe two bits on byte addressing and reserves mity, the stored representation of an ob- compilers and executed by a combination three-fourths of the remaining address ject containing packed data remains a se- of hardware and firmware. All instruc- space for the operating system kernel and quence of object references. Each word is tions are 17 bits long, consisting of a 9-bit the stack (neither of which is large). an immediate object reference to an in- operation field and an 8-bit argument In addition to immediate and by-ad- teger, whose 32 bits are broken down into field. Instructions are packed two per dress object references, the 3600 also uses packed fields as required, such as four word, which is important for performance pointers, a special kind ofobject reference 8-bit bytes in the case ofa character string. in two ways: that does not designate an object as such. A word in memoryconsists of36bits, of (1) Dense code decreases paging over- A pointer designates a particular location which I have already explained 34. When a head by making programs occupy fewer within an object or a particular instruction memory word contains a header or a pages and within a compiled function. Pointers are machine instruction, the remaining two (2) simplifies the memory system by used primarily for system programming. 7 bits serve as an extension of the rest ofthe decreasing theratio ofrequired instruction word. When a memory word contains an fetch bandwidth (in words/second) to Stored representations of objects. The object reference, a special marker, or a processor speed (in instructions/second). stored representation of an object is con- forwarding pointer, the remaining two bits Every instruction is contained in a com- tained in some number of consecutive are called the cdr code. The representation piled function, which consists of some words ofmemory. Each word maycontain of conses and lists (Steele, p. 26) 1 saves fixed overhead, a table ofconstants, and a an object reference, a header, a special one word by using thecdrcode instead ofa sequence of instructions (see Figure 6). marker, or a forwarding pointer. The data separate header to delimit the boundaries The table of constants contains object type tags distinguish these types of words. ofthese small objects. In addition, lists are references to objects used by the instruc- For example, an array is represented as a represented compactly by encoding com- tions, including locative pointers to defmi- header word, containing such information mon values of the cdr in the cdr code in- tion cells of functions called by this func- as the length of the array, followed by one stead of using an object reference (see tion. Indirection through the definition memory word for each element of the ar- Figures 4 and 5). cell ensures that if a function is redefined ray, containing an object reference to the Tagging every word in memory pro- its callers are automatically linked to the contents ofthat element (see Figure 2). An duces these benefits: new defmition. object reference to the array contains the * All data are self-describing and the in- Instructions operate in a address of the first memory word in the formation needed for full run-time check- model: Many instructions pop their stored representation of the array. ing of data types, array subscript bounds, operands off the stack and push their A header is the first word in the stored and undefined functions and variables is results onto the stack. In addition to these representation of most objects. A header always available. 0-address instructions, there are 1-address marks the boundary between the stored * Hardware can process the tag in par- instructions, which can address any loca- representations of two objects. It contains allel with other hardware that processes the tion in the current stack frame. In this way descriptive information about the object rest of a word. This makes it possible to op- the slots of the current stack frame serve that it heads, which can be expressed as timize safety and speed simultaneously. the same purpose as registers. The January 1987 47 3s 2s 27 0 l a Header Size Tao 0 Direction of Stack Growth I v SP- Callee's Temporary Number of Arguments e Storage r Callee's Local Variables h Debugger Information e Callee's Copy of Arguments a FP-I This Function's Definition Cell d Callee Function C Miscellaneous Status Bits - Caller's Saved PC Symbol Tag Address FOO n S Caller's Saved SP Locative Tag Address Called Definition Cell Caller's Saved FP - a hn Caller's Copy of Arguments Tag Single-Precision Floating Point s Int Instruction 1 Entry Instruction I Caller's Frame Tag (low 16 bits) I I d I I Instruction 3 Instruction 2 e I I I - I - - I lk.-(low.. 16.- bits) . (low%--~~~~~*116 bits) 16 IS Figure 6. A compiled function consists of four words of Flgure 7. A stack frame consists of the caller's copy of the overhead, a table of constants and external references, and arguments, five header words, the cailee's copy of the a sequence of 17-bit instructions, packed two per word. arguments, local variables, and temporary storage. The frame-pointer (FP) and stack-pointer (SP) registers address the current stack frame. (Reprinted from "Architecture of the Symbolics 3600," 12th Int'l Symp. Computer Architecture, © 1985 IEEE.)

1-address instructions include multi- struction is that hardware is only used to Lisp and at the same time simplifies the operand instructions, which pop all of optimize key performance areas. When a hardware by reducing the number of their operands except the last offthe stack Lisp function is not critical to system per- instruction formats to be decoded. and take their last operand from a location formance, or hardware implementation of in the current stack frame. it cannot achieve a major speedup, it re- Function call. Storage whose lifetime is There are several ways an instruction mains in software where it is easier to known to end when a function returns (or can use its argument field. Table 1 lists the change, to debug, and to optimize. is exited abnormally) is allocated in three ways to develop the address of an operand Using an instruction set designed for stacks, rather than in the main object in the stack or in memory by adding argu- Lisp rather than adapting one designed for storage heap, to increase efficiency. The ment to a base address. Table 2 lists non- Fortran or for a hand-crafted assembly control stack contains function-nesting in- address uses of argument. Each individual language enhances safety and speed. 3600 formation, arguments, local variables, opcode only uses argument in a single way; instructions always check for errors and function return values, and small stack- there are no addressing modes. The moti- exceptions, so programs need not execute allocated temporary objects. The binding vation for implementing this particular set extra instructions to do that checking. In- stack records dynamically bound vari- of arguments is to provide for constants structions operate on tagged data, so extra ables. I The data stack contains stack- (including small integers as a special case), instructions to insert and remove tags are allocated temporary objects. This article all types of Lisp variables (local and not needed. Instructions are generic, so concentrates on the control stack, which is nonlocal lexical, special, structure slot, in- declarations are not needed to tell the com- the most critical to performance. stance), branching, and byte fields. Byte piler how to select type-specific instruc- The protocol for calling a function is to fields were included because they are tions and translate between data formats. push the arguments onto the stack, then heavily used in system programming. In contrast, Lisp compilers for conven- execute a Call instruction that specifies the Many instructions are simply Lisp func- tional machines9 must generate extra function to be called, the number of argu- tions directly implemented by hardware shifting or masking instructions to ments, and what to do with the values and firmware, rather than built up from manipulate tags, must use multi-instruc- returned by the function. When the func- other Lisp functions and implemented as tion sequences for simple arithmetic tion returns, the arguments have been compiled instructions. These Lisp-func- operations unless there are declarations, popped off the stack and the values (if tion instructions are known as built-ins. and are always having to compromise be- wanted) have been pushed on. Note the They take a fixed number of arguments tween safety and speed. similarity in interface between functions from the stack and from their argument Unlike many machines, the 3600 does and built-in instructions. field. They return a fixed number ofvalues not have indexed and indirect addressing Every time a function is called, a new on the stack. Examples of built-ins are eq, modes. Instead it has instructions that per- stack frame is built on the control stack. A symbolp, logand (with two arguments), form structured, object-oriented opera- stack frame consists of the caller's copy of car, cons, member, and aref (with two tions such as subscripting an array or the arguments, five header words, the arguments). I The criterion for imple- fetching the car of a list. This fits the callee's copy of the arguments, local vari- menting a Lisp function as a built-in in- instruction set more closely to the needs of ables, and temporary storage, including 48 COMPUTER arguments being prepared for calling the next function (see Figure 7). The current stack frame is delimited by the frame- pointer (FP) and stack-pointer (SP) regis- ters, which are available as baseregisters in instructions that use their argument field to address locations in the current stack frame. A compiled function starts with a se- quence of one or more instructions known as the entry vector. The first instruction in the entry vector, the entry instruction, describes how many arguments the func- tion accepts, the layout ofthe entry vector, and the size of the function's constants table (see Figure 6), and tells the Call in- struction where in the entry vector to transfer control. The Call instruction and the entry vector cooperate to copy the arguments to the top ofthe stack (creating the callee's copy), convert their arrange- ment in storage if required, supply default values for optional arguments that the caller does not pass, handle the &rest and Apply features of Common Lisp, and To implement full Common Lisp func- cases to be fast (for example, Retum sim- signal an error if too many or too few tion calling efficiently requires matching ply checks whether the cleanup bits are arguments were supplied. The details are thearguments supplied bythe caller-with nonzero), and by using the entry vector beyond the scope of this article. normal function calling or with Apply- mechanism to simplify run-time decision- o the normal, &optional, and &rest pa- making. Function return. A function returns by ameters of the callee, and generating The information that the debugger can executing a Return instruction whose default values for unsupplied optional extract from a stack frame includes the ad- operands are the values to be returned. arguments. The entry vector takes care of dress of the previous frame (from the The value disposition saved in the frame this. Common Lisp's &key parameters are saved FP in the header), the function run- header by Call controls whether Return implemented by accepting an &rest pa- ning in that frame (from the header), the discards the values, returns one value on rameter containing the keywords and current instruction in that function (from the stack, returns multiple values with a values, then searching that list for each the PC saved in the next frame), the argu- count on the stack, or returns all thevalues &key parameter. Multiple values are pass- ments (from the stack-the header speci- to the caller's caller. ed back to the caller on the stack, with a fies the argument count and arrangement), Return removes the current frame from count. The caller reconciles the number of the local variables (from the stack), and the stack and makes the caller's frame cur- values retumed with the number ofvalues the names of the arguments and local rent, by restoring the saved FP, SP, and desired. variables (from a table created bythe com- PC registers. If the cleanup bits in the Function calling historically has been a piler and attached to the function). frame header are nonzero, special action major bottleneck in Lisp implementa- The compiler is simple because there is must be taken before the frame can be tions, both on stock hardware and on only a single calling sequence. Any call can removed. Return takes this action, clears specially-designed Lisp machines. It is im- call any function, and the argument pat- the bit, and tries again. Cleanup bits are portant for function calling to be as fast as tems are matched up at run time. Every- used to pop corresponding frames from possible. If it is not, efficiency-minded thing is in the stack and no register-saving the binding and data stacks, for unwind- programmers will distort their program- conventions are required, since there are protect, I for debugging and meteringpur- ming styles to avoid function calling, pro- no general-purpose registers. poses, and for stack buffer housekeeping. ducing code that is hard to maintain, and The principal costs of this functioh- will waste a lot oftime doing optimization calling discipline are the five-word header by hand that should have been done bythe in each frame and the copying of argu- Motivations of the function call disci- Lisp implementation itself. The 3600's ments to the top of the stack. The time to pline. The motivations for this particular function call mechanism attains good create the header is not a problem, because function-calling discipline are speed (fewerthan 20clock cycles for a one- it is overlapped with necessarymemory ac- * to implement full Common Lisp argument function call and return when cesses, but the space occupied by the function calling efficiently, no exceptions occur) by using a stack buf- header and by the extra copy of the * to be fast, so that programmers will fer to minimize the number of memory arguments is a substantial fraction of the write clear programs, references required, by optimizing the typical frame size. This extra space is not a * to retain complete information forthe stack frame layout to maximize speed major problem because the stack buffer is Debugger, and rather than to minimize space, by arrang- large enough (1024 words) that it rarely * to be simple for the compiler. ing for the checks for slower exception overflows. January 1987 49 Argument copying is necessary because tions typically make many checks for er- registers to be addressed. A disadvantage Common Lisp functions do not take a rors and exceptions. Minimizing the cycle of a stack architecture is that it requires fixed number of arguments. In a function count demands that these checks be per- address-calculation hardware, including a with &optional parameters, some of the formed in parallel, not each in a separate 10-bit (for a 1K-word buffer) adder. Since arguments are supplied by the caller while cycle. each instruction contains only one address the others are defaulted by the entry vec- Adequate bandwidth for access to instead of three, extra instructions are tor. The location in the stack frame of an operands is also required. In the 3600 in- sometimes required to move data to the argument must not depend on whether it struction architecture, a simple instruction top of the stack so they can be addressed. was supplied or defaulted, since this varies can read two stack locations and write one Instructions are processed by a four- from one call to the next, but the compiler stack location. One ofthese is a location in stage pipeline (see Figure 8) under the con- must know the location in order to gener- the current stack frame specified by an ad- trol ofhorizontal . Microcode is ate code to access the argument. The entry dress in the instruction, while the other used as an engineering technique, not to vector could not put default values in the two are at the top of the stack. Operands create a general-purpose that standard location if the arguments were are supplied by the stack buffer, a could implement alternate instruction ar- not at the top of the stack, because the 1K-word memory that holds up to four chitectures. Knowledge of the instruction frame header would be in the way. In a virtual-memory pages of the stack. The architecture is built into hardware function with an &rest parameter, the stack buffer contains all of the current wherever that achieves a substantial per- caller can supply an arbitrary number of frame plus as many older frames as hap- formance improvement. arguments. If these arguments were at the pen to fit. When the stack buffer fills up To achieve full performance, instruc- top of the stack, they would make it im- (during Call), the oldest page spills into tions must be supplied to the processor at possible for the compiler to know the loca- normal memory to make room for the new an adequate rate. Each processor model tions of the local variables, which are frame. When the stack buffer becomes has a different design, with different pushed after the arguments. empty (during Return), pages move from tradeoffs. Copying the arguments that are not part normal memory back into the stack buffer The 3640 uses a four-instruction buffer. of an &rest parameter to the top of the until the frame being returned to is entirely When the buffer is exhausted, or a branch stack solves both these problems. It gives in the buffer. The maximum size ofa stack occurs, microcode reads two words from the function complete control over the ar- frame is limited to what will fit in the stack memory and refills the instruction buffer. rangement of its stack frame and makes buffer. A second stack buffer contains an This design uses much less hardware than the stack depth constant. Argument copy- auxiliary stack for servicing page faults the other two, but provides lower perfor- ing takes extra time, but typically only one and interrupts without disturbing the pri- mance. Refilling the buffer takes five clock cycle per argument, which is faster mary buffer. clock cycles, so in the worst case the per- than the run-time decision-making that Associated with the stack buffer are the formance penalty is about a factor oftwo. would otherwise be necessary to access an FP and SP registers, which point to the With a typical instruction mix, the ob- optional argument or a local variable. current frame and to the top of the stack, served slowdown is about 35 percent, and hardware for addressing locations in because complex instructions such as the current stack frame via the argument function calls and memory references Processor architecture field of an instruction, which calculates a spend more than one cycle in the execute read address and a write address every stage. Three processor architectures are used clock cycle. The third operand access is The 3675 uses a 2K-instruction cache. in three representative models of the 3600 provided by a duplicate copy of the top Program loops that fit in the cache execute family: 3640, 3675, and 3620. Since they location in the stack, in a scratchpad at full speed, with no instruction fetching all implement the same instruction archi- memory, which can be read and written overhead. An autonomous instruction tecture, there are substantial similarities every clock cycle. The SP register is incre- prefetch unit fills the cache with instruc- among their processor architectures. They mented or decremented by instructions tions before they are needed, in parallel differ due to implementation in different that push or pop the stack. with execution. At the cost ofa substantial technologies and choices of different The stack buffer provides the same increase in hardware complexity over the cost/performance tradeoffs, but this over- operand bandwidth, two reads and one 3640, this design ensures that the pipeline view largely glosses over the differences. write everyclock cycle, as in atypical regis- almost never has to wait for an instruction. The main goal of each of these proces- ter-oriented architecture. It has the advan- The 3620 uses a six-instruction buffer. sor architectures is to implement the in- tage that register saving and restoring An autonomous instruction prefetch unit struction architecture described earlier across subroutine calls is not required, fills the buffer in parallel with execution. with the highest performance achievable since all registers already reside in the The 3620 instruction stage is a compro- within its particular cost budget. The costs stack. As in a register-window design, mise between the other two designs. are generally higher than most worksta- overhead occurs only when the stack buf- Straight-line code executes at full speed, tions but lower than most . fer overflows or underflows and requires a but branches execute at 3640 speed For high performance the number ofclock block transfer between stack buffer and because they must refill the buffer. cycles required to execute an instruction main memory. Another advantage is that The datapath contains several units that must be minimized; the goal is to execute a each instruction contains only one address function in parallel (see Figure 9). Simple new instruction every cycle. Because the instead of three, making the instructions instructions such as datamovetnent, arith- system architecture specifies that safety smaller (so that they can be fetched from metic, logical, and byte-field instructions and convenience must not be compro- main memory more quickly and processed execute in a single clock cycle. For exam- mised to increase performance, instruc- with less hardware) and allowing more ple, when executing an Add instruction 50 COMPUTER Fgure 8. The instruction procsing pipeline, with variations for three 3600 family models.

FIgure 9. 3640 datapath, contained in the Execute and Write stages of the pipeline. Other 3600 family models have generally similar datapaths. the following activities all take place in only used if both operands are single- When the operands ofAdd are not both parallel: floats. fixnums, executing the instruction takes * The stack buffer fetches the two * The tag processor checks the data more than one machine cycle and more operands, one from acalculated address in types of the operands. than one microinstruction. In the case of the stack buffer memory and the other * The stack buffer accepts the result adding two single-floats, the extra time is from the duplicate top-of-stack in the from the fixed-point arithmetic unit, ad- only required because the floating-point scratchpad memory. justs the stack pointer, and in the write arithmetic unit is slower than the fixed- * The fixed-point arithmetic unit com- stage stores the result at the new top ofthe point arithmetic unit. In other cases, extra putes the 32-bit sum of the operands and stack. time is required to convert the operands to checks for overflow. This result is only * The decode stage decodes the next in- a common format, to perform double- used if both operands are fixnums. struction and produces the microinstruc- precision floating-point operations, or to * The optional floating-point ac- tion that will control its execution. If the trap to a Lisp function to add numbers of celerator, if present, starts computing the type-checking unit or either arithmetic less common types. sum of the operands and checking for unit detects an exception, control is Memory-reference instructions such as floating-point exceptions. This result is diverted to a microcode exception handler. thecar and arefLispoperations are limited January 1987 51 mainly by the speed of the memory. Car, this. The lack ofthis close fit dissipates the 8. D. A. Moon, "Garbage CoDlection in a Large Lisp System," Proc. 1984ACMSymp. Lispand for example, takes four clock cycles. Com- hardware price/performance advantage Functional Programming, pp. 235-246. plex instructions such as Call, Return, and of conventional architectures when mea- 9. R. A. Brooks et al., "Design of an Optimizing, the Common Lisp member function in- suring system-level performance on soft- Dynamically Retargetable Compiler for voke microcode . A wide ware suited to symbolic architectures. R Common Lisp," Proc. 1986 ACM Conf. Lisp andFunctional Programming, pp. 67-85. microinstruction word and fast microcode branching minimize the number of microinstructions that need to be exe- cuted. Simple and memory-reference in- structions can be discovered to be complex at run time because of an exceptional con- dition such as the data type of the operands.

have described here an unusual sys- References tem architecture and presented an ar- 1. G. L. Steele, Common Lisp, Digital Press, overview of the underlying Burlngton, MA, 1984. chitectures that implement it. When con- 2. D. L. Andre, Paging in Lisp Programs, Master's sidering the type of applications that this thesis, University of Maryland, 1986. 3. R. D. Greenblatt et al., "The LISP Machine," system architecture targets, note how im- David A. Moon is a technical director at to success is that we com- Interactive Programming Environments, eds. portant their it D. R. Barstow, H. E. Shrobe, and E. SandewaDl, Symbolics, Inc. Previously, he was a hardware promise neither safety nor speed. With McGraw-Hil, Hightstown, NJ, 1984. designer, microprogrammer, and writer of this in mind, some of the unconventional 4. R. P. Gabriel, Performance and Evaluation of manuals at Symbolics. His interests include advanced software development and architec- design choices in these architectures were Lisp Systems, The MIT Press, Cambridge, MA, 1985. tures for symbolic processing. on made based rationales with varied bene- 5. D. A. Moon, "Architecture of the Symbolics Moon received the BS degree in mathematics fits and costs. For example, a close fit be- 3600," 12th Int'Symp. ComputerArchitecture, from MIT in 1975. tween processor, instruction, and system 1985, pp. 76-83. Readers may write to the author at Symbol- architectures improves performance, but 6. Symbolics Technical Summary, Symbolcs Inc, ics, Inc., 11 Cambridge Center, Cambridge, Cambridge, MA, 1985. MA 02142. His e-mail address is Moon@ users to on details of the allowing depend 7. Symbolics Common Lisp: LanguageDictionary, Stony-Brook. SCRC.Symbolics.COM on the instruction architecture can interfere with Symbolics Inc, Cambridge, MA, 1986. ARPA Internet.

RCI IS REACHING NEW PLA TEA US IN PARAMETRIC SOFTWARE COST ESTIMA TING MODELS WITH SOFTCOST-R Utilizing the Revolutonary Efforts of Dr. R. Tausworthe at the renowned New in 1987 from Macmillan: Jet Propulsion Laboratory, RCI has developed a Cost Estimating Package A practical new textbook on that encompasses the requirements that up until now were just good Ideas. Only SOFTCOST-A can provide you with features that Include: design and data management! * What-If" Capacity that Enables Rapid Analysis of your projectas Cost and Schedules The Database * A Work Breakdown Structure that lets you Tie-In with Automatic Gantt Schedule and Pert Chart Generation * A way to bound Risk by Computing the Confidence of your estimate MARY E.S. LOOMIS of Time, Effort and Size through a Series of Submodels Book465 pages * Ease of Use and Understanding as well as Support Services with available Training, User's Group Annual Conference, Quarterly User's Newsletter, Maintenance, Consultations, and now a Tutorial * Emphasizes the practical application of principles and is in the process of being developed. the importance of design in database development. * An ability to run on IBM PC and Compatibles, and to be Calibrated * Mathematical treatment of concepts has been kept to a to your Specific Environment minimum-orientation is toward practical applications * Generation of Many Useful Reports for Managers and Cost Analysts for both such as: Resources Reports, Input Value Summary Reports, Project business and engineering. Estimate Summary Reports and others * Covers the 3-scheme approach for implementing and * Uses the 1986 version of the popular COCOMO model as a sanity controlling distributed . check * Features chapters on: Logical data modeling * Lets you evaluate the implications of ADAtm and Incremental techniques, logical design of network databases, and developement on your workforce allocations decisions data dictionaries. Now Is the time to become oneof the Many Successful Organizationswho * Each chapter concludes with discussion questions, hove acquied the beerlt of SOFTCOST.R by mdng thir primary Soft- problems, and exercises. war Cost Estimating Package. For fuhr or lnformatbn wrHt cal today: * Techniques presented throughout text will enable students to work successfully with any commercial/ research database management system. AKVy Reifer Consultants, Inc. Look to Macmillan for your textbook needs. Call Toll-Free 1-800-428-3750, or write:

25550 Hawthome Boulevard, Suite 208 Macmillan Publishing Company Torrance, Califomia 90505/(213) 373-8728 College Division/866 Third Ave/New York, NY 10022

Reader Servke Number 4 Reader Servke Number 5