ABSTRACT

Gordon Bell, William Il. Strecker November 8, 1975

COMPUTER STRUCTURES: WHAT HAVE WE LEARNED FROM THE PDP-ll?

Over the FDP-11’S six year life behave in a particular way? about 20,000 specimens have been Where does it get inputs? HOW built based on 10 species (models). does it formulate and solve Al though range was a design goal, problems? it was unquantified; the actual range has exceeded expectations 3. The rest of the DEC (5OO:l in memory size and system organization--this includes price]. The range has stressed the applications groups assoc ia ted baa ic mini (mall computer with market groups # sales, architecture along all dimensions. service and manufacturing. The marn PM.5 structure, i.e. the UNIBUS, has been adopted as a de 4. The user, who receives the facto standard of interconnection final OUtQUt. for many micro and systems. The architectural Note, that if we assume that a experience gained in the design and QrOduc t is done sequentially, and use of the PDP-11 will be described each stage has a gestation time of in terms Of its environment about two years, it takes roughly (initial goals and constraints, eight years for an idea from basic technology, and the organization research to finally appear at the that designs, builds and user’s site. Other organizations distributes the machine). ala0 affect the design : competitors (they establish a deaign level and determine the product life): and government IsI 1.0 TNTRODUCTTON and standards. There are an ever increasing number Although one might think that of groups who feel compel led to computer architecture is the sole control all products bringing them determinant of a machine, it is all common norm : the merely the focal point for a government (“5) , testing groups such specification. A computer is a as Underwriters Laboratory, and the product Of its total environment. voluntary standards groups such as Thus to fully understand the ANSI and CBEWA. Nearly all these POP-11, it is necessary to groups affect the design in some understand its environment. way or another (e.g. by reguir ing time). Figure Org. shows the various groups (factors) affectrng a COmQUte r . The lines indicate the 2.0 BACKGROUND primary flow of information for product functional behavior and for QrOdUC t specifications. The It is the nature of engineer ing physical flow of goods is nearly projects to be goal oriented--the along the same 1 ines, but more 11 is no exception, with much direct: starting with aQQ1 ied pressure on deliverable products. technology (e.g., semiconductor Hence, it is difficult to plan for manufacturers), going through a long and extensive lifetime. computer manufacturing, and finally Nevertheless, the 11 evolved more to the service personnel before rapidly and over a wider range than being turned over to the Einal we expected, placing unusual stress user. on even a carefully planned System- The 11 family has evolved under The relevant parts, as they affect market and implementation group the desrgn are: pressure to build new machines. In this way the plannrng has been 1. The basic technology--it is asynchronous and diffuse, with important to understand the distributed development . A components that are available decentralized organization provides to build from, as they directly checks and balances since it is not affect the resultant designs. all under a single control point, of ten at the expense of 2. The development compatibility. Usually, the organrzatron--what is the hardware has been designed, and the fundamentai nature of the software 1.5 modified to provide organrzatron that makes it compatibility.

138 Indepenaen t of the plannrnq, the made to increase its address space. machrne has been very successful in In retrospect, rt IS clear that the marketplace, and wrth the since memory pr Ices decline at 26% sys terns prbgrams written for it. to 413 per year, and many users In the DaDer (Bell et al, 1970) we tend to buy constant dollar are f irs’t concerned with market systems, then every two or three acceptance and use. Features year 5 another bit is requrred for carried to other designs are also a the physical address space. measure of how it contributes to computer structures and are of Weakness 2 of not enough registers secondary importance. - was solved by providing eight 16-bit registers: subsequently six The PDP-11 has been successful in more 32-bit registers were added the marketplace with over 20,000 for floating point arithmetrc. The computers in use (1970-1975). It number of registers has proven unclear how rigid a test (aside adequate . More registers would i:orn the marketplace) we have given just increase the context switching the design since a large and time, and also perhaps the aggressive marketing and sales programming time by posrng the organization, coupled with sof twace allocation dilemma for a compiler to cover architectural or a programmer. inconsistencies and omissions, can save almost any design. There was Lack of stacks (weakness 3) has difficulty in teaching the machine been solved, uniquely, with the to new users: this required a auto-increment/auto-decrement large sales effort. On the other addressing mechanism . Stacks are hand, various machine and operating used extensively in some languages, systems capabilities still are to and generally by most programs. be used. Weakness 4, limited interrupts and slow con tex t switching has been generally solved by the 11 UNTEIJS vectors which direct interrupts 2.1 GOALS AND CONSTRAlNTS - 1970 when a request occurs from a given I/O device. The paper (6ell et al, 1970) Byte hand1 ing (weakness 5) was described the design, beg inning provided by direct byte addressing. with weaknesses of to remedy other goals and constraints. iiead-only memory (weakness 6) can These will be described tr iefly in be used directly without special this section, to provide a programming since all procedures framework, but most discussion of tend to be pure (and reentrant) and the individual aspects of the can be programmed to be recursive machine will be described later. (or multiply reentrant). Read-only memories are used extensively for Weakness 1, that of limited address bootstrap loaders, debugging capability, was solved for its programs, and now provide normal rmmediate future, but not with the console functions (in program) finesse it might have been. using a standard terminal, Indeed, this has been a costly oversight in redundant development Very elementary I/O processing and sales. (weakness 7) is partially provided by a better interrupt structure, There is only one mistake that can but so far, I/O processors per se be made in a computer design that have not been implemented. is difficult to recover from--not providing enough add:zis bits for Weakness 8 suggested that we must memory addressing memory have a family. Users would like to management _ The PDP-11 followed move about over a rangyof models. the unbroken tradition of nearly every known computer. Of course, High programming costs (Leakness 9) there is a fundamental rule of should be addressed because users computer (and per haps other ) program in machine language. Here designs which helps to alleviate we have no data to suggest this problem: any well-designed improvement. A reasonable machine can be evolved through at comparison would be programming least one major change. It is costs on an 11 versus other extremely embarrassing that the 11 machines. We built more complex had to be evolved with memory systems (e.g., operating sys terns, management only two years after the computers) with the 11 than with paper was written outlining the simpler structures (e.g. PDP-8 or goal of providing increased address 15). Also, some systems space. All predecessor DEC designs programmrng is done usrnq higher have suffered the same problem, and level languages. only the PDP-10 evolved over a ten year period before a change was 139 Another constcalnt was the word At least two organizations have lenq th had to be in multiples of mdde machines with similar and eight bits. While this has been ISP structures (use of address expensive within DEC because of our modes, behavror of registers as lnvestmenc in 12, 18 and 36 bit program counter and stack); and a sys terns, the net effect has third organization hdS offered a probably been war thwhile . The plug-replacement system for sale. notron of word length is quite meaninqless in machines like the 11 The UNIBUS structure has been and the IBM 360 because data-types accepted by many designers as the are of varying lengths, and PMS structure. This instructions tend to be in interconnect ion scheme is multiples of 16 bits. However, the especially used in microprocessor addressing of memory for floating designs. Also, as part of the point is inconsistent. UNIBUS design, the notion of mwerng I/O data and/or control Structural flexibility (modularity1 registers into the memory address was an impor tan t goal. This space has been used often in the succeeded beyond expectations, and microprocessor designs since it is discussed extensively in the eliminates instructions in the ISP part on PMS, in par titular the and requires no extra control to UNIBIJS section. the I/O section. There was not an explicit goal of Finally, we were concerned in 1970 microprogrammed implementation. that there would be Since large read-only memories were offsprings--cledrly no problem: not available at the time of the there have been about ten Model 20 implementation, implementations. In fact, the microprogramming was not used. family is large enough to suggest Unfortunately, all subseouen t need of family planning. machines have been microproqrammed but with some additional difficulty and expense because the initial design had poorly allocated 3.0 TECHNOLOGY opcodes, but more impor tan t the condition codes behavior was over spacified. The computers we build are strongly influenced by the basic electronic Understandability was also stated technology. In the case of to be a goal, that seems to have computers , electronic information been missed. The initial handbook processing technology evolution has was terse and as such the machine been used to mark the four was only saleable to those who generations. really understood computers. It is not clear what the distribution of first users was, but probably all had previous computing experience . 3.1 Effects Of .%&conductor A large number of machines were Memory On The PDP-11 Model sold to extremely knowledgeable De8igns users the universities and laborato:?es. The second handbook cdme out in 1972 and helped the The PDP-11 computer set ies design learning problem somewhat, but it beqdn in 1969 with the Model 20. is still not clear whether a user Subsequently, 3 models were with no previous computer introduced as minimum cost, best experience can determine how to use cost/performance, and maw imum a machine from the information in performance machines. The memory the handbooks. Fortunately, two technology in 1969 formed several comw.8 ter science tax tbooks constraints: (E&house, 75; and Stone and Siewiorek, 75) have been written 1. Core memory for the primary baaed on the 11 to assist in the (program) memory with an learning problem. eventual trend toward semiconductor memory.

2. A comparatively smdll number of 2.2 FEATURES THAT HAVE MIGRATED TO high speed registers for OTHER COMPUTERS AND OFFSPRINGS processor state (i.e. general registers), with a trend toward larger, higher speed register A suggested test (Hell et al 1970) files at lower cost. Note, was the features that have migrated only 16 word read-write into competitive designs. We. have memories were avdilableat not fully permitted this test design time. because some basic features are patented: hence, non-DEC desrgners 3. Unavailability of large, h iqh are reluctant t0 use VariOUS 1dedS. speed read-only memories,

140 permitting a microprogrammed 5. Improve the reliabilrty by approach to the design of the lsolatlng (protectrngj one control part. Note, not for ca program from another. paper, read-only memory was unavailable al though slow, 6. Improve performance by mapping read-only MOS was available for multiple programs Into the same character generators. physical memory, giving each program a virtual machine. These constraint5 established the Providing the last two points following desrgn principles and requires ;I signif icant increase attitudes: in the number of reqlsters (i.e. at least 64 word fast 1. It should be asynchronous and memory arrays). capable of accepting various conf iqurations of memories in sire and speed. 4.0 THE ORGANIZATION OF PEOPLE 2. It should be expandable, to take advantage of an eventually laraer number of resisters for more data-types and improve Three types of design are based context switching time. Also, both on the technology and the cost more registers would Fermi t and performance considerations. eventually mapping memory to The nature of this tradeoff is provide a virtual machine and shown in Figure DS. Note, that one protected multiprogramming. starts at 0 cost and performance, proceeds to add cost, to achieve a 3. It could be relatively complex, base Iminimum level of so that an eventual functionality). At this point, approach could be used to certain minimum goals are met: for advantage . New data-types the computer, it is simply that could be added to the there is program counter, and the rnstruction set to increase simplest arithmetic operations can performance even though they be carried out. It is easy to show added complexity. (baaed on t:iwTurinq machine) that only a Instructions are 4. The UNIBUS width would be required, and from these, any relatively wide, to get as much program can be written. From this performance as possible, since minimal point, performance LSI was not yet available to increases very rapidly in a step encode functions. fashion (to be described later I for quite sometime (due to fixed overhead of memories, cabinets, powar, etc.) to a point of 3.2 Variations In PDP-11 Models inflection where the cost-effective Through Technology solution is found. At this point, performance continues to increase until another point where the Semiconductor memory (read-only and performance is maxim ixed . resd-wr i te) were used to tradeoff Increas rng the size imp1 ies coat performance across a physical constraints are exceeded, reasonably wide range of models. and the machine becomes Various techniques baaed on unbuildable, and the performance semiconductors are used in the can go to 0. There is a general tradeoff to provide the range . tendency of all designers to “n+l” These include: (i.e., incrementally add to the design forever) . No design is so 1. Improve performance through complete , that a redesign can’ t brute force with faster improve it. memories. The 11/45 and 11/70 uses bipolar and fast MOS The two usual problems of design memory. are : inexperience and “second-systemitis”. The first 2. Microprogramming (see below) to problem is simply a resources improve performance through a problem. Are there people more complex ISP (i.e., available? What are their floating point). backgrounds? Can a small grow work effectively on architectural 3. Mu1 tiple copies of processor definitions? Perhaps most state (context) to improve time important is the principle, that no to switch context among various matter who is the architect, the runnlnq programs. design must be clearly understood by at least one person. 4. Additional registers for additional data-types--i.e., Second-systemitis is the phenomenon floatrnq point arithmetic. Of def ininq a system on the basis of past sya tern history.

141 Invar Iably, the system solves all for securrty) project started up, past problems.. . border ing on the design and planning were unbulldable. disarray, since Data General hi: been formed and was competing with the PDP-8 usrng a very small 16-bit computer . Al though the Product Line Manager, d former engineer 4.1 PDP-11 Exper xnce (NM) for the PDP-8, had the responsibility this time, the new project manager was mathematrcian/programmer followeda Some of the PDP-11 architecture was by another manager (RC) who had initially carr red out by at managed the PDP-8. Work proceeded Carnegie-Mellon University (HM with for several months based on the DCM CBI . Two of the useful ideas: the and with a design review at UNIBUS, and the use of general Carnegie-Mellon University in late registers in a substantrally more 1969. The DCM review took only a general fashion (e.g. as stack few minutes. Aside from a general pointers) came out of earlier work dullness and a feeling that it was (CB) at CMU and was described in too little too late to compete. It COMPUTER STRUCTURES (Bell and was difficult to program Newell, 1971). During the detailed (especially by compilers). design amelioration, 2 persons (HM, However, it’s benchmark results and RC) were responsible for the were good. (We believe it had been specification. tuned to the benchmarks, hence couldn’ t do other problems very Although the architectural activity well.) One of the designers (HM) Of the II/20 proceeded In parallel brought along the kernel of an with the implementation, there was alternative, which turned out to be less interaction than in previous the PDP-11. We worked on the DEC designs where the first design all weekend, recommending a implementation and architecture switch to the basic 11 design. were carried out by the same person. As a result, a slight At this point, there were reviews penalty was paid to build to ameliorate the design, and each subsequent designs, especially vis suggestion, in effect, amounted to a vi5 microprogramming. an n+l ; the implementation was proceeding in parallel (JOI and As the various models began to be since the logic design was built outside the original conventional , it was difficult to POP-11/20 group, nearly all tradeoff extensions. Also, the architectural control (XC) design was constrained with boards disappeared, and the architecture and ideas held over from the DCM. was managed by more people, and (The only safe way to design a de5ign resided with no one person! range is simultaneously do both A similar loss of control occurred high and low end designs.) During in the design of the per iphecals the summer at DEC, we tried to free after the basic design. oP code space, and increased (nil’edl the UNIBUS bandwidth (with The first designs for 16-bit an extra set of address lines), and computers came from a group placed outlined alternative models. under the PDP-15 management marketing person, with engineeri:; The advent of large, read-only background). It was called PDP-X, memories, made possible the various and did include a range. As a follow-on designs to the ll/ZO. range architecture, it was better Figure “Models’ sketches the cost thought out than the later PDP-11, of various models versus time, with but didn’t have the innovative lines of cons is tent oerformance. aspects. Unfortunately, this group This very clearly shows&the design was intimidating, and some members 5 tyles (ideologies) . The 11/40 lacked credibility. The group also design was started right after the managed to convince management that 11/20, al though it was the last to the machine was potent ially as come on the market (the low and complex as the PDP-10 (which it high ends had higher priority to wasn’t) ; since no one wan ted get into product ion as they another large computer disconnected extended the market). Both the from the main business, it was a 11/04 and 11/45 design groups went sure suicide. The (marketing) through extensrve buy in processes, management had little understanding as they came into the 11 by first of the machine. Since the people proposrng alternative designs. In involved in the design were the case of the 11/45, a larger, apparently simultaneously descgnrng 11-like 18-bit machine was proposed Data General, the PDP-X was not of by the 15 group: and later, the foremost rmpor tance. LINC engineerrng group proposed an alternative design which was subset As the FDP-X pro]ect folded and the compa table at the symbolic program GCM (for @esk Calculator Machine level. As the groups constdered

142 the software ramificatrons, buy-in computer rnterconnectron standard. was rap&d. Frgure Models shows the Al though it has been difficult to mrnrmum cost-orrented group has two fully specrfy the UNIBUS such that successors providing lower cost one can be certain that a grven (yet higher performance) and the system will work electrically and same cost wrth the ability to have without missed data, specification laroer memories and perform better. is the key to the UNTBUS. The bus Note, both of these- came from a behavior specification is a yet backup strategy to the LSI-11. unsolved problem in dealing with These-come from larger read-only complex1 ty--the best descriptions memor ies, and increased are based on behavror (i.e., timing understanding of how to implement diagrams). the 11. There are also problems with the The 11/7O is, of course, a natural deEi9n of the UNIBUS. Al though follow on to extend the performance parity was assigned as two of the of the 11/45. bits on the bus (parity and parity is available), it has not been widely used. Memory parity was implemented directly in the memory, since checking required additional 5.0 PMS STRUCTURE time. Memory and UNIBUS parity is a good example of nature of engineering optimization. The In this section, we give an tradeoff is one of cost and overview of the evolution of the decreased performance versus PDP-I1 in terms of its PMS decreased service cost and more structure, and compare it with data integrity for the user: The expectatrons (Bell et al. 1970). engineer is usually measured on The aspects include : the UNIBUS production cost goals, thus parity structure; UNIBUS performance: transmission and checking are use for diagnostics: architectural clearly a capability to be omitted control required: and from design . ..especially in view of multi-computer and multi-processor loet performance. The internal computer structures. Field Service organization has been unable to guantify the increase in service cost savings due to shorter MTTR by better fault isolation. Similarly, many of the transient 5.1 The UNIBUS - The Center Of The errors which parity detects can be Pm structure detected and corrected by software device driver8 and backup procedures without parity. With In general, the UNIBUS has behaved lower cost for logic and increased beyond expectations, acting as a r88pOnEibility (scope) to include standard for intercommunication of warranty C08tS a8 part of the peripherals. Several hundred types product deeign cO8t forces much of memor les and peripherals have more checking into the design. been attached to it. It has been the principle PMS interconnect ion The interlocked nature of the media of MD-PC and oeriuherals for transfers AS such that there is a systems in the range- 3K dollars to deadlock when two compu thr s are 1OOK dollars (1975). For larger joined together using the UNIBUS window. sys terns supplementary buees for With the window a computer PC-Mp and HP-MS traffic have been can map another computer’s address added. For very small sys terns, space into its own address space in like the LST-11, a narrower bus a true mu1 tiprocessor fashion. ((j-bus) has been designed. De8dlock occurs when the two computers simultaneously attempt to The UNIBIJS by being a standard ha8 accees the other’s addresses provided us with a PMS architecture through each window. A request to for easily configuring systems; the window is in progress on one time a any other organization can also UNIBUS , and at the same build components which interface request to the other UNfBUS is in progress on the requestee’s UNIBUS, the bus... clearly ideal for buyers. hence neither can be Good bU88eS Istandards) make good request neighbors ( in terms of answered, causing a deadlock. Cne engineering), since people can or both requests are aborted and concentrate delrign in a the deadlock is broken by having structured fasE:on. Indeed, the the UNIBUS time out since this is UNIBUS has created a complete eguivalen t to a non-existent secondary industry dealing in address (e.g., a memory). In this way the sys tern recovers and alternatrve sources of supply for requests can be reissued (which may memorres and peripherals. Outside cause deadlock) . The UNIBUS window of the IBM 360 r/o Hultiplexor/Selector bus, the is conf rned to appi ica trons where there likely to be a low UNIBUS 1s the most widely used deadlock’sate.

143 5.2 UNIBUS and Performance 5.3 Evolution Of Models: Predicted Optimality Versus Actual

Al though we always want more The or iq inal prediction (Bell et performance on one hand, there is al, 1970) was that models with an equal pressure to have lower increased performance would evolve cost. Since cost and peformance using: increased path width for are almost totally correlated the data: multi-processors: and two goals perfectly conflict. The wpar a ted bus structures for UNIBUS has turned out to be optimum control and data transfers to over wide dynamic range of secondary and tertiary memory. product:, (argued below) . However, Nearly all of these forms have been the lowest site system, the used, though not exactly as trbus has been introduced, which predicted. (Again. this points to contains about l/2 the number of lack of overall architectural conductors: and at the largest planning versus our willingness and systems, the data path width for belief that the suggest ions and the processor and memory has been plans for the evolution must come increased to 32-bits for added from the implementation groups.1 performance although the UNIBUS is still used for communication with In the earlier 11/45, a separate most I/O controllers. bus was added for direct access of either bipolar (300ns) or fast MOS Since all interconnection schemes (400ns) memory. In general, it was are highly constrained, it is clear assumed that these memories would that future lower and higher be small, and the user would move systems cannot be accomplished from the important part of his algorithm a single design unless a very low to the fast memory for direct cost, high performance execution. The 11/45 provided 3 communication media (e.g. optical) second LJNIHUS for direct is found. transmission of information to the fart memory without PC The optimaiity of the UNIBUS comes interference. The rr/rs also about because memory size (number increased performance by adding a of address bits) and I/O traffic second autonomous data operation are correlated with the processor unit called the Floating Point speed. Amdahl’s rule-of-thumb for Processor (actually not IBM computers (including the 360) processor) . In this way, boti one byte of memory is required integer and floating point E instruction/set and one bit of computation could proceed I/O is required for each concurrently. instruction executed. For our applications, we believe there is The 11/70, a cache based processor. more computation required for each is a logical extension of using memory word, because of the bias fast, local memories, but without toward control and scientific need for expert movement of data. applications. Also, there has been It has a memory path width of less use of complex instructions 32-bits, and the control portion typical of the IBM computers. and data portion of I/O transfers Hence, we assume one byte of memory have been separated as originally is required for each two suggested. The performance instructions executed, and assume limitation of the UNIEIIJS are one byte of I/O is an upper bound removed, since the second Mp system (for real time applications) for permits data transfers of up to each instruction executed. In the five megabytes/set (2.5 times that FDP-11, an average instruction of the UNIBUS) . Note, that a accesses three to five bytes of peripheral memory map control is memory, and with one byte of io, up needed since Mp address space (two to six bytes of memory are accessed meaabvtes) exceeds the UNIBUS. In for each instruction/set. this - way, Therefore, a bus which can support devices on the UNIEUS transfer data two megabytejsec traffic permits into a mapped portion of the larger instruction execution rates of .33 address space. to 1 5 mega instruction/set. This imputes to meory sizes of .16 to .2s megabytes: the maximum allowable memory is .3 to ,256 5.4 Multi-processor Computer megabytes. 6y using a cache memory Structures with a processor, the effective memory processor rate can be increased to further balance the Although it is not surprising that processor. Alternatively, faster multi-processors have not been used floating point operations will except on a highly specialized or inq the balance to be more like basis, it is depressing. In the IBM data, requiring more Computer Structures (Bell and memory. Newell, 711 we carried out 3n

144 analysis of the IBM 360, following results for multiple predicating a multi-processor 11/05 processors sharing a srngle aeslgn. The range of performance UNIBUS: covered by the PDP-11 models is with the substantially worse than PC. PC. PRICE/ SYS Price/ 360, al though the compe t I t Lve IPC HP PERF PRICE PEW* er1ce PfQP* environment of the two companies is 1 .6 1 1 1 3 1 substantially different. For the 2 1.15 1.85 1.23 .66 3.23 .58 360, smaller models appear to 3 1.42 2.4 1.47 -61 3.47 .Q8 perform worse than the technology 40 2.25 1.35 .6 3.35 .49 would predict. The reasons why mu1 t iprocessors have not *PC cost only l * Total System,assumrnq l/3 of system is materialized may be: Pc.cost 1. The basic nature of engineering is to be conservative. this is From these results we would expect to use up to three processors, to a classical deadlock situation: give the performance of a model 40. we cannot learn how to program multiprocessors until such More processors, while increasing the performance, are less systems exist: a system canot cost-effective. This basic be built before programs are structure has been applied on a ready. production basis in the GT4X series of graphics i The market doesn ’ t demand them. processors. In this how can the scheme, a second P.display is added Ano the r deadlock : to the UNIBUS for display picture market demand them, since the market doesn ’ t even know that maintenance. such a structure could exist? not blessed the The second type of structure g iven IBM has yet in Figure MP is a convent ional concept. multiprocessor using multiple port 3. We can always build a better memories. A number of these single, special processor. systems have been installed and This design philosophy stems operate guite effectively, however, from local optimization of the they have only been used for designed object, and ignores specialized applications. global costs of spares, The most extensive mu1 tiprocessor training, reliability and the structure, ability of the user to C.mmP, has been adjust described elsewhere. Hopefully, dynamically a convincing arguments will be configuration to his load. forthcoming about the effectiveness of 4. There are more available multiprocessors from this work designs for new processors than in order to establish these we can build already.. structures on an applied basis. 5. Planning and technology are asynchronous. Within DEC, not all products are planned and 6.0 THE TSP built at a particular time, hence, it is difficult to get the one right time when a multiprocessor would be better Determining an ISP is a design than an existing Uniprocessor problem. The initial 11 design was together with one or two based substantially on benchmarks , additional new processors. and as previously indicated this approach yielded a predecessor (not 6. Incremental market demands buil tl that though performing best require specific new machines. the six benchmarks, was By having more products, a dOlfficu1 t to program for other company can better track applications. compe t i tor s by specific unipracessors.

6.1 General ISP Design Problems 5.4.1 Existent Multiprocessors -

Figure MP gives some of the The guiding principles for ISP mu1 t lprocessor sys terns that have design in general, have been been built on the 11 base. The top especially difficult because: most structure has been built using 11/05 processors, but because of 1. The range of machines argues improper arbitration in the for different encoding over the processor, the performance expected range. At the smallest based on memory contention didn’t systems, a byte-or ien ted materialize. We would expect the approach with small addresses

145 is optrmum, whereas larger variable, and borders on the Lmplementatlons require more artistrc. Ideal goals ace thus operations. larger addresses to have a complete set of and encoding efficiency can be operations for a q iven basic traded off to garn performance. data-type (e.g. integers) --completeness, and The 11 has turned out to be operations would be the same applied (and hopefully for varying length effective) over a range of 500 data-types --orthogonality. in sys tern price ($500 to Selection of the data-trues is $250,000) and memory size (8k totally a function oi‘ the bytes to 4 megabytes). The 360 application. That is, the 11 by comparison varied over a considers both bytes and full similar range: from 4k bytes words to be - integers, yet to 4 megabytes. doesn ' t have a full set of operations for the byte; nor 2. At a given time, a certain are the byte and word ops the style of math ine ISP is used same. BY adhering to this because of the rapidly varying pc inc iple , the compiler and technology. For example, three human code generators are address machines were initially greatly aided. used to minimize processor state (at the expense of We would therefore ask that the encoding efficiency), and stack machine appear elegant, where machines have never been used elegance is a combined qua1 i ty extensively due to memory of instruction formats relating access time and control to mnemonic significance, complexity. In fact, we can operator/data-type completeness observe that machines have and orthogonality, and evolved over time to include addressing consistency. BY virtually all important having completely general operations on useful facilities (e.g., registers) data-types. and which are not context dependent assists in minimizing 3. The machine use vat ies over the number of instruction time. In the case of DEC, the types, and greatly aids in initial users were increasing the sophisticated and could utilize understandability (and the power at the machine u5efulness). language level. The II provided more fully general 6. Techniques for generating code registers and was unique in the by the human and compiler vary minicomputer marketplace, which widely. With the 11. more at the time consisted laroely addressing modes are provided of 1 or 2 accumulator machines than any other computer. The a with 0 or 1 index registers. modes for source and Also, the typical m in icompu ter destination with dyad ic operation codes were small. operators provide what amounts the 11 extended data-typing to to 64 possible instructions: the byte and to reals. by the and by associating the Program extension of the auto-indexing Counter and Stack Pointer mode, the string was registers with the modes, even conveniently programmed, and more data accessing methods are the same mechanism provided for provided. For example, 18 stack data-structures. forms of the MOVE instruction can be seen (Bell et al, 1971) 4. The machine is applied into as the machine is used as a widely different markets. two-address, general registers Initially the 11 was used at and stack machine program the machine language level. forms. (The price for this The user base broadened by generality is extra bits). In applications with substantially general, the machine has been higher level languages. These used mostly as a general languages initially were the register machine. scientific based reoister transfer languages such 7. Basic design can take the very BASIC, FORTRAN, CEC’S FOCAL: general form or be highly but the machine eventually specific, and design decisions began to be applied in the can be bound in some commercral marketplace fo.- the combination of microcode oc RPG , COBOL, DIBOL. and macrocode with no good criteria BASIC-PLUS languages which for tradeoff. provided string and decimal data-types. 5. The crlterla for a capability In an lnstructlon set IS hlqhly

146 6.2 Problems In Extendlnq The systems provide functions to Machine Range get additional segments).

Several problems have arisen as the basic machine has been extended: 7.0 SUMMARY

1. The opera tron-code extension This paper has re-examined the problem--the initial design did PDP-11 and compared it with the not leave enough free opcode initial goals and constraints. With space for extending the machine hindsight, we now clearly see what to increase the data-types. the problemswith the initial design were. Design faults occurred not At the t lme the 11/45 Was through ignorance, but because the designed (FPP was added) , design was started too late. As we several extension schemes were continue to evolve and improve the examined : an escape mode to PDP-11 over the next five years, it add the floating point Will indeed be interesting to operations: bringing the 11 observe, however, the ultimate test back to a more convent ional is use. general register machine by reducing the modes and finally, typing the data by adding a HTELIOGRAPHY global mode which could be switched to seiec t floating Ames, G.T., Drongowski, P.J. and point ( Lns tead of byte Fuller, S.H. Emulating the Nova on operations). the PDP-11/40: case study. Proc. COMPCON (19;5,. 2. Extend ina the addressinq range-- the VNTBUS limits the Bell, G., Cady, R., McFarland, H., physical memory to 262,144 Delagi, E., O’Loughlin, J., Noonan, bytes (la-bits). the 6 . , and Wulf, w. A new implementation of the1nll/70, architecture of minicomputers-- the th; physical address was DEC PDP-11. Proc. SJCC (1970) extended to 4 megabytes by Vo136, ~~-657-675. providing a UNZBUS map so that devices in a 262K UNIBIJS space Bell, C.G. and Newell, A. could transfer into the 4 Computer Structures. McGraw Hill megabyte space by mapping (1971) registers. Bell, J.R. Threaded code. While the physical address CONM ACM (June 1973) Vol 16, No. 6, PP limits are acceptable for both 370-372. the UNIGUS and larger sys terns, the address for a single Eckhouse, R.H. Minicomputer program is still confined to an Sys terns : organization and instantaneous space of 16 bits, programming (PDP-11). the user vlc tual address. Prentice-Hall, (1975) The main method of dealing with Fusfeld, A. R. The technological relatively small addresses is progress function. Technology via process-oriented operating Review (Feb. 1973) FP. 29-38 sys terns that handle large numbers of smaller tasks. This McWilliams, T., Sherwood, W., is a trend in operating Fuller, S., PDP-11 Implementation systems, especially for process using the Intel 3000 microprocessor control and transaction processing. It also enforces a chips. Submitted to NCC (May 1976) structuring discipline in the O’Loughlin, J.F. Microprogramming (user) program organization. operating a fixed architecture machine. The RSX series Microprogramming and Sys terns systems are organized this way, Architecture Infotech State of the and the need for large Art Report 23. ~~205-224 addresses except for problems where large arrays are accessed Ornstein, 1972? is minimized. (page 28) Stone, H.S. and Siewiorek, D.P. The initial memory management Introduction to computer proposal to extend the virtual organization and data structures: memory was predicated on PDP-11 Edition. McGraw-Hill, dynamrc, rather than static (19751 assignment of memory segment registers. In the current Turn, R. Computers in the 1980’s. memory management scheme, the Columbia University Press 1974. address registers are usually considered to be static for a Gulf, W.A., Bell, C.G., C.mmp: A task (al though some operating multi-mini-processor. FJCC (1972)

147 I I I

BASIC REtC; APPLIED - APPLICATIONS TECHNOLOGY - ADVANCED IMPLEMENTATION -- (HARDWARE/ MKT/SALES - USEH (E.G. SEMI- DEVELOPMENT COt0JCTCRS) - SOFTWARE) 4

ARCHITECTURE -

Competitors

Governments, standards,/ -- op. SYS. testing, professional societies

LANGUAGES

e flow of information (specifications, ideas, etc.)

FlGURE.ORt. STRUCTURE OF ORGANIZATION AFFECTING A COMPUTER DESIGN Fig. DS DESlcll STYLES (IDEOLOGIES) 111 TERIIS OF COST AND PERFORMANCE

Performance it

/ maximum / / maximum / cast- / effective Physical / Cons?raints / / / / 1 incremented / minimum f fixed 8 variable / - costs level of .------/ functionalit) / / I / / unbuildable

149 Price \ \

A--

Fig. HODELS PDP-I I MODELS PRICE vE9SIJ; T’rlE \llTtI LINES OF COttSTA;IT i’CEt0QtVA’CE PC PC... Mp... ET... WK... I I I I

a. MUM-PC strucfure using a sir& Unibos.

PC Pdisplry l hip... KT... KMS... I I I

l used in CT4X sericr; al~ernr~~wly

P spcciahrcd (e.g.. FIT) PC spcrirlvcd

c. hlultiprnccssor usq muttiporl Mp.

Ki... ys... h1p(CO:lS) S encral:crosspom~: - Pd ii 0: I5 ;‘I I ;4Oj - SICnibui) 16x16 C’ 3 E 3 d. C.mmp Cblll rnulri-mini.pr@:cssor Compwcr slru;wre.

Figure bit’ hlul~i~l’roccssor Compu~cr S~rw~ures Im~tenren~crj uiln; PDP.t t

151