The Computer

Chapter 34 The engineering design of the Stretch computer 1 Erich Block The Stretch is an advanced scientific with This Summary computer computer paper reviews the engineering design of the Stretch System variable facilities for and floating-point, fixed-point, variable-field-length with primary concentration on the central computer as the main arithmetic and data-handling facilities. contributor to performance. In it, these new techniques, devices, The of 100 704 is achieved performance goal x speed by high-speed and instructions have been pushed to the limit set by the present circuits, multiplexing, and simultaneous-operation technique of instruction technology and, therefore, its analysis will convey best the prob- and data-fetching, as well as overlap within the execution units. This lems encountered and the solutions employed. massive overlap and multiplexing results in complicated recovery routines between the look-ahead and instruction units. These units are described in as are the detail, arithmetic units and significant algorithms used in the floating-point arithmetic. The Stretch system A flexible set of circuits using a current-switching technique with Early in the system design, it appeared evident that a six-fold overriding-level facility is described, as well as the packaging of circuits improvement in memory performance and a ten-fold on printed cards. The frame and gate concept is also shown. Performance improvement in basic circuit over the 704 was the best one could achieve. figures and hardware count illustrate the size, complexity, and performance speed To meet the of the system. proposed performance criteria, the system had to be in such a that it organized way took advantage of every possible of overlap systems function, multiplexing of the major portion of Introduction the of system, processing operations simultaneously, and anticipa- tion of wherever The Stretch computer [Dunwell, 1956] project was started in order occurrences, possible. The system had to be of based on the to achieve two orders of magnitude of improvement in perform- capable making assumptions probability that certain events and means had to be ance over the then existing 704. Although this computer, like the might occur, provided to is retrace the when the to be 704, aimed at scientific problems such as reactor design, hydro- steps assumption proved wrong. This and of dynamics problems, partial differential equation etc., its instruc- simultaneity multiplexing operations reflects itself in the Stretch at all tion set and organization are such that it can handle with ease System levels, from overall systems organization to the of data-processing problems normally associated with commercial cycle specific instructions. In the following descrip- tion, this will be discussed in more detail. applications, such as processing of alphanumeric fields, sorting, and If one considers the decimal arithmetic. Stretch System (Fig. 1) from an overall of view it becomes In order to achieve the stated goal of performance, all factors point apparent that the major parts of the can that go into the computer design must contribute towards the system operate simultaneously: performance goal; this includes the instruction set [Buchholz, the internal the data and instruction 1958], system organization, a The 2-jnsec, 16,384-word core memories are self-contained, word and features such as length, auxiliary status-monitoring with their own clocks, addressing circuits, data registers and devices, the circuits, packaging, and component technology. No checking circuits. The memories themselves are interleaved so that the first two memories have their addresses distrib- one of them by itself can give this hundred-fold increase in speed; the uted modulo 2 and the other four are interleaved modulo only by combining and interacting of these contributing 4. The modufo-2-interleaved memories are used factors can this performance be obtained. primarily for instruction storage; since, for high-performance instructions, halfword formats are used, the average rate of ob- l Proc. EJCC, 1959. is pp. 48-59, taining instructions one per % ^isec. Similarly, a 0.5-jusec 421 422 Part 5 The PMS level Section 2 Computers with one central processor and multiple input/output processors INSTRUCTION MEMORIES OPERAND MEMORIES (MOD 2 INTERLEAVED) (MOD 4 INTERLEAVED) 7=T SEC CORE SEC CORE 2/i SEC CORE 2/i SEC CORE 2/i SEC CORE 2/x SEC CORE 2/i 2/i MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY (16K) (16K) (16K) 06K) (16K) (16K) MEMORY IN BUS MEMORY OUT BUS DISK SYNCH MEMORY CENTRAL UNIT I/O EXCHANGE BUS COMPUTER DISK CONSOLE READER PRINTER PUNCH TAPE TAPE CONTROL ADAPTER ADAPTER ADAPTER ADAPTER ADAPTER ADAPTER (CONSOLE)9 £IREADER tPUNCH 4xl0 6 WORDS Fig. 1. The Stretch system. data-word rate is achieved by the use of four modulo-4 Before discussing the computer organization, a few general organized memories. The addressing of the memories and features must be mentioned for completeness: the transfer of information from and to the memories by or both for checks and a memory bus permits new addresses, information, o Word length: 64 bits plus eight bits parity to pass through the bus every 200 mjusec. error-correction codes. words The simultaneously-operating Input/Output units are b Memory capacity and addressing: A possible 256,000 linked with the memories and the computer through the can be randomly addressed. These storage positions are all Exchange, which, after initial instruction by the computer, in external memory, except for the 32 first addresses. These time coordinates the starting of the I/O equipment, the checking positions consist of the internal registers (accumulators, and error-correction of the information, the arrangement clocks, index registers). of the information into memory words, and the fetching and c The instructions are single-address instructions with the storing of the information from and to memory. All these exception of a number of special codes that imply the functions are executed without the use of the computer, second address explicitly. so it can in the meantime continue its data and processing a The instruction set (Fig. 2) is generalized and contains computation. full set for single- and double-precision floating-point arith- The central computer processes and executes the stored metic, and a full set for variable-field-length integer arith- program. Here, now, the simultaneity and multiplexing of metic (binary and decimal). It also has a generalized set for functions has reached its ultimate. index modification and a branching set, as well as a set of Chapter 34 The engineering design of the Stretch computer 423 | instructions. All 765 of instructions I/O told, different types (accumulator). Both halves of the word are independently are used in the system. indexable. d The instruction format 3) makes use of both half and (Fig. A general monitoring device used for important status full words; half words accommodate indexing and floating- triggers is called the Interrupt [Brooks, 1957] System. This point instructions (for optimum performance these two sets system monitors the flip-flops which reflect internal mal- of instructions use a rigid format), and full-word formats functions, result significance (exponent range, mantissa zero, are used by the variable-field-length instructions. Notice overflow, underflow), program errors (illegal instruction, that the latter specifies the field the address of operand by protected memory area), and conditions (unit 1 input/output its left-most bit, the length of the field, and the byte size, not status ready, etc.). The of these flip-flops can cause a as well as the of the starting point (offset) implied operand break in the normal progression of the stored program for l Byte: a generic term to denote the number of bits to be operated on as fix-up purposes. Their status is automatically interrogated a unit by a variable-field-length instruction. at all times. INSTRUCTION CATEGORY 424 Part 5 The PMS level Section 2 Computers with one central processor and multiple input/output processors DATA FORMATS 1 1 1 1 1 1 1 ECC INTEGER BYTE 8 BYTE 7 BYTE 6 BYTE 5 BYTE 4 BYTE 3 BYTE 2 BYTE 1 PTY BITS 15 23 31 39 47 55 63 71 FLAG FLOATING ^ ECC EXPONENT MANTISSA (FRACTION) POINT PARITY 101112 59 63 71 INDEX 1+ WORD VALUE I/O CONTROL WORD Chapter 34 The engineering design of the Stretch computer 425 | time-sharing of the major computer organs is no longer possible. Data flow All in all, the computer has 3,000 register positions and about 450 adder positions. The data flow through the computer is shown in Fig. 5 and is Despite the multiplexing and simultaneous operation of suc- comparable to a pipeline which in a steady state (namely, once the if cessive instructions, result appears as sequential step-by-step filled) has a large output rate no matter what its length. The same internal operation were utilized. This has made the design of the is true here; after start-up the execution of the instructions is fast interlocks quite complex. and bears no relation at all to the stages it must progress through. DATA WORD INSTRUCTION INSTRUCTION FETCH 4 INSTRUCTIONS 4 DATA WORDS 1 1 J )L , INSTRUCTION INSTRUCTION UPDATING FETCH DATA WORD FETCH DATA WORD INSTRUCTION INSTRUCTION FETCH UPDATING EXECUTION INSTRUCTION EXECUTION 704 STRETCH Fig. 4. Comparison of Stretch and 704 organization. 426 Part 5 The PMS level Section 2 Computers with one central processor and multiple input/output processors INSTR FETCH ADR MEMORY BUS MEMORY IN BUS «=? FR EXCHANGE ADDRESS DATA OPERAND FETCH ADR "TZ RESULT STORE ADDRESS MEMORY OUT BUS TO EXCHANGE <J: LI CHECKER OUT BUS 1 INSTR WORD BUFFER OPERAND BUFFER INSTR WORD BUFFER ERROR OPERAND BUFFER INSTRUCTION 8 CORRECTOR OPERAND BUFFER CHECKER INDEXING UNIT OPERAND BUFFER LOOK-AHEAD CHECKER IN BUS 3" LA TRANSFER BUS L_E I ARITH CHECKER OUT BUS 2 WORD 2 WORD ACCUMULATOR OPERAND A,B REGISTER INTERRUPT C,D ARITHMETIC SYSTEM CHECK ARITH CHECKER IN BUS I 71 SERIAL PARALLEL ARITH UNIT ARITH UNIT Fig.

Load more