Chapter 27

The ILLIAC IV computer 1

George H. Barnes / Richard M. Brown / Maso Kato David J. Kuck / Daniel L. Slotnick / Richard A. Stokes

Summary The structure of ILLIAC IV, a parallel-array computer con- control unit so that a single instruction stream sequenced taining 256 processing elements, is described. Special features include the processing of many data streams. multiarray processing, multiprecision arithmetic, and fast data-routing 2 addresses and data common to all of the data 6 Memory interconnections. Individual processing elements execute 4 X 10 instruc- were broadcast from the central control. 9 processing tions per second to yield an effective rate of 10 operations per second. 3 amount of local control at the individual Index terms Array, computer structure, look-ahead, machine lan- Some processing thin-film element level was obtained each guage, parallel processing, speed, memory. by permitting element to enable or disable the execution of the common instructions according to local tests. Introduction 4 Processing elements in the array had nearest-neighbor con- The study of a number of well-formulated but computationally nections to provide moderate coupling for data exchange. massive problems is limited by the computing power of currently available or proposed computers. Some involve manipulations of Studies with the original SOLOMON computer indicated that such a both feasible very large matrices (e.g., linear programming); others, the solution parallel approach was and applicable to a sets of areas. of of partial differential equations over sizable grids (e.g., variety important computational The advent of LSI cir- weather models); and others require extremely fast data correlation cuitry, or at least medium-scale versions, with gate times of the techniques (phased array signal processing). Substantive progress order of 2 to 5 ns, suggested that a SOLOMON-type array of 9 in these areas requires computing speeds several orders of magni- potentially 10 word operations per second could be realized. In tude greater than conventional computers. addition, memory technology had advanced sufficiently to indicate 6 At the same time, signal propagation speeds represent a serious that 10 words of memory with 200 to 500-ns cycle times could barrier to increasing the speed of strictly sequential computers. be produced at acceptable cost. The ILLIAC IV Phase I design Thus, in recent years a variety of techniques have been introduced study during the latter part of 1966 resulted in the design discussed in in this to be fabricated the Defense to overlap the functions required sequential processing, e.g., paper. The machine, by Space multiphased memories, program look-ahead, and pipeline arith- and Special Systems Division of , Paoli, Pa., metic units. Incremental speed gains have been achieved but at is scheduled for installation in early 1970. considerable cost in hardware and complexity with accompanying

problems in machine checkout and reliability. Summary of the ILLIAC IV The use of explicit parallelism of operation rather than over- IV lapping of subfunctions offers the possibility of speeds which in- The ILLIAC main structure consists of 256 processing elements in four crease linearly with the number of gates, and consequently has arranged reconfigurable SOLOMON-type arrays of 64 been explored in several designs [Slotnick et al., 1962; Unger, 1958; processors each. The individual processors have a 240-ns ADD Holland, 1959; Murtha, 1966]. The SOLOMON computer [Slotnick time and a 400-ns MULTIPLY time for 64-bit operands. Each 4 et al., 1962], which introduced a large degree of overt parallelism processor requires approximately 10 ECL gates and is provided into its structure, had four principal features. with 2048 words of 240-ns cycle time thin-film memory.

Instruction and control 1 A large array of arithmetic units was controlled by a single addressing

The ILLIAC IV array possesses a common control unit which HEEE Trans., C-17, vol. 8, pp. 746-757, August, 1968. decodes the instructions and generates control signals for all

320 Chapter 27 The ILLIAC IV computer 321

in the This eliminates the cost and of constants, and (2) it overlap of common processing elements array. storage program permits in each element. fetches with other complexity for decoding and timing circuits operand operations. are In addition, an and address adder provided the final address Processor with each processing element, so that operand partitioning

for element i is determined as follows: Oj Many computations do not require the full 64-bit precision of the To make more efficient use of the hardware and speed . processors. a = a + (b) + (Cj) up computations, each processor may be partitioned into either where a is the base address in the instruction, (b) is the specified two 32-bit or eight 8-bit subprocessors, to yield 512 32-bit or contents of a central index in the control unit, and (c,) register 2048 8-bit subprocessors for the entire ILLIAC IV set. is the contents of the local index of the ele- register processing The subprocessors are not completely independent in that they effective ment i. This in is very independence operand addressing share a common index register and the 64-bit data routing paths. for rows and columns of matrices and other multidimen- handling The 32-bit subprocessors have separate enabled/disabled modes sional data structures [Kuck, 1968]. for indexing and data routing; the 8-bit subprocessors do not.

Mode control and data conditional operations Array partitioning is to be able to Although the goal of the ILLIAC IV structure The 256 elements of ILLIAC IV are grouped into four separate a control the processing of a number of data streams with single subarrays of 64 processors, each subarray having its own control instruction stream, it is sometimes necessary to exclude some data unit and capable of independent processing. The subarrays may streams or to process them differently. This is accomplished by be dynamically united to form two arrays of 128 processors or one whose value providing each processor with an ENABLE flip-flop array of 256 processors. The following advantages are obtained. controls the instruction execution at the processor level. The bit is of a test result in each ENABLE part register 1 Programs with moderately dimensioned vector or matrix holds the results of tests conditional on local data. processor which variables can be more efficiently matched to the array size. Thus in ILLIAC IV the data conditional jumps of conventional 2 Failure of any subarray does not preclude continued proc- computers are accomplished by processor tests which enable or essing by the others. disable local execution of subsequent commands in the instruction stream. This paper summarizes the structure of the entire ILLIAC IV system. Programming techniques and data structures for ILLIAC Routing IV are covered in a paper by Kuck [1968].

i has data Each processing element in the ILLIAC IV routing — connections to 4 of its neighbors, processors i + 1, i 1, i + 8, for and i — 8. End connection is end around so that, a single array, ILLIAC IV structure

63 connects to 0, 62, 7, and 55. processor processors The organization of the ILLIAC IV system is indicated in Fig. 1. data transmissions of arbitrary distance are ac- Interprocessor The individual processing elements (PEs) are grouped in four a sequence of routings within a single instruction. complished by arrays, each containing 64 elements and a control unit (CU). The of For a array the maximum number routing steps control to 64-processor four arrays may be connected together under program is 4. actual is 7; the average overall possible distances In required permit multiprocessing or single-processing operation. The system distance 1 is most common and distances programs, routing by program resides in a general-purpose computer, a Burroughs than 2 are rare. greater B 6500, which supervises program loading, array configuration changes, and I/O operations internal to the ILLIAC IV system Common operand broadcasting and to the external world. To provide backup memory for the disk 109 Constants or other operands used in common by all the processors ILLIAC IV arrays, a large parallel-access system (10 bits, bit access 40-ms maximum is are fetched and stored locally by the central control and broadcast per second rate, latency) directly to the There is also for real-time data to the processors in conjunction with the instruction using them. coupled arrays. provision for connections to the ILLIAC IV This has several advantages: (1) it reduces the memory used directly arrays. Section 2 Processors for data 322 Part 4 The instruction-set processor level: special-function processors array Chapter 27 The ILLIAC IV computer 323 324 Part 4 The instruction-set processor level: special-function processors Section 2 Processors for array data

words (16 instructions), fetch of the next block is initiated; the Column different is If the Array possibility of pending jumps to blocks ignored. next block is found to be already resident in the buffer, no further (12) action is taken; else fetch of the next block from the array memory is initiated. On arrival of the requested block, the instruction

buffer is cyclically filled; the oldest block is assumed to be the least required block in the buffer and is overwritten. Jump instruc- tions initiate the same procedures. Fetch of a new instruction block from memory requires a delay of approximately three memory cycles to cover the signal trans- mission times between the array memory and the control unit. On execution of a straight line program, this delay is overlapped with the execution of the 8 instructions remaining in the current block.

In a multiple-array configuration, instructions are fetched from the array memory specified by the , and broadcast simultaneously to all the participating control units. Instruction processing thereafter is identical to that for single-array operation, except that synchronization of the control units is necessary whenever information, in the form of either data or control signals, must cross array boundaries. CU synchronization must be forced at all fetches of new instruction blocks, upon all data routing operations, all conditional program transfers, and all configuration- changing instructions. With these exceptions, the CUs of the several arrays run independently of one another. This simplifies the control in the multiple-array operation; furthermore, it permits I/O transactions with the separate array memories without steal- ing memory cycles from the nonparticipating memories.

Memory addressing

Both data and instructions are stored in the combined memories

of the array. However, the CU has access to the entire memory, while each PE can only directly reference its own 2,048-word PEM. The memory appears as a two-dimensional array with CU access sequential along rows and with PE access down its own column. In multiarray configurations the width of the rows is increased by multiples of 64. The resulting variable-structure addressing problem is solved by generating a fixed-form 20-bit address in the CU as shown in Fig. 5. The lower 6 bits identify the PE column within a given

array. The next 2 bits indicate the array number, and the remain- ing higher-order bits give the row value. The row address bits actually transmitted to the PE memories are configuration- dependent and are gated out as shown. Addresses used by the PE's for local operands contain three components: a fixed address contained in the instruction, a CU Chapter 27 The ILLIAC IV computer 325

must be the and data and S as a general The addressing indicated by both CFC1 and CFC2 multiplicand routing register, else consistent with the actual configuration designated by CFCO, storage register. a configuration interrupt is triggered. 2 An adder/multiplier (MSG, PAT, CPA), a logic unit (LOG), and a barrel switch (BSW) for arithmetic, Boolean, and Trap processing shifting functions, respectively. the will be Because external demands on arrays preprocessed for 3 A 16-bit index register (BGX) and adder (ADA) memory the for the through the B 6500 system computer, interrupt system address modification and control. are control units is relatively straightforward. Interrupts provided 4 An 8-bit mode to hold the results of tests or faults register (BGM) to handle B 6500 control signals and a variety of CU array and the PE ENABLE/DISABLE state information. con- (undefined instructions, instruction parity error, improper under- control instruction, etc.). Arithmetic overflow and figuration As described earlier, the PEs may be partitioned into subproc- is detected and a flow in any of the processing elements produces essors of word lengths of 64, 2 X 32, or 8 X 8 bits. Figure 7 shows trap. biased rela- the data representations available. Exponents are and is an effective FOBK The strategy of response to an interrupt tive to base 2. Table 1 indicates the arithmetic and logical opera- Each saves its own status word to a single-array configuration. CU tions available for the three operand precisions. with which it automatically and independently of other CU's may mode control previously have been configured. PE base address Hardware consists of a interrupt or implementation Two bits of the mode register (BGM) control the enabling as a to (BIAB) which is dedicated pointer array storage 32-bit register disabling of all instructions; one of these is active only in the into which status information will be transferred. Upon receipt execution on the second precision mode and controls instruction of the counter and other of an interrupt, the contents program arithmetic operand. Two other bits of BGM are set whenever an status information and the contents of CAB are stored in the fault (overflow, underflow) occurs in the PE. The fault bits of all block to by the BIAB. In addition, CAB is set to contain pointed PEs are continuously monitored by the CU to detect a fault condi- the block address used by BIAB so that subsequent register saving tion and initiate a CU trap. may be programmed. Interrupt returns are accomplished through reloads the status word and Data a special instruction which previous paths CAB and clears the interrupt. Each PE has a 64-bit wide routing path to 4 of its neighbors (±1, enabled a mask word in a in such Interrupts are through special regis- ±8). To minimize the physical distances involved routing, is and not to a ter. The interrupt state general unique specific the PEs are grouped 8 to a cabinet (PUC) in the pattern shown or the no interior to a trigger trap. During interrupt processing, subsequent in Fig. 8. Bouting by distance ±8 occurs PUC; routing are their is in 2 distances. interrupts responded to, although presence flagged by distance ±1 requires no more than intercabinet the interrupt state word. CU data and instruction fetches require blocks of 8 words, of in the control unit an into a buffer The high degree overlap precludes which are accessed in parallel, 1 word per PUC, CU the instruction which immediate response to an interrupt during (CUB) 512-bit wide, distributed among the PUCs, 1 word per in some element. To generates an arithmetic fault processing control to force non- alleviate this it is possible under program Table 1 PE data operations definite fault overlapped instruction execution permitting access to information.

Processing element (PE) executes the data com- The processing element, shown in Fig. 6, fetches. It contains the putations and local indexing for operand following elements.

1 Four 64-bit registers (A, B, R, S) to hold operands and results. A serves as the , B as the operand register, R as 326 Part 4 The instruction-set processor level: special-function processors Section 2 Processors for array data

NEWS

CONTROL UNIT DRIVERS/ AND MIR CDB J L RECEIVERS DRIVERS MODE AND REGISTER RECEIVERS (RGM)

R REGISTER (RGR)

ADDRESS ill —\ ADDER OPERAND (ADA) S REGISTER SELECT (RGS) GATES (OSG) LC MULTIPLICAND SELECT GATES (MSG) 111

B REGISTER X REGISTER LLC (RGB) (RGX) PSEUDOADDER TREE (PAT) MEMORY ADDRESS • MEMORY REGISTERS (MAR) LLLii CARRY PROPAGATE ADDER (CPA)

LLL1

A REGISTER LOGIC UNJT (RGA) (LOG) MIR

LEADING ONES BARREL DETECTOR SWITCH (BSW) (LOO) c

Fig. 6. Processing-element block diagram. Chapter 27 The ILLIAC IV computer 327 328 Part 4 The instruction-set processor level: special-function processors Section 2 Processors for array data

cabinet. Data is transmitted to the CU from the CUB on a 512-line bus. CU Disk and on-line I/O data are transmitted on a 1024-line bus which can be switched among the arrays. Within each array, 16 of 64 2 parallel connection is made to a selected PEs, per PUC. 9 Maximum data rate is one I/O transaction per microsecond or 10 bits per second. The I/O path of 1024 lines is expandable to 4096 lines if required.

Processing element memory (PEM)

The individual memory attached to each processing element is a thin-film DRO linear select memory with a cycle time of 240 ns and access time of 120 ns. Each has a capacity of 2048 64-bit words. The memory is independently accessible by its attached PE, the CU, or I/O connections.

Disk-file subsystem

The computing speed and memory of the ILLIAC IV arrays re- data files quire a substantial secondary storage for program and as well as backup memory for programs whose data sets exceed Bur- fast memory capacity. The disk-file subsystem consists of six 8 roughs model II A storage units, each with a capacity of 1.61 X 10 bits and a maximum latency of 40 ms. The system is dual; each 8 half has a capacity of 5 X 10 bits and independent electronics capable of supporting a transfer rate of 500 megabits per second. The data path from each of the disk subsystems becomes 1024 bits wide at its interface with the array. Figure 9 shows the organization of the disk-file system.

B 6500 control computer

The B 6500 computer is assigned the following functions.

1 Executive control of the execution of array programs

2 Control of the multiple-array configuration operations

3 Supervision of the internal I/O processes (disk to arrays, etc.)

4 External I/O processing and supervision

5 Processing and supervision of the files on the disk file sub- system

6 Independent data processing, including compilation of ILLIAC IV programs

To control the array operations, there is a single interrupt line and a 16-bit data path both ways between the B 6500 and each of the control units. In addition, the B 6500 has a control and data Chapter 27 The ILLIAC IV computer 329

of in number of gates per system should be possible with comparable The remaining problems are (1) location the faulty subsys- location of in the reliability. tem, and (2) the faulty package subsystem. It is only by virtue of high-density integration (50- to 100-gate Location of the faulty subsystem assumes the B 6500 to be package) that the design of a three-million-gate system can be fault-free, since this can be determined by using the standard 256 contemplated. Reliability of the major part of the system, B 6500 maintenance routines. The steps to follow are shown in processing elements and 256 memory units, is expected to be in Fig. 10. 5 3 the range of 10 hours per element and 2 X 10 hours per memory The B 6500 tests the control units (CU) which in turn test all unit. PEs. PEMs are tested through the disk channel. This capability the as collection of identical The organization of ILLIAC IV a for functional partitioning of the subsystems simplifies the diag- ele- units simplifies its maintenance problems. The processing nostic procedure considerably. ments, the memories, and some part of power supplies are designed to be pluggable and replaceable to reduce system down time and References improve system availability. HollJ59; KuckD68; MurtJ66; SlotD62; UngeS58 330 Part 4 The instruction-set level: processor special-function processors Section 2 Processors for array data

APPENDIX 1 Skip on index portion of CAR (bits 40 Al. CLASSIFIED LIST OF CU INSTRUCTIONS H? ) through 63) equal to bits 40 through 63 of address contents. Al.l Data transmission CU memory The letters T, F, and A have the same meaning as in ALIT Add literal (24 bit) to CAR. CTSB above. BIN Block fetch to CU memory. 4 Instructions: EQLXT, EQLXTA, EQLXF, EQLXFA. BINX Indexed (by PE index) block fetch. fT, Al Skip on index part of CAR (bits 40 through BOUT Block store from CU memory. GRTR n 63) greater than bits 40 through 63 of CU BOUTX Indexed block store. memory address contents. The letters T, CLC Clear CAR. F, and A have the same meaning as in COPY Copy CAR into CAR of other quadrant. CTSB above. DUPI Duplicate inner half of CU memory ad- 4 Instructions: GRTRT, GRTRTA, GRTRF, GRTRFA. dress contents into both halves of CAR. fT, Al Skip on index part of CAR (bits 40 through DUPO Duplicate outer half of CU memory ad- LESS 63) less than bits 40 63 of CU dress contents into both halves of n through CAR. memory address contents. The letters T, F, EXCHL Exchange contents of CAR with CU mem- and A have the same meaning as in CTSB ory address contents. above. LDL Load CAR from CU memory address con- 4 Instructions: LESST, LESSTA, LESSF, LESSFA. tents. fT, Al Skip on CAR equal to all l's. The letters LIT Load CAR with 64-bit literal following the ONES T, F, and A have the same as in instruction. n meaning CTSB above. LOAD Load CU memory from contents of PE 4 Instructions: ONEST, ONESTA, ONESF, ONESFA. memory address found in CAR. A"! on bits 40 63 of CAR LOADX Load CU memory from contents of PE ONEX fT, Skip through equal to all l's. The letters and A have the address found in CAR, indexed n T, F, memory same as in CTSB above. by PE index. meaning 4 Instructions: ONEXT, ONEXTA, ONEXF, ONEXFA. OBAC OR all CARS in array and place in CAR. fT, AT on T-F set. The SLIT Load CAR with 24-bit literal. SKIP Skip flip-flop previously letters and A have the same STL Store CAR into CU n T, F, meaning memory. as in CTSB above. STORE Store CAR into PE memory. 4 Instructions: SKIPT, SKIPTA, SKIPF, SKIPFA. STOREX Store CAR into PE memory, indexed by SKIP PE index. Skip unconditionally. T, A, I on index of CAR 40 TCCW Transmit CAR counterclockwise between TXJ Skip portion (bits through 63) less than limit portion (bits 1 CUs in array. The letters and A have TCW Transmit CAR clockwise between CUs in through 15). T, F, the same meaning as in CTSB above. If / array. is present, the index portion of CAB is in- A 1. 2 and test Skip cremented by the increment portion of Al on nth bit of CAR. If Tis CAR 16 while the test is CTSB |T, Skip present, slap (bits through 39) if 1; if F is present, slap if 0. If A is pres- in progress; if / is not present, no incre- ent, AND together bits from all CUs in menting takes place.

array before testing; if absent, OR together 8 Instructions: TXLT, TXLTI, TXLTA, TXLTAI, TXLF,

bits from all CUs in array before testing. TXLFI, TXLFA, TXLFAI.

Instructions: CTSBT, CTSBTA, CTSBF, CTSBFA. Skip on index portion of CAR (bits 40 Al on CAR to CU ad- to limit of CAR (TfT, Al Skip equal memory through 63) equal portion eol(Iff ) dress contents. The letters T, F, and A (bits 1 through 15). See CTSB for the have the same meaning as in CTSB above. meaning of T, F, and A; see TXL above 4 Instructions: EQLT, EQLTA, EQLF, EQLFA. for the meaning of /. Chapter 27 The ILLIAC IV computer 331

8 Instructions: TXET, TXETI, TXETA, TXETIA, TXEF, CLC Clear CAR. TXEFI, TXEFA, TXEFIA. COR OR CU memory to CAR. 1II on index of CAR 40 CRB Reset bit TXG fT, A, Skip portion (bits of CAR. than limit of CROTL Rotate left. r ) through 63) greater portion CAR 1 for the CAR (bits through 15). See CTSB CROTR Rotate CAR right. meaning of T, F, and A; see TXL above CSB Set bit of CAR. for the meaning of /. CSHL Shift CAR left.

8 Instructions: TXGT, TXGTI, TXGTA, TXGTAI, TXGF, CSHR Shift CAR right. TXGFI, TXGFA, TXGFAI. LEADO Detect leading ONE in CAR of all quad- on CAR all 0's. See CTSB for the rants in ZER Skip array. meaning of T, F, and A. LEADZ Detect leading ZERO in CAR of all quad-

4 Instructions:ctions: ZERT, ZERTA, ZERF, ZERFA. rants in array. on index of CAR 40 ORAC OR all CARS in and in CAR. ZERX( Skip portion (bits array place through 63) all 0's. See CTSB for the A2. CLASSIFIED LIST OF PE INSTRUCTIONS meaning of T, F, and A. 4 Instructions: ZERXT, ZERXTA, ZERXF, ZERXFA. A2.1 Data transmission LDA A1.3 Transfer of control

EXEC Execute instruction found in bits 32 through 63 of CAR. EXCHL Exchange contents of CAR with contents of CU memory address. HALT Halt ILLIAC IV. JUMP Jump to address found in instruction. LOAD Load CU memory address contents from contents of PE memory address found in CAR. LOADX Load CU memory address contents from contents of PE memory address found in CAR, indexed by PE index. STL Store CAR into CU memory.

A1.4 Route

RTE Route. Routing distance is found in address

field (CAR indexable), and register con-

nectivity is found in the skip field.

A1.5 Arithmetic

ALIT Add 24-bit literal to CAR. CADD Add contents of CU memory address to CAR. CSUB Subtract contents of CU memory address from CAR. INCRXC Increment index word in CAR.

A1.6 Logical CAND AND CU memory to CAR. CCB Complement bit of CAR. 332 Part 4 The instruction-set processor level: special-function processors Section 2 Processors for array data

6 Instructions: IXL, IXLI, IXE, IXEI, IXG, IXGI. 6 Instructions: IXL, IXLI, IXE, IXEI, IXG, IXGI. Set / on of X and Set / on of X and | comparison register op- comparison register op- E, 1 erand. See above for of L, E, erand. See Section A2.2 for of 1 meaning G, ,X meaning L,

(LG J and I. |'I £, G, and I. 6 Instructions: JXL, JXLI, JXE, JXEI, JXG, JXGI. 6 Instructions: JXL, JXLI, JXE, JXEI, JXG, JXGI. fLl XI Increment PE index (X register) by bits 48 Set / on comparison of S register and op- IS E through 63 of operand. erand. See Section A2.2 for meaning of L, XIO Increment PE index of bits 48 through 63 IgJ E, and G. of operand plus one. 3 Instructions: ISL, ISE, ISG. fLl Set / on comparison of S register and op- A2.3 Mode setting/comparisons erand. See Section A2.2 for js|e meaning of L, EQB Test A and B for equality bytewise. Ig' E, and G.

GRB Test B register greater than A register 3 Instructions: JSL, JSE, JSG. J bytewise. ISN Set from the sign bit of A register. Test less LSB B register than A register bytewise. JSN Set / from the sign bit of A register.

CHWS Change word size. SETE Set £ bit as a logical function of other bits.

Set J if A register is less than operand. L SETEO Set El bit similarly.

L means test logical; A means test arithmetic; SETF Set F bit similarly.

'E) M means test mantissa. SETFO Set Fl bit similarly.

Instructions: ILL, IAL, IML. SETG Set G bit similarly.

if Set fLl Set J A register is equal to operand. See SETH H bit similarly. SETI Set 7 bit I A E above for meaning of L, A, and M. similarly. ImJ SETJ Set / bit similarly. Instructions: Set ILE, IAE, IME. SETCO Pth bit of CAR similarly. Set / if A register is greater than operand. SETC1 Set Pth bit of CAR 1 similarly. See for above meaning of L, A, and M. SETC2 Set Pth bit of CAR 2 similarly. Set SETC3 Pth bit of CAR 3 similarly. Instructions: ILG, IAG, IMG. IBA Set I from Mh bit of A register; bit num- Set 7 if A register is equal to all zeros. ber is found in address field. z JBA Set / from Mh bit of A register; bit num- 3 ber is found in address field. Instructions: ILZ, IAZ, IMZ. A2.4 Arithmetic Set J if A is to all fL| register equal ONES. I A O ADB Add bytewise.

ImJ SBB Subtract operand from A register bytewise. Instructions: ILO, IAO, IMO. ADD Add A register and operand as 64-bit fL LI Set / under conditions specified in set of operands. J A, E instructions immediately above. SUB Subtract operand from A register as 64-

Im gJ bit quantities. z AD{R, N, M, S} Add operand to A register. The fl, N, M, o S specify all possible variants of the arith- 15 Instructions: JLL, JAL, JML, JLE, JAE, JME, JLG, metic instruction. The meaning of each JAG, JMG, JLZ, JAZ, JMZ, JLO, JAO, letter, if present in the mnemonic, is JMO. fl round result

Set I on comparison of X register and op- N normalize result erand. See Section A2.2 for meaning of L, M mantissa only

eg, E, G, and /. S special treatment of signs. Chapter 27 I The ILLIAC IV computer 333

16 Instructions: ADM, ADMS, ADNM, ADNMS, ADN, 16 Instructions: AND, ANDN, ANDZ, ANDO, NAND, ADNS, ADRM, ADRMS, ADRM, NANDN, NANDZ, NANDO, ZAND, ADRNMS, ADRN, ADRNS, ADR, ADRS, ZANDN, ZANDZ, ZANDO, OAND, AD, ADS. OANDN, OANDZ, OANDO.

ADEX Add to exponent. CBA Complement bit of A register. DV{R, N, M, S} Divide by operand. See AD instruction for CHSA Change sign of A register. of M, and S. fNl meaning R, N, fNj 16 Instructions: DVM, DVMS, DVNM, DVNMS, DVN, Z EOR Z Exclusive OR A register with operand. DVNS, DVRM, DVRMS, DVRNM, loJ loJ DVRNS, DVRN, DVRNS, DVR, DVRS, 16 Instructions: EOR, EORN, EORZ, EORO, NEOR, DV, DVS. NEORN, NEORZ, NEORO, ZEOR,

EAD Extend precision after floating point ADD. ZEORN, ZEORZ, ZEORO, OEOR, OEORO. ESB Extend precision after floating point SUB- OEORN, OEORZ, TRACT. Load exponent of A register.

LEX Load exponent of A register. OR A with ML{R, N, M, S} Multiply by operand. See AD instruction register operand. for meaning of R, N, M, and S. 16 Instructions: MLM, MLMS, MLNM, MLNMS, MLN, OR, ORN, ORZ, ORO, NOR, NORN, MLNS, MLRM, MLRMS, MLRNM, NORZ, NORO, ZOR, ZORN, ZORZ, MLRNMS, MLRN, MLRNS, MLR, MLRS, ZORO, OOR, OORN, OORZ, OORO.

ML, MLS. RBA Reset bit A register to ZERO.

SAN Set A register negative. RTAL Rotate A register left. Rotate left. SAP Set A register positive. RTAML mantissa of A register SBEX Subtract exponent of operand from expo- RTAMR Rotate mantissa of A register right. Rotate A nent of A register. RTAR register right. Set A SB{R, N, M, S} Subtract operand from A register. See AD SAN register negative. instruction for meaning of R, N, M, and S. SAP Set A register positive. 16 Instructions: SBM, SBMS, SBNM, SBNMS, SBN, SBNS, SBA Set bit of A register to ONE.

SBRM, SBRMS, SBRNM, SBRNMS, SBRN, SHABL Shift A and B registers double-length left.

SBRNS, SBR, SB, SBS. SHABR Shift A and B registers double-length right. Shift left. NORM Normalize A register. SHAL A register MULT In 32-bit mode, perform MULTIPLY and SHAML Shift A register mantissa left. Shift A leave outer result in A register and inner SHAR register right. Shift A mantissa result in R register, with both results ex- SHAMR register right. tended to 64-bit format.

AND A register with operand. The left-

hand set of letters specifies a variant on

the A register, the right-hand set, on the operand. The meaning of these variants is not present use true N use complement Z use all ZEROS

O use all ONES.