CS2 Supplementary Notes 2000-01

1 SYSTEMS.

1.1 Digital . A digital computer is a programmable digital system possessing the following elements (may be present in multiplicity). 1. Memory Unit: used to store programs and operands (data); usually location addressable. 2. : consisting of at least an ALU, which performs all data processing functions (logical and arithmetic). 3. : a synchronous stored program programmable controller. 4. I/O Unit: provides communication with other systems (and the outside world). The ALU and CU are in intimate communication: CU monitors condition codes, generates ALU function selects, and controls movement of data. The CU and ALU, together with possibly a few storage registers, are therefore often considered as a single entity called a (CPU ). In the early 1970s, LSI technology (MOS) made it possible to fabricate an entire CPU (admittedly a rather small and simple one) on a single chip. An integrated CPU is called a microprocessing unit (MPU ) or simply a . Whether the CPU is a single chip, or a board (e.g.in a minicomputer), or a number of boards, its connection to the rest of the machine is limited typically to one or two data buses (with associated address and control buses). Machines which have completely separate buses for instructions and operands are called Harvard architectures . An example is the first commercial microprocessor, the INTEL 4004 (in 1971), which had an 8-bit instruction bus and a 4-bit operand bus. Most machines, however, use only one bus to fetch instructions and read/write operands: this is the Princeton or . The width in bits (or lines) of the data bus is an important characteristic of the CPU. In the 1970s it was possible to classify machines in the following way: large machines (mainframes) typically had 32 CPU data bus lines; minicomputers (e.g. DEC PDP 11) typically had 16; and microcomputers (e.g. original IBM PC) typically 8. However, with VLSI, with the performance and data bus width of minicomputers and small to medium mainframes are now in existence. The width of the address bus determines the maximum number of locations which can be addressed; the set of all possible locations constitutes the address space of the CPU. The address space can be very large even in microprocessors: e.g. the Pentium has a space of 2 32 bytes or 4G bytes (1M:=1K.1K; 1G:=1K.1M). Using standard memory SIMMs have only a capacity of 128M bytes, 4G bytes is 128 such SIMMs. This would be expensive even in a large system and usually there are large sections of unpopulated address space (no physical devices using addresses in these regions). The memory addressed via the CPU address bus (i.e. that in the Memory Units) is referred to as primary memory . Most installations have a much larger backing store or secondary memory which is treated as a subsidiary external system and is accessed via specialised I/O units called secondary storage controllers . The CPU can communicate not only with its primary memory, but also with its I/O units, which typically contain at least one or two registers addressable by it. Some CPUs have a separate address space for I/O devices. This can be done by inclusion of one extra line, say I/O enable, but special I/O instructions are needed also (e.g. Pentium family). Other CPUs simply use the memory address bus to place their I/O devices, which then appear like memory locations to the CPU. This is called memory mapped I/O (e.g. MC68000 family).

1.2 Bus Communications. We have seen how various computer subsystems are interfaced to a system address/data bus. Data transfer on a bus is a transaction involving one source subsystem, and, usually one destination. Each such transfer or bus cycle is conducted under the control of the subsystem which is currently driving the address and

Page 1 control buses, the bus master for that cycle. We will consider only cycles of the type involving two subsystems: one bus master and a bus slave . Each bus master will, of course, be a subsystem with an internal controller, such as a CPU (and other devices like DMACs), which is capable of driving the address bus. In the simplest systems the only potential bus master is the (single) CPU. In more complex systems several subsystems may have the ability to be bus masters. If several masters wish to use a shared bus, it is necessary to arbitrate between them. This is often done by a separate subsystem called a bus arbiter , which each potential master can request for bus control. The arbiter grants the bus to one master at a time. There are several algorithms which can be used to decide which master will be issued with a grant: e.g masters can be prioritised. Also the arbiter may allow one master to hold the bus until it has finished its transfer, or it may allow each master only one cycle at a time before forcing rearbitration. Slaves are, by definition, addressable devices. The address bus defines a system address space (n-bit address bus gives 2 n word address space). Each slave appears in this space as one or more addressable locations (usually locations within a slave are contiguous). A slave may be a memory mapped I/O device with a couple of locations, or a large memory unit with millions. It is necessary for an address to specify not only which location within a slave is being addressed, but also which slave is involved. Each slave must be allocated a unique portion of the address space which it will occupy. The more significant bits of the address bus are usually used to select the the slave in question. Often these bits are interpreted by a single central address decoder which then sends an enable signal to the slave required. This —decoder“ is not usually as simple as the decoder circuit discussed earlier, since different slaves can occupy different amounts of address space. A memory unit may require tens of millions of addresses, any one of which will cause it to be enabled, while an I/O interface may have as few as one or two. Thus one output of the address decoder may need to go active for tens of millions of different inputs, while another responds to only one. In any case, at most one output should go active at any one time. Address Decoder

Address bus (usually upper Selects to different lines) slaves: active select will enable slave

Note that any master which cannot drive all address lines will be unable to access large areas of the address space. Note also, that a master in one cycle can sometimes be a slave in another. Subsystems which have master/slave capability must have I/O interfaces to the address and control buses as well as the data bus. When a bus master has control of the bus, it will activate the address bus, putting the address of the required location onto it. Various control lines are also necessary to manage the transfer. The master will usually need at least: 1) Read/Write Line (say H=read, L=write) which will indicate to the slave whether the master is going to read or write to it. 2) Address Valid (say L=address valid) indicates to all slaves when the value on the address bus is valid, to avoid ambiguities, when, for example the lines are in transition. The address decoder will usually be disabled (all outputs inactive) when the address valid is inactive. It cannot in general be assumed that the master knows how long the slave will need to identify its address, store data written to it, or present valid data requested from it. The problem is particularly acute with mass produced standard microprocessors, where a single type of device can be used in a myriad of different systems containing other subsystems of widely differing speeds. Several strategies are used to overcome this.

Page 2 2 INSTRUCTIONS.

Instructions fetched and executed by the CPU vary from machine to machine. The collection of instructions which a particular CPU can execute comprises its instruction set . Although instructions are fetched, via the data bus, in binary form or machine code , it is normal to associate with each a mnemonic which describes its function: but this, of course, is for human consumption only. An instruction may be several CPU words long, and so may involve more than one memory cycle to fetch. For example a 68000 instruction can be anything from 1 to 5 words long (each inst word must be stored in a 16-bit memory word). Instructions in general can be classified as: 1) Control Instructions. Normally CU executes instructions sequentially but it is sometimes desirable to alter this. Branch or jump instructions direct the CU to begin executing at some location specified by the instruction itself. Simple branches may be unconditional or may depend on status inputs from e.g. the ALU (i.e. condition codes). Most CPUs save the condition codes from the last instruction in a special internal register called the status register (or SR for short) (also condition code register (CCR for short)). Other branching instructions may call subroutines or govern more sophisticated looping behaviour. Additionally, many machines have control instructions which can e.g. stop the CPU and wait for some external event, reset the rest of the system etc. 2) Data Processing Instructions tell the CPU to operate on data. They include: a) Data Movement Instructions : e.g. MOVE to move data between registers or between a register and memory location. b) Dyadic Operations : 2 operands, 1 result. e.g.: logical AND, OR, XOR; arithmetic ADD, SUB, MUL, DIV, CMP. c) Monadic Operations : 1 operand, 1 result. e.g.: logical COM (complement); arithmetic NEG (two's comp. inverse), INC, DEC; shifts and rotates; swaps. d) Complex Instructions : e.g to support high level languages LINK, CHK on 68000; block moves on Z80; polynomial evaluation on VAX 11 etc. All these instructions act on operands, which are code words of length n, n being a characteristic of the instruction. Some CPUs use several values of n over different instructions (for most n is the same as, or related to, the CPU word length). A CPU with a word length of say 8 bits must take 2 memory cycles to load or store a 16-bit operand, 4 memory cycles for a 32-bit operand, and so on, with consequent speed impairment. Many insts. expect an operand to be in a certain code e.g. a 16-bit ADD inst. combines 2 16-bit code words, which it assumes to be in numeric (posn. or 2's comp.) form. Control flow instructions must indicate where control is to be transferred, while information processing instructions must indicate the locations of operands and the destination of result, if any. The first part of an inst. must specify the action involved, and is called the op-code . This is then followed by information about the whereabouts of the operands on which it is to act or the destination to which control is to be transferred. Most machines can specify operand whereabouts in several ways, called addressing modes . Simple and common examples include the following (real machines generally have other more complex modes as well): 1) Immediate Addressing : operand is contained within instruction itself. e.g. ADD 3 to ... 2) Absolute Addressing : address of memory location(s) containing operand is part of the inst. (if operand occupies several locations CPU usually assumes sequential storage). e.g. ADD contents of

loc. addr. 2000 16 to ... 3) Register Addressing : operand is in CPU internal register. Some (not all) CPUs have such internal data registers capable of holding inst. operands. These are faster than external memory, and, because there are few (usually not more than 16), only a few bits are needed to specify addr: hence shorter insts. 4) Indirect Addressing Main memory addr. of operand is contained in a CPU storage register (addr. reg.) specified in the inst. The advantage is that the instruction need only specify a register (such Page 3 specification needs only a few bits) rather than memory address. Useful if several instructions are to reference same area of memory. Address register is a pointer to memory. An instruction must contain its opcode, specification of which mode it will use and any additional information (e.g. actual operands or addresses) needed by these modes. Most CPU operations involve one or two operands, and generate one result. Instructions can be classified according to the number of different independent primary memory addresses they can specify. 1) A 1-address instruction can only generate a single independent memory address to fetch the first operand. Any second operand must be in a CPU data register (sometimes called an accumulator , especially if there are only 1 or 2 in the CPU) which will be overwritten by the result. 2) A 2-address instruction can generate 2 independent memory addresses to fetch first and second operands, with the result overwriting one of them. 3) A 3-address instruction allows specification of independent memory locations for 2 operands and a result. Notes 1) Short instructions are generally preferable to long ones, as they are quicker to fetch and take up less room. 2) Instructions are decoded by the CU, which first examines the op-code and determines the (s), then fetches any auxiliary addressing information. 3) Even on a given machine, instruction format can vary widely. 4) While programs reside in primary memory as machine code, they are normally written by human beings employing some sort of language which is then translated into machine code. The simplest programming language uses mnemonics which correspond to the op-codes and symbolic representations of the addressing modes. Languages which correspond directly to the machine code, in this way, are called low level or assembly languages . Each CPU type has its own assembly language, corresponding to its own instruction set. Assembly language can be translated manually into machine code but this tedious process is normally performed by a service program called an assembler . Assembly languages suffer from facts that they are not portable (i.e. program written on one machine cannot be transferred to run on a different type) and do not support e.g. structured programming techniques. To overcome these problems high-level languages such as PASCAL, FORTRAN, ADA etc. have been designed. Programs written in these standard languages are translated into machine code for specific machines by other service programs called compilers . Clearly, even for a given language, each CPU type requires a different compiler.

Page 4 CS2 Supplementary Notes 2000-01

3 THE CPU.

A von Neumann CPU is a well defined subsystem communicating with others via a single data/address bus. There are 3 main internal subsystems connected by internal buses: 1) Control Unit 2) Execution Unit (ALU etc.) 3) Storage Registers: some for operands and/or addresses; others invisible to the assembly language instruction set. The control unit contains a subsystem called the program sequencer , which generates instruction cycles . Internal to the program sequencing logic is a special address register called the (PC), which contains the address of the next instruction word to be fetched. The PC is updated as each word is fetched, to point to the next memory location. For example, a 68000, which fetches 2 bytes for each instruction word, increments the PC by 2 each time. Branches, of course, modify the PC more radically (by a +ve or -ve number for, respectively, branches forward and back). The basic of a typical CPU is now outlined: 1) Instruction Fetch : CPU generates an external memory cycle (read) fetching the first word of the instruction, and the PC is amended accordingly. Instruction word is latched into another CPU register called the (IR). 2) Instruction Decode : CU decodes the instruction, identifying the op-code and addressing modes involved. 3) Effective Address Calculation : CU generates any further instruction fetches required by addressing modes and computes the actual or effective address of the first data operand. The calculation of the EA may involve the use of the ALU. 4) Operand Fetch : CU fetches operand(s), if necessary generating external memory cycle(s). [In a 2-address inst steps 3 and 4 may be repeated.] 5) Execution : CU supplies operands to the instruction unit and generates required sequences of control outputs to cause processing to occur. 6) Store Result : the EA of the result is computed, if appropriate (3-address instructions), and another memory cycle generated, if required. Otherwise, result is stored in internal register. The cycle now repeats. From examination of any instruction set, it is apparent that: a) There is considerable variation across instructions. b) Each instruction can involve many complex CU state transitions. Since a machine code instruction is such a complex entity, there must be some mechanism for generating the state sequences required. Older machines have hardwired sequencers (counters) for each instruction or group of instructions. Modern machines tend to be microprogrammed , i.e. internal to the CU is a fast internal program store, containing, for each standard instruction, a routine of microinstructions . When a given machine code instruction is decoded, a microprogram sequencer executes the microinstructions corresponding to the macroinstruction . Each microinstruction performs a very simple operation (e.g. register-to-register transfer), involving only a few state changes. Note, however, that it is not only permissible, but desirable, to allow a single microinstruction to initiate concurrent events in separate parts of the CPU: e.g. to initiate prefetching of instruction words while the execution unit is busy. This can lead to the different stages of the instruction cycle occurring in a manner involving more overlap than the above generalised description suggests. Microprogramming allows the designer flexibility, simplifying the design process and facilitating instruction set changes (augmentation or even redesign). Most microprogram stores are ROM; however, some machines use RAM and are therefore user microprogrammable . In

Page 5 summary, a microprogrammed CU can be considered to be a system with stored program programmable controller in its own right. The CPU is generally driven by a clock which defines machine cycles . During each machine cycle, some basic activity occurs, e.g. the execution of a microinstruction in a microprogrammed CPU. A machine code instruction can take a number of machine cycles to complete, depending on the number of microinstructions in its defining microroutine. In some machines, performance can be increased by using a multiphase clock to provide an increased number of timing events per cycle. Memory cycles Reads and writes to memory are controlled by the CU, using 2 internal CPU registers: the Memory Address Register ( MAR ), which holds the address for the current cycle; and the (MBR ), which holds data during writes, and latches it in during reads. These registers are temporary holding registers used only during the cycle (e.g. MAR can hold an EA which has just been computed; MBR can hold incoming data or instruction words after external memory has been deselected). They allow correct values of data and address to be maintained on the bus after the µ inst. initiating the access has completed. A memory access is nearly always considerably longer than the µcycle time. Sometimes it may be necessary to delay execution of the next µinst. while it waits for a memory cycle to complete. Careful planning of the microroutines for individual instructions will minimise this. For the assembly language programmer the internal operation of the CPU is largely hidden. Only those registers explicitly accessible to the (macro) instruction set need be considered, forming the programmer's model . The model will include all data and address registers, the program counter, and a status register which generally contains the condition code register (SR is called CCR on some machines) and some other status flip- holding information about the state of the CPU. Most CPUs also have a special address register called the stack pointer , which is used to maintain a LIFO data structure, or stack in primary memory. The stack is used for automatic storage of return addresses of subroutines etc.

Page 6 CS2 Supplementary Notes 2000-01

4 PROCESSES AND OPERATING SYSTEMS

4.1 Processes. A process is a program in execution. At any instant a process is in a given state . The state of a process at a certain instant, contains all the information needed to restart it later, if its execution were stopped at that instant. A process stops when the CPU stops executing that process's program. We distinguish between stopping a process between instructions and stopping it while one of its instructions is still executing. However, in either case the process state includes: 1) the program; 2) all primary memory locations used for data; 3) the status and position of all I/O devices used; 4) the state of the CPU. To restart a process stopped between insts. (4) need include only those registers which are accessible to the instruction set (programmer's model). For many purposes the state defined in this way is adequate. When a process is stopped during instruction execution, before its internal registers (MAR, MBR, IR etc.) have been released, their contents must also be included in the state if restart is to be possible. Thus the state required to cope with intra-instruction breaks is larger than that needed for inter-instruction stoppages only. The changeable parts of a process's state are often called its state vector (i.e. items 2, 3 and 4 above). A process can be thought of as program plus state vector. A can then be looked at as a device which alters the state vector of a process by executing state-changing instructions.

4.2 Exceptions. CPU normally executes a program in sequence or in accordance with control flow instructions. However, some events, called exceptions , cause a CPU to cease normal processing. When an exception occurs, the CPU must identify it and save the internal registers relevant to the current process state (in the hope that it will be able to resume later), usually on a stack, maintained in a convenient area of primary memory. It will then load an exception vector into its PC, from some location(s) associated (by hardware) with the identified exception. The vector is the beginning address at which the CPU should now begin execution. The program or subprogram stored at that address is called the exception handler routine . After the handler has executed,if possible, the CPU will retrieve the original CPU state (always including the original PC and CCR contents), and begin executing the process at precisely the point it left off.

4.2.1 Exception Types. 1) Reset . CPU reset is the ultimate exception: there is no recovery. Most CPUs do not even bother saving the CPU state. Execution is transferred to the reset handler routine: precisely the point at which the CPU will start at power-on (power-on always initiates a reset). Reset is usually generated by a control panel pushbutton. 2) Memory Fault. This condition is recognised only by CPUs which handshake with their peripherals. When the CPU tries to access an address which is not populated, or has some fault, the system will negatively acknowledge the access. Simple CPUs like the 6800 or Z80 are not designed to detect this type of fault, but the 68000, for example, has a Bus Error input pin, which is asserted by the system whenever a memory fault is detected. A memory fault, by definition, will occur during rather than between instructions. Clearly, in many circumstances, recovery is inappropriate, since the occurrence of a memory fault indicates a hardware or software failure. However, it turns out that there are situations where it is essential that a process be able to survive a fault of this kind. The 68000, despite its ability to detect memory faults, cannot store the entire CPU internal state (MAR, MBR, IR etc.) and so a process suffering a Bus

Page 7 Error cannot be restarted. Motorola later introduced an enhanced 68000, designated MC68010, which rectified this shortcoming. 3) Interrupts . Most modern CPUs have provision to accept hardware signalled exceptions, called interrupts , from external subsystems (usually I/O devices): e.g. an I/O subsystem can always interrupt when it requires attention. Interrupts are usually detected by the CPU only between instructions and the CPU can always recover (given that the programmer takes all necessary steps). Sometimes CPU may not wish to accept interrupts of a certain kind. To allow this, the status register is usually equipped with a disabling interrupt mask flip-flop. When the mask is set, the processor ignores assertion of the appropriate interrupt input. Many CPUs also have a separate non-maskable interrupt to be used for very high priority devices. Some CPUs have several levels of interrupt, normally prioritised. Such systems usually allow masking of all those below level n, where n is chosen by the CPU at a particular time. Usually, when processing a level n interrupt, it will mask all levels below n+1, preventing repeated interrupts by any impatient level n device. 4) Traps. Exceptions generated by the software itself are called traps . Some CPUs will trap on detecting an arithmetic overflow or a division by zero, for example. Other traps are just generated by special TRAP or SOFTWARE INTERRUPT instructions. They are normally used to obtain some service from any operating system which might be present.

4.3 Operating Systems. Some systems run one infinite process forever. Such systems are dedicated , performing a single function. However, with other machines, processes of a finite duration will be run. When such a process terminates, if the machine is not simply stopped, some other software must be present to, at the very least, transfer control to the appropriate next process. In interactive systems, this control software, or operating system , provides the interface to the user, which enables submission of processes for execution as required. Operating systems can provide a number of different types of service, of varying complexity. Here we provide only a brief introduction: 1) The simplest operating systems are found in single user systems which run one process at a time. Such systems have default control over all input/output devices (e.g. terminal, printer, disc) and exceptions. The user is provided with a number of system calls , which furnish facilities like "output message to terminal" or "terminate process" (returns control to the OS. after the user process has finished). Such calls are normally performed by using a TRAP or SWI instruction, which is then intercepted by the system. Since one user cannot interfere with others, such systems will normally permit control to be taken by user processes, so that, for example, a user can write his or her own exception handlers, or take responsibility for directly outputting to a printer or terminal, without using a system call. Input and output is often file oriented, with terminals and printers being treated like special disc files. Operating systems without file handling capabilities are used on some low-level systems, mainly for debugging, and are called monitors . 2) When a process is interrupted, it is possible for the OS to return control, not to it, but to another process. The first process is said to have been pre-empted . Its state, however, is saved, so that it can be restarted later. The operating system is responsible for deciding which process gets to run when, and for determining which events (exceptions) cause which changes. This is called scheduling . A system which allows a CPU to be shared by several processes in this way is called multiprogrammed . In a multiprogrammed environment the OS has to be very careful about allocated resources like I/O devices to individual processes. In general it will maintain a fairly tight control over such resources. 3) If a multiprogrammed system is interrupted at periodic intervals by a timer , and processes are arranged in a circular queue, which shifts round one place at each interrupt, an illusion can be

Page 8 created that several processes are actually running at the same time. This time-slicing can be used as the basis of a multi-user timeshared system like, for example, UNIX. Note that multiprogramming, and particularly time-sharing, causes some serious headaches in trying to prevent users, who are after all using the same primary memory, from corrupting each other's work or, even worse, from destroying the operating system. To allow workable time-shared systems, it is necessary to be able to protect areas of memory assigned to different users, and to translate CPU addresses, so that users need not be aware of which blocks have been allocated to them. Address protection and translation comes under the general heading of memory management , and is performed automatically, in hardware, by a subsystem called a (MMU ).

Page 9 CS2 Supplementary Notes 2000-01

5 INPUT/OUTPUT

The CPU deals with subsystems, not randomly addressable via the address bus, as external I/O devices. With each I/O device is associated one or more I/O ports , which may be serial or parallel. A parallel port is designed to send or receive parallel data, to or from an external device. To this end, an n-bit parallel port will have a set of n drivers and/or n receivers, which will actually transfer data to or from the external device. With each such port is associated an n-bit addressable parallel buffer register which is emptied (output) or filled (input) by the device. This register: a) protects the I/O device from the data bus, and vice versa; b) acts as a temporary store for outgoing or incoming data. Serial ports are very similar, but have only one driver/receiver and usually have parallel-to-serial and serial-to-parallel shift registers as buffer regs, to handle the required parallel/serial conversion. Data entering the system is generally transferred to some area of primary memory allocated in advance for this (a memory buffer ). Data for output is, likewise, usually held in a memory buffer, prior to being sent to the port. In either case there must be a continual transfer of data between memory buffer and port buffer register, at a rate compatible with the speed of the I/O device. In simple systems, this transfer is accomplished by the CPU itself, which can access the port buffer register either in its normal address space (memory mapped) or, as in the case of the Z80, in a special I/O address space. For an output device (e.g. terminal, printer), the CPU's responsibility is to keep the buffer register full. It is important, however, that data is not written to the port faster than the I/O device can empty it, an unacceptable situation called overrun . Normally, a handshaking protocol is used between the device and the port, with the port asserting a "data ready" strobe, and the device replying with "data accepted". The port will have an addressable status register associated with it, containing a flag which will be set whenever the buffer reg has been emptied. The CPU must not write to the register until this flag is signalled, or overrun will occur. With an input device (e.g terminal, keyboard), the situation is similar, but it is the port which is in danger of being overrun, unless the CPU ensures that the buffer register is emptied quickly. A handshaking protocol is again used for device to port transfers, and a "buffer full" flag is required in the port status register. In both cases, it is clear that an active port requires regular service from the CPU. On the other hand, many I/O devices are relatively slow compared to the speed at which the CPU executes instructions. A fast printer, for example, will print only a few hundred characters per second, whereas even a fairly slow microprocessor like the MC6800 will have no difficulty in executing 200,000 instructions in that time. It is often, thus, very wasteful for a CPU to occupy itself only with an I/O operation. There are two techniques which are commonly used to free the processor for other work. 1) Polling . Where a process is handling its own I/O it is possible to have the CPU check, in turn, the port status flags of each active port, at regular intervals, to determine if service is required. When many ports are involved, however, the technique can become very time-consuming in itself. Also, it is the CPU, not the port, which determines when service should occur. 2) Interrupt . Quite simply, a port requiring service, interrupts the CPU. Assuming the interrupt is not masked, the CPU will respond immediately, pausing only to save its state. Note that this interrupt latency time , depends strongly on the number of internal registers that must be saved. Interrupt driven I/O is particularly suitable where the OS handles all I/O devices, but it is also useful for a process which contains its own interrupt handlers. In some cases, where several devices share a CPU interrupt line, the CPU may have to poll the relevant ports, on interrupt, to determine which was responsible. In either case, it is apparent that a port is not a trivially simple subsystem. In addition to buffer and status registers, it will require an internal controller, to conduct handshaking between it and the external device and to determine when CPU interrupts should be asserted. The Motorola MC6821 Peripheral Interface Adapter ( PIA ), is a common integrated dual 8-bit parallel I/O port. The device

Page 10 has two 8-bit parallel ports, designated A and B, each line of which can be configured for input or output. Each port has its own handshake lines for device handling, and each can independently interrupt the CPU. The MC6850 Asynchronous Communications Interface Adapter ( ACIA ), in the same family, has similar features for a single serial port. More specialised chips are available for controlling I/O to particularly complex devices (e.g. floppy disc controllers, CRT controllers etc.).

5.1.1 DMACs and I/O Processors Some I/O devices can transmit/absorb a large amount of data in a short time (notably discs), and could tie the CPU up completely. Where this is the case, a common solution is to transfer data to or from the main memory, without CPU intervention, a process called Direct Memory Access ( DMA ). DMA is normally handled by a DMA Controller or DMAC , a specialised subsystem which has the ability to supplant the main CPU as system bus master (via some bus arbitration system). DMACs are optimised for data transfer: as they do not fetch instructions, they can move large blocks of data faster than most CPUs. However DMA is not without its cost. The CPU and a DMAC cannot both use the same bus at the same time, so that, the CPU, for example, must be held off while a DMAC is conducting a cycle. DMACs use two distinct strategies to deal with shared buses: 1) DMA cycles are interleaved between those of the CPU, but as there is often insufficient space for an extra cycle to be inserted, the CPU will have to wait when necessary. The DMAC is said to cycle steal , i.e. it "steals" cycles which could otherwise have been used by the CPU. Arbitration occurs on every cycle, so if it is not efficient, the technique can result in reduced performance. On the other hand, the CPU is never kept waiting for more than one cycle at a time. 2) In contrast, a DMAC may seize the bus for long enough to transfer a whole block of data. In this burst mode DMA, there is much less arbitration overhead, but the CPU can be paralysed for many cycles. The technique chosen depends on the I/O device being serviced and system- specific parameters. Where time consuming device management or complex protocols are required, most medium complexity machines now use I/O processors to control ports. These may be simply dedicated CPUs, with small specialised instruction sets, which fetch instructions from primary memory (shared with the main CPU), such as the IBM channel , or they may be complete microcomputers, with their own private memory and resources, like the Motorola MC68121 Intelligent Peripheral Controller ( IPC ). Such devices will normally incorporate, or be linked to, a DMAC, and will use DMA to effect data transfers to memory. The 68121 is a single chip which contains an 8-bit microprocessor (similar to the 6800), 128 bytes of RAM, 2K bytes of ROM, a timer, several I/O ports, and an interface to a 68000 bus. A 68000 main CPU can use this interface to read/write the on-chip RAM and use it to pass messages to the IPC

Page 11