<<

Chapter 12

Computer Organization LEVEL 2

APPLICATION LEVEL

HIGH-ORDER LANGUAGE LEVEL

ASSEMBLY LEVEL

OPERATING SYSTEM LEVEL

INSTRUCTION SET ARCHITECTURE LEVEL

MICROCODE LEVEL 2

LOGIC GATE LEVEL

9781284079630_CH12_689_782.indd 689 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N • Data section ‣ Receives data from and sends data to the main memory subsystem and I/O devices • Control section ‣ Issues the control signals to the data section and the other components of the computer system

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 692 CHAPTER 12 Computer Organization

T is f nal chapter shows the connection between the combinational and sequential circuits at Level LG1 and the machine at Level ISA3. It describes how hardware devices can be connected at the Mc2 level of abstraction to form black boxes at successively higher levels of abstraction to eventually construct the Pep/9 computer.

12.1 Constructing a Level-ISA3 Machine

FIGURE 12.1 is a block diagram of the Pep/9 computer. It shows the CPU divided into a data section and a control section. T e data section receives data from and sends data to the main memory subsystem and the disk. T e control section issues the control signals to the data section and to the other components of the computer. The CPU Data Section FIGURE 12.2 is the data section of the Pep/9 CPU. T e sequential devices in the f gure are shaded to distinguish them from the combinational devices. T e CPU registers at the top of the f gure are identical to the two-port register bank of Figure 11 . 52 . T e control section, not shown in the f gure, is to the right of the data section. T e control lines coming in from the right come from the control section. T e control lines are rendered as dashed lines in Figure 12 . 1 , but they are solid lines in Figure 12 . 2 . T ere are two kinds of control signals— combinational circuit controls and clock pulses. Names of the clock pulses all end in Ck and all act to clock data into a register or f ip-f op. For example,

FIGUREComputer 12.1 Systems F I F T H E D I T I O N Figure 12.1 Block diagram of the Pep/9 computer.

Main memory

Disk CPU Input Data Control Output section section

Data flow Control

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 692 29/01/16 8:27 am 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 Figure 12.2 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N CPU components • 16-bit memory address register (MAR) ‣ 8-bit MARA and 8-bit MARB • 8-bit memory data register (MDR) • 8-bit ‣ AMux, CMux, MDRMux ‣ 0 on control line routes left input ‣ 1 on control line routes right input

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Control signals • Originate from the control section on the right (not shown in Figure 12.2) • Two kinds of control signals ‣ Clock signals end in “Ck” to load data into registers with a clock pulse ‣ Signals that do not end in “Ck” set up the data flow before each clock pulse arrives

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.1 Constructing a Level-ISA3 Machine 693

Figure 12.2 FIGURE 12.2Computer Systems F I F T H E D I T I O N (expanded) The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N The status bits NZVC • Each status bit is a one-bit D flip-flop • Each status bit is available to the control section • The status bits can be sent as the low-order nybble to the left input of CMux and from there to the register bank

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N The shadow carry bit S • Invisible at level ISA3 • Visible at level Mc2 • Used for internal arithmetic operations that should not affect the C bit • Example: PC ← PC + 1 • Selected by CSMux = 1

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Setting the status bits • C can be set directly from ALU Cout, and can be the Cin input to the ALU • V and N can be set directly from ALU • Z can be set in one of two ways ‣ If AndZ control signal is 0, Z is set directly from ALU Zout ‣ If AndZ control signal is 1, Z is set as the AND of ALU Zout and Z

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 696 CHAPTER 12 Computer Organization

Computer FIGURE 12. 3 Systems F I F T H E D I T I O N Figure 12.3 The truth table for the AndZ combinational circuit in Figure 12.2.

Input AndZ Z Zout Output 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1

through CMux, and f nally back to the bank of 32 registers via CBus. Data from main memory can be injected into the loopCopyright © 2017from by Jones & Bartlettthe Learning, system LLC an Ascend Learning bus, Company through the MDRMux to the MDR. From there, it can go through the AMux, the ALU, and the CMux to any of the CPU registers. To send the content of a CPU register to memory, you can pass it through the ALU via the ABus and AMux, through the CMux and MDRMux into the MDR. From there it can go to the memory subsystem over the system bus. T e control section has 34 control output lines and 12 input lines. T e 34 output lines control the f ow of data around the data section loop and specify the processing that is to occur along the way. T e 12 input lines come from the 8 lines of BBus plus 4 lines from the status bits, which the control section can test for certain conditions. Later in this chapter is a description of how the control section generates the proper control signals. In the following description of the data section, assume for now that you can set the control lines to any desired values for any cycle. The von Neumann Cycle T e heart of the Pep/9 computer is the von Neumann cycle. T e data section in Figure 12 . 2 implements the von Neumann cycle. It really is nothing more than plumbing. In the same way that water in your house runs through the pipes controlled by various faucets and valves, signals (electrons, literally) f ow through the wires of the buses controlled by various multiplexers. Along the way, the signals can f ow through the ALU, where they can be processed as required. T is section shows the control signals necessary to implement

9781284079630_CH12_689_782.indd 696 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N von Neumann cycle • The cycle at level Asmb5 ‣ Fetch instruction at Mem[PC] ‣ Decode ‣ Increment PC ‣ Execute ‣ Repeat

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 4.31

Load the machine language program Initialize PC and SP do { Fetch the next instruction Decode the instruction specifier Increment PC Execute the instruction fetched } while (the stop instruction does not execute)

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N von Neumann cycle • The cycle at level LG1 ‣ The instruction could be unary or nonunary ‣ If nonunary, the instruction specifier must be fetched one byte at a time because the Pep/8 data bus is eight bits wide

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.4 do { Fetch the instruction specifier at address in PC PC ← PC + 1 Decode the instruction specifier if (the instruction is not unary) { Fetch the high-order byte of the operand specifier at address in PC PC ← PC + 1 Fetch the low-order byte of the operand specifier at address in PC PC ← PC + 1 } Execute the instruction fetched } while ((the stop instruction does not execute) && (the instruction is legal))

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Control sequences • Each line is a clock cycle • Comma is the parallel separator • Semicolon is the sequential separator • Control signals before the semicolon are combinational signals set for the duration of the cycle • Control signals after the the semicolon are clock pulses at the end of the cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Control signals for the von Neumann cycle • To fetch from memory ‣ Put the address in the MAR (MARA and MARB) ‣ Assert MemRead for three consecutive cycles ‣ At the end of the third cycle, clock the data from the bus into the MDR

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 Figure 12.2 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.5

// Fetch the instruction specifier and increment PC by 1

UnitPre: IR=0x000000, PC=0x00FF, Mem[0x00FF]=0xAB, S=0 UnitPost: IR=0xAB0000, PC=0x0100

// MAR <- PC. 1. A=6, B=7; MARCk // Fetch instruction specifier. 2. MemRead 3. MemRead 4. MemRead, MDRMux=0; MDRCk // IR <- instruction specifier. 5. AMux=0, ALU=0, CMux=1, C=8; LoadCk

// PC <- PC plus 1, low-order byte first. 6. A=7, B=23, AMux=1, ALU=1, CMux=1, C=7; SCk, LoadCk 7. A=6, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=6; LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems12.1 ConstructingF I F T a H Level-ISA3 E D I T I O NMachine Figure 12.6 699

FIGURE 12.6 Timing diagram of the fi rst fi ve cycles of Figure 12.5.

T Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 LoadCk

MARCk

MDRCk

MemRead A=6 MemRead MemRead MemRead AMux=0 B=7 MDRMux=0 ALU=0 MARCk MDRCk CMux=1 C=8 LoadCk

register bank and for the content of the PC to be placed on ABus and BBus. T e period T is long enough to allow those signals to be set up and present Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company at the inputs of MARA and MARB before the MARCk clock pulse occurs at the end of cycle 1 shown in Figure 12 . 6 . Af er the clock pulse at the end of cycle 1, the content of PC is in MARA and MARB. At the beginning of cycle 2, the address signals from MARA and Transferring a byte from MARB begin to propagate over the system bus to the memory subsystem. memory to the MDR Now the MemRead signal activates the OE line on the chip in the memory subsystem that is selected by the address decoding circuitry. T ere are so many propagation delays in the memory subsystem that it usually takes many CPU cycles before the data is available on the main system bus. T e Pep/9 computer models this fact by requiring MemRead on three consecutive cycles af er the address is clocked into the MAR. So the MemRead signal is asserted high in cycles 2, 3, and 4. At the end of cycle 4, because the same address was present in the memory address register (MAR) for three consecutive cycles and MemRead was asserted during those three cycles, the data from that address in memory is present on the system bus. In cycle 4, MDRMux=0 sets the MDR line to 0, which routes the data from the system bus through the multiplexer to the input of the memory data register (MDR). T e MDRCk pulse on cycle 4 clocks the instruction specif er into the MDR. Cycle 5 sends the instruction specif er from the MDR into the Sending data from the instruction register, IR, as follows. First, AMux=0 sets the AMux control line MDR to the register bank to 0, which routes the MDR to the output of the AMux multiplexer. Next, ALU=0 sets the ALU control line to 0, which passes the data through the

9781284079630_CH12_689_782.indd 699 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Combining cycles • Cycle 1 puts copy of PC in MAR • During cycles 2 and 3, can increment PC without disturbing MAR • Combine cycle 6 with cycle 2 • Combine cycle 7 with cycle 3 • Saved 2 cycles out of original 7 • Savings in time, 2/7 = 29%

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.7

// Fetch the instruction specifier and increment PC by 1

UnitPre: IR=0x000000, PC=0x00FF, Mem[0x00FF]=0xAB, S=0 UnitPost: IR=0xAB0000, PC=0x0100

// MAR <- PC. 1. A=6, B=7; MARCk // Fetch instruction specifier, PC <- PC + 1. 2. MemRead, A=7, B=23, AMux=1, ALU=1, CMux=1, C=7; SCk, LoadCk 3. MemRead, A=6, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=6; LoadCk 4. MemRead, MDRMux=0; MDRCk // IR <- instruction specifier. 5. AMux=0, ALU=0, CMux=1, C=8; LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 702 CHAPTER 12 Computer Organization

of the MDR, and cycle 5 uses the content of the MDR. T erefore, cycle 5 must happen af er cycle 4. Hardware concurrency is an important issue in computer organization. Designers are always on the alert to use hardware concurrency to improve performance. T e seven-cycle sequence of Figure 12 . 5 would certainly not be used in a real machine, because combining cycles in Figure 12 . 7 gives a performance boost with no increase in circuitry. Although the details of the control section are not shown, you can imagine how it would test the instruction just fetched to determine whether it is unary. T e control section would set B to 8 to put the instruction specif er on BBus, which it could then test. If the fetched instruction is not unary, the control section must fetch the operand specif er, incrementing PC accordingly. T e control sequence to fetch the operand specif er and increment PC is a problem at the end of this chapter. A f er fetching an instruction, the control section tests the instruction specif er to determine which of the Pep/9 ISA3 instructions to execute. T e control signals to execute the instruction depend not only on the opcode, but on the register-r f eld and the addressing-aaa f eld also. FIGURE 12.8 shows the relationship between the operand and the operand specif er (OprndSpec) for each . A quantity in square brackets is a memory address. To execute the instruction, the control section must provide control signals to the data

FIGURE 12 . 8 F I F T H E D I T I O N Figure 12.8 Computer The addressing Systems modes for the Pep/9 computer.

Addressing mode Operand

Immediate OprndSpec

Direct Mem[OprndSpec]

Indirect Mem[Mem[OprndSpec]]

Stack-relative Mem[SP + OprndSpec]

Stack-relative deferred Mem[Mem[SP + OprndSpec]]

Indexed Mem[OprndSpec + X]

Stack-indexed Mem[SP + OprndSpec + X]

Stack-deferred indexed Mem[Mem[SP + OprndSpec] + X]

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 702 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Store byte • Assume operand specifier has been fetched and contains the address of the operand • To write byte to memory ‣ Put the address in the MAR and the data to to be written in the MDR ‣ Assert MemWrite for three consecutive cycles

byte Oprnd r 8..15 h i

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 Figure 12.2 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.9

// STBA there,d // RTL: byteOprnd <- A<8..15> // Direct addressing: Oprnd = Mem[OprndSpec]

UnitPre: IR=0xF1000F, A=0x00AB UnitPost: Mem[0x000F]=0xAB

// MAR <- OprndSpec. 1. A=9, B=10; MARCk // Initiate write, MBR <- A. 2. MemWrite, A=1, AMux=1, ALU=0, CMux=1, MDRMux=1; MDRCk 3. MemWrite 4. MemWrite

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Memory read protocol • Address in MAR before first MemRead cycle • Data in MDR on or before third MemRead cycle • Cannot change MAR on third MemRead cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Memory write protocol • Address in MAR before first MemWrite cycle • Data in MDR on or before second MemWrite cycle • Can put next data in MDR on third MemWrite cycle • Cannot change MAR on third MemWrite cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Pep/9 RTL specification of the instruction set

Instruction Register transfer language specification STOP Stop execution RET PC Mem[SP] ; SP SP + 2 ← ← RETTR NZVC Mem[SP] 4..7 ; A Mem[SP + 1] ; X Mem[SP + 3] ; PC Mem[SP + 5] ; SP Mem[SP + 7] ← ⟨ ⟩ ← ← ← ← MOVSPA A SP ← MOVFLGA A 8..11 0 , A 12..15 NZVC ⟨ ⟩← ⟨ ⟩← MOVAFLG NZVC A 12..15 ← ⟨ ⟩ NOTr r r ; N r < 0 , Z r = 0 ← ¬ ← ← NEGr r r ; N r < 0 , Z r = 0 , V overflow ←− ← ← ← { } ASLr C r 0 , r 0..14 r 1..15 , r 15 0 ; N r < 0 , Z r = 0 , V overflow ← ⟨ ⟩ ⟨ ⟩← ⟨ ⟩ ⟨ ⟩← ← ← ← { } ASRr C r 15 , r 1..15 r 0..14 ; N r < 0 , Z r = 0 ← ⟨ ⟩ ⟨ ⟩← ⟨ ⟩ ← ← ROLr C r 0 , r 0..14 r 1..15 , r 15 C ← ⟨ ⟩ ⟨ ⟩← ⟨ ⟩ ⟨ ⟩← RORr C r 15 , r 1..15 r 0..14 , r 0 C ← ⟨ ⟩ ⟨ ⟩← ⟨ ⟩ ⟨ ⟩← BR PC Oprnd ← BRLE N = 1 Z = 1 PC Oprnd ∨ ⇒ ← BRLT N = 1 PC Oprnd ⇒ ← BREQ Z = 1 PC Oprnd ⇒ ← BRNE Z = 0 PC Oprnd ⇒ ← BRGE N = 0 PC Oprnd ⇒ ← BRGT N = 0 Z = 0 PC Oprnd ∧ ⇒ ← BRV V = 1 PC Oprnd ⇒ ← BRC C = 1 PC Oprnd ⇒ ← CALL SP SP 2;Mem[SP] PC ; PC Oprnd ← − ← ← NOPn Trap: Unary no operation NOP Trap: Nonunary no operation

DECI Trap: Oprnd decimal input ← { } DECO Trap: decimal output Oprnd { } ← HEXO Trap: hexadecimal output Oprnd { } ← STRO Trap: string output Oprnd { } ← ADDSP SP SP + Oprnd ← SUBSP SP SP Oprnd Computer Systems F I ←F T H E D I T− I O N ADDr r r + Oprnd ; N r < 0 , Z r = 0 , V overflow , C carry Store← word ← ← ← { } ← { } SUBr r r Oprnd ; N r < 0 , Z r = 0 , V overflow , C carry ← − ← ← ← { } ← { } •ANDrAssume operandr specifierr Oprnd has been; N fetchedr < 0 , Z r = 0 ← ∧ ← ← ORrand contains ther addressr Oprnd of the ;operandN r < 0 , Z r = 0 • To write word to← memory∨ ← ← CPWr‣ Write first byteT tor memoryOprnd ; N T < 0 , Z T = 0 , V overflow , C carry ; N N V ← − ← ← ← { } ← { } ← ⊕ CPBr‣ Increment OprndSpecT r 8.. by15 1 byte Oprnd ; N T < 0 , Z T = 0 , V 0 , C 0 ← ⟨ ⟩− ← ← ← ← LDWr r Oprnd ; N r < 0 , Z r = 0 ‣ Write second ←byte to memory← ← LDBr r 8..15 byte Oprnd ; N 0 , Z r 8..15 = 0 ⟨ ⟩← ← ← ⟨ ⟩ STWr Oprnd r ← STBr byte Oprnd r 8..15 Copyright← © 2017⟨ by Jones & Bartlett Learning,⟩ LLC an Ascend Learning Company Trap T Mem[FFF6] ; Mem[T 1] IR 0..7 ; Mem[T 3] SP ; Mem[T 5] PC ; Mem[T 7] X ; ← − ← ⟨ ⟩ − ← − ← − ← Mem[T 9] A ; Mem[T 10] 4..7 NZVC ; SP T 10 ; PC Mem[FFFE] − ← − ⟨ ⟩← ← − ←

2 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 Figure 12.2 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.10

// STWA there,d // RTL: Oprnd <- A // Direct addressing: Oprnd = Mem[OprndSpec]

UnitPre: IR=0xE100FF, A=0xABCD, S=0 UnitPost: Mem[0x00FF]=0xABCD

// UnitPre: IR=0xE101FE, A=0xABCD, S=1 // UnitPost: Mem[0x01FE]=0xABCD

// MAR <- OprndSpec. 1. A=9, B=10; MARCk // Initiate write, MDR <- A. 2. MemWrite, A=0, AMux=1, ALU=0, CMux=1, MDRMux=1; MDRCk // Continue write, T2 <- OprndSpec + 1. 3. MemWrite, A=10, B=23, AMux=1, ALU=1, CMux=1, C=13; SCk, LoadCk 4. MemWrite, A=9, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=12; LoadCk

// MAR <- T2. 5. A=12, B=13; MARCk // Initiate write, MDR <- A. 6. MemWrite, A=1, AMux=1, ALU=0, CMux=1, MDRMux=1; MDRCk 7. MemWrite 8. MemWrite

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Add accumulator • Assume immediate addressing • The number to be added is in the operand specifier

r r + Oprnd ; N r < 0 , Z r = 0 ,

V overflow , C carry { } { }

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.1 Constructing a Level-ISA3 Machine 693

FIGURE 12.2 The data section of the Pep/9 CPU.

01 8 14 15 22 23 A IR T3 M1 0x00 0x01 LoadCk 23 991 100 16 17 24 25 Figure 12.2 X T4 M2 0x02 0x03

45 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 67 12 13 20 21 28 29 5 B PC T2 T6 M4 0xF0 0xF6 5 30 31 A CPU registers M5 0xFE 0xFF

CBus ABus BBus System bus MARB MARCk

MARA MDRCk MDR

MDRMux AMux AMux MDRMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux

Cout S SCk Mem C CCk

Addr V VCk

0 AndZ Data 0 0 AndZ Z ZCk 0 Zout

N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 693 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.11

// ADDA this,i // RTL: A <- A + Oprnd; N <- A<0, Z <- A=0, V <- {overflow}, C <- {carry} // Immediate addressing: Oprnd = OprndSpec

UnitPre: IR=0x700FF0, A=0x0F11, N=1, Z=1, V=1, C=1, S=0 UnitPost: A=0x1F01, N=0, Z=0, V=0, C=0

// UnitPre: IR=0x707FF0, A=0x0F11, N=0, Z=1, V=0, C=1, S=0 // UnitPost: A=0x8F01, N=1, Z=0, V=1, C=0

// UnitPre: IR=0x70FF00, A=0xFFAB, N=0, Z=1, V=1, C=0, S=1 // UnitPost: A=0xFEAB, N=1, Z=0, V=0, C=1

// UnitPre: IR=0x70FF00, A=0x0100, N=1, Z=0, V=1, C=0, S=1 // UnitPost: A=0x0000, N=0, Z=1, V=0, C=1

// A <- A + Oprnd, Save shadow carry. 1. A=1, B=10, AMux=1, ALU=1, AndZ=0, CMux=1, C=1; ZCk, SCk, LoadCk // A <- A plus Oprnd plus saved carry. 2. A=0, B=9, AMux=1, CSMux=1, ALU=2, AndZ=1, CMux=1, C=0; NCk, ZCk, VCk, CCk, LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Load index register • Assume indirect addressing • The operand specifier contains the address of the address of the operand

r Oprnd ; N r < 0 , Z r = 0

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.1 Constructing a Level-ISA3 Machine 711

FIGURE 12.13 ComputerThe result Systems of executing the controlF I F T H sequence E D I T I Oof N Figure 12.12. Figure 12.13 Mem CPU 8 IR CA

9 10 0012 26 OprndSpec 00 12 0013 D1 12 13 T2 00 13

26D1 53 Cycles 2–3 26D2 AC 14 15 T3 26 D1

Cycles 6–9 Cycles 1–5 16 17 T4 26 D2

Cycles 11–12 2 3 X 53 AC

Cycles 15–18 Cycles 10–14

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

specif er, and cycle 3 takes care of a possible carry out from the low-order addition to the high-order addition. Cycle 5 puts the computed value of T2 into the MAR over the ABus and BBus at the same time that it puts the MDR into T3 over the CBus. Cycles 6–9 use the address computed in T2 to fetch the low-order byte Transferring Mem[T2] to of the address of the operand. Cycle 6 initiates the memory read, cycle 7 T3 continues the memory read, and cycle 8 completes the read and latches the byte from memory into the MDR. Cycle 9 routes the byte from the MDR through AMux and into the low-order byte of T3. At this point, we can assert that temporary register T3 contains the Transferring Mem[T3] to address of the operand, 26D1 (hex) in this example. Finally, we can get the X and adding 1 to f rst byte of the operand and load it into the f rst byte of the index register. T3 storing in T4 Cycles 10–14 perform that load. T e RTL specif cation for LDWr shows that the instruction af ects the N and Z bits. T e N bit is determined by the sign bit of the most signif cant byte. Consequently, cycle 14 includes the clock

9781284079630_CH12_689_782.indd 711 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.12

// Figure 12.12 // LDWX this,n // RTL: X <- Oprnd; N <- X<0, Z <- X=0 // Indirect addressing: Oprnd = Mem[Mem[OprndSpec]]

UnitPre: IR=0xCA0012, Mem[0x0012]=0x26D1, Mem[0x26D1]=0x53AC UnitPre: N=1, Z=1, V=0, C=1, S=1 UnitPost: X=0x53AC, N=0, Z=0, V=0, C=1

// UnitPre: IR=0xCA0012, X=0xEEEE, Mem[0x0012]=0x00FF, Mem[0x00FF]=0x0000 // UnitPre: N=1, Z=0, V=1, C=0, S=1 // UnitPost: X=0x0000, N=0, Z=1, V=1, C=0

// T3 <- Mem[OprndSpec], T2 <- OprndSpec + 1. 1. A=9, B=10; MARCk 2. MemRead, A=10, B=23, AMux=1, ALU=1, CMux=1, C=13; SCk, LoadCk 3. MemRead, A=9, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=12; LoadCk 4. MemRead, MDRMux=0; MDRCk 5. A=12, B=13, AMux=0, ALU=0, CMux=1, C=14; MARCk, LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Figure 12.12 Computer Systems F I F T H E D I T I O N (continued)

// T3 <- Mem[T2]. 6. MemRead 7. MemRead 8. MemRead, MDRMux=0; MDRCk 9. AMux=0, ALU=0, CMux=1, C=15; LoadCk

// Assert: T3 contains the address of the operand. // X <- Mem[T3], T4 <- T3 + 1. 10. A=14, B=15; MARCk 11. MemRead, A=15, B=23, AMux=1, ALU=1, CMux=1, C=17; SCk, LoadCk 12. MemRead, A=14, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=16; LoadCk 13. MemRead, MDRMux=0; MDRCk 14. A=16, B=17, AMux=0, ALU=0, AndZ=0, CMux=1, C=2; NCk, ZCk, MARCk, LoadCk

// X <- Mem[T4]. 15. MemRead 16. MemRead 17. MemRead, MDRMux=0; MDRCk 18. AMux=0, ALU=0, AndZ=1, CMux=1, C=3; ZCk, LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Arithmetic shift right accumulator • Unary instruction • No memory read or write

C r 15 , r 1..15 r 0..14 ; h i h i h i N r < 0 , Z r = 0

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 10.4 Combinational Devices 625

of the four-input OR gate in Figure 10 . 59 acts as the enable signal, which allows all 10 outputs of the unit to pass through when one of d, e, f, or g is 1. In the same way that a 16-bit addition can be composed of an 8-bit addition of the low-order bytes followed by an 8-bit addition of the high- order bytes, the shif and rotate operations can be composed of two 8-bit operations. FIGURE 10.62 shows the specif cation of the shif and rotate operations for the Pep/9 ALU of Figure 10 . 55 . Although it is not shown in Figure 10 . 62 , the V bit is set by both the ROL (rotate lef ) and ASL (arithmetic shif lef ) operations. Figures 10.62(a) and (b) show that a 16-bit arithmetic shif right is composed f rst of an 8-bit arithmetic shif right of the high-order byte. T at shif duplicates the sign bit and shif s its low-order bit into Cout. Following that operation, a rotate right of the low-order byte takes the Cout from the high-order shif as its Cin and rotates its low-order bit into Cout. Figures 10.62(c) and (d) show that a 16-bit arithmetic shif lef starts with an arithmetic shif lef of the low-order byte in part (d). Cout gets the most signif cant bit, which is used as Cin for the subsequent rotate lef of F I F T H E D I T I O N Figure 10.62 Computerthe high-order byte in partSystems (c). T e second step shows why the ALU sets the V bit on the ROL operation. Figure 5 . 2 shows that the ROLr instruction at Level Asmb5 does not af ect the V bit. However, at this lower level of

FIGURE 10 . 62 Specifi cation of the shift and rotate operations.

Cin

Cout Cout (a) Arithmetic shift right (ASR). (b) Rotate right (ROR).

Cin

0

Cout Cout

(c) Rotate left (ROL). (d) ArithmeticCopyright shift © left 2017 (ASL). by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH10_575_636.indd 625 29/01/16 8:28 am Computer Systems F I F T H E D I T I O N Figure 12.14

// ASRA // RTL: C <- A<15>, A<1..15> <- A<0..14>; N <- A<0, Z <- A=0

UnitPre: IR=0x0C0000, A=0xFF01, N=1, Z=1, V=1, C=0, S=0 UnitPost: A=0xFF80, N=1, Z=0, V=1, C=1

// UnitPre: IR=0x0C0000, A=0x7E00, N=1, Z=1, V=0, C=0, S=1 // UnitPost: A=0x3F00, N=0, Z=0, V=0, C=0

// UnitPre: IR=0x0C0000, A=0x0001, N=1, Z=1, V=0, C=0, S=1 // UnitPost: A=0x0000, N=0, Z=1, V=0, C=1

// Arithmetic shift right of high-order byte. 1. A=0, AMux=1, ALU=13, AndZ=0, CMux=1, C=0; NCk, ZCk, SCk, LoadCk // Rotate right of low-order byte. 2. A=1, AMux=1, CSMux=1, ALU=14, AndZ=1, CMux=1, C=1; ZCk, CCk, LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N

The Pep/9 control section

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Level Mc2 for Pep/9 • uMem: Microcode ROM memory • uPC: Microcode program • uIR: Microcode instruction register • Micro-von Neumann cycle ‣ No increment part of the cycle ‣ Each micro-instruction contains the address of the next instruction

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 714 CHAPTER 12 Computer Organization

abstraction that lies between ISA3 and LG1. Like all levels of abstraction, this level has its own language consisting of a set of microprogramming statements. T e control section is its own micromachine with its own uMem, uPC, and uIR micromemory, uMem; its own microprogram counter, uPC; and its own microinstruction register, uIR. Unlike the machine at Level ISA3, the machine at Level Mc2 has only one program that is burned into uMem ROM. Once the chip is manufactured, the microprogram can never be changed. T e program contains a single loop whose sole purpose is to implement the ISA3 von Neumann cycle. FIGURE 12.15 shows the control section of Pep/9 implemented in microcode at Level Mc2. T e data section of Figure 12 . 2 has 34 control lines

FIGURE 12.15 F I F T H E D I T I O N Figure 12.15 ComputerA microcode implementation Systems of the control section of the Pep/9 CPU.

N Z V 12 Decoder/ k branch unit C BBus

uPC LoadCk C 5 k 5 B uMem 5 address A 0 MARCk 1 MDRCk 2 AMux MDRMux 2 k × n bit CMux uMem 4 ROM ALU CSMux SCk 2 k 1 CCk VCk n AndZ ZCk uIR NCk n 34 MemWrite MemRead 34 Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 714 29/01/16 8:27 am 12.1 Constructing a Level-ISA3 Machine 715

coming from the right to control the data f ow. T ese correspond to the 34 control lines going to the lef from Figure 12 . 15 . In the data section of Figure 12 . 2 , there are a total of 12 data lines going to the right to the control section—8 from the BBus and 4 from the status bits. T ese correspond to the 12 data lines coming from the right in Figure 12 . 15 . T e microprogram counter contains the address of the next microin- struction to execute. uPC is k bits wide, so that it can point to any of the 2k instructions in uMem. A microinstruction is n bits wide, so that is the width of each cell in uMem and also the width of uIR. At Level Mc2, there is no reason to require that the width of a microinstruction be an even power of 2. n can be any oddball value that you want to make it. T is f exibility is due to the fact that uMem contains only instructions with no data. Because instruc- tions and data are not commingled, there is no need to require the memory cell size to accommodate both. FIGURE 12.16 shows the instruction format of a microinstruction. T e rightmost 34 bits are the control signals to send to the data section. T e remaining f eld consists of two parts—a Branch f eld and an Addr f eld. T e at Level ISA3 is incremented because the normal f ow of control is to have the instructions stored and executed sequentially in main memory. T e only deviation from this state of af airs is when PC changes due to a branch instruction. uPC at Level Mc2, however, is not incremented. Instead, every microinstruction contains within it information to compute the address of the next microinstruction. T e Branch f eld specif es how to compute the address of the next microinstruction, and the Addr f eld contains data to be used in the computation. For example, if the next microinstruction does not depend on any of the 12 signals from the data section, Branch will specify an unconditional branch, and Addr will be the address of the next microinstruction. In ef ect, every instruction is a branch instruction. To execute a set of microinstructions in sequence, you make each microinstruction an unconditional branch to the F I F T H E D I T I O N Figure 12.16 Computernext one. T e Decoder/branch Systems unit in Figure 12 . 15 is designed to pass Addr straight through to uPC when the Branch f eld specif es an unconditional branch,Microinstruction regardless of the values of the 12 lines from theformat data section.

FIGURE 12.16 The instruction format of a microinstruction.

n – 34 34

Branch Addr CBA ALU

n

9781284079630_CH12_689_782.indd 715 29/01/16 8:27 am

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N

Motorola MC68000, 1980

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Increasing performance • Increase data bus width • Insert a between CPU and memory • Increase hardware parallelism with pipelining

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N

Increase data bus width

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 718 CHAPTER 12 Computer Organization

Computer FIGURE Systems 12. 17 F I F T H E D I T I O N Figure 12.17 The data section of the Pep/9 CPU with a two-byte data bus.

CBus ABus BBus System MARMux bus MARB

MARMux MARCk MARA

MDROCk MDROdd

MDROMux MDROMux

MDRECk MDREven MDREMux MDREMux EOMux EOMux

AMux AMux

CMux ALU 4 ALU CMux Cin

CSMux CSMux Cout S SCk

C CCk

V VCk AndZ 0 0 AndZ Z ZCk 0 Zout 0 N NCk

MemWrite MemRead Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 718 29/01/16 8:27 am 718 CHAPTER 12 Computer Organization

FIGURE 12. 17 Figure 12.17 TheComputer data section of the SystemsPep/9 CPU with a two-byteF I F T Hdata E Dbus. I T I O N (expanded) CBus ABus BBus System MARMux bus MARB

MARMux MARCk MARA

MDROCk MDROdd

MDROMux MDROMux

MDRECk MDREven MDREMux MDREMux EOMux EOMux

AMux AMux

Copyright © 2017 by Jones & Bartlett Learning, LLC an AscendCMux Learning Company ALU 4 ALU CMux Cin

CSMux CSMux Cout S SCk

C CCk

V VCk AndZ 0 0 AndZ Z ZCk 0 Zout 0 N NCk

MemWrite MemRead

9781284079630_CH12_689_782.indd 718 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MARMux control signals • 0 on MARMux routes MDREven to MARA and MDROdd to MARB • 1 on MARMux routes ABus to MARA and BBus to MARB

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.2 Performance 719

Computer Systems F I F T H E D I T I O N Figure 12.18

FIGURE 12.18 Addresses of dataMemory delivered by the memory read subsystem to the CPU with a two-byte data bus.

Delivered CPU Address Request MDREven MDROdd

0AB6 0AB6 0AB7

0AB7 0AB6 0AB7

0AB8 0AB8 0AB9

0AB9 0AB8 0AB9

for each two-byte word instead of for each byte. Today, however, virtually all computer memories are byte-addressable. FIGURE 12.18 shows the addresses of the dataCopyright delivered © 2017 by Jones & Bartlettby Learning,the LLCmemory an Ascend Learning Company subsystem to the CPU in response to some example CPU requests. If the CPU puts 0AB6 in the MAR, the memory will deliver the content of Mem[0AB6] in MDREven and Mem[0AB7] in MDROdd. T at is, it delivers the byte at the requested address and the byte at the following address. However, if the CPU puts 0AB7 in the MAR, the memory system delivers the same two bytes in the same data registers. T at is, it delivers the byte at the requested address and the byte at the preceding address. It always delivers data from an even address to MDREven and from an odd address to MDROdd, regardless of whether the CPU memory request is even or odd. FIGURE 12.19 shows the memory pinout of the chips for the 8-line data bus of F i g u r e 1 2 . 2 and for the 16-line data bus of F i g u r e 1 2 . 1 7 . One obvious dif erence between the two pinouts is the set of 16 data lines for the two-byte bus as opposed to the 8 data lines for the one-byte bus. Another dif erence is the absence of the lowest-order address line A0 in the two-byte bus in part (b). Why is it missing? Because it is never used. F i g u r e 1 2 . 1 8 shows that a memory request from 0AB6 and one from 0AB7 produce the same access—namely, Mem[0AB6] delivered to MDREven and Mem[0AB7] delivered to MDROdd. Because 0AB6 (hex) is 0000 1010 1011 0110 (bin) and 0AB7 (hex) is 0000 1010 1011 0111 (bin), the last bit A0 is irrelevant to the memory access. T e address line A0 literally does not need to be on the bus, which makes it physically a 15-line address bus even though it is logically a 16-line address bus. F i g u r e 1 2 . 2 for the one-byte data bus shows the MDR output connected to the AMux input so you can inject it into the ABus–CBus data loop. With the two-byte data bus of F i g u r e 1 2 . 1 7 , however, there are two memory data registers. T e EOMux, which stands for even-odd multiplexer, selects one of

9781284079630_CH12_689_782.indd 719 29/01/16 8:27 am 720 CHAPTER 12 Computer Organization

FIGURE 12.19 F I F T H E D I T I O N Figure 12.19 TheComputer pinout diagrams for twoSystems Pep/9 main memory chips.

A0 D0 D0 A1 D1 A1 D1 A2 D2 A2 D2 A3 D3 A3 D3 A4 D4 A4 D4 A5 D5 A5 D5 A6 D6 A6 D6 A7 D7 A7 D7 A8 A8 D8 A9 A9 D9 A10 A10 D10 A11 A11 D11 A12 A12 D12 A13 A13 D13 A14 A14 D14 A15 A15 D15

MemRead MemWrite MemRead MemWrite

(a) The chip of Figure 12.2 with an (b) The chip of Figure 12.17 with a 8-line data bus. 16-line data bus.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company the two memory data registers to be sent to the AMux input and from there to the ABus–CBus data loop. It works like the other multiplexers, with an EOMux control value of 0 routing the MDREven output to the input of AMux and a control value of 1 routing the MDROdd output to the input of AMux. Another performance feature of the Pep/9 CPU design of Figure 12 . 17 is an additional two-byte data path from the MDR registers to the MAR registers. T is path alleviates a bottleneck in those cases when a memory address is read from memory and subsequently needs to be sent to the MAR. Instead of routing the address one byte at a time from the MDR registers, through the register bank, and then to the MAR, this two-byte data path allows both bytes of the MDR registers to be sent to the MAR registers in only one cycle. T e box labeled MARMux is a 16-bit multiplexer controlled by the MARMux signal. Here is the rule for the control signal:

MARMux control signals ❯ 0 on the MARMux control line routes MDREven to MARA and MDROdd to MARB. ❯ 1 on the MARMux control line routes ABus to MARA and BBus to MARB.

9781284079630_CH12_689_782.indd 720 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Fetch • If PC is even, fetch instruction specifier and prefetch the next byte ‣ Store prefetched byte in T1 ‣ Cost – one extra cycle • If PC is odd, get instruction specifier in T1 ‣ No memory read necessary ‣ Savings – two cycles

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.20

// Two-byte data bus // Fetch the instruction specifier and increment PC by 1 // Assume: PC is even and pre-fetch the next byte

UnitPre: IR=0x000000, PC=0x00FE, Mem[0x00FE]=0xABCD, S=1 UnitPost: IR=0xAB0000, PC=0x00FF, T1=0xCD

// MAR <- PC. 1. A=6, B=7, MARMux=1; MARCk // Initiate fetch, PC <- PC + 1. 2. MemRead, A=7, B=23, AMux=1, ALU=1, CMux=1, C=7; SCk, LoadCk 3. MemRead, A=6, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=6; LoadCk 4. MemRead, MDREMux=0, MDROMux=0; MDRECk, MDROCk // IR <- MDREven, T1 <- MDROdd. 5. EOMux=0, AMux=0, ALU=0, CMux=1, C=8; LoadCk 6. EOMux=1, AMux=0, ALU=0, CMux=1, C=11; LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.21

// Two-byte data bus // Fetch the instruction specifier and increment PC by 1 // Assume instruction specifier has been pre-fetched

UnitPre: IR=0x000000, PC=0x01FF, T1=0x12, S=0 UnitPost: IR=0x120000, PC=0x0200

// Fetch instruction specifier. 1. A=11, AMux=1, ALU=0, CMux=1, C=8; LoadCk

// PC <- PC plus 1. 2. A=7, B=23, AMux=1, ALU=1, CMux=1, C=7; SCk, LoadCk 3. A=6, B=22, AMux=1, CSMux=1, ALU=2, CMux=1, C=6; LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.22 12.2 Performance 723 Data alignment FIGURE 12.22 Data alignment in the program of Figure 5.26.

0000 12000A BR main 0000 120009 BR main bonus: .EQUATE 10 bonus: .EQUATE 10 0003 00 .ALIGN 2 0003 0000 exam1: .BLOCK 2 0004 0000 exam1: .BLOCK 2 0005 0000 exam2: .BLOCK 2 0006 0000 exam2: .BLOCK 2 0007 0000 score: .BLOCK 2 0008 0000 score: .BLOCK 2 ; ; 0009 310003 main: DECI exam1,d 000A 310004 main: DECI exam1,d 000C 310005 DECI exam2,d 000D 310006 DECI exam2,d 000F C10003 LDWA exam1,d 0010 C10004 LDWA exam1,d 0012 610005 ADDA exam2,d 0013 610006 ADDA exam2,d

(a) Without data alignment. (b) With data alignment.

f xed location 0003. Consider execution of the LDWA instruction. It loads the Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company accumulator with the value stored at 0003. Because 0003 is odd, a memory Alignment of data request by the CPU from this location puts Mem[0002] in MDREven and Mem[0003] in MDROdd. Even though the memory access loads two bytes from memory, it loads only the f rst byte of exam1 into the memory data register. T e CPU can use only the value in MDROdd and needs to make a second memory access for the value from Mem[0004]. T e benef t of the two-byte data bus is lost, as two memory accesses are still required. If the f rst byte of exam1 is stored at an even address, both bytes can be accessed with a single memory access. Figure 12 . 22 (b) shows the same program but with an additional .ALIGN dot command inserted before the T e .ALIGN assembler declaration of exam1 . T e ef ect of a .ALIGN command is to insert an extra directive zero byte if necessary so that the code generated by the following line will begin at an even address. In Figure 12 . 22 (b), the extra byte generated by the alignment dot command forces exam1 to be stored at 0004 instead of at 0003. With this program, the LDWA instruction causes the CPU to request a memory access from 0004, which puts Mem[0004] in MDREven and Mem[0005] in MDROdd. Only one memory access is required to load the value of the integer. Alignment commands in assembly languages take an integer argument that is a power of 2. In Figure 12 . 22 (b), the alignment command is

.ALIGN 2

9781284079630_CH12_689_782.indd 723 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Load index register • Two-byte data bus • Assume indirect addressing • The operand specifier contains the address of the address of the operand

r Oprnd ; N r < 0 , Z r = 0

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.23

// Two-byte data bus // LDWX this,n // RTL: X <- Oprnd; N <- X<0, Z <- X=0 // Indirect addressing: Oprnd = Mem[Mem[OprndSpec]]

UnitPre: IR=0xCA0012, Mem[0x0012]=0x26D2, Mem[0x26D2]=0x53AC UnitPre: N=1, Z=1, V=0, C=1 UnitPost: X=0x53AC, N=0, Z=0, V=0, C=1

// MDR <- Mem[OprndSpec]. 1. A=9, B=10, MARMux=1; MARCk 2. MemRead 3. MemRead 4. MemRead, MDROMux=0, MDREMux=0; MDROCk, MDRECk

// MAR <- MDR. 5. MARMux=0; MARCk

// MDR <- two-byte operand. 6. MemRead 7. MemRead 8. MemRead, MDROMux=0, MDREMux=0; MDROCk, MDRECk

// X <- MDR, high-order first. 9. EOMux=0, AMux=0, ALU=0, AndZ=0, CMux=1, C=2; NCk, ZCk, LoadCk 10. EOMux=1, AMux=0, ALU=0, ANDZ=1, CMux=1, C=3; ZCk, LoadCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Result • One-byte data bus takes 18 cycles • Two-byte data bus takes 10 cycles • 18 – 10 = 8 cycles saved • Time savings is 8 / 18 = 44%

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 726 CHAPTER 12 Computer Organization Computer Systems F I F T H E D I T I O N Figure 12.24

FIGURE 12.24 Program alignmentProgram in the program of Figure 6.8. alignment 0000 120004 BR main limit: .EQUATE 100 num: .EQUATE 0 0000 120003 BR main ; limit: .EQUATE 100 0003 00 .ALIGN 2 num: .EQUATE 0 0004 580002 main: SUBSP 2,i ; 0007 330000 DECI num,s 0003 580002 main: SUBSP 2,i 000A C30000 if: LDWA num,s 0006 330000 DECI num,s 000D A00064 CPWA limit,i 0009 C30000 if: LDWA num,s 0010 16001A BRLT else 000C A00064 CPWA limit,i 0013 490022 STRO msg1,d 000F 160018 BRLT else 0016 12001E BR endIf 0012 49001F STRO msg1,d 0019 26 NOP0 0015 12001B BR endIf 001A 490028 else: STRO msg2,d 0018 490025 else: STRO msg2,d 001D 26 NOP0 001B 500002 endIf: ADDSP 2,i 001E 500002 endIf: ADDSP 2,i 001E 00 STOP 0021 00 STOP

(a) Without program alignment. (b) With program alignment. Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

to an instruction at 0003, which is odd. T e .ALIGN statement in part (b) f xes that problem, forcing the main program code to begin at 0004, which is even. Inserting a .ALIGN statement at the beginning of every function, including main , is required. T e program contains two other branch targets, if and endIf . To force these statements to lie on even addresses, the assembler must insert the unary no operation instruction NOP0 before each target instruction. In Pep/9, the NOP0 instruction is a trap instruction and so would cause a big performance degradation that would nullify the benef t of the wide data bus. All real processors have native no-operation instructions that execute in one cycle and are used to align program statements. Two other statements that modify the program counter are call and ret. T e call statement is a three-byte instruction that puts the incremented program counter on the run-time stack as the return address. T e ret statement branches to that address. T us, the address following the call statement must be at an even address. It follows that every call statement must be at an odd address and can therefore never be the target of a branch instruction. To f x a program that contains a branch to a call

9781284079630_CH12_689_782.indd 726 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N n-bit computers • An n-bit computer has n bits in the MAR and in the CPU registers that are visible at level ISA3 (usually equal) • The registers in the register bank at level LG1 could have width less than n • The data bus could have width greater than n • The address bus could have width less than n

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 728 CHAPTER 12 Computer Organization

Computer Systems F I F T H E D I T I O N Figure 12.25 FIGURE 12. 25 Maximum memory limits as a function of the MAR width.

MAR Width Number of Addressable Bytes

8 256

16 64 Ki

32 4 Gi

64 17,179,869,184 Gi

the circuit board, however, the manufacturer designs the system bus with only 35 address lines connected to MAR lines A0 through A34. Lines A35 through A63 of the MAR remain unconnected.Copyright © 2017 Because by Jones & Bartlett Learning, 2 35LLC anbytes Ascend Learning isCompany 32 GiB, the product will allow the user to install up to 32 GiB of memory. It is even possible for the number of data lines in the main system bus to be greater than n . For example, Pep/9, which is a 16-bit computer, could have a 32-bit wide data bus that would fetch 8 bytes with each memory read. In the same way that the microcode sequence in Figure 12.20 prefetches an additional byte that eliminates the need for a later memory access, all the unused bytes from the 8-byte memory access can be saved in the CPU, possibly eliminating many future memory accesses. T e idea of prefetching more data than necessary to eliminate future memory accesses is the basis of cache memories described in the next section. Assuming that the internal registers, the internal buses, and the data/ address buses on the system bus all have the same width, increasing that common width will increase the performance. Increasing the common width of these components has a large impact on the size of the chip. All FIGURE 12. 26 the circuits, including the ALU, the multiplexers, and the registers, must Historic register/bus be increased to accommodate the larger width. T e history of computers widths. shows a progression toward ever-wider buses and registers. FIGURE 12.26 shows how the CPUs increased in bus width from 4 bits to 64 bits. T e Register 4004 was the f rst on a chip. T e f rst 64-bit from ChipDate Width Intel was a version of the chip. T e market is in the of 4004 1971 4-bit completing the transition from 32-bit machines to 64-bit machines. Most 8008 1972 8-bit desktop and laptop machines today have 64-bit processors, while most mobile devices have 32-bit processors. 8086 1978 16-bit T e progression is possible only because technology advances at such 80386 1985 32-bit a regular pace. Gordon Moore, the founder of Intel, observed in 1965 that the density of transistors in an had doubled every year –64 2004 64-bit and would continue to do so for the foreseeable future. T e rate has slowed

9781284079630_CH12_689_782.indd 728 29/01/16 8:27 am 728 CHAPTER 12 Computer Organization

FIGURE 12 . 25 Maximum memory limits as a function of the MAR width.

MAR Width Number of Addressable Bytes

8 256

16 64 Ki

32 4 Gi

64 17,179,869,184 Gi

the circuit board, however, the manufacturer designs the system bus with only 35 address lines connected to MAR lines A0 through A34. Lines A35 through A63 of the MAR remain unconnected. Because 235 bytes is 32 GiB, the product will allow the user to install up to 32 GiB of memory. It is even possible for the number of data lines in the main system bus to be greater than n . For example, Pep/9, which is a 16-bit computer, could have a 32-bit wide data bus that would fetch 8 bytes with each memory read. In the same way that the microcode sequence in Figure 12.20 prefetches an additional byte that eliminates the need for a later memory access, all the unused bytes from the 8-byte memory access can be saved in the CPU, possibly eliminating many future memory accesses. T e idea of prefetching more data than necessary to eliminate future memory accesses is the basis of cache memories described in the next section. Assuming that the internal registers, the internal buses, and the data/ address buses on the system bus all have the same width, increasing that common width will increase the performance. Increasing the common width of these components has a large impact on the size of the chip. All FIGURE 12 . 26 the circuits, including the ALU, the multiplexers, and the registers, must Computer Systems HistoricF register/busI F T H E D I T I O N Figurebe 12.26 increased to accommodate the larger width. T e history of computers widths. shows a progression toward ever-wider buses and registers. FIGURE 12.26 shows how the Intel CPUs increased in bus width from 4 bits to 64 bits. T e Register 4004 was the f rst microprocessor on a chip. T e f rst 64-bit processor from ChipDate Width Intel was a version of the Pentium 4 chip. T e market is in the process of 4004 1971 4-bit completing the transition from 32-bit machines to 64-bit machines. Most 8008 1972 8-bit desktop and laptop machines today have 64-bit processors, while most mobile devices have 32-bit processors. 8086 1978 16-bit T e progression is possible only because technology advances at such 80386 1985 32-bit a regular pace. Gordon Moore, the founder of Intel, observed in 1965 that the density of transistors in an integrated circuit had doubled every year x86–64 2004 64-bit and would continue to do so for the foreseeable future. T e rate has slowed

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 728 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Cache memory • In practice, it takes more than three cycles to fetch data from main memory to the CPU • Problem: CPU would spend many cycles waiting for memory reads • Solution: Construct a high-speed memory near the CPU and pre-fetch the data • The cache has a copy of the data in memory • Requires that you predict the future Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Predicting the future • Spacial locality – An address close to the previously requested address is likely to be requested in the near future • Temporal locality – The previously requested address itself is likely to be requested in the near future

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Three levels of cache • Split L1 instruction and data – smallest, fastest, closest to the CPU • Unified L2 – Between L1 and L3 • Unified L3 – largest, slowest, farthest from the CPU

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.2 Performance 731

a huge size and speed dif erence between the fastest memory technology and the slowest. It is the classic space/time tradeof . T e more space you are willing to devote per memory cell, the faster you can make the memory operate. Memory technology provides a range of designs between these two extremes. Corresponding to this range, three levels of cache are typical between the CPU and the main memory subsystem:

❯ Split L1 instruction and data cache—smallest, fastest, closest to the T ree levels of cache in a CPU computer system ❯ U n i f ed L2 cache—between the L1 and L3 caches ❯ U n i f ed L3 cache—largest, slowest, farthest from the CPU

FIGURE 12.27 shows the three levels for a quad-core CPU. In the f gure, the dashed line marks the boundary of a single package—that is, a single part

FIGURE 12 . 27 F I F T H E D I T I O N Figure 12.27 Computer Three levels of cache inSystems a typical computer system.

CPU package

CPU 1 CPU 2 CPU 3 CPU 4

L1 L1 L1 L1 L1 L1 L1 L1 instruction data instruction data instruction data instruction data cache cache cache cache cache cache cache cache

L2 unified cache L2 unified cache L2 unified cache L2 unified cache

L3 unified cache

System bus

Main memory subsystem

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 731 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Cache operation • When the CPU requests a load from memory it first checks the cache • A cache hit occurs if the requested data is in the cache ‣ Can fetch from cache immediately • A cache miss occurs if the requested data is not in the cache ‣ Must fetch from memory

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Cache designs • There are two types of cache designs ‣ Direct-mapped cache ‣ Set-associative cache

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Direct-mapped cache • Physical address is divided into three fields ‣ A high-order tag field ‣ A line field ‣ A low-order byte field • Memory is divided into lines or blocks • When one byte is requested on a cache miss, the entire line containing the byte is loaded into the cache

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.2 Performance 733

FIGURE 12 . 28 A direct-mapped cache.

Mem 9 3 4 0 Figure 12.28 Tag Line Byte 16 Address 32 48 64 80 96 Cache 112 0 128 CPU 1 144 2 160 3 176 4 192 5 208 6 224 7 240 Tag Data 256 272 Valid 288 304 320 336 352 368

16 bytes

from memory. In this example, each cache cell holds a total of 1 + 9 + 128 = 138 bits. As there are eight cells in the cache, the entire cache holds a total of 138 × 8 = 1104 bits. FIGURE 12.29 is a pseudocode descritption of the operation of a direct- mapped cache. When the system starts up, it sets all the Valid bits in the cache to 0. T e very f rst memory request will be a miss. T e address of the cache line from memory to be loaded is simply the Tag f eld of the address, with the last four bits set to 0. T at line is fetched from memory and stored in the cache cell with the Valid bit set to 1 and the Tag f eld extracted from the address and stored as well. If another request asks for a byte in the same line, it will be a hit. T e system extracts the Line f eld from the address, goes to that line in the cache, and determines that the Valid bit is 1 and that the Tag f eld of the request matches the Tag f eld stored in the cache cell. It extracts the byte or word from the Data part of the cache cell and gives it to the CPU without a memory read. If the Valid bit

9781284079630_CH12_689_782.indd 733 29/01/16 8:27 am 734 CHAPTER 12 Computer Organization

Computer Systems F I F T H E D I T I O N Figure 12.29

FIGURE 12 . 29 A pseudocodeOperation description of the operation of ofcache a direct-mapped cache.

Extract the Line f eld from the CPU memory address request Retrieve the Valid/Tag/Data cache entry from the Line row if (Valid == 0) { Cache miss, memory fetch } else if (Tag from cache != Tag from memory request){ Cache miss, memory fetch } else { Cache hit, use Data f eld from cache }

is 1 but the Tag f elds do not match, it is a miss. ACopyright memory © 2017 by Jones &request Bartlett Learning, LLC is an Ascend necessary, Learning Company and the Tag and Data f elds replace the old ones in the same cache cell.

Example 12.1 T e CPU requests the byte at address 3519 (dec). What are the nine bits of the Tag f eld, what are the four bits of the Byte f eld, and which cell of the cache stores the data? Converting to binary and extracting the f elds gives

3519 (dec) = 000011011 011 1111 (bin)

T e nine bits in the Tag f eld are 000011011, the four bits in the Byte f eld are 1111, and the data is stored at address 011 (bin) = 3 (dec) in the cache. ❚ A def ciency of direct- Figure 12 .28 shows that the blocks of memory at addresses 16, 144, mapped caches 272, . . . , all contend for the same cache entry at address 1. Because there are nine bits in the Tag f eld, there are 29 = 512 blocks in memory that contend for each cache cell, which can hold only one at a time. T ere is a pattern of requests resulting in a high cache miss rate that arises from switching back and forth between two f xed areas of memory. An example is a program with pointers on the run-time stack at high memory in Pep/9 and the heap at low memory. A program that accesses pointers and the cells to which they point will have such an access pattern. If it happens that the pointer and the cell to which it points have the same Line f eld in their addresses, the miss rate will increase substantially. Set-associative caches Set-associative caches are designed to alleviate this problem. Instead of having each cache entry hold just one cache line from memory, it can hold several. FIGURE 12.30(a) shows a four-way set-associative cache. It duplicates the cache of Figure 12 . 29 four times and allows a set of up to four blocks of memory with the same Line f elds to be in the cache at any

9781284079630_CH12_689_782.indd 734 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Direct-mapped cache • It is possible to get a pattern of requests that result in a high cache miss rate • The program back and forth between a low region of memory and a high region that map to the same cache entry • Example: Pep/9 heap at low region of memory and local pointers on the run-time stack at high region of memory

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Set-associative cache • Increases the cache hit rate • Each cache entry can hold several lines of data from memory • A cache that can store up to four lines of data in each cache entry is a four-way set- associative cache

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.2 Performance 735 Computer Systems F I F T H E D I T I O N Figure 12.30

FIGURE 12 . 30 Block diagram of cache storage A four-way set-associative cache.

0 1 2 3 4 5 6 7 Tag Data Tag Data Tag Data Tag Data

Valid Valid Valid Valid

(a) Block diagram of cache storage.

Address Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Tag Line Byte 3

Cache storage Cache address

Cache data 9 TagV Data TagV Data TagV Data V Tag Data

= = = =

128 128 128 128

128 four-input multiplexers

128

Data Hit

(b) Implementation of read circuit.

9781284079630_CH12_689_782.indd 735 29/01/16 8:27 am 12.2 Performance 735

FIGURE 12 . 30 A four-way set-associative cache.

0 1 2 3 4 5 6 7 Tag Data Tag Data Tag Data Tag Data

Valid Valid Valid Valid

(a) Block diagram of cache storage.

Address Figure 12.30 Tag Line Byte 3 (continued)

Cache storage Cache address

Cache data 9 TagV Data TagV Data TagV Data V Tag Data

= = = =

128 128 128 128

128 four-input multiplexers

128

Data Hit Read circuit

(b) Implementation of read circuit.

9781284079630_CH12_689_782.indd 735 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Replacement policy • If a cache miss occurs and all parts of the cache are filled, which one is overwritten by the new data? • Least Recently Used (LRU) • Requires only one bit per entry with a two- way set associative cache • Four-way set associative is more difficult, and LRU approximation is common

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Cache write policies with cache hits • Write through ‣ Every write request updates the cache and the corresponding block in memory • Write back ‣ A write request only updates the cache copy ‣ A write to memory only happens when the cache line is replaced

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.2 Performance 737

FIGUREComputer 12 . 31 Systems F I F T H E D I T I O N Figure 12.31 Cache write policies with cache hits.

EachEach write write EachEach write write CPUCPU CacheCache MemMem CPUCPU CacheCache MemMem

ealp rnehW pe l rnehW pe l deca deca rnehW pe l rnehW pe l deca deca

(a) (a)Write Write throu through.gh. (b) (b)Write Write back. back. because memory must be updated before the new data can be loaded into the cache. By design, this event should happen rarely with a high percentageCopyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company of cache hits. Another issue to resolve with caches is what to do with a write request in Cache write policies with conjunction with a cache miss. T e policy known as write allocation brings cache misses the block from memory into the cache, possibly replacing another cache line, and then updates the cache line according to its normal cache write strategy. Without write allocation, a memory write is initiated, bypassing the cache altogether. T e idea here is that the CPU can continue its processing concurrently with the completion of the memory write. FIGURE 12.32 shows the cache write policies without and with write allocation. Although either cache write policy with cache misses can be combined with either cache write policy with cache hits, write allocation is normally used in caches with write back for cache hits. Write-through caches also tend to not use write allocation in order to keep the design simple. T e design choice of most caches is either parts (a) of Figures 12.31 and 12.32 on the one hand or parts (b) of Figures 12.31 and 12.32 on the other. T e discussion of cache memories should sound familiar if you have read the discussion of in Section 9.2. In the same way that the motivation behind virtual memory is the mismatch between the small size of main memory and the size of an executable program stored on a hard drive, the motivation behind caches is the mismatch between the small size

9781284079630_CH12_689_782.indd 737 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Write through • Simpler design • When cache line is replaced, memory always has the latest update • Excessive bus traffic can reduce performance of other devices using the bus

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Write back • Better performance of other devices using the bus, especially when you get a burst of write requests • There is a delay when the cache line must be replaced, because memory must be updated before the new data can be loaded into the cache

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Cache write policies with cache misses • Without write allocation ‣ Bypass the cache altogether ‣ Normally used with write through • With write allocation ‣ Load the block from memory into the cache and update the cache line ‣ Normally used with write back

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 738 CHAPTER 12 Computer Organization Computer Systems F I F T H E D I T I O N Figure 12.32

FIGURE 12 .32 Cache write policies with a cache miss.

CPU Cache Mem

CPU Cache Mem

(a) Without write allocation. (b) With write allocation.

of the cache and the size of main memory. T e LRU page replacement policy in a virtual memory system corresponds to the LRU replacement policy of a cache line. Cache is to main memory as main memory is to disk. In Memory hierarchies both cases, there is a that spans two extremes of a small Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company high-speed memory subsystem that must interface with a large low-speed memory subsystem. Design solutions in both hierarchies rely on locality of reference. It is no accident that designs in both f elds have common issues and solutions. Another indication of the universality of these principles is the hash table data structure in sof ware. In Figure 12 . 28 , you should have recognized the mapping from main memory to cache as essentially a hash function. Set-associative caches even look like hash tables where you resolve collisions by chaining, albeit with an upper limit on the length of the chain. The System Performance Equation T e ultimate performance metric of a machine is how fast it executes. FIGURE 12.33 , the system performance equation, shows that the time it takes to execute a program is the product of three factors. T e word instruction in

FIGURE 12 .33 The system performance equation for a von Neumann machine.

time = instructions × cycles × time program program

Minimize with powerful Minimize with simple Minimize with pipelining complex instructions that instructions that take to shorten the cycle period. take many cycles to execute. few cycles to execute.

9781284079630_CH12_689_782.indd 738 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Memory hierarchies • Spans two extremes of small high-speed memory and large low-speed memory • Hardware: Virtual memory with small fast main memory and large slow disk • Hardware: Cache with small fast register bank and large slow main memory • Software: Hash table with small fast array and large slow file

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Three components of execution time • instructions / program • cycles / instruction • time / cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 738 CHAPTER 12 Computer Organization

FIGURE 12 . 32 Cache write policies with a cache miss.

CPU Cache Mem

CPU Cache Mem

(a) Without write allocation. (b) With write allocation.

of the cache and the size of main memory. T e LRU page replacement policy in a virtual memory system corresponds to the LRU replacement policy of a cache line. Cache is to main memory as main memory is to disk. In Memory hierarchies both cases, there is a memory hierarchy that spans two extremes of a small high-speed memory subsystem that must interface with a large low-speed memory subsystem. Design solutions in both hierarchies rely on locality of reference. It is no accident that designs in both f elds have common issues and solutions. Another indication of the universality of these principles is the hash table data structure in sof ware. In Figure 12 . 28 , you should have recognized the mapping from main memory to cache as essentially a hash function. Set-associative caches even look like hash tables where you resolve collisions by chaining, albeit with an upper limit on the length of the chain. The System Performance Equation T e ultimate performance metric of a machine is how fast it executes. F I F T H E D I T I O N Figure 12.33 FIGUREComputer 12.33 , the Systemssystem performance equation, shows that the time it takes to execute a program is Thethe product system of three factors. T e word instruction in

FIGURE 12 performance . 33 equation The system performance equation for a von Neumann machine.

time = instructions × cycles × time program program instruction cycle

Minimize with powerful Minimize with simple Minimize with pipelining complex instructions that instructions that take to shorten the cycle period. take many cycles to execute. few cycles to execute.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 738 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N RISC vs. CISC • RISC: Reduced Instruction Set Computer ‣ Sacrifices instructions / program to decrease cycles / instruction • CISC: Complex Instruction Set Computer ‣ Sacrifices cycles / instruction to decrease instructions / program

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Microcode • Level Mc2 • Used in CISC machines • Absent in RISC machines

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N RISC design principles 1. Each ISA3 instruction except for load and store executes in one cycle 2. The control section has no microcode 3. Every ISA3 instruction is the same length 4. There are only a few simple addressing mode

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N RISC design principles 5. All arithmetic and logic operations take place entirely in the CPU 5.1. RISC machines are load/store machines 5.2. CISC machines are accumulator machines 6. Data sections are designed for deep pipelining

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 742 CHAPTER 12 Computer Organization

Microcode in x86 Systems

T e x86 processors built by Intel and AMD are the most puts the address of the source string in ESI, puts the widely used CISC processors on the market. T e details address of the destination in EDI, puts the byte count of their internal designs, including the design of their in ECX, and sets or clears the direction f ag, depending Mc2 level microcode control units, are proprietary and on whether the copy should be lef to right or right to considered to be company secrets. T e companies even lef . T en, the single execution of movsb at the ISA3 encrypt the microcode on the chip in an ef ort to keep level performs the entire copy using a loop at the Mc2 it hidden. T rough limited publications and reverse microcode level. engineering, however, we do have a partial picture of T e microcode in Figure 12 . 34 is more symbolic the designs. than Pep/9 microcode. For example, the microcode Modern x86 chips use microcode only for those instruction at cycle 2 sets the Z bit if the content of the instructions in their ISA3 instruction set that are ECX register is all zeros. It takes the OR operation of the complex. Simple instructions that require just a few ECX register with itself and sets the Z bit accordingly. cycles are implemented with f nite-state machines T e equivalent Pep/9 microcode sequence for testing directly in the hardware that output RISC-like operations the high-order byte of the index register and setting the (ROPs) of a f xed length. More complex instructions use Z bit accordingly is a full Mc2 layer of abstraction. 2. A=2, B=2, AMux=1, ALU=7, AndZ=0; ZCk FIGURE 12.34 shows the microcode sequence for an AMD chip that implements the complex movsb In the Pep/9 version, the numeric address of the instruction, whose mnemonic stands for move string register 2 is required, while the AMD microcode version byte. T is one ISA3 instruction copies a string of uses the symbolic name ecx . Pep/9 microcode requires ASCII bytes from one memory location to another. It the numeric value 7 on the ALU control line to select uses two index registers in the x86 register bank—ESI, the OR operation, while the AMD microcode version the index register for the source, and EDI, the index uses the symbolic name OR . T e Pep/9 version requires F I F T H E D I T I O N Figure 12.34 registerComputer for the destination—and Systems one register ECX for explicit control values for the multiplexer and clock pulse, the byte count. T e CPU also has a direction f ag that while these signals are implicit in the AMD microcode. the assembly language programmer can set and clear. T e AMD microassembler must generate these control To use the movsb AMD instruction, the programmer microcode f rst signals and values when for it translates the microcode.

FIGURE 12 . 34 movsb instruction AMD microcode for the movsb instruction.

1. LDDF ;load direction flag to latch in functional unit 2. OR ecx, ecx ;test if ECX is zero 3. JZ end ;terminate string move if ECX is zero loop: 4. MOVFM+ tmp0, [esi] ;move to tmp data from source and inc/dec ESI 5. MOVTM+ [edi], tmp0 ;move the data to destination and inc/dec EDI 6. DECXJNZ loop ;dec ECX and repeat until zero end: 7. EXIT

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 742 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MIPS • Microprocessor without Interlocked pipeline stages • Designed at Stanford and UC Berkeley • Used in Nintendo game consoles • MIPS and ARM are the most common RISC processors

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 744 CHAPTER 12 Computer Organization

FIGURE 12 . 35 Comparison of the 32-bit MIPS and Pep/9 CPU registers.

32 $zero 0 0x00000000 $s0 16 $at 1 $s1 17 $v0 2 $s2 18 $v1 3 $s3 19 $a0 4 $s4 20 $a1 5 $s5 21 $a2 6 $s6 22 $a3 7 $s7 23 $t0 8 $t8 24 $t1 9 $t9 25 $t2 10 $k1 26 $t3 11 $k0 27 $t4 12 $gp 28 $t5 13 $sp 29 $t6 14 $fp 30 $t7 15 $ra 31

(a) MIPS registers.

16 A PC X SP

(b) Pep/9 registers. Figure 12.35

the f rst byte of each instruction to be stored at an address evenly divisible by four. FIGURE 12.36 shows the von Neumann cycle for the MIPS machine. T ere is no if statement to determine the size of the instruction. Figure 12 . 36 shows the von Neumann cycle for MIPS as an endless loop, which is more realistic than the cycle for Pep/9. Real machines have no STOP instruction because the operating system continues to execute when an application terminates. The Addressing Modes In contrast to Pep/9, which has eight addressing modes, MIPS has the f ve addressing modes of FIGURE 12.37 . E a c h o f t h e f ve addressing modes uses one of the three instruction types—either I-type, R-type, or J-type.

9781284079630_CH12_689_782.indd 744 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MIPS instruction set • All instructions are exactly 32 bits long • All binary operations are between two source registers, rs and rt, with the result placed in a destination register, rd • The only instructions that access main memory are the load and store instructions • All instructions except load and store take one cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 745

F I F T H E D I T I O N Figure 12.36 Computer FIGURE 12 Systems . 36 A pseudocode description of the MIPS von Neumann execution cycle.MIPS von Neuman cycle

do { Fetch the instruction at the address in PC PC ← PC + 4 Decode the instruction specif er Execute the instruction fetched } while (true)

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company FIGURE 12 . 37 The MIPS addressing modes.

Operands Addressing Instruction Mode Type Destination Source Source

Immediate I-type Reg[rt] Reg[rs] SE(im)

Register R-type Reg[rd] Reg[rs] Reg[rt]

Base with load I-type Reg[rt] Mem[Reg[rb] + SE(im)]

Base with store I-type Mem[Reg[rb] + SE(im)] Reg[rt]

PC-relative I-type PC PC + 4 SE(im × 4)

Pseudodirect J-type PC (PC + 4)〈0..3〉 : (ta × 4)

FIGURE 12.38 shows the instruction format for each addressing mode. Instructions with the addressing modes immediate, base, and PC-relative are I-type instructions. Instructions with register addressing are R-type, and instructions with pseudodirect addressing are J-type instructions. A MIPS instruction always consists of a six-bit opcode to specify the instruction and one or more operand specif ers. T e rs f eld, when present, is always at bit location 6..10, the rt f eld is always at 11..15, and rd is always at 16..20. T is chapter specif es bit locations starting with the lef most bit as 0 and numbering from lef to right to be consistent with Pep/9 notation. Standard MIPS MIPS bit numbering notation is to start with the rightmost bit as 0 and to number from right to lef . notation

9781284079630_CH12_689_782.indd 745 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N ISA level • There are 6 addressing modes ‣ Immediate, Register, Base with load, Base with store, PC relative, Pseudodirect • There are 3 instruction types ‣ I-type, R-type, J-type • There are 5 instruction formats

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 745

FIGURE 12 . 36 A pseudocode description of the MIPS von Neumann execution cycle.

do { Fetch the instruction at the address in PC PC ← PC + 4 Decode the instruction specif er Execute the instruction fetched Computer} Systems F I F T H E D I T I O N Figure 12.37 while (true) MIPS addressing modes FIGURE 12 . 37 The MIPS addressing modes.

Operands Addressing Instruction Mode Type Destination Source Source

Immediate I-type Reg[rt] Reg[rs] SE(im)

Register R-type Reg[rd] Reg[rs] Reg[rt]

Base with load I-type Reg[rt] Mem[Reg[rb] + SE(im)]

Base with store I-type Mem[Reg[rb] + SE(im)] Reg[rt]

PC-relative I-type PC PC + 4 SE(im × 4)

Pseudodirect J-type PC (PC + 4)〈0..3〉 : (ta × 4)

FIGURE 12.38 shows the instruction format for each addressing mode.

Instructions with the addressing modes immediate, base, and PC-relativeCopyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company are I-type instructions. Instructions with register addressing are R-type, and instructions with pseudodirect addressing are J-type instructions. A MIPS instruction always consists of a six-bit opcode to specify the instruction and one or more operand specif ers. T e rs f eld, when present, is always at bit location 6..10, the rt f eld is always at 11..15, and rd is always at 16..20. T is chapter specif es bit locations starting with the lef most bit as 0 and numbering from lef to right to be consistent with Pep/9 notation. Standard MIPS MIPS bit numbering notation is to start with the rightmost bit as 0 and to number from right to lef . notation

9781284079630_CH12_689_782.indd 745 29/01/16 8:27 am 746 CHAPTER 12 Computer Organization Computer Systems F I F T H E D I T I O N Figure 12.38

FIGURE 12 . 38 The MIPS instructionMIPS formats instructioncorresponding to the addressing modes. formats 32 6 5 5 16 op rs rt immediate Opcode Source register Dest register Immediate

(a) Immediate addressing with the I-type instruction.

6 5 5 5 5 6 op rs rt rd shamt funct Opcode Source register Source register Dest register Shift amount Function field

(b) Register addressing with the R-type instruction.

6 5 5 16 op rs rt immediate Opcode Base register Target register Address displacement

(c) Base addressing with the I-type instruction.

6 5 5 16

op rs rt Copyright immediate© 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Opcode Source register Branch cond Branch displacement

(d) PC-relative addressing with the I-type instruction.

6 26 op target Opcode Target address

(e) Pseudodirect addressing with the J-type instruction.

Because there are 32 registers in the register bank in Figure 12. 35 (a), and 2 5 is 32, it takes f ve bits to access one of them. T e designations rs, rt, and rd are standard MIPS notations for f ve-bit register f elds in an instruction. Figure 12 . 37 shows that rs is always a source register, and rd is always a destination register; but rt, which stands for target register, can be either a source or a destination register. T e notation Reg in Figure 12 . 37 stands for register and is analogous to Mem, which stands for memory. Reg[r] indicates the content of the register r. For example, if an instruction with immediate addressing executes, Figure 12 . 37 shows Reg[rt] as the destination operand. If the f ve-bit rt f eld in Figure 12 . 38 (a) has the value 10011 (bin), which is 19 (dec), then the destination register $s3 will contain the result of the operation because $s3 is register 19 in Figure 12 . 35 .

9781284079630_CH12_689_782.indd 746 29/01/16 8:27 am 746 CHAPTER 12 Computer Organization

FIGURE 12 . 38 The MIPS instruction formats corresponding to the addressing modes.

32 6 5 5 16 op rs rt immediate Opcode Source register Dest register Immediate

(a) Immediate addressing with the I-type instruction.

6 5 5 5 5 6 op rs rt rd shamt funct Opcode Source register Source register Dest register Shift amount FigureFunction 12.38 field Computer Systems F I F T H E D I T I O N (continued) (b) Register addressing with the R-type instruction.

6 5 5 16 MIPSop instructionrs rt formatsimmediate Opcode Base register Target register Address displacement

(c) Base addressing with the I-type instruction.

6 5 5 16 op rs rt immediate Opcode Source register Branch cond Branch displacement

(d) PC-relative addressing with the I-type instruction.

6 26 op target Opcode Target address

(e) Pseudodirect addressing with the J-type instruction.

Because there are 32 registers in the register bank in Figure 12 . 35 (a), and 25 is 32, it takes f ve bits to access one of them. T e designations rs, rt, and rd are standard MIPS notations for f ve-bit register f elds in an instruction. Figure 12 . 37 shows that rs is alwaysCopyright © a 2017 source by Jones & Bartlett register, Learning, LLC an and Ascend rdLearning is Company always a destination register; but rt, which stands for target register, can be either a source or a destination register. T e notation Reg in Figure 12 . 37 stands for register and is analogous to Mem, which stands for memory. Reg[r] indicates the content of the register r. For example, if an instruction with immediate addressing executes, Figure 12 . 37 shows Reg[rt] as the destination operand. If the f ve-bit rt f eld in Figure 12 . 38 (a) has the value 10011 (bin), which is 19 (dec), then the destination register $s3 will contain the result of the operation because $s3 is register 19 in Figure 12 . 35 .

9781284079630_CH12_689_782.indd 746 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Instruction fields • sssss, ttttt – Source registers • ddddd – Destination register • bbbbb – Base register • hhhhh – Shift amounts • i..i – Immediate value ‣ Addition – sign extended ‣ And/Or – zero extended • a..a – Sign extended offset in address calculation

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 750Computer SystemsCHAPTER 12 F Computer I F T H E Organization D I T I O N Figure 12.39

FIGURE 12 . 39 MIPS instruction set A few instructions from the MIPS instruction set.

Mnemonic Meaning Binary Instruction Encoding add Add 0000 00ss ssst tttt dddd d000 0010 0000 addi Add immediate 0010 00ss sssd dddd iiii iiii iiii iiii sub Subtract 0000 00ss ssst tttt dddd d000 0010 0010 and Bitwise AND 0000 00ss ssst tttt dddd d000 0010 0100 andi Bitwise AND immediate 0011 00ss sssd dddd iiii iiii iiii iiii or Bitwise OR 0000 00ss ssst tttt dddd d000 0010 0101 ori Bitwise OR immediate 0011 01ss sssd dddd iiii iiii iiii iiii sll Shift left logical 0000 0000 000t tttt dddd dhhh hh00 0000 sra Shift right arithmetic 0000 0000 000t tttt dddd dhhh hh00 0011 srl Shift right logical 0000 0000 000t tttt dddd dhhh hh00 0010 lb Load byte 1000 00bb bbbd dddd aaaa aaaa aaaa aaaa lw 1000 11bb bbbd dddd aaaa aaaa aaaa aaaa Load word Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company lui Load upper immediate 0011 1100 000d dddd iiii iiii iiii iiii sb Store byte 1010 00bb bbbt tttt aaaa aaaa aaaa aaaa sw Store word 1010 11bb bbbt tttt aaaa aaaa aaaa aaaa beq Branch if equal to 0001 00ss ssst tttt aaaa aaaa aaaa aaaa bgez Branch if greater than or equal to zero 0000 01ss sss0 0001 aaaa aaaa aaaa aaaa bgtz Branch if greater than zero 0001 11ss sss0 0000 aaaa aaaa aaaa aaaa blez Branch if less than or equal to zero 0001 10ss sss0 0000 aaaa aaaa aaaa aaaa bltz Branch if less than zero 0000 01ss sss0 0000 aaaa aaaa aaaa aaaa bne Branch if not equal to 0001 01ss ssst tttt aaaa aaaa aaaa aaaa j Jump 0000 10aa aaaa aaaa aaaa aaaa aaaa aaaa jr Jump register 0000 00bb bbb0 0000 0000 0000 0000 1000

9781284079630_CH12_689_782.indd 750 29/01/16 8:27 am 750 CHAPTER 12 Computer Organization

FIGURE 12 . 39 A few instructions from the MIPS instruction set.

Mnemonic Meaning Binary Instruction Encoding add Add 0000 00ss ssst tttt dddd d000 0010 0000 addi Add immediate 0010 00ss sssd dddd iiii iiii iiii iiii sub Subtract 0000 00ss ssst tttt dddd d000 0010 0010 and Bitwise AND 0000 00ss ssst tttt dddd d000Figure 0010 12.39 0100 andiComputerBitwise ANDSystems F I F T H E D I T I O N (continued) immediate 0011 00ss sssd dddd iiii iiii iiii iiii or Bitwise OR 0000 00ss ssst tttt dddd d000 0010 0101 ori BitwiseMIPS OR instruction set immediate 0011 01ss sssd dddd iiii iiii iiii iiii sll Shift left logical 0000 0000 000t tttt dddd dhhh hh00 0000 sra Shift right arithmetic 0000 0000 000t tttt dddd dhhh hh00 0011 srl Shift right logical 0000 0000 000t tttt dddd dhhh hh00 0010 lb Load byte 1000 00bb bbbd dddd aaaa aaaa aaaa aaaa lw Load word 1000 11bb bbbd dddd aaaa aaaa aaaa aaaa lui Load upper immediate 0011 1100 000d dddd iiii iiii iiii iiii sb Store byte 1010 00bb bbbt tttt aaaa aaaa aaaa aaaa sw Store word 1010 11bb bbbt tttt aaaa aaaa aaaa aaaa beq Branch if equal to 0001 00ss ssst tttt aaaa aaaa aaaa aaaa bgez Branch if greater than or equal to zero 0000 01ss sss0 0001 aaaa aaaa aaaa aaaa bgtz Branch if greater than zero 0001 11ss sss0 0000 aaaa aaaa aaaa aaaa blez Branch if less Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company than or equal to zero 0001 10ss sss0 0000 aaaa aaaa aaaa aaaa bltz Branch if less than zero 0000 01ss sss0 0000 aaaa aaaa aaaa aaaa bne Branch if not equal to 0001 01ss ssst tttt aaaa aaaa aaaa aaaa j Jump 0000 10aa aaaa aaaa aaaa aaaa aaaa aaaa jr Jump register 0000 00bb bbb0 0000 0000 0000 0000 1000

9781284079630_CH12_689_782.indd 750 29/01/16 8:27 am 750 CHAPTER 12 Computer Organization

FIGURE 12. 39 A few instructions from the MIPS instruction set.

Mnemonic Meaning Binary Instruction Encoding add Add 0000 00ss ssst tttt dddd d000 0010 0000 addi Add immediate 0010 00ss sssd dddd iiii iiii iiii iiii sub Subtract 0000 00ss ssst tttt dddd d000 0010 0010 and Bitwise AND 0000 00ss ssst tttt dddd d000 0010 0100 andi Bitwise AND immediate 0011 00ss sssd dddd iiii iiii iiii iiii or Bitwise OR 0000 00ss ssst tttt dddd d000 0010 0101 ori Bitwise OR immediate 0011 01ss sssd dddd iiii iiii iiii iiii sll Shift left logical 0000 0000 000t tttt dddd dhhh hh00 0000 sra Shift right arithmetic 0000 0000 000t tttt dddd dhhh hh00 0011 srl Shift right logical 0000 0000 000t tttt dddd dhhh hh00 0010 lb Load byte 1000 00bb bbbd dddd aaaa aaaaFigure aaaa 12.39 aaaa F I F T H E D I T I O N Computerlw Load word Systems1000 11bb bbbd dddd aaaa aaaa(continued) aaaa aaaa lui Load upper immediate 0011 1100 000d dddd iiii iiii iiii iiii sb StoreMIPS byte instruction1010 00bb bbbt tttt aaaa set aaaa aaaa aaaa sw Store word 1010 11bb bbbt tttt aaaa aaaa aaaa aaaa beq Branch if equal to 0001 00ss ssst tttt aaaa aaaa aaaa aaaa bgez Branch if greater than or equal to zero 0000 01ss sss0 0001 aaaa aaaa aaaa aaaa bgtz Branch if greater than zero 0001 11ss sss0 0000 aaaa aaaa aaaa aaaa blez Branch if less than or equal to zero 0001 10ss sss0 0000 aaaa aaaa aaaa aaaa bltz Branch if less than zero 0000 01ss sss0 0000 aaaa aaaa aaaa aaaa bne Branch if not equal to 0001 01ss ssst tttt aaaa aaaa aaaa aaaa j Jump 0000 10aa aaaa aaaa aaaa aaaa aaaa aaaa jr Jump register 0000 00bb bbb0 0000 0000 0000 0000 1000

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 750 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MIPS load instruction • Uses base addressing with I-type instruction • address is a 16-bit signed offset • Assembly language lw rd,address(rb) rd Mem[rb + address]

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS add instruction • Uses register addressing with r-type instruction • 32 registers and 25 = 32 implies 5 bits to specify one register • Assembly language add rd,rs,rt rd rs + rt

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS add immediate • Uses immediate addressing with I-type instruction • immediate is a 16-bit signed field • Assembly language addi rd,rs,immediate rd rs + immediate

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS store instruction • Uses base addressing with I-type instruction • address is a 16-bit signed offset • Assembly language sw rt,address(rb)

Mem[rb + address] rt

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 749

Quantity ta is the 26-bit target address, which is shif ed lef 2 bits, increasing its length to 28 bits. T e colon in the specif cation is the concatenation operator. T e f rst 4 signif cant bits of the incremented program counter are concatenated with the 28 bits of the shif ed target address to produce a 32-bit address for the PC. With pseudodirect addressing, you are still limited to one-sixteenth of the memory map specif ed by the f rst four bits of the program counter. Neither PC-relative nor pseudodirect addressing modes allow unrestricted access to the entire 4 GiB address space. MIPS provides the jump register instruction jr with base addressing to access any address in the entire address space. T e program counter is changed with the following specif cation: PC ← Reg[rb] T e jump register instruction where rb is the base register in the rs f eld of Figure 12 . 38(c). It requires the programmer to compute the address and put it in the base register before executing the unconditional branch. The Instruction Set FIGURE 12.39 is a summary of some MIPS instructions. T e f rst column is the mnemonic of the instruction for MIPS assembly language. Operand specif ers labeled sssss and ttttt are f ve-bit source register f elds, those labeled ddddd are f ve-bit destination register f elds, and those labeled bbbbb are f ve-bit base register f elds. A f eld of i characters is an immediate operand specif er, which is sign-extended for addition and zero-extended for AND and OR. A f eld of a characters is an address operand specif er, which is a sign-extended of set in an address calculation. Operand specif ers labeled hhhhh are f ve-bit shif amounts for the shif instructions. Following are some examples of C code fragments with their translation into MIPS assembly language. T e MIPS processor uses its bank of 32 registers as a cache for variables. T e f rst time a variable is accessed, the compiler associates a register with that value. Later accesses use the variable copy in the register instead of accessing memory again. With Pep/9, global variables are stored at a f xed absolute address in memory. With MIPS, global variables are accessed relative to $gp, the global pointer.

Example 12.2 F i g u r e 5 . 2 7 d e c l a r e s t h r e e g l o b a l v a r i a b l e s , exam1 , exam2 and score, as well as a constant bonus , which equates to 10. T e C compiler might install the three globals with of sets 0, 4, and 8 from $gp and associate them with registers $s1, $s2, and $s3 in the register bank. It translates the CstatementComputer Systems F I F T H E D I T I O N Example 12.2

score = (exam1 + exam2) / 2 + bonus;

• exam1 – global, offset 0 from $gp, in $s1 • exam2 – global, offset 4 from $gp, in $s2 • score – global, offset 8 from $gp, in $s3 bonus – constant, value 10 9781284079630_CH12_689_782.indd• 749 29/01/16 8:27 am

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 749

Quantity ta is the 26-bit target address, which is shif ed lef 2 bits, increasing its length to 28 bits. T e colon in the specif cation is the concatenation operator. T e f rst 4 signif cant bits of the incremented program counter are concatenated with the 28 bits of the shif ed target address to produce a 32-bit address for the PC. With pseudodirect addressing, you are still limited to one-sixteenth of the memory map specif ed by the f rst four bits of the program counter. Neither PC-relative nor pseudodirect addressing modes allow unrestricted access to the entire 4 GiB address space. MIPS provides the jump register instruction jr with base addressing to access any address in the entire address space. T e program counter is changed with the following specif cation: PC ← Reg[rb] T e jump register instruction where rb is the base register in the rs f eld of Figure 12 . 38(c). It requires the programmer to compute the address and put it in the base register before executing the unconditional branch. The Instruction Set FIGURE 12.39 is a summary of some MIPS instructions. T e f rst column is the mnemonic of the instruction for MIPS assembly language. Operand specif ers labeled sssss and ttttt are f ve-bit source register f elds, those labeled ddddd are f ve-bit destination register f elds, and those labeled bbbbb are f ve-bit base register f elds. A f eld of i characters is an immediate operand specif er, which is sign-extended for addition and zero-extended for AND and OR. A f eld of a characters is an address operand specif er, which is a sign-extended of set in an address calculation. Operand specif ers labeled hhhhh are f ve-bit shif amounts for the shif instructions. Following are some examples of C code fragments with their translation into MIPS assembly language. T e MIPS processor uses its bank of 32 registers as a cache for variables. T e f rst time a variable is accessed, the compiler associates a register with that value. Later accesses use the variable copy in the register instead of accessing memory again. With Pep/9, global variables are stored at a f xed absolute address in memory. With MIPS, global variables are accessed relative to $gp, the global pointer.

Example 12.2 F i g u r e 5 . 2 7 d e c l a r e s t h r e e g l o b a l v a r i a b l e s , exam1 , exam2 and score, as well as a constant bonus , which equates to 10. T e C compiler might install the three globals with of sets 0, 4, and 8 from $gp and associate them with registers $s1, $s2, and $s3 in the register bank. It translates the Example 12.2 CstatementComputer Systems F I F T H E D I T I O N (continued)

score = (exam1 + exam2) / 2 + bonus;

• $s1 is register 17 (dec) = 10001 (bin) • $s2 is register 18 (dec) = 10010 (bin) • $s3 is register 19 (dec) = 10011 (bin)

9781284079630_CH12_689_782.indd 749 29/01/16 8:27 am

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 749

Quantity ta is the 26-bit target address, which is shif ed lef 2 bits, increasing its length to 28 bits. T e colon in the specif cation is the concatenation operator. T e f rst 4 signif cant bits of the incremented program counter are concatenated with the 28 bits of the shif ed target address to produce a 32-bit address for the PC. With pseudodirect addressing, you are still limited to one-sixteenth of the memory map specif ed by the f rst four bits of the program counter. Neither PC-relative nor pseudodirect addressing modes allow unrestricted access to the entire 4 GiB address space. MIPS provides the jump register instruction jr with base addressing to access any address in the entire address space. T e program counter is changed with the following specif cation: PC ← Reg[rb] T e jump register instruction where rb is the base register in the rs f eld of Figure 12 . 38(c). It requires the programmer to compute the address and put it in the base register before executing the unconditional branch. The Instruction Set FIGURE 12.39 is a summary of some MIPS instructions. T e f rst column is the mnemonic of the instruction for MIPS assembly language. Operand specif ers labeled sssss and ttttt are f ve-bit source register f elds, those labeled ddddd are f ve-bit destination register f elds, and those labeled bbbbb are f ve-bit base register f elds. A f eld of i characters is an immediate operand specif er, which is sign-extended for addition and zero-extended for AND and OR. A f eld of a characters is an address operand specif er, which is a sign-extended of set in an address calculation. Operand specif ers labeled hhhhh are f ve-bit shif amounts for the shif instructions. Following are some examples of C code fragments with their translation into MIPS assembly language. T e MIPS processor uses its bank of 32 registers as a cache for variables. T e f rst time a variable is accessed, the compiler associates a register with that value. Later accesses use the variable copy in the register instead of accessing memory again. With Pep/9, global variables are stored at a f xed absolute address in memory. With MIPS, global variables are accessed relative to $gp, the global pointer.

Example 12.2 F i g u r e 5 . 2 7 d e c l a r e s t h r e e g l o b a l v a r i a b l e s , exam1 , exam2 and score, as well as a constant bonus , which equates to 10. T e C compiler might install the three globals with of sets 0, 4, and 8 from $gp and associate them with registers $s1, $s2, and $s3 in the register bank. It translates the Example 12.2 CstatementComputer Systems F I F T H E D I T I O N (continued)

score = (exam1 + exam2) / 2 + bonus;

MIPS assembly language lw $s1,0($gp) # Load exam1 into register $s1 lw $s2,4($gp) # Load exam2 into register $s2 add $s3,$s1,$s2 # Register $s3 gets exam1 + exam2 sra $s3,$s3,1 # Shift right register $s3 one bit addi $s3,$s3,10 # Register $s3 gets $s3 + 10 sw $s3,8($gp) # score gets $s3

MIPS machine language 100011 11100 10001 0000000000000000 9781284079630_CH12_689_782.indd100011 11100 10010 0000000000000100 749 29/01/16 8:27 am 000000 10001 10010 10011 00000100000 000000 00000 10011 10011 00001 000011 001000 10011 10011 0000000000001010 101011 11100 10011 0000000000001000

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS arrays • Use base addressing with I-type instruction • Compiler associates a $s register with an array variable • It contains the address of the first element of the array

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 752 CHAPTER 12 Computer Organization

destination f eld f rst. In Figure 12.35, you can identify the binary f elds in the preceding machine code for the global variables as follows:

❯ $s1 is register 17 (dec) = 10001 (bin). ❯ $s2 is register 18 (dec) = 10010 (bin). ❯ $s3 is register 19 (dec) = 10011 (bin). ❚

It is not feasible to store an entire array in the register bank because most arrays have too many elements to f t the available registers. To process arrays, the C compiler associates a register with the address of the f rst element of the array and uses base addressing to access an element of the array. Pep/9 accesses the element of an array with indexed addressing, for which the operand is specif ed as

Oprnd = Mem[OprndSpec + X]

A s i g n i f cant dif erence between Pep/9 and MIPS is the use of the register to access an array element. In Pep/9, the register X contains the value of the index. In the MIPS machine, the register rb contains the address of the f rst element of the array—in other words, the base of the array. T at is why register rb is called the base register and this addressing mode is called base addressing.

Example 12.3 Suppose the C compiler associates $s1 with array a, $s2 with g b Computervariable Systems , andF I F T $s3H E D I T withI O N arrayExample . 12.3It translates the statement a[2] = g + b[3];

• toa – MIPSglobal array, assembly already loaded language in $s1 as • g – global, value already loaded in $s2 • b – global lw array,$t0,12($s3) already loaded in $s3 # Register $t0 gets b[3] add $t0,$s2,$t0 # Register $t0 gets g + b[3] sw $t0,8($s1) # a[2] gets g + b[3]

T e load instruction has 12 for the address f eld, because it is accessing b[ 3], each word is fourCopyright © 2017 bytes, by Jones & Bartlett Learning, LLCand an Ascend Learning 3 Company × 4 = 12. Similarly, the store instruction has 8 in the address f eld because of the index value in a[ 2]. T e machine language translation of these instructions is

100011 10011 01000 0000000000001100 000000 10010 01000 01000 00000 100000 101011 10001 01000 0000000000001000

Figure 12 . 35 shows that $t0 is register 8 (dec) = 01000 (bin), $s3 is register 19 (dec) = 10011 (bin), $s2 is register 18 (dec) = 10010 (bin), and $s1 is register 17 (dec) = 10001 (bin). ❚

9781284079630_CH12_689_782.indd 752 29/01/16 8:27 am 752 CHAPTER 12 Computer Organization

destination f eld f rst. In Figure 12.35, you can identify the binary f elds in the preceding machine code for the global variables as follows:

❯ $s1 is register 17 (dec) = 10001 (bin). ❯ $s2 is register 18 (dec) = 10010 (bin). ❯ $s3 is register 19 (dec) = 10011 (bin). ❚

It is not feasible to store an entire array in the register bank because most arrays have too many elements to f t the available registers. To process arrays, the C compiler associates a register with the address of the f rst element of the array and uses base addressing to access an element of the array. Pep/9 accesses the element of an array with indexed addressing, for which the operand is specif ed as

Oprnd = Mem[OprndSpec + X]

A s i g n i f cant dif erence between Pep/9 and MIPS is the use of the register to access an array element. In Pep/9, the register X contains the value of the index. In the MIPS machine, the register rb contains the address of the f rst element of the array—in other words, the base of the array. T at is why register rb is called the base register and this addressing mode is called base addressing.

Example 12.3 Suppose the C compiler associates $s1 with array a, $s2 with variable g, and $s3 with arrayExample b . 12.3It translates the statement Computer Systems F I F T H E D I T I O N (continued) a[2] = g + b[3];

• to$t0 MIPS is register assembly 8 (dec) = 01000language (bin) as • $s1 is register 17 (dec) = 10001 (bin) • $s2 is lw register $t0,12($s3) 18 (dec) = 10010 (bin) # Register $t0 gets b[3] • $s3 is add register $t0,$s2,$t0 19 (dec) = 10011 (bin) # Register $t0 gets g + b[3] sw $t0,8($s1) # a[2] gets g + b[3]

T e load instruction has 12 for the address f eld, because it is accessing b[ 3], each word is fourCopyright © 2017 bytes, by Jones & Bartlett Learning, LLCand an Ascend Learning 3 Company × 4 = 12. Similarly, the store instruction has 8 in the address f eld because of the index value in a[ 2]. T e machine language translation of these instructions is

100011 10011 01000 0000000000001100 000000 10010 01000 01000 00000 100000 101011 10001 01000 0000000000001000

Figure 12 . 35 shows that $t0 is register 8 (dec) = 01000 (bin), $s3 is register 19 (dec) = 10011 (bin), $s2 is register 18 (dec) = 10010 (bin), and $s1 is register 17 (dec) = 10001 (bin). ❚

9781284079630_CH12_689_782.indd 752 29/01/16 8:27 am 752 CHAPTER 12 Computer Organization

destination f eld f rst. In Figure 12.35, you can identify the binary f elds in the preceding machine code for the global variables as follows:

❯ $s1 is register 17 (dec) = 10001 (bin). ❯ $s2 is register 18 (dec) = 10010 (bin). ❯ $s3 is register 19 (dec) = 10011 (bin). ❚

It is not feasible to store an entire array in the register bank because most arrays have too many elements to f t the available registers. To process arrays, the C compiler associates a register with the address of the f rst element of the array and uses base addressing to access an element of the array. Pep/9 accesses the element of an array with indexed addressing, for which the operand is specif ed as

Oprnd = Mem[OprndSpec + X]

A s i g n i f cant dif erence between Pep/9 and MIPS is the use of the register to access an array element. In Pep/9, the register X contains the value of the index. In the MIPS machine, the register rb contains the address of the f rst element of the array—in other words, the base of the array. T at is why register rb is called the base register and this addressing mode is called base addressing.

Example 12.3 Suppose the C compiler associates $s1 with array a, $s2 with variable g, and $s3 with arrayExample b . 12.3It translates the statement Computer Systems F I F T H E D I T I O N (continued) a[2] = g + b[3];

MIPS assembly language lw $t0,12($s3) # Register $t0 gets b[3] add $t0,$s2,$t0 # Register $t0 gets g + b[3] sw $t0,8($s1) to MIPS # a[2] gets assembly g + b[3] language as

MIPS machine language 100011 10011 01000 0000000000001100 000000 10010 01000 01000 lw 00000$t0,12($s3) 100000 # Register $t0 gets b[3] 101011 10001 01000 0000000000001000 add $t0,$s2,$t0 # Register $t0 gets g + b[3] sw $t0,8($s1) # a[2] gets g + b[3]

T e load instruction has 12 for the address f eld, because it is accessing b[ 3], each word is fourCopyright © 2017 bytes, by Jones & Bartlett Learning, LLCand an Ascend Learning 3 Company × 4 = 12. Similarly, the store instruction has 8 in the address f eld because of the index value in a[ 2]. T e machine language translation of these instructions is

100011 10011 01000 0000000000001100 000000 10010 01000 01000 00000 100000 101011 10001 01000 0000000000001000

Figure 12 . 35 shows that $t0 is register 8 (dec) = 01000 (bin), $s3 is register 19 (dec) = 10011 (bin), $s2 is register 18 (dec) = 10010 (bin), and $s1 is register 17 (dec) = 10001 (bin). ❚

9781284079630_CH12_689_782.indd 752 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MIPS shift left logical instruction • Uses register addressing with r-type instruction • shamt is a 5-bit shift amount, which specifies how many bits to shift left • Assembly language sll rd,rt,shamt

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Example 12.4

Shift left $s0 seven bits, put result in $t2

• $s0 is register 16 (dec) = 10000 (bin) • $t2 is register 10 (dec) = 01010 (bin) • shamt is 7 (dec) = 00111 (bin)

MIPS assembly language sll $t2,$s0,7

MIPS machine language 000000 00000 10000 01010 00111 000000

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS word • 32-bit MIPS word is four bytes • Index must be multiplied by 4 • Shift 2 bits left

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 753

T e situation is a bit more complicated if you want to access the element of an array whose index is a variable. MIPS has no index register that does what the index register in Pep/9 does. Consequently, the compiler must generate code to add the index value to the address of the f rst element of the array to get the address of the element referenced. In Pep/9, this addition is done automatically at Level Mc2 with indexed addressing. But, the design philosophy of load/store machines is to have few addressing modes even at the expense of needing more statements in the program. In Pep/9, words are two bytes, and so the index must be shi f ed lef once to multiply it by 2. In MIPS, words are four bytes, and so the index must be shif ed A word is four bytes in MIPS. lef twice to multiply it by 4. T e MIPS instruction sll , for shif lef logical, uses the shamt f eld in Figure 12 . 38 (b) to specify the amount of the shif .

Example 12.4 T e MIPS assembly language statement to shif the content of $s0 seven bits to the lef and put the result in $t2 is

sll $t2,$s0,7

T e machine language translation is

000000 00000 10000 01010 00111 000000

T e f rst f eld is the opcode. T e second f eld is not used by this instruction and is set to all zeros. T e third is the rt f eld, which indicates $s0, register 16 (dec) = 10000. T e fourth is the rd f eld, which indicates $t2, register 10 (dec) = 01010 (bin). T e f fh is the shamt f eld, which indicates the shif amount. T e last is the funct f eld used with the opcode to indicate the sll instruction. ❚

Example 12.5 Assuming that the C compiler associates $s0 with variable i , $s1 with array a, and $s2 with variable g , it translates the statement Computer Systems F I F T H E D I T I O N Example 12.5 g = a[i];

• i – global, into value MIPS already assembly loaded in $s0 language as follows. • a – global array, already loaded in $s1 • g – global, in sll$s2 $t0,$s0,2 # $t0 gets $s0 times 4

MIPS assembly language sll $t0,$s0,2 # $t0 gets $s0 add times 4 $t0,$s1,$t0 # $t0 gets the address of a[i] add $t0,$s1,$t0 # $t0 gets the address of a[i] lw $s2,0($t0) # $s2 gets a[i] lw $s2,0($t0) # $s2 gets a[i]

Note the 0 in the address f eld of the load instruction. ❚

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

Like Pep/9, the MIPS machine has a $sp in its register bank. Unlike Pep/9, there is no special ADDSP instruction because the addi instruction can access any of the 32 registers in the register bank, including $sp. To allocate storage on the run-time stack, execute addi

9781284079630_CH12_689_782.indd 753 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Example 12.6

Allocate four bytes of storage on the run-time stack

• $sp is register 29 (dec) = 11101 (bin) • -4 (dec) = 1111111111111100 (bin)

MIPS assembly language addi $sp,$sp,-4 # $sp <- $sp - 4

MIPS machine language 001000 11101 11101 1111111111111100

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N 16-bit immediate • The immediate field is 16 bits • Registers are 32 bits • How do you add a 32-bit constant? ‣ lui Load upper immediate ‣ ori OR immediate

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 754 CHAPTER 12 Computer Organization

with a negative immediate value and with $sp as both the source and destination register.

Example 12.6 To allocate four bytes of storage on the run-time stack, you would execute

addi $sp,$sp,-4 # $sp <- $sp - 4

where –4 is not an address but the immediate operand. T e machine language translation is

001000 11101 11101 1111111111111100

where $sp is register 29. ❚

You may have noticed a limitation with the addi instruction. T e constant f eld is only 16 bits wide, but MIPS is a 32-bit machine. You should be able to add constants with 32-bit precision using immediate addressing. Here is another example of the RISC architecture philosophy. T e goal is simple instructions with few addressing modes. T e Pep/9 design permits instructions with dif erent widths—that is, both unary and nonunary instructions. Figure 12 . 4 shows how this decision complicates the fetch part of the von Neumann cycle. T e hardware must fetch the instruction specif er, then decode it to determine whether to fetch an operand specif er. T is complexity is completely counter to the load/store philosophy, which demands simple instructions that can be decoded quickly. T e simplicity goal demands that all instructions be the same length. But if all instructions are 32 bits wide, how could one instruction possibly contain a 32-bit immediate constant? T ere would be no room in the instruction format for the opcode. Here is where decreasing the second factor in Figure 12 . 33 at the expense of the f rst factor comes into play. T e solution to the problem of a 32-bit immediate constant is to require the execution of two instructions. For this job, MIPS provides lui , which stands for load upper immediate. It sets the high-order 16 bits of a register to the immediate operand and the low-order bits to all zeros. A second instruction is required to set the low-order 16 bits, usually the OR immediate instruction ori .

Example 12.7 Assuming that the compiler associates register $s2 with g Computer Systemsvariable F I ,F Tit H Etranslates D I T I O N theExample C statement12.7 g = 491521;

• g – global, to MIPS in $s2 assembly language as • 491521 (dec) = 0007 8001 (hex) lui $s2,0x0007 MIPS assembly language lui $s2,0x0007 ori $s2,$s2,0x8001 ori $s2,$s2,0x8001

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 9781284079630_CH12_689_782.indd 754 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N

MIPS computer organization

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS data section • IF Instruction fetch • ID Instruction decode / Register file read • Ex Execute / Address calculation • Mem Memory access • WB Write back

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 756 CHAPTER 12 Computer Organization

FIGURE 12 . 40 The MIPS data section. Sequential circuits are shaded.

PC JMux Figure 12.40 32

PCMux L1 instruction cache Address 4 Plus4 IF Instruction 28 Instruction fetch address 26 ASL2

address ID Decode instruction Instruction decode/ 5 5 5 16 read C A B Register bank Sign extend

CBus ABus BBus 32

Ex AMux ASL2 Execute/address calculation

Adder ALU

L1 data cache Data in Address Mem Memory access Data out

WB Write back CMux

9781284079630_CH12_689_782.indd 756 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N MIPS control signals • JMux control – If 0, left input. If 1, PCMux. • PCMux control – If 0, Plus4. If 1, . • AMux control – If 0, ABus. If 1, Sign extend. • CMux control – If 0, Data out. If 1, ALU.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS control signals • ALU control – Multiline function select. • PCCk – Clock pulse for Program Counter. • LoadCk – Clock pulse for write to register bank. • DCCk – Clock pulse for write to L1 data cache.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 758 CHAPTER 12 Computer Organization

FIGURE 12 . 41 The control signals from the Decode unit in the MIPS data section.

PC PCCk JMux Figure 12.41

PCMux

L1 instruction cache Address

Instruction

Decode instruction MIPS control signals

C A B Register bank LoadCk CBus ABus BBus

AMux

ALU

L1 data cache Data in Address

Data out DCCk

CMux

9781284079630_CH12_689_782.indd 758 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Instruction fetch • The program counter (PC) is not a general register in the register bank • Plus 4 is a special-purpose adder to increment PC by 4 • The two least-significant bits in an address field are assumed to be 00 and not stored in the address field • An ASL2 box shifts the address field two bits to the left to compensate

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 756 CHAPTER 12 Computer Organization

FIGURE 12 . 40 The MIPS data section. Sequential circuits are shaded. Figure 12.40 PC JMux (expanded)

32

PCMux L1 instruction cache Address 4 Plus4 IF Instruction 28 Instruction fetch address 26 ASL2

address ID Decode instruction Instruction decode/register file 5 5 5 16 read C A B Register bank Sign extend

CBus ABus BBus 32

Ex AMux ASL2 Execute/address calculation

Adder ALU

L1 data cache Data in Address Mem Memory access Data out

WB Write back CMux

9781284079630_CH12_689_782.indd 756 29/01/16 8:27 am 1

CPU address Delivered request MDREven MDROdd 0AB6 0AB6 0AB7 0AB7 0AB6 0AB7 0AB8 0AB8 0AB9 0AB9 0AB8 0AB9

Addressing Instruction Operands mode type Destination Source Source Immediate I-type Reg[rt] Reg[rs] SE(im) Computer SystemsRegisterF I F T H E D I T I OR-type N ExampleReg[rd] 12.8 Reg[rs] Reg[rt] Base with load I-type Reg[rt] Mem[Reg[rb]+SE(im)] Base with store I-type Mem[Reg[rb]+SE(im)] Reg[rt] PC-relative I-type PC PC + 4 SE(im 4) MIPS jump instruction × Pseudodirect J-type PC (PC + 4) 0..3 : (ta 4) • Uses pseudodirect addressing with J-type ⟨ ⟩ × instructionPC (PC + 4)+SE(im 4) ← × • ta is a 26-bit unsigned integer

PC (PC + 4) 0..3 : (ta 4) ← ⟨ ⟩ ×

1. JMux=0; PCCk PC Reg[rb] ←

Reg[rt] Mem[Reg[rb]+CopyrightSE © 2017(im by Jones)] & Bartlett Learning, LLC an Ascend Learning Company ←

Mem[Reg[rb]+SE(im)] Reg[rt] ←

Reg[rd] Reg[rs]+Reg[rt] ←

time instructions cycles time = program program × instruction × cycle Computer Systems F I F T H E D I T I O N Next instruction • Non-branch instructions • Conditional branch instructions when the branch is not taken • PCMux and JMux route PC + 4 to PC as the address of the next instruction

1. PCMux=0, JMux=1; PCCk

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N MIPS branch instructions • MIPS has no NZVC status bits • Comparison is always between two registers • add triggers a system trap on overflow • addu, for add unsigned, does not

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 1

CPU address Delivered request MDREven MDROdd 0AB6 0AB6 0AB7 0AB7 0AB6 0AB7 0AB8 0AB8 0AB9 0AB9 0AB8 0AB9

Addressing Instruction Operands Computer SystemsmodeF I F T H E D I T I O Ntype ExampleDestination 12.9 Source Source Immediate I-type Reg[rt] Reg[rs] SE(im) MIPS branchRegister instructionsR-type Reg[rd] Reg[rs] Reg[rt] Base with load I-type Reg[rt] Mem[Reg[rb]+SE(im)] • When branchBase is taken, with storeuse PC-relativeI-type Mem[Reg[rb]+SE(im)] Reg[rt] PC-relative I-type PC PC + 4 SE(im 4) addressing with I-type instruction × Pseudodirect J-type PC (PC + 4) 0..3 : (ta 4) • im is a 16-bit unsigned integer ⟨ ⟩ × PC (PC + 4)+SE(im 4) ← ×

1. PCMux=1, JMux=1; PCCk PC (PC + 4) 0..3 : (ta 4) ← ⟨ ⟩ ×

PC Reg[rb] Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company ←

Reg[rt] Mem[Reg[rb]+SE(im)] ←

Mem[Reg[rb]+SE(im)] Reg[rt] ←

Reg[rd] Reg[rs]+Reg[rt] ←

time instructions cycles time = program program × instruction × cycle 1

CPU address Delivered request MDREven MDROdd 0AB6 0AB6 0AB7 0AB7 0AB6 0AB7 0AB8 0AB8 0AB9 0AB9 0AB8 0AB9

Addressing Instruction Operands mode type Destination Source Source Immediate I-type Reg[rt] Reg[rs] SE(im) Register R-type Reg[rd] Reg[rs] Reg[rt] Base with load I-type Reg[rt] Mem[Reg[rb]+SE(im)] Base with store I-type Mem[Reg[rb]+SE(im)] Reg[rt] PC-relative I-type PC PC + 4 SE(im 4) × Pseudodirect J-type PC (PC + 4) 0..3 : (ta 4) ⟨ ⟩ ×

PC (PC + 4)+SE(im 4) ← ×

PC (PC + 4) 0..3 : (ta 4) ← ⟨ ⟩ ×

Computer Systems F I F T H E D I T I O N Example 12.10 PC Reg[rb] MIPS← store instruction Uses base addressing with I-type instruction • Reg[rt] Mem[Reg[rb]+SE(im)] • im is a 16-bit← unsigned integer

Mem[Reg[rb]+SE(im)] Reg[rt] ←

1. PCMux=0, JMux=1, A=rt, AMux=1, B=rb, ALU=AReg plus[rd] B; RegPCCk,[rs]+ DCCkReg[rt] ←

time instructions Copyright © 2017cycles by Jones & Bartlett Learning, LLC antime Ascend Learning Company = program program × instruction × cycle 1

CPU address Delivered request MDREven MDROdd 0AB6 0AB6 0AB7 0AB7 0AB6 0AB7 0AB8 0AB8 0AB9 0AB9 0AB8 0AB9

Addressing Instruction Operands mode type Destination Source Source Immediate I-type Reg[rt] Reg[rs] SE(im) Register R-type Reg[rd] Reg[rs] Reg[rt] Base with load I-type Reg[rt] Mem[Reg[rb]+SE(im)] Base with store I-type Mem[Reg[rb]+SE(im)] Reg[rt] PC-relative I-type PC PC + 4 SE(im 4) × Pseudodirect J-type PC (PC + 4) 0..3 : (ta 4) ⟨ ⟩ ×

PC (PC + 4)+SE(im 4) ← ×

PC (PC + 4) 0..3 : (ta 4) ← ⟨ ⟩ ×

PC Reg[rb] ←

Computer Systems F I F T H E D I T I O N Example 12.11 Reg[rt] Mem[Reg[rb]+SE(im)] MIPS add← instruction

• Uses registerMem addressing[Reg[rb]+ withSE(im R-type)] Reg[rt] instruction ← • rd, rs, rt are 5-bit register addresses Reg[rd] Reg[rs]+Reg[rt] ←

1. PCMux=0, JMux=1, A=rs, AMux=0, B=rt, ALU=A plustime B, CMux=1,instructions C=rd; cycles time = PCCk, LoadCCkprogram program × instruction × cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N

MIPS pipelining

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Signal propagation • IF: The instruction cache, Plus4, ASL2, PCMux, JMux • ID: The Decode instruction box, the Register bank, the sign extend box • Ex: AMux, ASL2, the ALU, the Adder • Mem: The data cache • WB: CMux, the of the Register bank

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 756 CHAPTER 12 Computer Organization

FIGURE 12 . 40 The MIPS data section. Sequential circuits are shaded.

PC JMux Figure 12.40 32

PCMux L1 instruction cache Address 4 Plus4 IF Instruction 28 Instruction fetch address 26 ASL2

address ID Decode instruction Instruction decode/register file 5 5 5 16 read C A B Register bank Sign extend

CBus ABus BBus 32

Ex AMux ASL2 Execute/address calculation

Adder ALU

L1 data cache Data in Address Mem Memory access Data out

WB Write back CMux

9781284079630_CH12_689_782.indd 756 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.42

Without pipelining

Cy cle 1 Cy cle 2 Cy cle 3 Time

IF ID Ex Mem WB Instru ction 1

Instru ction 2 IF ID Ex Mem WB

Instru ction 3 IF ID Ex Mem WB

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Pipelining • Assembly line analogy • Overlap the IF, ID, Ex, Mem, and WB stages • Allows a decrease in the cycle time (an increase in the MHz rating) • Provides parallelism in the CPU circuitry • Requires boundary registers to store the intermediate results between stages

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 763

FIGURE 12 . 43 The MIPS data section with pipelining.

PC JMux Figure 12.43

PCMux L1 instruction cache Address Plus4 IF Instruction fetch Instruction address ASL2 IF/ID registers

address

Decode instruction ID Instruction decode/register file C A B read Register bank Sign extend

CBus ABus BBus ID/Ex registers

AMux ASL2 Ex Execute/address calculation

Adder ALU

Ex/Mem registers

L1 data cache Mem Data in Address Memory access

Data out

Mem/WB registers

WB CMux Write back

9781284079630_CH12_689_782.indd 763 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Figure 12.44

With pipelining

Cy cle 1 2 3 4 5 6 7 8 Time

Instru ction 1 IF ID Ex Mem WB

Instru ction 2 IF ID Ex Mem WB

Instru ction 3 IF ID Ex Mem WB

Instru ction 4 IF ID Ex Mem WB

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Hazards • Control hazards from unconditional and conditional branches • Data hazards from data dependencies between instructions • A hazard causes the instruction that cannot continue to stall, which creates a bubble in the pipeline that must be flushed out before peak performance can be resumed

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 764 CHAPTER 12 Computer Organization

FIGURE 12 . 44 Instruction execution with pipelining.

Cycle 1 2 43 65 87 Time

Instruction 1 IF ID Ex Mem WB

Instruction 2 IF ID Ex Mem WB

Instruction 3 IF ID Ex Mem WB

Instruction 4 IF ID Ex Mem WB

Computer Systems F I F T H E D I T I O N Figure 12.45 FIGURE 12 . 45 The effect of a hazard on a pipeline.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

(a) No hazards.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

(b) A branch hazard.

sixth instruction to execute, theCopyright second © 2017 byon Jones the & Bartlett second Learning, LLC an line Ascend Learningrepresents Company the seventh, and so on. Starting with cycle 5, the pipeline operates at peak ef ciency. Consider what happens when a branch instruction executes. Suppose instruction 7, which starts at cycle 7 on the second line, is a branch instruction. Assuming that the updated program counter is not available for the next instruction until the completion of this instruction, instruction

9781284079630_CH12_689_782.indd 764 29/01/16 8:27 am 12.3 The MIPS Machine 765

8 and every instruction af er it must stall. Figure 12. 45 (b) shows the bubble as not shaded. T e ef ect is as if the pipeline must start over at cycle 12. FIGURE 12.46 shows that branch instructions account for 15% of executing statements in a typical program on a MIPS machine. So roughly every seventh instruction must delay four cycles. T ere are several ways to reduce the penalty of a control hazard. Figure 12 . 45 (b) assumes that the result of the branch instruction is not available until af er the write-back stage. But branch instructions do not modify the register bank. So, to decrease the length of the bubble the Eliminate the write- system could eliminate the write-back stage of branch instructions under back stage of the branch the assumption that the next instruction has been delayed. T e extra instructions. control hardware to do that would decrease the length of the bubble from four cycles to three. Conditional branches present another opportunity to minimize the FIGURE 12 . 46 F I F T H E D I T I O N Figure 12.46 ef ects of the hazard. Suppose the branch penalty is three cycles with theComputer Systems Frequency of execution addition of the extra control hardware, and the computer is executing the of MIPS instructions. following MIPS program: Instruction Frequency beq $s1,$s2,4 add $s3,$s3,$s4 Arithmetic 50% sub $s5,$s5,$s6 Load/Store 35% andi $s7,$s7,15 sll $s0,$s0,2 Branch 15% ori $s3,$s3,1

T e f rst instruction is a branch if equal. T e address f eld is 4, which means Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company a branch to the fourth instruction af er the next one. Consequently, if the branch is taken, it will be to the ori instruction. If the branch is not taken, add will execute next. Figure 12 . 45 (b) shows a lot of wasted parallelism. While the bubble is being f ushed out of the pipeline, many stages are idle. You do not know whether you should be executing add , sub , and andi while waiting for the results of beq . But you can execute them anyway assuming that the branch is Assume that conditional not taken. If the branch is in fact not taken, you have eliminated the bubble branches are not taken. altogether. If it is taken, you are no worse of in terms of bubble delay than you would have been if you had not started the instruction af er beq . In that case, f ush the bubble from the pipeline. T e problem is the circuitry required to clean up your mess if your assumption is wrong and the branch is taken. You must keep track of any instructions in the interim, before you discover whether the branch is taken, and not allow them to irreversibly modify the data cache or register bank. When you discover that the branch is not taken, you can commit to those changes.

9781284079630_CH12_689_782.indd 765 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Speedup techniques • Eliminate the Write back stage of the branch instructions • Conditional branches ‣ Assume branch will not be taken ‣ Process the following instructions until you know for sure ‣ If you predict right, no wasted cycles

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Dynamic branch prediction • The higher the percentage of the time your prediction is correct, the better the performance • Use one bit to store whether the branch was taken previously • Predict that the branch will be taken this time if it was taken the previous time

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.47 One-bit dynamic branch prediction

Tak en No t tak en Tak en Predict Predict n o t tak en (0 ) tak en (1 )

No t tak en Start

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Nested loop pattern One-bit dynamic branch prediction

Taken: YYYYNYYYYNYYYYNYYYYN Prediction: NYYYYNYYYYNYYYYNYYYY Incorrect: xxxxxxxx

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Two-bit dynamic branch prediction • If you have a run of branches taken and you encounter one branch not taken, do not change your prediction right away • To change your prediction, you must get two consecutive identical branch types

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 12.3 The MIPS Machine 767

encounter one branch not taken, you do not change your prediction right away. T e criterion to change is to get two consecutive branches not taken. FIGURE 12.48 shows that you can be in one of four states. T e two shaded states are the ones where you have a run of two consecutive identical branch types—the two previous branches either both taken or both not taken. T e states not shaded are for the situation where the two previous branches were dif erent. If you have a run of branches taken, you are in state 00. If the next branch is not taken, you go to state 01, but still predict the branch af er that will be taken. A trace of the f nite-state machine of Figure 12. 48 with the branch sequence above shows that the prediction is correct every time af er the outer loop gets started. Another technique used on deep pipelines, where the penalty would be more severe if your branch assumption is not correct, is to have duplicate Build two pipelines. pipelines with duplicate program counters, fetch circuitry, and all the rest. When you decode a branch instruction, you initiate both pipelines, f lling one with instructions assuming the branch is not taken and the other assuming it is. When you f nd out which pipeline is valid, you discard the other and continue on. T is solution is quite expensive, but you have no bubbles regardless of whether the branch is taken or not. A data hazard happens when one instruction needs the result of a previous instruction and must stall until it gets it. It is called a read-af er-write (RAW) hazard. An example is the code sequence

add $s2,$s2,$s3 # write $s2 A RAW data hazard sub $s4,$s4,$s2 # read $s2

Computer Systems F I F T H E D I T I O N Figure 12.48

FIGURE 12Two-bit . 48 dynamic branch prediction The state transition diagram for two-bit dynamic branch prediction.

Not taken Taken Predict Predict taken (00) taken (01) Taken

Taken Not taken

Not taken

Predict Predict Not taken not taken (11) not taken (10) Taken

Start

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 767 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Duplicate pipelines • One pipeline assuming the branch is taken • One pipeline assuming the branch is not taken • When you find out which pipeline is valid discard the other pipeline

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Data hazard • One instruction needs the result of a previous instruction • Must stall until it gets the result • Called read-after-write (RAW) hazard

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company 768 CHAPTER 12 Computer Organization

T e add instruction changes the value of $s2, which is used in the sub instruction. Figure 12. 43 shows that the add instruction will update the value of $s2 at the end of the write-back stage, WB. Also, the sub instruction reads from the register bank at the end of the instruction decode/register f le read stage, ID. FIGURE 12.49(a) shows two instructions without a RAW hazard, and 12.49(b) shows how the two instructions must overlap with the data hazard. T e sub instruction’s ID stage must come af er the add instruction’s WB stage. T e result is that the sub instruction must stall, creating a three-cycle bubble. If there were another instruction with no hazard between add and sub , the bubble would be only two cycles long. With two hazardless instructions between them, the bubble gets reduced to one cycle in length, and with three, it disappears altogether. T is observation brings up a possibility: If you could f nd some hazardless instructions nearby that needed to be executed sometime anyway, why not just execute them out of order, sticking them in between the add and sub instructions to f ll up the bubble? You might object that mixing up the order in which instructions are executed will change the results of the algorithm. T at is true in some cases, but not in Instruction reordering all. If there are many arithmetic operations in a block of code, an optimizing compiler can analyze the data dependencies and rearrange the statements to reduce the bubbles in the pipeline without changing the result of the algorithm. Alternatively, a human assembly language programmer can do the same thing. T is is an example of the price to be paid for abstraction. A level of abstraction is supposed to simplify computation at one level by hiding the details at a lower level. It would certainly be simpler if the assembly language

Computer Systems F I F T H E D I T I O N Figure 12.49 FIGURE 12 . 49 The effect of a RAW data hazard on a pipeline.

Instruction 1 IF ID Ex Mem WB

Instruction 2 IF ID Ex Mem WB

(a) Consecutive instructions without a RAW hazard.

add $s2,$s2,$s3 IF ID Ex Mem WB

sub $s4,$s4,$s2 IF ID Ex Mem WB

(b) Consecutive instructions with a RAW hazard.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company

9781284079630_CH12_689_782.indd 768 29/01/16 8:27 am Computer Systems F I F T H E D I T I O N Instruction reordering • Find some later instructions that can be executed out of order with no change to the results of the program • Execute them after the conditional branch • Each instruction that can be executed out of order decreases the bubble penalty by one cycle

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Abstraction versus performance • The compiler must help construct the assembly language program knowing the lower-level details of the pipeline • Adds complexity to the compiler, but gains performance

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Data forwarding • For RAW hazard of Figure 12.49(b) • Construct a data path from Ex/Mem ALU register to a special ID/Ex register • Reduces the bubble penalty in Figure 12.49(b) from three cycles to one

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Superscalar machines • Based on the fact that two instructions with no data dependencies can execute in parallel • Two approaches ‣ Multiple pipelines ‣ Multiple execution units • Out-of-order execution inherent in both approaches

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Figure 12.50

ID EX Mem WB

IF

ID EX Mem WB

(a) Du al pipelines.

ALU

IF ID ALU Mem WB

FP

(b) Mu ltiple ex ecu tion u nits.

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Write-after-read hazard • Write-after-read hazard (WAR) occurs when one instruction writes to a register after the previous instruction reads it • Example add $s3,$s3,$s2 # read $s2 sub $s2,$s4,$s5 # write $s2 • Only a potential problem with out-of-order execution

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N The megahertz myth • Of two different machines with two different megahertz ratings, the one with the higher rating is the one with better performance • Only true if you compare identical machines with two different megahertz ratings • Cycles per second must be multiplied by work done per cycle • The myth is valuable for advertising

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Simplifications • Edge-triggered flip-flops are more common than master-slave • Timing constraints on buses are more severe and bus protocols are more complex • Input/Output subsystems are more complex

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Direct memory access • A DMA controller, instead of the CPU, controls the bus • Data can flow between disk and memory in parallel with the CPU performing computation work for applications • Requires an arbitration protocol when two devices want to use the bus at the same time and a system of interrupt priority levels

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Finite state machines • Finite state machines are the basis of all computing ‣ The is a finite state machine with an infinite tape ‣ FSMs are the basis of lexical analysis ‣ A computer is one big FSM with its state stored in the flip-flops of its circuitry

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company Computer Systems F I F T H E D I T I O N Abstraction • Simplicity is the key to harnessing complexity • It is possible to construct a machine as complicated as a computer only because of the simple concepts that govern its behavior at every level of abstraction

Copyright © 2017 by Jones & Bartlett Learning, LLC an Ascend Learning Company