Class Feedback

• How is the 1st Lecture on the Overview?

– Is the amount of info too much, or not enough ?

COMP 212 Lecture 2 – Is it difficult to understand ? Which part is difficult?

Fall Semester, 2008 – Is the pace of the lecture okay ? Too fast , or about right ?

– Other questions ? Instruction Cycles & • My expectations: you will be okay if Interconnect – Some gory details are for your information only, no need to meorize them. – Understand lecture summary and review questions.

– Do the homework,

Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008

Lecture #1 Summary Lecture #1 Summary

• Functional overview of computer: • Computer Components & Organization

– Data processing: – CPU: computing and logics, data, e.g. addition, multiplication

» e.g. play MP3 music, encoding a picture into JPEG – Memory: store data for processing and presentation

– Data movement: – I/O: communicate with computer peripherals like keyboard, camera, » moving data between computer and peripherals, or communicate with disks. remote devices, – Bus: mechanism for inter-connecting CPU, Mem and I/O devices » e.g, email, web access, youtube (invented with DEC PDP-8) – Data storage: • All modern computers are » store data for future use – Which has a stored program, instead of re-wiring circuits » Database, personal picture store – First implementation, IAS computer at Princeton

Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Lecture #1 Summary Number System

• Modern Computer Has been thru 3 Generations • Relationship between Decimal, Binary and Hex numbers:

– Early years are Vacuum Tubes, e.g ENIAC Decimal Binary Hex – Second generations are Transistor based 0 0000 0h 1 0001 1h – Third generations are Integrated Circuits Based 2 0010 2h 3 0011 3h 4 0100 4h • Key Tricks for Architectural improvements 5 0101 5h 6 0110 6h – Pipeline: break up command executions into stages, allow parallelism 7 0111 7h 8 1000 8h among commands 9 1001 9h 10 1010 Ah – Superscalar: 11 1011 Bh 12 1100 Ch – 13 1101 Dh Multi-Core: parallel processors, APP level parallelism with OS support 14 1110 Eh 15 1111 Fh

Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008

Overview of Lecture #2 Re-Cap of von Neumann Machine

• In this lecture we will discuss computer component inter- • Von Neumann Architecture connect, – Data & Programs are stored in a single addressable memory

– Overview of Computer Functions and Components – Content of memory is located/retrieved by address

– Instruction Cycles and Interrupts – Sequential execution of commands

– Bus Structure and example: PCI

Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Program Concept What is a program?

• Why program? • A sequence of steps

– Hardwired systems are inflexible • For each step, an arithmetic or logical operation is done » You don’t want to have a computer each applications, like, email, mp3 • For each operation, a different set of control signals is playback, calendar, web ..etc. generated for the hardware – General purpose hardware can do different tasks, given correct control signals

» Different state of the hardware.

– Instead of re-wiring, supply a new set of control signals to achieve new functionality

Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008

Function of Functional Components

• For each operation a unique code is provided • CPU:

– e.g. ADD, MOVE – The Control Unit and the Arithmetic and Logic Unit constitute the

• A hardware segment accepts the code and issues the control signals • I/O:

– Will have examples in BUS operations – Data and instructions need to get into the system and results out

• We have a computer! • Storage of code and results is needed – Main memory for temporary storage

– Disk for permanent storage.

Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 An abstract von Neumann computer

• CPU operates on a set of AC registers

• Program related: How is the program executed inside a computer ? – PC: program

– IR: instruction reg

• Mem access:

– MAR

– MBR

• I/O:

– IAR

– IBR

Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008

Instruction and data in computer Instruction Cycle

• Two steps:

– Fetch:

– Execute

• Instruction: 16 bit

– 4 bit Op code: total 16 possible instructions,

» e.g 0001: load mem data at address to AC, 0010: store AC to mem at address

– 12 bit address, or operand:

» Can access 4096 different addresses in memory

Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Fetch Cycle 4 types of instructions • fetches instruction • Data move: CPU <-> Memory from memory location pointed to by PC () – Data movement instructions between CPU and Mem • Increment PC: PC=PC+1 – Load mem data into CPU registers • Instruction loaded into – Store CPU register value to Mem (IR) – Expressed as OpCode = LOAD, Oprand= Addr • Processor interprets instruction and performs • Data move: CPU <-> I/O required actions-> go to the exec cycle – Data transfer between CPU and I/O module – OpCode = LOAD, Oprand = I/O port

Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008

4 types of instructions Example Instruction Execution

• Data processing • Consider an Addition task

– Some arithmetic or logical operation on data, eg. Addition, – Y = [490] + [491]

multiplication – Add data in mem location 490 with that of 491 and store the result Y • Control in [491]

– Alteration of sequence of operations • Registers used in CPU

– e.g. jump – PC : program counter, start at 300, for example

• Combination of above – AC: CPU register for addition – IR: instruction reg in CPU

– MBR/MAR: mem buffer and addr registers

Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Addition Execution -1: Fetch Load Command Addition Execution-2: Exec Load Data

• PC=PC+1, PC=301

• Data at [940]=0003 moved to CPU Reg AC

– Load MAR 940, execute load mem

• Current PC=300 – MBR = 003

• Fetch the instruction at [300] to IR – Load AC from MBR

• IR = [1, 940] , • AC = 0003. – opcode=001, LOAD AC, Addr = 940

Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008

Addition Execution -3: Fetch ADD Command Addition Execution-4: Exec Add

• PC=PC+1, PC=302

• Data at [941]=0002 add to CPU Reg AC

• AC = 0003+0002=0005. • Current PC=301

• Fetch the instruction at [301] to IR

• IR = [5, 941] ,

– opcode=101, ADD AC from mem address, Addr = 941

Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Addition Execution -5: Fetch Store Command Addition Execution-6: Exec Store AC

• Current PC=302 • PC=PC+1, PC=303

• Fetch the instruction at [302] to • Data at AC=0005 store to memory IR location [941]. – • IR = [2, 941] , Set MAR = 941 – Set MBR = AC =0005 – opcode=010, store AC content to – memory with address 941 Execute mem store.

Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008

Summary of Addition Execution Instruction Cycle State Diagram (more detailed) Mem, I/O related • It involves 3 main instructions

– Load data from [940] to AC

– Add AC with [941]

– Store AC to [941]

• Total 3 fetches and 3 executions

• To move data between CPU and Mem, MBR and MAR are used.

CPU insider operations

Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Instruction Cycle Details Instruction Cycle Details

• IAC (instruction addr calc) • OAC (operand addr calc)

– PC=PC+1 – If the operand involves a mem location or

• IF (instruction fetch) I/O , compute the address of the operand

– Load [PC] to IR • OF (operand fetch) • IAD (instruction operation decoding) – Get the operand from mem or IO – Op code decoding, what to do: e.g

– 0001: Load Mem to AC

– 0010: Store AC to Mem

– 0101: Add AC from Mem

– 1101: Read I/O reg IOBR

Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008

Instruction Cycle Details Program Control with a more powerful CPU

• von Neumann machine with the following accessible registers and their reg • DO (data operation) address: 0000 0001 – ALU operations on CPU registers – R0, R1, R2, R3: general purpose registers, R0 R1 0010 0011 can load and save to memory directly R2 R3 – AC: addition register, can only move 0100 0101 between general registers, and do addition PC AC • OS (operand store) 0110 0111 – PC: program counter – Write the results in CPU registers back to MAR MBR – MAR: mem addr register mem or I/O devices, which involves another – MBR: mem buf register round of OAC. – IR: not directly accessible.

Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z. Li, 2008 Adding an array of data Instruction Set and Machine Code • Compute the summation • Data move between registers of data in [940], [941], …, [949], store it in [940] • Using program loop control for the execution, do not repeat addition 10 times. • How to do it ? – MOVE R1, R2; 0001 0000 0001 0010 – MOVE AC, R1; 0001 0000 0101 0001 – MOVE MAR, 0; 0011 0000 0000 0000

Comp 212 Computer Org & Arch 33 Z. Li, 2008 Comp 212 Computer Org & Arch 34 Z. Li, 2008

Data move between registers and memory ALU and Program Control

• Stop the program:

MOVE MAR, 540h; 0011 0101 0100 0000 • Jump on AC=0, reset PC to value hhh:

LOAD R1; 0101 0000 0000 0001

SAVE R2; 0110 0000 0000 0010

Comp 212 Computer Org & Arch 35 Z. Li, 2008 Comp 212 Computer Org & Arch 36 Z. Li, 2008 Loop programming with JUMP Loop programming with JUMP

• Load memory data to R2: • Decrement loop count R3

– MOVE MAR, 540h; – MOVE AC, -1;

– LOAD R2; – ADD AC, R3;

• Init Loop Count R3 – MOVE R3, AC;

– MOVE AC, 10; • JUMP to xxx, if loop count = 0

– MOVE R3, AC; – MOVE AC, R3;

• Increment MAR, MAR=MAR+1; – JMP xxxh;

– MOVE AC, 1;

– ADD AC, MAR;

– MOVE MAR, AC;

Comp 212 Computer Org & Arch 37 Z. Li, 2008 Comp 212 Computer Org & Arch 38 Z. Li, 2008

Put them all together Interrupts Address Assembly Code Machine Code % load R2 with the first array value • Purpose: 370h MOVE MAR, 940h; 371h LOAD R2; – Mechanism by which other modules (e.g. I/O) may interrupt normal % loop count in R3 sequence of processing, instead of waiting the slow IO device to 372h MOVE AC, 9; 373h MOVE R3, AC; finish task in the instruction cycle % MAR=MAR+1 374h MOVE AC, 1; • Interrupts types: 375h ADD AC, MAR; 376h MOVE MAR, AC; – Program, e.g. divide by zero % add R2 with [MAR] 377h LOAD R1; » e.g. overflow, division by zero 378h ADD R1, R2; 379h MOVE R2, AC; – Timer % R3=R3-1; » Generated by internal processor timer 380h MOVE AC, -1; 381h ADD AC, R3; » Used in pre-emptive multi-tasking 382h MOVE R3, AC; %Jump – I/O 383h MOVE AC, R3; » from I/O controller, when disk read is finished, e.g. 384h JMP 900h; 385h HALT; – Hardware failure: eg. memory parity error

Comp 212 Computer Org & Arch 39 Z. Li, 2008 Comp 212 Computer Org & Arch 40 Z. Li, 2008 Program Flow Control – No Interrupts Program Flow Control – short I/O wait • I/O operations • Program executes normally, finish – Prep for I/O, init I/O , seg 4, issuing I/O command and continue – Actual I/O initiation, the slow part with the next instruction (2) – Finishing I/O, retrieve results from IO • The interrupt mechanism allow I/O Regs, reset flags, in seg 5. operations to interrupt CPU when • Program waits for (slow) I/O slow I/O operations are finished Operations to finish • CPU execute Interrupt Handler in – Segments 1, 2, 3 are normally very fast (5) to retrieve data – Segments 4 and 5 have a very slow IO operations in between • The actual slow I/O part is not – Execution sequences: stopping the CPU from executing » 1->4->5->2->4->[actual slow I/O]->5->3 other instructions

Comp 212 Computer Org & Arch 41 Z. Li, 2008 Comp 212 Computer Org & Arch 42 Z. Li, 2008

Transfer of Control via Interrupts Instruction Cycle with Interrupts

– Interrupts remain pending and are checked after first interrupt has been processed, interrupts handled in sequence as they occur

– Interrupt is identified by an Int #, which points to a specific Int Service Routine (ISR), aka Interrupt Handler.

– Auto saving of the normal instruction execution context, exec of ISR, and then return to normal program with saved context Comp 212 Computer Org & Arch 43 Z. Li, 2008 Comp 212 Computer Org & Arch 44 Z. Li, 2008 Instruction Cycle (with Interrupts) - State Diagram Multiple Interrupts

• Disable interrupts – Sequential Processing

– Processor will ignore further interrupts whilst processing one interrupt

– Interrupts remain pending and are checked after first interrupt has been processed

– Interrupts handled in sequence as they occur

• Define priorities – Allow Nested Interrupts

– Low priority interrupts can be interrupted by higher priority – With interrupt mechanism, a small cost in CPU cylce, i.e Interrupt interrupts

Check can allow for parallelism between fast instructions and slow – When higher priority interrupt has been processed, processor I/O operations. returns to previous interrupt

Comp 212 Computer Org & Arch 45 Z. Li, 2008 Comp 212 Computer Org & Arch 46 Z. Li, 2008

Multiple Interrupts - Sequential Multiple Interrupts – Nested

Comp 212 Computer Org & Arch 47 Z. Li, 2008 Comp 212 Computer Org & Arch 48 Z. Li, 2008 Time Sequence of Multiple Interrupts

Components interconnect and Bus structure (mostly informational)

Comp 212 Computer Org & Arch 49 Z. Li, 2008 Comp 212 Computer Org & Arch 50 Z. Li, 2008

Connecting Data Flows

• All the units must be connected • Data flow directions

– Memory – Memory -> CPU

» Typcially N words of k bits (k=8, 16, 32, 64) word, addressed by its word » Most common, load data and instructions address starting from 0, 1, 2, ….N-1 – CPU-> Memory

» Fast » Save data to memeory

– Input/Output – I/O -> CPU » » Functionally similar to mem, reserved address space Send data and interrupts to CPU

» Slow – CPU -> I/O » Send data, control signals to I/O device – CPU – I/O <-> Memory » Reads instructions and data » Direct Mem Access (DMA) , highly efficient, minimum CPU involvement » Issue control signals and receive interrupts

Comp 212 Computer Org & Arch 51 Z. Li, 2008 Comp 212 Computer Org & Arch 52 Z. Li, 2008 Bus Interconnection Scheme • Bus Structure (invented with PDP-8) – A common path way with width k-bit, k=64, 128, e.g. – Each line can transmit • 3 types of bus lines 1 or 0 over time, – Data lines: moving data between CPU, I/O ports and Mem – Connecting CPU, mem – Addr lines: to specify address of mem, I/O ports and I/O devices – Control lines: carry control signals and interrupts among components

Comp 212 Computer Org & Arch 53 Z. Li, 2008 Comp 212 Computer Org & Arch 54 Z. Li, 2008

CPU connection to Bus Mem connection to Bus

• Reads instruction and data • Receives and sends data

• Writes out data (after processing) • Receives addresses (of locations)

• Sends control signals to other units • Receives control signals – • Receives (& acts on) interrupts Read – Write

– Timing

Comp 212 Computer Org & Arch 55 Z. Li, 2008 Comp 212 Computer Org & Arch 56 Z. Li, 2008 I/O connection to Bus Input/Output Connection(2)

• Receive control signals from computer

• Send control signals to peripherals

– e.g. spin disk

• Receive addresses from computer • Output – e.g. port number to identify peripheral – Receive data from computer • Send interrupt signals (control) – Send data to peripheral

• Input

– Receive data from peripheral

– Send data to computer

Comp 212 Computer Org & Arch 57 Z. Li, 2008 Comp 212 Computer Org & Arch 58 Z. Li, 2008

Communication over Bus Data Bus

• A communication pathway connecting two or more devices • Carries data

• Usually broadcast – Both instruction and data goes thru this bus

• Often grouped • Width is a key determinant of performance

– A number of channels in one bus – Consists of 32 to hundreds of separate lines, called bus “width”

– e.g. 32 bit data bus is 32 separate single bit channels – Width affects the system performance, e.g. if instruction is 32bit, bus is 16 bit, then it takes 2 mem access to load 1 instruction • Power lines may not be shown

Comp 212 Computer Org & Arch 59 Z. Li, 2008 Comp 212 Computer Org & Arch 60 Z. Li, 2008 Address bus Control Bus

• Identify the source or destination of data • Control and timing information

– e.g. CPU needs to read an instruction (data) from a given location in – Memory read/write signal

memory – IO read/write • Bus width determines maximum memory capacity of – Transfer ACK: indicate data has been put on or retrieved on the bus system – Bus req: request a module want to gain control of bus

– e.g. 8080 has 16 bit address bus giving 64k address space – Bus grant: indicate req of bus approved

– MIPS has 32 bit address bus, so it has 4GB address space – Interrupt req: request an interrupt of CPU – • Address bus also used to identify IO devices Interrupt ACK: the interrupt has been recognized by CPU – Clock signals – IO device registers appear as memory locations. – Reset: init all modules, e.g, Ctrl-Alt-Del on a PC.

Comp 212 Computer Org & Arch 61 Z. Li, 2008 Comp 212 Computer Org & Arch 62 Z. Li, 2008

Single Bus Problems Multi-Bus System

• Bus is broadcasting so single bus could become a problem • Has a hierarchy of Buses if – Local Bus:

– Propagation delays due to longer bus to accommodate more devices » Connect CPU with and other high speed devices

– Total traffic generated by components, especially multimedia devices – : are exceeding Bus bandwidth » Connect Mem with CPU (via Cache) – • Most systems use multiple buses to overcome these Expansion Bus: » Connect nework, disk and other devices with varying speed problems

Comp 212 Computer Org & Arch 63 Z. Li, 2008 Comp 212 Computer Org & Arch 64 Z. Li, 2008 Traditional ISA Bus with Cache High Performance Bus

Comp 212 Computer Org & Arch 65 Z. Li, 2008 Comp 212 Computer Org & Arch 66 Z. Li, 2008

Bus Design Elements Bus Types

• Dedicated

– Separate data & address lines

• Multiplexed

– Shared lines

– Address valid or data valid control line to indicate data/addr

– Advantage - fewer lines

– Disadvantages

» More complex control circuits

» Performance penalty when certain events can’t be parallelized

Comp 212 Computer Org & Arch 67 Z. Li, 2008 Comp 212 Computer Org & Arch 68 Z. Li, 2008 Bus Arbitration Bus Width

• Only one module may control bus at one time • Bus width has direct impact on system performance

• Arbitration may be centralised or distributed – If data bus width is 16, then each mem access can move 2 bytes, if width is 32, can move 4 bytes – Centralised – If address bus width is w, then can address 2w mem space locations » Single hardware device controlling bus access

– Bus Controller

– Arbiter

» May be part of CPU or separate

– Distributed

» Each module may claim the bus

» Control logic on all modules to implement an arbitration protocol

Comp 212 Computer Org & Arch 69 Z. Li, 2008 Comp 212 Computer Org & Arch 70 Z. Li, 2008

Bus Timing Synchronous Timing Diagram

• Co-ordination of events on bus • A Clock “paces” all actions • CPU puts address on address • Asynchronous lines, and set status line to – Events determined by the execution sequence, indicate mem access

– Allows mixture of fast and slow devices • Address enable after stable address line • Synchronous • Issue read at T2, read – Events determined by clock signals, easier to implement happens in T3, drop read line – Control Bus includes clock line after read complete

– A single 1-0 is a bus cycle • For write, CPU put data on – All devices can read clock line data line, issue write,

– Usually a single cycle for an event • Drop write after completion.

Comp 212 Computer Org & Arch 71 Z. Li, 2008 Comp 212 Computer Org & Arch 72 Z. Li, 2008 Asynchronous Timing – Read Diagram Asynchronous Timing – Write Diagram

Comp 212 Computer Org & Arch 73 Z. Li, 2008 Comp 212 Computer Org & Arch 74 Z. Li, 2008

Example: PCI Bus PCI Bus Expansion Slots on a typical PC mother board • Peripheral Component Interconnection by Intel • 32 or 64 bit data lines, at 66Mhz, 528MB/s

Comp 212 Computer Org & Arch 75 Z. Li, 2008 Comp 212 Computer Org & Arch 76 Z. Li, 2008 PCI Bus Lines (required) PCI Bus Lines (Optional)

• Systems lines • Interrupt lines

– Including clock and reset – Not shared

• Address & Data • Cache support

– 32 time mux lines for address/data • 64-bit data Bus Extension

– Interrupt & validate lines – Additional 32 lines

• Interface Control – Time multiplexed

• Arbitration – 2 lines to enable devices to agree to use 64-bit transfer

– Not shared • JTAG/Boundary Scan

– Direct connection to PCI bus arbiter – For testing procedures

• Error lines Comp 212 Computer Org & Arch 77 Z. Li, 2008 Comp 212 Computer Org & Arch 78 Z. Li, 2008

PCI Commands for data transfer PCI Read Timing Diagram

• Transaction between initiator (master) and target

• Master claims bus

• Determine type of transaction

– e.g. I/O read/write

• Address phase

• One or more data phases

Comp 212 Computer Org & Arch 79 Z. Li, 2008 Comp 212 Computer Org & Arch 80 Z. Li, 2008 PCI Bus Arbiter

• Centralized solution

– Each component has a req and grant line attached to the arbiter

Comp 212 Computer Org & Arch 81 Z. Li, 2008