Class Feedback
• How is the 1st Lecture on the Computer Architecture Overview?
– Is the amount of info too much, or not enough ?
COMP 212 Lecture 2 – Is it difficult to understand ? Which part is difficult?
Fall Semester, 2008 – Is the pace of the lecture okay ? Too fast , or about right ?
– Other questions ? Instruction Cycles & Bus • My expectations: you will be okay if Interconnect – Some gory details are for your information only, no need to meorize them. – Understand lecture summary and review questions.
– Do the homework,
Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008
Lecture #1 Summary Lecture #1 Summary
• Functional overview of computer: • Computer Components & Organization
– Data processing: – CPU: computing and logics, process data, e.g. addition, multiplication
» e.g. play MP3 music, encoding a picture into JPEG – Memory: store data for processing and presentation
– Data movement: – I/O: communicate with computer peripherals like keyboard, camera, » moving data between computer and peripherals, or communicate with disks. remote devices, – Bus: mechanism for inter-connecting CPU, Mem and I/O devices » e.g, email, web access, youtube (invented with DEC PDP-8) – Data storage: • All modern computers are von Neumann architecture » store data for future use – Which has a stored program, instead of re-wiring circuits » Database, personal picture store – First implementation, IAS computer at Princeton
Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Lecture #1 Summary Number System
• Modern Computer Has been thru 3 Generations • Relationship between Decimal, Binary and Hex numbers:
– Early years are Vacuum Tubes, e.g ENIAC Decimal Binary Hex – Second generations are Transistor based 0 0000 0h 1 0001 1h – Third generations are Integrated Circuits Based 2 0010 2h 3 0011 3h 4 0100 4h • Key Tricks for Architectural improvements 5 0101 5h 6 0110 6h – Pipeline: break up command executions into stages, allow parallelism 7 0111 7h 8 1000 8h among commands 9 1001 9h 10 1010 Ah – Superscalar: 11 1011 Bh 12 1100 Ch – 13 1101 Dh Multi-Core: parallel processors, APP level parallelism with OS support 14 1110 Eh 15 1111 Fh
Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008
Overview of Lecture #2 Re-Cap of von Neumann Machine
• In this lecture we will discuss computer component inter- • Von Neumann Architecture connect, – Data & Programs are stored in a single addressable memory
– Overview of Computer Functions and Components – Content of memory is located/retrieved by address
– Instruction Cycles and Interrupts – Sequential execution of commands
– Bus Structure and example: PCI
Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Program Concept What is a program?
• Why program? • A sequence of steps
– Hardwired systems are inflexible • For each step, an arithmetic or logical operation is done » You don’t want to have a computer each applications, like, email, mp3 • For each operation, a different set of control signals is playback, calendar, web ..etc. generated for the hardware – General purpose hardware can do different tasks, given correct control signals
» Different state of the hardware.
– Instead of re-wiring, supply a new set of control signals to achieve new functionality
Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008
Function of Control Unit Functional Components
• For each operation a unique code is provided • CPU:
– e.g. ADD, MOVE – The Control Unit and the Arithmetic and Logic Unit constitute the
• A hardware segment accepts the code and issues the Central Processing Unit control signals • I/O:
– Will have examples in BUS operations – Data and instructions need to get into the system and results out
• We have a computer! • Storage of code and results is needed – Main memory for temporary storage
– Disk for permanent storage.
Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 An abstract von Neumann computer
• CPU operates on a set of AC registers
• Program related: How is the program executed inside a computer ? – PC: program counter
– IR: instruction reg
• Mem access:
– MAR
– MBR
• I/O:
– IAR
– IBR
Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008
Instruction and data in computer Instruction Cycle
• Two steps:
– Fetch:
– Execute
• Instruction: 16 bit
– 4 bit Op code: total 16 possible instructions,
» e.g 0001: load mem data at address to AC, 0010: store AC to mem at address
– 12 bit address, or operand:
» Can access 4096 different addresses in memory
Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Fetch Cycle 4 types of instructions • Processor fetches instruction • Data move: CPU <-> Memory from memory location pointed to by PC (Program counter) – Data movement instructions between CPU and Mem • Increment PC: PC=PC+1 – Load mem data into CPU registers • Instruction loaded into – Store CPU register value to Mem Instruction Register (IR) – Expressed as OpCode = LOAD, Oprand= Addr • Processor interprets instruction and performs • Data move: CPU <-> I/O required actions-> go to the exec cycle – Data transfer between CPU and I/O module – OpCode = LOAD, Oprand = I/O port
Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008
4 types of instructions Example Instruction Execution
• Data processing • Consider an Addition task
– Some arithmetic or logical operation on data, eg. Addition, – Y = [490] + [491]
multiplication – Add data in mem location 490 with that of 491 and store the result Y • Control in [491]
– Alteration of sequence of operations • Registers used in CPU
– e.g. jump – PC : program counter, start at 300, for example
• Combination of above – AC: CPU register for addition – IR: instruction reg in CPU
– MBR/MAR: mem buffer and addr registers
Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Addition Execution -1: Fetch Load Command Addition Execution-2: Exec Load Data
• PC=PC+1, PC=301
• Data at [940]=0003 moved to CPU Reg AC
– Load MAR 940, execute load mem
• Current PC=300 – MBR = 003
• Fetch the instruction at [300] to IR – Load AC from MBR
• IR = [1, 940] , • AC = 0003. – opcode=001, LOAD AC, Addr = 940
Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008
Addition Execution -3: Fetch ADD Command Addition Execution-4: Exec Add
• PC=PC+1, PC=302
• Data at [941]=0002 add to CPU Reg AC
• AC = 0003+0002=0005. • Current PC=301
• Fetch the instruction at [301] to IR
• IR = [5, 941] ,
– opcode=101, ADD AC from mem address, Addr = 941
Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Addition Execution -5: Fetch Store Command Addition Execution-6: Exec Store AC
• Current PC=302 • PC=PC+1, PC=303
• Fetch the instruction at [302] to • Data at AC=0005 store to memory IR location [941]. – • IR = [2, 941] , Set MAR = 941 – Set MBR = AC =0005 – opcode=010, store AC content to – memory with address 941 Execute mem store.
Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008
Summary of Addition Execution Instruction Cycle State Diagram (more detailed) Mem, I/O related • It involves 3 main instructions
– Load data from [940] to AC
– Add AC with [941]
– Store AC to [941]
• Total 3 fetches and 3 executions
• To move data between CPU and Mem, MBR and MAR are used.
CPU insider operations
Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Instruction Cycle Details Instruction Cycle Details
• IAC (instruction addr calc) • OAC (operand addr calc)
– PC=PC+1 – If the operand involves a mem location or
• IF (instruction fetch) I/O , compute the address of the operand
– Load [PC] to IR • OF (operand fetch) • IAD (instruction operation decoding) – Get the operand from mem or IO – Op code decoding, what to do: e.g
– 0001: Load Mem to AC
– 0010: Store AC to Mem
– 0101: Add AC from Mem
– 1101: Read I/O reg IOBR
Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008
Instruction Cycle Details Program Control with a more powerful CPU
• von Neumann machine with the following accessible registers and their reg • DO (data operation) address: 0000 0001 – ALU operations on CPU registers – R0, R1, R2, R3: general purpose registers, R0 R1 0010 0011 can load and save to memory directly R2 R3 – AC: addition register, can only move 0100 0101 between general registers, and do addition PC AC • OS (operand store) 0110 0111 – PC: program counter – Write the results in CPU registers back to MAR MBR – MAR: mem addr register mem or I/O devices, which involves another – MBR: mem buf register round of OAC. – IR: not directly accessible.
Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z. Li, 2008 Adding an array of data Instruction Set and Machine Code • Compute the summation • Data move between registers of data in [940], [941], …, [949], store it in [940] • Using program loop control for the execution, do not repeat addition 10 times. • How to do it ? – MOVE R1, R2; 0001 0000 0001 0010 – MOVE AC, R1; 0001 0000 0101 0001 – MOVE MAR, 0; 0011 0000 0000 0000
Comp 212 Computer Org & Arch 33 Z. Li, 2008 Comp 212 Computer Org & Arch 34 Z. Li, 2008
Data move between registers and memory ALU and Program Control
• Stop the program:
MOVE MAR, 540h; 0011 0101 0100 0000 • Jump on AC=0, reset PC to value hhh:
LOAD R1; 0101 0000 0000 0001
SAVE R2; 0110 0000 0000 0010
Comp 212 Computer Org & Arch 35 Z. Li, 2008 Comp 212 Computer Org & Arch 36 Z. Li, 2008 Loop programming with JUMP Loop programming with JUMP
• Load memory data to R2: • Decrement loop count R3
– MOVE MAR, 540h; – MOVE AC, -1;
– LOAD R2; – ADD AC, R3;
• Init Loop Count R3 – MOVE R3, AC;
– MOVE AC, 10; • JUMP to xxx, if loop count = 0
– MOVE R3, AC; – MOVE AC, R3;
• Increment MAR, MAR=MAR+1; – JMP xxxh;
– MOVE AC, 1;
– ADD AC, MAR;
– MOVE MAR, AC;
Comp 212 Computer Org & Arch 37 Z. Li, 2008 Comp 212 Computer Org & Arch 38 Z. Li, 2008
Put them all together Interrupts Address Assembly Code Machine Code % load R2 with the first array value • Purpose: 370h MOVE MAR, 940h; 371h LOAD R2; – Mechanism by which other modules (e.g. I/O) may interrupt normal % loop count in R3 sequence of processing, instead of waiting the slow IO device to 372h MOVE AC, 9; 373h MOVE R3, AC; finish task in the instruction cycle % MAR=MAR+1 374h MOVE AC, 1; • Interrupts types: 375h ADD AC, MAR; 376h MOVE MAR, AC; – Program, e.g. divide by zero % add R2 with [MAR] 377h LOAD R1; » e.g. overflow, division by zero 378h ADD R1, R2; 379h MOVE R2, AC; – Timer % R3=R3-1; » Generated by internal processor timer 380h MOVE AC, -1; 381h ADD AC, R3; » Used in pre-emptive multi-tasking 382h MOVE R3, AC; %Jump – I/O 383h MOVE AC, R3; » from I/O controller, when disk read is finished, e.g. 384h JMP 900h; 385h HALT; – Hardware failure: eg. memory parity error
Comp 212 Computer Org & Arch 39 Z. Li, 2008 Comp 212 Computer Org & Arch 40 Z. Li, 2008 Program Flow Control – No Interrupts Program Flow Control – short I/O wait • I/O operations • Program executes normally, finish – Prep for I/O, init I/O , seg 4, issuing I/O command and continue – Actual I/O initiation, the slow part with the next instruction (2) – Finishing I/O, retrieve results from IO • The interrupt mechanism allow I/O Regs, reset flags, in seg 5. operations to interrupt CPU when • Program waits for (slow) I/O slow I/O operations are finished Operations to finish • CPU execute Interrupt Handler in – Segments 1, 2, 3 are normally very fast (5) to retrieve data – Segments 4 and 5 have a very slow IO operations in between • The actual slow I/O part is not – Execution sequences: stopping the CPU from executing » 1->4->5->2->4->[actual slow I/O]->5->3 other instructions
Comp 212 Computer Org & Arch 41 Z. Li, 2008 Comp 212 Computer Org & Arch 42 Z. Li, 2008
Transfer of Control via Interrupts Instruction Cycle with Interrupts
– Interrupts remain pending and are checked after first interrupt has been processed, interrupts handled in sequence as they occur
– Interrupt is identified by an Int #, which points to a specific Int Service Routine (ISR), aka Interrupt Handler.
– Auto saving of the normal instruction execution context, exec of ISR, and then return to normal program with saved context Comp 212 Computer Org & Arch 43 Z. Li, 2008 Comp 212 Computer Org & Arch 44 Z. Li, 2008 Instruction Cycle (with Interrupts) - State Diagram Multiple Interrupts
• Disable interrupts – Sequential Processing
– Processor will ignore further interrupts whilst processing one interrupt
– Interrupts remain pending and are checked after first interrupt has been processed
– Interrupts handled in sequence as they occur
• Define priorities – Allow Nested Interrupts
– Low priority interrupts can be interrupted by higher priority – With interrupt mechanism, a small cost in CPU cylce, i.e Interrupt interrupts
Check can allow for parallelism between fast instructions and slow – When higher priority interrupt has been processed, processor I/O operations. returns to previous interrupt
Comp 212 Computer Org & Arch 45 Z. Li, 2008 Comp 212 Computer Org & Arch 46 Z. Li, 2008
Multiple Interrupts - Sequential Multiple Interrupts – Nested
Comp 212 Computer Org & Arch 47 Z. Li, 2008 Comp 212 Computer Org & Arch 48 Z. Li, 2008 Time Sequence of Multiple Interrupts
Components interconnect and Bus structure (mostly informational)
Comp 212 Computer Org & Arch 49 Z. Li, 2008 Comp 212 Computer Org & Arch 50 Z. Li, 2008
Connecting Data Flows
• All the units must be connected • Data flow directions
– Memory – Memory -> CPU
» Typcially N words of k bits (k=8, 16, 32, 64) word, addressed by its word » Most common, load data and instructions address starting from 0, 1, 2, ….N-1 – CPU-> Memory
» Fast » Save data to memeory
– Input/Output – I/O -> CPU » » Functionally similar to mem, reserved address space Send data and interrupts to CPU
» Slow – CPU -> I/O » Send data, control signals to I/O device – CPU – I/O <-> Memory » Reads instructions and data » Direct Mem Access (DMA) , highly efficient, minimum CPU involvement » Issue control signals and receive interrupts
Comp 212 Computer Org & Arch 51 Z. Li, 2008 Comp 212 Computer Org & Arch 52 Z. Li, 2008 Bus Interconnection Scheme • Bus Structure (invented with PDP-8) – A common path way with width k-bit, k=64, 128, e.g. – Each line can transmit • 3 types of bus lines 1 or 0 over time, – Data lines: moving data between CPU, I/O ports and Mem – Connecting CPU, mem – Addr lines: to specify address of mem, I/O ports and I/O devices – Control lines: carry control signals and interrupts among components
Comp 212 Computer Org & Arch 53 Z. Li, 2008 Comp 212 Computer Org & Arch 54 Z. Li, 2008
CPU connection to Bus Mem connection to Bus
• Reads instruction and data • Receives and sends data
• Writes out data (after processing) • Receives addresses (of locations)
• Sends control signals to other units • Receives control signals – • Receives (& acts on) interrupts Read – Write
– Timing
Comp 212 Computer Org & Arch 55 Z. Li, 2008 Comp 212 Computer Org & Arch 56 Z. Li, 2008 I/O connection to Bus Input/Output Connection(2)
• Receive control signals from computer
• Send control signals to peripherals
– e.g. spin disk
• Receive addresses from computer • Output – e.g. port number to identify peripheral – Receive data from computer • Send interrupt signals (control) – Send data to peripheral
• Input
– Receive data from peripheral
– Send data to computer
Comp 212 Computer Org & Arch 57 Z. Li, 2008 Comp 212 Computer Org & Arch 58 Z. Li, 2008
Communication over Bus Data Bus
• A communication pathway connecting two or more devices • Carries data
• Usually broadcast – Both instruction and data goes thru this bus
• Often grouped • Width is a key determinant of performance
– A number of channels in one bus – Consists of 32 to hundreds of separate lines, called bus “width”
– e.g. 32 bit data bus is 32 separate single bit channels – Width affects the system performance, e.g. if instruction is 32bit, bus is 16 bit, then it takes 2 mem access to load 1 instruction • Power lines may not be shown
Comp 212 Computer Org & Arch 59 Z. Li, 2008 Comp 212 Computer Org & Arch 60 Z. Li, 2008 Address bus Control Bus
• Identify the source or destination of data • Control and timing information
– e.g. CPU needs to read an instruction (data) from a given location in – Memory read/write signal
memory – IO read/write • Bus width determines maximum memory capacity of – Transfer ACK: indicate data has been put on or retrieved on the bus system – Bus req: request a module want to gain control of bus
– e.g. 8080 has 16 bit address bus giving 64k address space – Bus grant: indicate req of bus approved
– MIPS has 32 bit address bus, so it has 4GB address space – Interrupt req: request an interrupt of CPU – • Address bus also used to identify IO devices Interrupt ACK: the interrupt has been recognized by CPU – Clock signals – IO device registers appear as memory locations. – Reset: init all modules, e.g, Ctrl-Alt-Del on a PC.
Comp 212 Computer Org & Arch 61 Z. Li, 2008 Comp 212 Computer Org & Arch 62 Z. Li, 2008
Single Bus Problems Multi-Bus System
• Bus is broadcasting so single bus could become a problem • Has a hierarchy of Buses if – Local Bus:
– Propagation delays due to longer bus to accommodate more devices » Connect CPU with Cache and other high speed devices
– Total traffic generated by components, especially multimedia devices – System Bus: are exceeding Bus bandwidth » Connect Mem with CPU (via Cache) – • Most systems use multiple buses to overcome these Expansion Bus: » Connect nework, disk and other devices with varying speed problems
Comp 212 Computer Org & Arch 63 Z. Li, 2008 Comp 212 Computer Org & Arch 64 Z. Li, 2008 Traditional ISA Bus with Cache High Performance Bus
Comp 212 Computer Org & Arch 65 Z. Li, 2008 Comp 212 Computer Org & Arch 66 Z. Li, 2008
Bus Design Elements Bus Types
• Dedicated
– Separate data & address lines
• Multiplexed
– Shared lines
– Address valid or data valid control line to indicate data/addr
– Advantage - fewer lines
– Disadvantages
» More complex control circuits
» Performance penalty when certain events can’t be parallelized
Comp 212 Computer Org & Arch 67 Z. Li, 2008 Comp 212 Computer Org & Arch 68 Z. Li, 2008 Bus Arbitration Bus Width
• Only one module may control bus at one time • Bus width has direct impact on system performance
• Arbitration may be centralised or distributed – If data bus width is 16, then each mem access can move 2 bytes, if width is 32, can move 4 bytes – Centralised – If address bus width is w, then can address 2w mem space locations » Single hardware device controlling bus access
– Bus Controller
– Arbiter
» May be part of CPU or separate
– Distributed
» Each module may claim the bus
» Control logic on all modules to implement an arbitration protocol
Comp 212 Computer Org & Arch 69 Z. Li, 2008 Comp 212 Computer Org & Arch 70 Z. Li, 2008
Bus Timing Synchronous Timing Diagram
• Co-ordination of events on bus • A Clock “paces” all actions • CPU puts address on address • Asynchronous lines, and set status line to – Events determined by the execution sequence, indicate mem access
– Allows mixture of fast and slow devices • Address enable after stable address line • Synchronous • Issue read at T2, read – Events determined by clock signals, easier to implement happens in T3, drop read line – Control Bus includes clock line after read complete
– A single 1-0 is a bus cycle • For write, CPU put data on – All devices can read clock line data line, issue write,
– Usually a single cycle for an event • Drop write after completion.
Comp 212 Computer Org & Arch 71 Z. Li, 2008 Comp 212 Computer Org & Arch 72 Z. Li, 2008 Asynchronous Timing – Read Diagram Asynchronous Timing – Write Diagram
Comp 212 Computer Org & Arch 73 Z. Li, 2008 Comp 212 Computer Org & Arch 74 Z. Li, 2008
Example: PCI Bus PCI Bus Expansion Slots on a typical PC mother board • Peripheral Component Interconnection by Intel • 32 or 64 bit data lines, at 66Mhz, 528MB/s
Comp 212 Computer Org & Arch 75 Z. Li, 2008 Comp 212 Computer Org & Arch 76 Z. Li, 2008 PCI Bus Lines (required) PCI Bus Lines (Optional)
• Systems lines • Interrupt lines
– Including clock and reset – Not shared
• Address & Data • Cache support
– 32 time mux lines for address/data • 64-bit data Bus Extension
– Interrupt & validate lines – Additional 32 lines
• Interface Control – Time multiplexed
• Arbitration – 2 lines to enable devices to agree to use 64-bit transfer
– Not shared • JTAG/Boundary Scan
– Direct connection to PCI bus arbiter – For testing procedures
• Error lines Comp 212 Computer Org & Arch 77 Z. Li, 2008 Comp 212 Computer Org & Arch 78 Z. Li, 2008
PCI Commands for data transfer PCI Read Timing Diagram
• Transaction between initiator (master) and target
• Master claims bus
• Determine type of transaction
– e.g. I/O read/write
• Address phase
• One or more data phases
Comp 212 Computer Org & Arch 79 Z. Li, 2008 Comp 212 Computer Org & Arch 80 Z. Li, 2008 PCI Bus Arbiter
• Centralized solution
– Each component has a req and grant line attached to the arbiter
Comp 212 Computer Org & Arch 81 Z. Li, 2008