CO200 - Computer Organization & Architecture

Basavaraj Talawar [email protected]

Course Syllabus ● Basics – CPU organization, Data representation and Instruction Sets ● Design – Fixed point arithmetic – Adders, Subtracters, Multipliers, Dividers. – ALU, Floating point arithmetic ● Control Design – Hardwired control, Microprogrammed control, Pipeline control ● Memory Organization – Serial vs. Random Access Memories – Caches, ● Principles of Pipelining ● Principles of

Course Structure

● Textbooks – J P Hayes, and Organization, 3 ed., McGraw Hill. – Hwang and Briggs, Computer Architecture and Parallel Processing, McGraw Hill. – D Patterson and J Hennessy, Computer Organization and Architecture, MK, 3 ed. ● Other References – NPTEL course on “High Performance Computing” by Matthew Jacob, IISc. ● Guest Lectures

● About Course – Surprise Quizzes – 15%, Assignments – 10%, Mid Sem – 25%, Final Exam – 50%

Course Objectives

● To understand how a computer works ● To know the architecture and working of components inside a computer – Processor, , ALU, Memory, I/O

Course Objectives – Expanded

● How is a machine language program executed by a computer?

● How does the software instruct the hardware to perform a desired action? How does the hardware instruct a desired unit to perform its corresponding operation?

● Why study all of this? – To gain insight into the setting in which our programs execute – To improve the setting in which our programs execute – to improve the performance of the system

What is a Computer?

What is a Computer?

● An electronic device which is capable of receiving information (data) in a particular form and of performing a sequence of operations in accordance with a predetermined but variable set of procedural instructions (program) to produce a result in the form of information or signals.

Basic Computer Organization

● Machine instructions – Description of a primitive operation that a machine hardware is able to understand – In binary – Example of a 32b machine language instruction

00110011101100000100001110101011

Basic Computer Organization

● Instruction Set – Complete specification of all the kinds of instructions that the processor hardware was built to execute – Eg.: ADD, SUB, XOR, JUMP, … ● How are programs written in high level languages such as C translated into a language that the machine understands?

The Computer Program

● Description of and data structures to achieve a specific objective ● A compiler translates the high level language into assembly language. ● An assembler translates the assembly into machine code.

Basic Computer Organization

● Processor – Executes programs ● Main Memory – Holds program and data ● I/O – For communication and data

Processor (CPU)

ALU REGISTERS MEMORY CONTROL

BUS

I/O I/O I/O I/O Inside the Processor

● Control Hardware: Hardware to manage instruction execution ● ALU: Arithmetic and Logical Unit (hardware to do arithmetic and logic operations) ● Registers: Small units of memory to hold data/instructions temporarily during execution ● Memory: Stores information being processed by the CPU ● Input: Allows the user to supply information to the computer ● Output: Allows the user to receive information from the computer

Physics in the Real World

Computer Architecture

Application

Algorithm Computer architecture is the design of the Programming Language abstraction/implement Operating System/Virtual Machines ation layers that allow Instruction Set Architecture us to execute Organization/ information Register-Transfer Level processing Gates applications efficiently using Circuits manufacturing Devices technologies Physics

Architecture vs. Organization

● Architecture/Instruction Set Architecture (ISA) – Programmer visible state (Memory & Registers) – Operations (Instructions and how they work) – Input/Output – Data Representation – Types/Sizes ● Microarchitecture/Organization: – Is the way a given ISA is implemented on a processor

Same Architecture, Different Organizations

● AMD Athlon II X4 ● Atom – ISA – X86 Instruction Set – Quad Core, 2.9GHz, 125W – Single Core, 1.6GHz, 2W – 3 Instructions/Cycle/Core – 2 Instructions/Cycle/Core – 64KB L1Cache, 512KB L2 – 32KB/24KB L1 I/D , 512KB Cache L2 Cache

Different Architectures, Organizations ● AMD Vishera ● IBM POWER 8 – X86 ISA – Power ISA – 8 Core, 4.7 GHz, 125W – 12 cores, 4.5GHz, 250W – 64KB L1Cache, 2MB L2 – 64KB L1Cache, 512KB L2 Cache, 8MB L3 Cache, 8MB L3.

Recap

● What is a Computer? ● Computer Organization and Architecture – Registers, Control Unit, ALU, Memory, I/O, ● ISA, Machine language ● Organization vs. Architecture

Coming up …

● Processor Performance ● Machine Models

Concept of Time and Speed

● Frequency: Number of occurrences of a repeating event per unit time. – SI unit: Hertz (Hz) ● The period is the duration of one cycle in a repeating event – Period = Cycle time 1 Cycle Time= Frequency

On Processor Performance

● How is frequency related to performance? Program ExecutionTime= Execution Time per Instruction×Total Program Instructions

CPU Time=Execution Time per Instruction×InstructionCount

Execution Time per Instruction= Cycles spent per Instruction×Cycle Time CPU Time= IC××CycleTime

ExampleExample

WhatWhat isis thethe executionexecution timetime ofof aa programprogram containingcontaining aa millionmillion InstructionsInstructions eacheach occupyingoccupying 44 cyclescycles inin aa 22 GHzGHz processor?processor? Iron Law of Processor Performance

CPU Time= IC ×Cycles per Instruction×CycleTime

1 Time per Cycle= Frequency

IC×CPI CPU Time= Frequency

Instructions Clock cycles Seconds CPU time= ∗ ∗ Program Instruction Clock cycle

On Processor Performance

Instructions Clock cycles Seconds CPU time= ∗ ∗ Program Instruction Clock cycle

ARCHITECTURE AND COMPILER ORGANIZATION

The GNU C Compiler

● $gcc hello.c

The compiler and its working: Guest lecture by Dr. Janakiraman, IBM, August 2 Operations and Operands

● C = A + B ● Operation: Addition. Operands: A & B. Result: C. ● Instruction: ADD C, A, B

Where do Operands come from and where do results go? Architectural decision

Memory – Toy Example

0x0000 ...... ● ... addressable ... ● Linearly increasing addresses 0x00FF 0x0100 ● Memory is 'growing down' 0x0101 0x0102 ● Any location can be read ...... from/written into...... ● ... How many can be stored ...... in this example memory? ...... 0xFFFE ...... 0xFFFF Recap

● Processor performance ● Abstract view of Memory

ExampleExample

YourYour desktopdesktop hashas aa 4GB4GB Memory.Memory. HowHow longlong (in(in bits)bits) isis itsits address?address?

Operations and Operands

...

i1 i2

R

O

S

S

E C

O Control

R ALU P

...... Memory ...

Machine Model – Stack

● STACK Stack is a form of memory 0xFF ● Top of the Stack (Stack Pointer) 0xFE ...... ● ... Push and Pop ...... TOS ...... 0x02 0x01 0x00

Stack

STACK MEMORY

0xFF ... PUSH 10 PUSH 12 0xFE ...... 255 0x07 POP 13 ...... PUSH 7 ...... 77... 0x10 ...... 44 0x12 ... 172 0x13 0x06 ...... 0x05 0x04 0x03 ... 0x02 71 TOS 0x01 94 TOS 0x02 0x00 10

Stack

STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ...... 255 0x07 POP 13 ...... PUSH 7 ...... 77... 0x10 ...... 44 0x12 172 0x13 0x06 ...... 0x05 ... 0x04 0x03 ...77 TOS 0x02 71 TOS 0x01 94 TOS 0x020x03 0x00 10

Stack

STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ...... 255 0x07 POP 13 ...... PUSH 7 ...... 77... 0x10 ...... 44 0x12 172 0x13 0x06 ... 0x05 0x04 44 0x03 77 TOS 0x02 71 0x01 94 TOS 0x030x04 0x00 10

Stack

STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ...... 255 0x07 POP 13 ...... PUSH 7 ...... 77... 0x10 ...... 44 0x12 17244 0x13 0x06 ... 0x05 0x04 44 TOS 0x03 77 TOS 0x02 71 0x01 94 TOS 0x040x03 0x00 10

Stack

STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ...... 255 0x07 POP 13 ...... PUSH 7 ...... 77... 0x10 ...... 44 0x12 44 0x13 0x06 ... 0x05 0x04 255 TOS 0x03 44 0x02 71 0x01 94 TOS 0x04 0x00 10

Machine Model – Stack

STACK STACK

R

O S

S ......

E ...

C O

R ... P ... TOS ...... ALU ......

...

Y

R O

M ...

E ... M ...... TOS

Where do Operands come from and where do results go? Machine Model – Stack STACK ●

R The operands are always TOS,

O S

S TOS – 1. E

C ...

O TOS

R ●

P Result always goes into TOS – 1. ● Implicit operands

● ALU Instruction: ADD ● Example equation: d=(a+b)*c

......

Postfix Expressions

a + b ab+

(a + b)*c

X*c Xc* where X = (a + b) postfix form of (a + b) is ab+

ab+c*

Postfix Expressions

a + (b*c) abc*+

(a + b)* (c - d)

X * (c – d) where X = (a + b)

X * Y XY* where Y = (c – d) replace Y with its postfix form

Xcd-* replace X with its postfix form

(a + b)* (c - d) ab+cd-* (((a + b)*c)+d)*e

((X*c)+d)*e where X = (a + b)

(Y+d)*e where Y = (X*c)

Z*e Ze* where Z = (Y+d) replace Z with its postfix form

Yd+e* replace Y with its postfix form

Xc*d+e* replace X with its postfix form

ab+c*d+e* Reverse Polish Notation ● A way of expressing arithmetic expressions that avoids the use of brackets. ● Evaluated left-to-right. Natural on a stack. ● Devised by the Polish philosopher and mathematician Jan Łukasiewicz (1878-1956)

Infix Notation RPN a+b ab+ (a+b)*c ab+c* a+(b*c) abc*+ (a+b) * (c-d) ab+cd-* (((a+b)*c)+d)*e ab+c*d+e*

RPN Example

Stack

Postfix Form: ab+ ......

a

RPN Example

Stack

Postfix Form: ab+ ......

b a

RPN Example

Stack

Postfix Form: ab+ ......

b Infix Form: a _ b a

RPN Example

Stack

Postfix Form: ab+ ......

b Infix Form: a + b a

RPN Example

Stack

Postfix Form: ab+ ......

Infix Form: a + b a + b

RPN Example

Stack

Postfix Form: ab+c* ......

a

RPN Example

Stack

Postfix Form: ab+c* ......

b a

RPN Example

Stack

Postfix Form: ab+c* ......

b Infix Form: a + b a

RPN Example

Stack

Postfix Form: ab+c* ......

c Infix Form: a + b

RPN Example

Stack

Postfix Form: ab+c* ......

Infix Form: (a + b) * c (a+b)*c

RPN Example

Stack

Postfix Form: ab+c* ......

Infix Form: (a + b) * c (a+b)*c

RPN Example

Stack

Postfix Form: ab*cde/-* ......

TOS a

RPN Example

Stack

Postfix Form: ab*cde/-* ...... Infix Form: ... (a*b)*(c-(d/e)) ......

(a*b)*(c-(d/e))

Machine Model – Stack STACK

R d = (a + b) * c

O S

S ...

E ●

C ... RPN: d = ab+c*

O ... R

P ...... Sequence of ... Instructions PUSH a PUSH b TOS ADD b ... PUSH c ...... a MULTIPLY ... POP d c ALU d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS a

b ...... a ... c ALU d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS b a

b ...... a ... c ALU d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS b a TOS

b b ...... a a ... b c c ALU d d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

a TOS TOS b b ...... a a ... a b c c

a+b d d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS a + b TOS b ...... a ... c

a+b d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS c a + b

b ...... a ... c

d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS (a+b)*c

b ...... a ... c

d Machine Model – Stack STACK ● d=(a+b)*c PUSH a

R PUSH b

O S

S ... ADD E

C ... PUSH c O ... R MULTIPLY P ...... POP d ......

TOS b ...... a ... c

(a+b)*c d Stack based Machines

● Burrough's B5000 (1960) ● Forth machine ● JVM, Intel x87 floating point unit.

Accumulator Based Machine Model

● ACCUMULATOR One operand is implicit – the accumulator. ● Another operand is brought in from the memory ● The result of an operation is

ALU always stored in the accumulator.

● x Instruction: ADD x ...... ● ... Example: d = (a + b) * c ...

Accumulator Based Machine Model

● d = (a + b) * c ACCUMULATOR LOAD a ADD b MULTIPLY c STORE d

ALU

......

Accumulator Based Machine Model

● d = (a + b) * c

LOAD a a ADD b MULTIPLY c STORE d Accumulator is the implicit destination for the load operation.

a ...... d LOAD: Transfer data from the ... memory into the processor b c

Accumulator Based Machine Model

● d = (a + b) * c

LOAD a a+ba ADD b MULTIPLY c STORE d a b

a ... d ... b c

Accumulator Based Machine Model

● d = (a + b) * c

LOAD a (a+b)*ca+b ADD b MULTIPLY c STORE d a+b c

a ... d ... b c

Accumulator Based Machine Model

● d = (a + b) * c

LOAD a (a+b)*c ADD b MULTIPLY c STORE d

Destination in memory: d Implicit source: Accumulator

a ... (a+b)*cd STORE: Transfer data from the ... processor into the memory. b c

Accumulator Based Machines

● IBM 701 (1952) ● PDP-8, PDP-12 ● Intel 4004, 8008, 8080, 8086 … ● Intel x86 processors still use primary accumulator EAX and secondary accumulator EDX for multiplication and division of large numbers (MUL ECX)

Register–Memory Machine Models

REGISTER FILE ● Small units of memory to hold data/instructions temporarily during R0 R1 ... execution ...... ● Each register identified by a ... number – R0, R1, …, R31 ...... ● All the registers make up a Register ...... File ...... R30 R31

Register–Memory Machine Models

REGISTER FILE

... R0 ● Register file supplies R1 ... one operand...... ● Memory supplies ... another...... ● Result is stored back ALU ... in the register file...... ● No implicit operands ... ● d = (a + b) * c ...... R30 ... R31

Register–Memory Machine Models

● ... d = (a + b) * c LOAD R1, a ADD R2, R1, b MULTIPLY R3, R2, c STORE R3, d

ALU

a ... d ... b c

Register–Memory Machine Models

● ... d = (a + b) * c LOAD R1, a ADD R2, R1, b a MULTIPLY R3, R2, c STORE R3, d

Source in Memory: a ALU Destination in Register File: R1

a ... d ... b c

Register–Memory Machine Models

● ... d = (a + b) * c LOAD R1, a a+b ADD R2, R1, b a MULTIPLY R3, R2, c STORE R3, d

a ... d ... b c

Register–Memory Machine Models

● ... d = (a + b) * c (a+b)*c LOAD R1, a a+b ADD R2, R1, b a MULTIPLY R3, R2, c STORE R3, d

a ... d ... b c

Register–Memory Machine Models

● ... d = (a + b) * c (a+b)*c LOAD R1, a a+b ADD R2, R1, b a MULTIPLY R3, R2, c STORE R3, d

Source in RF: R3 Destination in Memory: d

a ... d ... b c

Register – Model

● No implicit operands ... ● Both operands are supplied from the Register file. ● Memory is accessed only through Load and Store ALU instructions. ● d = (a + b) * c

......

Machine Models – Comparison

● Number of explicitly named operands ● Number of instructions that can access data from memory ● Code size ● Amount of data transferred between memory and processor ● Complexity of hardware ● Ease of compilation (ease of generation of machine code).

Machine Models – Memory Operands

Number of Max. No. of Type of Examples Memory operands architecture Addresses allowed 0 3 Load-store Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, TM32 1 2 Register – memory IBM 360/370, Intel x86, Motorola 68000, TI TMS320C54x 2 2 Memory – memory VAX 3 3 Memory – memory VAX

Machine Models – Memory Operands

Type Advantages Disadvantages Register-Register Simple, Fixed length Higher instruction count than (0, 3) encoding. Simple code architectures with memory generation model. references in instructions. More Instructions take instructions and lower instruction similar numbers of density lead to larger programs. clocks to execute. Register-Memory Data can be accessed Source operand is destroyed. (1,2) without a separate Encoding a register number and a load. Instruction format memory address in each easy to encode. Good instruction may restrict the number density. of registers. Clocks per instruction vary. Memory-Memory Most compact. Doesn't Large variations in instruction size, (2,2) or (3,3) waste registers for especially for three-operand temporaries. instructions. Large variation in work per instruction. Memory accesses create a bottleneck.

C = A + B

STACK ACCUMULATOR REGISTOR-MEMORY REGISTER-REGISTER

......

TOS ALU

ALU ALU ALU ......

Push A Load R1, A Load A Load R1, A Push B Load R2, B Add B Add R3, R1, B Add Add R3, R1, R2 Store C Store R3, C Pop C Store R3, C