<<

Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN). [Chapters 1, 2]

• Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] Week 2 • MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2]

• Central Unit (CPU) & Computer System Performance Measures. [Chapter 4] Week 3 • CPU Organization: & Design. [Chapter 5] Week 4 – MIPS Single Cycle Datapath & Control Unit Design. – MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Week 5 • Microprogrammed Control Unit Design. [Chapter 5] – Microprogramming Project Week 6 • Midterm Review and Midterm Exam Week 7 • CPU Pipelining. [Chapter 6] • The : Design & Performance. [Chapter 7] Week 8 • The Memory Hierarchy: Main & . [Chapter 7] Week 9 • Input/Output Organization & System Performance Evaluation. [Chapter 8]

Week 10 • Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Week 11 • Final Exam. EECC550 - Shaaban #1 Lec # 1 Winter 2005 11-29-2005 Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals

• Computing Element Choices: – Computing Element Programmability – Spatial vs. Temporal Computing – Main Processor Types/Applications • General Purpose Processor Generations • The Von Neumann Computer Model • CPU Organization (Design) • Recent Trends in Computer Design/performance • Hierarchy of • Hardware Description: Register Transfer Notation (RTN) • Computer Architecture Vs. Computer Organization • Instruction Set Architecture (ISA): – Definition and purpose – ISA Specification Requirements – Main General Types of Instructions – ISA Types and characteristics – Typical ISA Addressing Modes – Instruction Set Encoding – Instruction Set Architecture Tradeoffs – Complex Instruction Set Computer (CISC) – Reduced Instruction Set Computer (RISC) – Evolution of Instruction Set Architectures

(Chapters 1, 2) EECC550 - Shaaban #2 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices • General Purpose Processors (GPPs): Intended for general purpose computing (desktops, servers, clusters..) • Application-Specific Processors (ASPs): Processors with ISAs and architectural features tailored towards specific application domains – E.g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors, Graphics Processing Units (GPUs), Vector Processors??? ... • Co-Processors: A hardware (hardwired) implementation of specific algorithms with limited programming interface (augment GPPs or ASPs) • Configurable Hardware: – Field Programmable Gate Arrays (FPGAs) – Configurable array of simple processing elements • Application Specific Integrated Circuits (ASICs): A custom VLSI hardware solution for a specific computational task • The choice of one or more depends on a number of factors including: - Type and complexity of computational algorithm (general purpose vs. Specialized) - Desired level of flexibility/ - Performance requirements programmability - Development cost/time - System cost - Power requirements - Real-time constrains

The main goal of this course is the study of fundamental design EECC550 - Shaaban techniques for General Purpose Processors EECC550 - Shaaban #3 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices The main goal of this course is the study

General Purpose of fundamental design techniques Processors for General Purpose Processors (GPPs): Processor : Programmable computing element that Flexibility runs programs written using a pre-defined set of Application-Specific instructions Processors (ASPs)

Configurable Hardware

Programmability / Programmability Selection Factors: - Type and complexity of computational algorithms Co-Processors (general purpose vs. Specialized) Application Specific - Desired level of flexibility - Performance Integrated Circuits - Development cost - System cost (ASICs) - Power requirements - Real-time constrains

Specialization , Development cost/time Performance/Chip Area/Watt Performance (Computational Efficiency) EECC550 - Shaaban #4 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices: ComputingComputing ElementElement ProgrammabilityProgrammability

Fixed Function: Programmable:

• Computes one function (e.g. • Computes “any” FP-multiply, divider, DCT) computable function (e.g. • Function defined at Processors) fabrication time • Function defined after • e.g hardware (ASICs) fabrication

Parameterizable Hardware: Performs limited “set” of functions e.g. Co-Processors

Processor = Programmable computing element that runs programs written using pre-defined instructions EECC550 - Shaaban #5 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices: SpatialSpatial vs.vs. TemporalTemporal ComputingComputing Spatial Temporal

(using hardware) (using software/program running on a processor)

Processor Instructions

Processor = Programmable computing element that runs programs written using a pre-defined set of instructions EECC550 - Shaaban #6 Lec # 1 Winter 2005 11-29-2005 MainMain ProcessorProcessor Types/ApplicationsTypes/Applications The main goal of this course is the study of fundamental design techniques for General Purpose Processors • General Purpose Processors (GPPs) - high performance. – RISC or CISC: P4, IBM Power4, SPARC, PowerPC, MIPS ... – Used for general purpose software – Heavy weight OS - Windows, , Desktops (PC’s), Clusters

• Embedded processors and processor cores Increasing Cost/Complexity – e.g: Intel XScale, ARM, 486SX, SH7000, NEC V800... – Often require Digital signal processing (DSP) support or other application-specific support (e.g network, media processing) – Single program volume Increasing – Lightweight, often realtime OS or no OS – Examples: Cellular phones, consumer electronics .. (e.g. CD players) • – Extremely cost/power sensitive – Single program – Small word size - 8 bit common – Highest volume processors by far – Examples: Control systems, Automobiles, toasters, thermostats, ...

EECC550 - Shaaban Examples of Application-Specific Processors #7 Lec # 1 Winter 2005 11-29-2005 TheThe ProcessorProcessor DesignDesign SpaceSpace

Application specific architectures for performance Embedded Real-time constraints processors GPPs Specialized applications Low power/cost constraints Performance is everything & Software rules Performance The main goal of this course is the Microcontrollers study of fundamental design techniques for General Purpose Processors Cost is everything Chip Area, Power Processor Cost complexity

Processor = Programmable computing element EECC550 - Shaaban that runs programs written using a pre-defined set of instructions #8 Lec # 1 Winter 2005 11-29-2005 General Purpose Processor/Computer System Generations

Classified according to implementation technology: • The First Generation, 1946-59: Vacuum Tubes, Relays, Mercury Delay Lines: – ENIAC (Electronic Numerical Integrator and Computer): First electronic computer, 18000 vacuum tubes, 1500 relays, 5000 additions/sec (1944). – First stored program computer: EDSAC (Electronic Delay Storage Automatic ), 1949. • The Second Generation, 1959-64: Discrete . – e.g. IBM Main frames • The Third Generation, 1964-75: Small and Medium-Scale Integrated (MSI) Circuits. – e.g Main frames (IBM 360) , mini (DEC PDP-8, PDP-11). • The Fourth Generation, 1975-Present: The . VLSI-based Microprocessors (single-chip processor) – First : Intel’s 4-bit 4004 (2300 transistors), 1970. – (PCs), laptops, PDAs, servers, clusters … – Reduced Instruction Set Computer (RISC) 1984

Common factor among all generations: All target the The Von Neumann Computer Model or paradigm EECC550 - Shaaban #9 Lec # 1 Winter 2005 11-29-2005 TheThe VonVon--NeumannNeumann ComputerComputer ModelModel • Partitioning of the programmable computing engine into components: – (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic and logic unit, buses). – Memory: Instruction (program) and operand () storage. – Input/Output (I/O). – The stored program concept: Instructions from an instruction set are fetched from a common memory and executed one at a time.

Control Input Memory - (instructions, data) Datapath registers Output ALU, buses

Computer System CPU I/O Devices

Major CPU Performance Limitation: The Von Neumann computing model implies sequential execution one instruction at a time EECC550 - Shaaban #10 Lec # 1 Winter 2005 11-29-2005 GenericGeneric CPUCPU MachineMachine InstructionInstruction ProcessingProcessing StepsSteps (Implied by The Von Neumann Computer Model) Instruction Obtain instruction from program storage Fetch (memory)

Instruction Determine required actions and instruction size Decode

Operand Locate and obtain operand data Fetch

Execute Compute result value or status

Result Deposit results in storage for later use Store

Next Determine successor or next instruction Instruction

Major CPU Performance Limitation: The Von Neumann computing model EECC550 - Shaaban implies sequential execution one instruction at a time implies sequential execution one instruction at a time #11 Lec # 1 Winter 2005 11-29-2005 HardwareHardware ComponentsComponents ofof ComputerComputer SystemsSystems Five classic components of all computers: 1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output

Processor I/O

Computer Keyboard, Mouse, etc. Processor Memory Devices (active) (passive) Control Input Unit (where programs, data I/O Disk Datapath live when Output running) Display, Printer, etc. EECC550 - Shaaban #12 Lec # 1 Winter 2005 11-29-2005 CPUCPU OrganizationOrganization • Datapath Design: – Capabilities & performance characteristics of principal Functional Units (FUs): – (e.g., Registers, ALU, Shifters, Logic Units, ...) – Ways in which these components are interconnected (buses connections, multiplexors, etc.). – How information flows between components. • Control Unit Design: – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). EECC550 - Shaaban #13 Lec # 1 Winter 2005 11-29-2005 Control Branch Unit Control Data cache AA TypicalTypical MicroprocessorMicroprocessor Layout:Layout: TheThe IntelIntel PentiumPentium ClassicClassic 1993 - 1997 Integer data- Floating- 60MHz - 233 MHz Instruction path point cache datapath Datapath

First Level of Memory (Cache) EECC550 - Shaaban #14 Lec # 1 Winter 2005 11-29-2005 Control Unit AA TypicalTypical MicroprocessorMicroprocessor Layout:Layout: TheThe IntelIntel PentiumPentium ClassicClassic 1993 - 1997 60MHz - 233 MHz

Datapath

First Level of Memory (Cache) EECC550 - Shaaban #15 Lec # 1 Winter 2005 11-29-2005 ComputerComputer SystemSystem ComponentsComponents CPU Core 1 GHz - 3.8 GHz 4-way Superscaler All Non-blocking caches RISC or RISC-core (): Deep Instruction Pipelines L1 16-128K 1-2 way set associative (on chip), separate or unified Dynamic scheduling L1 L2 256K- 2M 4-32 way set associative (on chip) unified Multiple FP, integer FUs CPU L3 2-16M 8-32 way set associative (off or on chip) unified Dynamic branch prediction L2 Hardware speculation Examples: Alpha, AMD K7: EV6, 200-400 MHz L3 Intel PII, PIII: GTL+ 133 MHz SDRAM Caches Intel P4 800 MHz PC100/PC133 100-133MHZ Front Side Bus (FSB) 64-128 bits wide Off or On-chip 2-way inteleaved adapters ~ 900 MBYTES/SEC )64bit) Memory I/O Buses Current Standard Controller Example: PCI, 33-66MHz 32-64 bits wide Double Date 133-528 MBYTES/SEC Rate (DDR) SDRAM Memory Bus Controllers NICs PCI-X 133MHz 64 bit PC3200 1024 MBYTES/SEC 200 MHZ DDR 64-128 bits wide Memory 4-way interleaved Disks ~3.2 GBYTES/SEC Displays (one 64bit channel) Networks ~6.4 GBYTES/SEC Keyboards (two 64bit channels)

RAMbus DRAM (RDRAM) I/O Devices: North South 400MHZ DDR 16 bits wide (32 banks) Bridge Bridge I/O Subsystem ~ 1.6 GBYTES/SEC Chipset EECC550 - Shaaban #16 Lec # 1 Winter 2005 11-29-2005 PerformancePerformance IncreaseIncrease ofof WorkstationWorkstation--ClassClass MicroprocessorsMicroprocessors 19871987--19971997 1200 DEC /600 1100

1000

900

800

700 Integer SPEC92 Performance 600 Integer SPEC92 Performance

Performance 500 DEC Alpha 5/500 400

300 DEC Alpha 5/300 200 DEC Alpha 4/266 SUN-4/ MIPS MIPS IBM 100 IBM POWER 100 260 M/120 M2000 RS6000 DEC AXP/500 HP 9000/750 0 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 Year > 100x performance increase in one decade EECC550 - Shaaban #17 Lec # 1 Winter 2005 11-29-2005 MicroprocessorMicroprocessor TransistorTransistor CountCount GrowthGrowth RateRate 100000000 Currently > 1 Billion

10000000 Alpha 21264: 15 million Moore’s Law : 5.5 million PowerPC 620: 6.9 million i80486 1000000 : 9.3 million

i80386 Sparc Ultra: 5.2 million i80286 100000

i8086 Moore’sMoore’s Law:Law: 10000 i8080 2X transistors/Chip 2300 i4004 Every 1.5 years 1000 1970 1975 1980 1985 1990 1995 2000 (circa 1970) Year ~ 500,000x density increase in the last 35 years EECC550 - Shaaban #18 Lec # 1 Winter 2005 11-29-2005 IncreaseIncrease ofof CapacityCapacity ofof VLSIVLSI DynamicDynamic RAMRAM (DRAM)(DRAM) ChipsChips size year 1024 M bit = 1 G bit size(Megabit) 1000000000 1980 0.0625 100000000 1983 0.25 16 M bit 1986 1 10000000 1989 4 1 M bit 1000000 1992 16 256k bit 100000 1996 64 64k bit 1999 256 10000 2000 1024

1000 1970 1975 1980 1985 1990 1995 2000 1.55X/yr, Year or doubling every 1.6 years ~ 17,000x DRAM chip capacity increase in 20 years (Also follows Moore’s Law)

EECC550 - Shaaban #19 Lec # 1 Winter 2005 11-29-2005 ComputerComputer TechnologyTechnology Trends:Trends: EvolutionaryEvolutionary butbut RapidRapid ChangeChange • Processor: – 1.5-1.6 performance improvement every year; Over 100X performance in last decade. • Memory: – DRAM capacity: > 2x every 1.5 years; 1000X size in last decade. – Cost per bit: Improves about 25% or more per year. – Only 15-25% performance improvement per year. • Disk: – Capacity: > 2X in size every 1.5 years. – Cost per bit: Improves about 60% per year. – 200X size in last decade. – Only 10% performance improvement per year, due to mechanical limitations. • Expected State-of-the-art PC by end of year 2005 : – Processor clock speed: > 4000 MegaHertz (4 Giga ) – Memory capacity: > 4000 MegaByte (4 Giga ) – Disk capacity: > 500 GigaBytes (0.5 Tera Bytes)

EECC550 - Shaaban #20 Lec # 1 Winter 2005 11-29-2005 AA SimplifiedSimplified ViewView ofof TheThe Software/HardwareSoftware/Hardware HierarchicalHierarchical LayersLayers

ons s icati oftw pl ar Ap e s ems oftw st a y re S

Hardware

EECC550 - Shaaban #21 Lec # 1 Winter 2005 11-29-2005 HierarchyHierarchy ofof ComputerComputer ArchitectureArchitecture High-Level Language Programs

Assembly Language Software Application Programs Operating Machine Language System Program e.g. BIOS (Basic Input/Output System) Compiler Firmware Software/Hardware Instruction Set Architecture Boundary Instr. Set Proc. I/O system (ISA) The ISA forms an abstraction layer Datapath & Control that sets the requirements for both complier and CPU designers Digital Design Microprogram Hardware Circuit Design Layout

Register Transfer Logic Diagrams VLSI placement & routing Notation (RTN)

Circuit Diagrams EECC550 - Shaaban #22 Lec # 1 Winter 2005 11-29-2005 LevelsLevels ofof ProgramProgram RepresentationRepresentation temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; Compiler lw$15, 0($2) lw$16, 4($2) MIPS Program Assembly sw$16, 0($2) Code Assembler sw$15, 4($2)

0000 1001 1100 0110 1010 1111 0101 1000 Machine Language 1010 1111 0101 1000 0000 1001 1100 0110

Software Program 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Interpretation

Control Signal ALUOP[0:3] <= InstReg[9:11] & MASK Specification Hardware Register Transfer Notation (RTN) ° Microprogram ° EECC550 - Shaaban #23 Lec # 1 Winter 2005 11-29-2005 AA HierarchyHierarchy ofof ComputerComputer DesignDesign Level Name Modules Primitives Descriptive Media

1 Electronics Gates, FF’s Transistors, Resistors, etc. Circuit Diagrams

2 Logic Registers, ALU’s ... Gates, FF’s …. Logic Diagrams

3 Organization Processors, Memories Registers, ALU’s … Register Transfer Notation (RTN) Low Level - Hardware

4 Microprogramming Assembly Language Microinstructions Microprogram Firmware

5 Assembly language OS Routines Assembly language Assembly Language programming Instructions Programs

6 Procedural Applications OS Routines High-level Language Programming Drivers .. High-level Languages Programs

7 Application Systems Procedural Constructs Problem-Oriented Programs High Level - Software EECC550 - Shaaban #24 Lec # 1 Winter 2005 11-29-2005 HardwareHardware DescriptionDescription • Hardware visualization: – Block diagrams (spatial visualization): Two-dimensional representations of functional units and their interconnections. – Timing charts (temporal visualization): Waveforms where events are displayed vs. time. • Register Transfer Notation (RTN): – A way to describe microoperations capable of being performed by the data flow (data registers, data buses, functional units) at the register transfer level of design (RT). – Also describes conditional information in the system which cause operations to come about. – A “shorthand” notation for microoperations. • Hardware Description Languages: – Examples: VHDL: VHSIC (Very High Speed Integrated Circuits) Hardware Description Language, Verilog. EECC550 - Shaaban #25 Lec # 1 Winter 2005 11-29-2005 RegisterRegister TransferTransfer NotationNotation (RTN)(RTN) • Dependent RTN: When RTN is used after the data flow is assumed to be frozen. No data transfer can take place over a path that does not exist. No statement implies a function the data flow hardware is incapable of performing. • Independent RTN: Describe actions on registers without regard to nonexistence of direct paths or intermediate registers. No predefined data flow. i.e No datapath design yet • The general format of an RTN statement: Conditional information: Action1; Action2 • The conditional statement is often an AND of literals (status and control signals) in the system (a p-term). The p-term is said to imply the action. • Possible actions include transfer of data to/from registers/memory data shifting, functional unit operations etc. EECC550 - Shaaban #26 Lec # 1 Winter 2005 11-29-2005 RTNRTN StatementStatement ExamplesExamples A ← B or R[A] ← R[B] where R[X] mean the content of register X – A copy of the data in entity B (typically a register) is placed in Register A – If the destination register has fewer bits than the source, the destination accepts only the lowest-order bits. – If the destination has more bits than the source, the value of the source is sign extended to the left.

CTL • T0: A = B – The contents of B are presented to the input of combinational circuit A – This action to the right of “:” takes place when control signal CTL is active and signal T0 is active. EECC550 - Shaaban #27 Lec # 1 Winter 2005 11-29-2005 RTNRTN StatementStatement ExamplesExamples MD ← M[MA] or MD ← Mem[MA] – Means the memory data (MD) register receives the contents of the main memory (M or Mem) as addressed from the Memory Address (MA) register. AC(0), AC(1), AC(2), AC(3) – Register fields are indicated by parenthesis. – The concatenation operation is indicated by a comma. – Bit AC(0) is bit 0 of the AC – The above expression means AC bits 0, 1, 2, 3 – More commonly represented by AC(0-3) E • T3: CLRWRITE – The control signal CLRWRITE is activated when the condition E • T3 is active.

EECC550 - Shaaban #28 Lec # 1 Winter 2005 11-29-2005 ComputerComputer ArchitectureArchitecture Vs.Vs. ComputerComputer OrganizationOrganization • The term Computer architecture is sometimes erroneously restricted to computer instruction set design, with other aspects of computer design called implementation. The ISA forms an abstraction layer that sets the • More accurate definitions: requirements for both complier and CPU designers – Instruction Set Architecture (ISA): The actual programmer- visible instruction set and serves as the boundary or interface between the software and hardware. – Implementation of a machine has two components:

CPU Micro- • Organization: includes the high-level aspects of a computer’s architecture design such as: The memory system, the bus structure, the (CPU design) internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations. • Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology. Hardware design and implementation • In general, Computer Architecture refers to the above three aspects: 1- Instruction set architecture 2- Organization. 3- Hardware. EECC550 - Shaaban #29 Lec # 1 Winter 2005 11-29-2005 InstructionInstruction SetSet ArchitectureArchitecture (ISA)(ISA) “... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.” The ISA forms an abstraction layer that sets the – Amdahl, Blaaw, and Brooks, 1964. requirements for both complier and CPU designers The instruction set architecture is concerned with: • Organization of programmable storage (memory & registers): Includes the amount of addressable memory and number of available registers. • Data Types & Data Structures: Encodings & representations. • Instruction Set: What operations are specified. • Instruction formats and encoding. • Modes of addressing and accessing data items and instructions • Exceptional conditions. EECC550 - Shaaban #30 Lec # 1 Winter 2005 11-29-2005 ComputerComputer InstructionInstruction SetsSets • Regardless of computer type, CPU structure, or hardware organization, every machine instruction must specify the following: – : Which operation to perform. Example: add, load, and branch. Opcode = Operation Code – Where to find the operand or operands, if any: Operands may be contained in CPU registers, main memory, or I/O ports. – Where to put the result, if there is a result: May be explicitly mentioned or implicit in the opcode. – Where to find the next instruction: Without any explicit branches, the instruction to execute is the next instruction in the sequence or a specified address in case of jump or branch instructions. EECC550 - Shaaban #31 Lec # 1 Winter 2005 11-29-2005 InstructionInstruction SetSet ArchitectureArchitecture (ISA)(ISA) Instruction SpecificationSpecification RequirementsRequirements Fetch • Instruction Format or Encoding: Instruction – How is it decoded? Decode • Location of operands and result (addressing modes): Operand – Where other than memory? Fetch – How many explicit operands? – How are memory operands located? Execute – Which can or cannot be in memory? • Data type and Size. Result • Operations Store – What are supported • Successor instruction: Next – Jumps, conditions, branches. Instruction • Fetch-decode-execute is implicit. EECC550 - Shaaban #32 Lec # 1 Winter 2005 11-29-2005 MainMain GeneralGeneral TypesTypes ofof InstructionsInstructions • Data Movement Instructions, possible variations: – Memory-to-memory. – Memory-to-CPU register. – CPU-to-memory. – Constant-to-CPU register. – CPU-to-output. – etc.

(ALU) Instructions.

• Branch (Control) Instructions: – Unconditional jumps. – Conditional branches. EECC550 - Shaaban #33 Lec # 1 Winter 2005 11-29-2005 ExamplesExamples ofof DataData MovementMovement InstructionsInstructions

Instruction Meaning Machine

MOV A,B Move 16-bit data from memory loc. A to loc. B VAX11

lwz R3,A Move 32-bit data from memory loc. A to register R3 PPC601

li $3,455 Load the 32-bit integer 455 into register $3 MIPS R3000

MOV AX,BX Move 16-bit data from register BX into register AX Intel X86

LEA.L (A0),A2 Load the address pointed to by A0 into A2 MC68000

EECC550 - Shaaban #34 Lec # 1 Winter 2005 11-29-2005 ExamplesExamples ofof ALUALU InstructionsInstructions

Instruction Meaning Machine

MULF A,B, Multiply the 32-bit floating point values at mem. VAX11 locations A and B, and store result in loc. C nabs r3,r1 Store the negative absolute value of register r1 in r2 PPC601 ori $2,$1,255 Store the logical OR of register $1 with 255 into $2 MIPS R3000

SHL AX,4 Shift the 16-bit value in register AX left by 4 bits Intel X86

ADD.L D0,D1 Add the 32-bit values in registers D0, D1 and store MC68000 the result in register D0

EECC550 - Shaaban #35 Lec # 1 Winter 2005 11-29-2005 ExamplesExamples ofof BranchBranch InstructionsInstructions

Instruction Meaning Machine

BLBS A, Tgt Branch to address Tgt if the least significant bit VAX11 at location A is set. bun r2 Branch to location in r2 if the previous comparison PPC601 signaled that one or more values was not a number.

Beq $2,$1,32 Branch to location PC+4+32 if contents of $1 and $2 MIPS R3000 are equal.

JCXZ Addr Jump to Addr if contents of register CX = 0. Intel X86

BVS next Branch to next if in CC is set. MC68000

EECC550 - Shaaban #36 Lec # 1 Winter 2005 11-29-2005 OperationOperation TypesTypes inin TheThe InstructionInstruction SetSet Operator Type Examples

Arithmetic and logical Integer arithmetic and logical operations: add, or Data transfer Loads-stores (move on machines with memory addressing) Control Branch, jump, procedure call, and return, traps. System call/return, virtual memory management instructions ... Floating point Floating point operations: add, multiply .... Decimal Decimal add, decimal multiply, decimal to character conversion String String move, string compare, string search Media The same operation performed on multiple data (e.g Intel MMX, SSE) EECC550 - Shaaban #37 Lec # 1 Winter 2005 11-29-2005 InstructionInstruction UsageUsage Example:Example: TopTop 1010 IntelIntel X86X86 InstructionsInstructions Rankinstruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Observation: Simple instructions dominate instruction usage frequency.

CISC to RISC observation EECC550 - Shaaban #38 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures AccordingAccording ToTo OperandOperand AddressingAddressing FieldsFields Memory-To-Memory Machines: – Operands obtained from memory and results stored back in memory by any instruction that requires operands. – No local CPU registers are used in the CPU datapath. – Include: • The 4 Address Machine. • The 3-address Machine. • The 2-address Machine. The 1-address (Accumulator) Machine: – A single local CPU special-purpose register (accumulator) is used as the source of one operand and as the result destination. The 0-address or : – A push-down stack is used in the CPU. General Purpose Register (GPR) Machines: – The CPU datapath contains several local general-purpose registers which can be used as operand sources and as result destinations. – A large number of possible addressing modes. – Load-Store or Register-To-Register Machines: GPR machines where only data movement instructions (loads, stores) can obtain operands from memory and store results to memory.

CISC to RISC observation (load-store simplifies CPU design) EECC550 - Shaaban #39 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures MemoryMemory--ToTo--MemoryMemory Machines:Machines: TheThe 44--AddressAddress MachineMachine • No program (PC) or other CPU registers are used. • Instruction encoding has four address fields to specify: – Location of first operand. - Location of second operand. – Place to store the result. - Location of next instruction.

Memory Instruction: CPU add Res, Op1, Op2, Nexti Op1Addr: Op1 Meaning: Op2Addr: Op2 + Res ← Op1 + Op2

ResAddr: Res or more precise RTN: M[ResAddr] ← M[Op1Addr] + M[Op2Addr] : : Instruction Format (encoding) Bits: 8 24 24 24 24 NextiAddr: Nexti add ResAddr Op1Addr Op2Addr NextiAddr Instruction Opcode Where to find Size: Where to Where to find operands 13 bytes Which put result next instruction operation EECC550 - Shaaban Can address 224 bytes = 16 MBytes #40 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures MemoryMemory--ToTo--MemoryMemory Machines:Machines: TheThe 33--AddressAddress MachineMachine • A (PC) is included within the CPU which points to the next instruction. • No CPU storage (general-purpose registers). Memory CPU Instruction: add Res, Op1, Op2 Op1Addr: Op1 Op2Addr: Op2 Meaning: + Instruction Res ← Op1 + Op2 Size: 10 bytes ResAddr: Res or more precise RTN: : M[ResAddr] ← M[Op1Addr] + M[Op2Addr] : PC ← PC + 10 Increment PC Where to find next instruction Instruction Format (encoding) Bits: 8 24 24 24 NextiAddr: Nexti Program 24 Counter (PC) add ResAddr Op1Addr Op2Addr Opcode Where to Where to find operands Which put result operation Can address 224 bytes = 16 MBytes EECC550 - Shaaban #41 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures MemoryMemory--ToTo--MemoryMemory Machines:Machines: TheThe 22--AddressAddress MachineMachine • The 2-address Machine: Result is stored in the memory address of one of the operands. Instruction: Memory CPU add Op2, Op1 Meaning: Op1Addr: Op1 Op2 ← Op1 + Op2 + or more precise RTN:

Op2Addr: Op2,Res M[Op2Addr] ← M[Op1Addr] + M[Op2Addr] PC ← PC + 7 Increment PC : : Instruction Format (encoding) Where to find next instruction Bits: 8 24 24 add Op2Addr Op1Addr NextiAddr: Nexti Program 24 Counter (PC) Opcode Which Where to find operands operation Instruction Size: Where to 7 bytes put result EECC550 - Shaaban #42 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures TheThe 11--addressaddress (Accumulator)(Accumulator) MachineMachine • A single accumulator in the CPU is used as the source of one operand and result destination. Instruction: Memory CPU add Op1 Op1Addr: Op1 Meaning: Acc ← Acc + Op1 + Where to find operand2, and or more precise RTN: where to put result Acc ← Acc + M[Op1Addr] Accumulator : PC ← PC + 4 : Increment PC Where to find Instruction Format (encoding) next instruction Bits: 8 24 NextiAddr: Nexti Program 24 Counter (PC) add Op1Addr Instruction Size: Opcode Where to find 4 bytes Which operand1 operation

EECC550 - Shaaban #43 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures TheThe 00--addressaddress (Stack)(Stack) MachineMachine

• A push-down stack is used in the CPU.

4 Bytes Instruction Format CPU Memory Bits: 8 24 Stack Instruction: push Op1Addr push push Op1 Opcode Where to find Op1Addr: Op1 pop add Meaning: operand Op2Addr: Op2 TOS Op2, Res TOS ← M[Op1Addr] + Op1 Instruction: 1 Instruction Format ResAddr: SOS Res add Bits: 8 etc. Meaning: add : TOS ← TOS + SOS : Opcode 8 4 Bytes Instruction Format Bits: 8 24 NextiAddr: Nexti Program 24 Counter (PC) Instruction: pop ResAddr pop Res Opcode Memory Meaning: Destination M[ResAddr] ← TOS TOS = Top Entry in Stack SOS = Second Entry in Stack EECC550 - Shaaban #44 Lec # 1 Winter 2005 11-29-2005 TypesTypes ofof InstructionInstruction SetSet ArchitecturesArchitectures GeneralGeneral PurposePurpose RegisterRegister (GPR)(GPR) MachinesMachines • CPU contains several general-purpose registers which can be used as operand sources and result destination. Instruction Format Instruction: Bits: 8 3 24 CPU load R8, Op1 Memory Meaning: load R8 Op1Addr Registers R8 ← M[Op1Addr] Opcode Where to find load PC ← PC + 5 operand1 Op1Addr: Op1 R8 Size = 4.375 bytes rounded up to 5 bytes R7 Instruction: add R6 Instruction Format add R2, R4, R6 + R5 Meaning: Bits: 8 3 3 3 R4 R2 ← R4 + R6 add R2 R4 R6 R3 PC ← PC + 3 Opcode Des Operands : R2 store Size = 2.125 bytes rounded up to 3 bytes : R1

Instruction: Instruction Format NextiAddr: Nexti Program 24 store R2, Op2 Bits: 8 3 24 Counter (PC) Meaning: store R2 ResAddr M[Op2Addr] ← R2 Opcode Destination PC ← PC + 5 Size = 4.375 bytes rounded up to 5 bytes Here add instruction has three register specifier fields While load, store instructions have one register specifier field EECC550 - Shaaban and one memory address specifier field #45 Lec # 1 Winter 2005 11-29-2005 ExpressionExpression EvaluationEvaluation ExampleExample withwith 33--,, 22--,, 11--,, 00--Address,Address, AndAnd GPRGPR MachinesMachines For the expression A = (B + C) * - E where A-E are in memory

1-Address 0-Address GPR 3-Address 2-Address Accumulator -Memory Load-Store add A, B, C load A, B load B push B load R1, B load R1, B mul A, A, D add A, C add C push C add R1, C load R2, C sub A, A, E mul A, D mul D add mul R1, D add R3, R1, R2 sub A, E sub E push D sub R1, E load R1, D store A mul store A, R1 mul R3, R3, R1 push E load R1, E sub sub R3, R3, R1 pop A store A, R3

3 instructions 4 instructions 5 instructions 8 instructions 5 instructions 8 instructions Code size: Code size: Code size: Code size: Code size: Code size: 30 bytes 28 bytes 20 bytes 23 bytes 25 bytes 34 bytes 9 memory 12 memory 5 memory 5 memory 5 memory 5 memory accesses for accesses for accesses for accesses for accesses for accesses for data data data data data data EECC550 - Shaaban #46 Lec # 1 Winter 2005 11-29-2005 TypicalTypical ISAISA AddressingAddressing ModesModes Addressing Sample Mode Instruction Meaning Register Add R4, R3 R4 ← R4 + R3 Immediate Add R4, #3 R4 ← R4 + 3 Displacement Add R4, 10 (R1) R4 ← R4 + Mem[10+ R1] Indirect Add R4, (R1) R4 ← R4 + Mem[R1] Indexed Add R3, (R1 + R2) R3 ← R3 +Mem[R1 + R2] Absolute Add R1, (1001) R1 ← R1 + Mem[1001] Memory indirect Add R1, @ (R3) R1 ← R1 + Mem[Mem[R3]] Autoincrement Add R1, (R2) + R1 ← R1 + Mem[R2] R2 ← R2 + d Autodecrement Add R1, - (R2) R2 ← R2 - d R1 ← R1 + Mem[R2] Scaled Add R1, 100 (R2) [R3] R1 ← R1+ Mem[100+ R2 + R3*d]

For GPR ISAs EECC550 - Shaaban #47 Lec # 1 Winter 2005 11-29-2005 AddressingAddressing ModesModes UsageUsage ExampleExample For 3 programs running on VAX ignoring direct register mode: Displacement 42% avg, 32% to 55% 75% Immediate: 33% avg, 17% to 43% 88%

Register deferred (indirect): 13% avg, 3% to 24%

Scaled: 7% avg, 0% to 16%

Memory indirect: 3% avg, 1% to 6%

Misc: 2% avg, 0% to 3%

75% displacement & immediate 88% displacement, immediate & register indirect. Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important. CISC to RISC observation (fewer addressing modes simplify CPU design) EECC550 - Shaaban #48 Lec # 1 Winter 2005 11-29-2005 DisplacementDisplacement AddressAddress SizeSize ExampleExample

Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs

Int. Avg. FP Avg.

30%

20%

10%

0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Displacement Address Bits Needed

1% of addresses > 16-bits 12 - 16 bits of displacement needed

CISC to RISC observation EECC550 - Shaaban #49 Lec # 1 Winter 2005 11-29-2005 InstructionInstruction SetSet EncodingEncoding Considerations affecting instruction set encoding: – To have as many registers and addressing modes as possible. – The Impact of of the size of the register and fields on the average instruction size and on the average program. – To encode instructions into lengths that will be easy to handle in the implementation. On a minimum to be a multiple of bytes. • Fixed length encoding: Faster and easiest to implement in hardware. e.g. Simplifies design of pipelined CPUs • Variable length encoding: Produces smaller instructions. • Hybrid encoding.

CISC to RISC observation EECC550 - Shaaban #50 Lec # 1 Winter 2005 11-29-2005 ThreeThree ExamplesExamples ofof InstructionInstruction SetSet EncodingEncoding

Operations & Address Address no of operands Address Address specifier 1 field 1 specifier n field n

Variable Length Encoding: VAX (1-53 bytes)

Operation Address Address Address field 1 field 2 field3

Fixed Length Encoding: MIPS, PowerPC, SPARC (all instructions are 4 bytes each)

Operation Address Address Specifier field

Address Address Operation Address field Specifier 1 Specifier 2

Address Address Operation Specifier Address field 1 field 2

Hybrid Encoding: IBM 360/370, Intel 80x86 EECC550 - Shaaban #51 Lec # 1 Winter 2005 11-29-2005 InstructionInstruction SetSet ArchitectureArchitecture TradeoffsTradeoffs • 3-address machine: shortest code sequence; a large number of bits per instruction; large number of memory accesses. • 0-address (stack) machine: Longest code sequence; shortest individual instructions; more complex to program. • General purpose (GPR): Machine = CPU or ISA – Addressing modified by specifying among a small set of registers with using a short register address (all new ISAs since 1975). – Advantages of GPR: • Low number of memory accesses. Faster, since register access is currently still much faster than memory access. • Registers are easier for compilers to use. • Shorter, simpler instructions. • Load-Store Machines: GPR machines where memory addresses are only included in data movement instructions (loads/stores) between memory and registers (all new ISAs designed after 1980).

CISC to RISC observation (load-store simplifies CPU design) EECC550 - Shaaban #52 Lec # 1 Winter 2005 11-29-2005 ISAISA ExamplesExamples Machine Number of General Architecture year Purpose Registers EDSAC 1 accumulator 1949 IBM 701 1 accumulator 1953 CDC 6600 8 load-store 1963 IBM 360 16 register-memory 1964 DEC PDP-8 1 accumulator 1965 DEC PDP-11 8 register-memory 1970 Intel 8008 1 accumulator 1972 6800 1 accumulator 1974 DEC VAX 16 register-memory 1977 memory-memory 1 extended accumulator 1978 16 register-memory 1980 Intel 80386 8 register-memory 1985 MIPS 32 load-store 1985 HP PA-RISC 32 load-store 1986 SPARC 32 load-store 1987 PowerPC 32 load-store 1992 DEC Alpha 32 load-store 1992 HP/Intel IA-64 128 load-store 2001 AMD64 (EMT64) 16 register-memory 2003 EECC550 - Shaaban #53 Lec # 1 Winter 2005 11-29-2005 ExamplesExamples ofof GPRGPR MachinesMachines (ISAs) For Arithmetic/Logic (ALU) Instructions

Max. number of Max. number memory addresses of operands allowed

SPARC, MIPS 03PowerPC, ALPHA

Intel 80386 12 Motorola 68000

2 or 3 2 or 3 VAX

EECC550 - Shaaban #54 Lec # 1 Winter 2005 11-29-2005 ComplexComplex InstructionInstruction SetSet ComputerComputer (CISC)(CISC) • Emphasizes doing more with each instruction. ISAs • Motivated by the high cost of memory and hard disk capacity when original CISC architectures were proposed: – When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000 – When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000 Circa 1980 • Original CISC architectures evolved with faster, more complex CPU designs, but backward instruction set compatibility had to be maintained. • Wide variety of addressing modes: • 14 in MC68000, 25 in MC68020 • A number instruction modes for the location and number of operands: • The VAX has 0- through 3-address instructions. • Variable-length or hybrid instruction encoding is used. EECC550 - Shaaban #55 Lec # 1 Winter 2005 11-29-2005 ExampleExample CISCCISC ISAsISAs MotorolaMotorola 680X0680X0 18 addressing modes: • Data register direct. Operand size: • Address register direct. • Immediate. • Range from 1 to 32 bits, 1, 2, 4, 8, • Absolute short. 10, or 16 bytes. • Absolute long. • Address register indirect. • Address register indirect with postincrement. Instruction Encoding: • Address register indirect with predecrement. • Address register indirect with displacement. • Instructions are stored in 16-bit • Address register indirect with index (8-bit). words. • Address register indirect with index (base). • the smallest instruction is 2- bytes • Memory inderect postindexed. (one word). • Memory indirect preindexed. • The longest instruction is 5 words • Program counter indirect with index (8-bit). (10 bytes) in length. • Program counter indirect with index (base). • Program counter indirect with displacement. • Program counter memory indirect postindexed. • Program counter memory indirect preindexed. EECC550 - Shaaban #56 Lec # 1 Winter 2005 11-29-2005 ExampleExample CISCCISC ISA:ISA: IntelIntel 80386

12 addressing modes: Operand sizes: • Register. • Can be 8, 16, 32, 48, 64, or 80 bits long. • Immediate. • Also supports string operations. • Direct. • Base. • Base + Displacement. Instruction Encoding: • Index + Displacement. • The smallest instruction is one byte. • Scaled Index + Displacement. • The longest instruction is 12 bytes long. • Based Index. • Based Scaled Index. • The first bytes generally contain the opcode, mode specifiers, and register fields. • Based Index + Displacement. • Based Scaled Index + Displacement. • The remainder bytes are for address • Relative. displacement and immediate data.

EECC550 - Shaaban #57 Lec # 1 Winter 2005 11-29-2005 ReducedReduced InstructionInstruction SetSet ComputerComputer (RISC)(RISC) ~1984 ISAs • Focuses on reducing the number and complexity of instructions of the machine. Machine = CPU or ISA • Reduced number of cycles needed per instruction. – Goal: At least one instruction completed per clock cycle. • Designed with CPU in mind. • Fixed-length instruction encoding. • Only load and store instructions access memory. • Simplified addressing modes. – Usually limited to immediate, register indirect, register displacement, indexed. • Delayed loads and branches. • Prefetch and . • Examples: MIPS, HP PA-RISC, SPARC, Alpha, PowerPC. EECC550 - Shaaban #58 Lec # 1 Winter 2005 11-29-2005 ExampleExample RISCRISC ISA:ISA: PowerPCPowerPC

8 addressing modes: Operand sizes: • Register direct. • Four operand sizes: 1, 2, 4 or 8 bytes. • Immediate. • Register indirect. • Register indirect with immediate Instruction Encoding: index (loads and stores). • Register indirect with register index • Instruction set has 15 different formats (loads and stores). with many minor variations. • • Absolute (jumps). • All are 32 bits in length. • Link register indirect (calls). • Count register indirect (branches).

EECC550 - Shaaban #59 Lec # 1 Winter 2005 11-29-2005 ExampleExample RISCRISC ISA:ISA: HPHP PrecisionPrecision ArchitectureArchitecture HPHP PAPA--RISCRISC 7 addressing modes: Operand sizes: • Register • Five operand sizes ranging in powers of • Immediate two from 1 to 16 bytes. • Base with displacement • Base with scaled index and displacement Instruction Encoding: • Predecrement • Instruction set has 12 different formats. • Postincrement • • All are 32 bits in length. • PC-relative

EECC550 - Shaaban #60 Lec # 1 Winter 2005 11-29-2005 ExampleExample RISCRISC ISA:ISA: SPARCSPARC

5 addressing modes: Operand sizes: • Register indirect with immediate • Four operand sizes: 1, 2, 4 or 8 bytes. displacement. • Register inderect indexed by another register. Instruction Encoding: • Register direct. • Instruction set has 3 basic instruction • Immediate. formats with 3 minor variations. • PC relative. • All are 32 bits in length.

EECC550 - Shaaban #61 Lec # 1 Winter 2005 11-29-2005 ExampleExample RISCRISC ISA:ISA: DECDEC AlphaAlpha AXPAXP

4 addressing modes: Operand sizes: • Register direct. • Four operand sizes: 1, 2, 4 or 8 bytes. • Immediate. • Register indirect with displacement. • PC-relative. Instruction Encoding: • Instruction set has 7 different formats. • • All are 32 bits in length.

EECC550 - Shaaban #62 Lec # 1 Winter 2005 11-29-2005 RISCRISC ISAISA Example:Example: MIPSMIPS R3000R3000 (32(32--bit)bit) Instruction Categories: 5 Addressing Modes: Registers

• Load/Store. • Register direct (arithmetic). • Computational. • Immedate (arithmetic). R0 - R31 • Jump and Branch. • Base register + immediate offset • Floating Point (loads and stores). (using ). • PC relative (branches). • Memory Management. • Pseudodirect (jumps) • Special. PC Operand Sizes: HI • Memory accesses in any LO multiple between 1 and 4 bytes. Instruction Encoding: 3 Instruction Formats, all 32 bits wide. OP rs rt rd sa funct

OP rs rt immediate

OP jump target

MIPS is the target ISA for CPU design in this course EECC550 - Shaaban #63 Lec # 1 Winter 2005 11-29-2005 EvolutionEvolution ofof InstructionInstruction SetSet ArchitecturesArchitectures Single Accumulator (EDSAC 1949) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation

High-level Language Based Concept of an ISA Family (B5000 1963) (IBM 360 1964)

General Purpose Register (GPR) Machines

Load/Store Architecture Complex Instruction Sets (CISC) (Vax, Motorola 68000, Intel x86 1977-80) (CDC 6600, 1 1963-76)

Reduced Instruction Set Computer (RISC)( (MIPS, SPARC, HP-PA, PowerPC, . . . 1984..) EECC550 - Shaaban #64 Lec # 1 Winter 2005 11-29-2005