Clock Rate Improves Roughly Proportional to Improvement in L • Number of Transistors Improves Proportional to L2 (Or Faster)

TheThe VonVon NeumannNeumann ComputerComputer ModelModel • Partitioning of the computing engine into components: – Central Processing Unit (CPU): Control Unit (instruction decode , sequencing of operations), Datapath (registers, arithmetic and logic unit, buses). – Memory: Instruction and operand storage. – Input/Output (I/O) sub-system: I/O bus, interfaces, devices. – The stored program concept: Instructions from an instruction set are fetched from a common memory and executed one at a time Control Input Memory - (instructions, data) Datapath registers Output ALU, buses Computer System CPU I/O Devices EECC551 - Shaaban #1 Lec # 1 Winter 2001 12-3-2001 Generic CPU Machine Instruction Execution Steps Instruction Obtain instruction from program storage Fetch Instruction Determine required actions and instruction size Decode Operand Locate and obtain operand data Fetch Execute Compute result value or status Result Deposit results in storage for later use Store Next Determine successor or next instruction Instruction EECC551 - Shaaban #2 Lec # 1 Winter 2001 12-3-2001 HardwareHardware ComponentsComponents ofof AnyAny ComputerComputer Five classic components of all computers: 1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output } Processor Computer Keyboard, Mouse, etc. Processor Memory Devices (active) (passive) Control Input (where Unit programs, data Disk Datapath live when Output running) Display, Printer, etc. EECC551 - Shaaban #3 Lec # 1 Winter 2001 12-3-2001 CPUCPU OrganizationOrganization • Datapath Design: – Capabilities & performance characteristics of principal Functional Units (FUs): • (e.g., Registers, ALU, Shifters, Logic Units, ...) – Ways in which these components are interconnected (buses connections, multiplexors, etc.). – How information flows between components. • Control Unit Design: – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). EECC551 - Shaaban #4 Lec # 1 Winter 2001 12-3-2001 RecentRecent TrendsTrends inin ComputerComputer DesignDesign • The cost/performance ratio of computing systems have seen a steady decline due to advances in: – Integrated circuit technology: decreasing feature size, l • Clock rate improves roughly proportional to improvement in l • Number of transistors improves proportional to l2 (or faster). – Architectural improvements in CPU design. • Microprocessor systems directly reflect IC improvement in terms of a yearly 35 to 55% improvement in performance. • Assembly language has been mostly eliminated and replaced by other alternatives such as C or C++ • Standard operating Systems (UNIX, NT) lowered the cost of introducing new architectures. • Emergence of RISC architectures and RISC-core architectures. • Adoption of quantitative approaches to computer design based on empirical performance observations. EECC551 - Shaaban #5 Lec # 1 Winter 2001 12-3-2001 19881988 ComputerComputer FoodFood ChainChain Mainframe Work- PC Mini- Supercomputer Mini- station supercomputer computer Massively Parallel Processors EECC551 - Shaaban #6 Lec # 1 Winter 2001 12-3-2001 Mini- Mini- supercomputer computer Massively Parallel Processors 19971997 ComputerComputer FoodFood ChainChain Mainframe Work- PC PDA Server station Supercomputer EECC551 - Shaaban #7 Lec # 1 Winter 2001 12-3-2001 ProcessorProcessor PerformancePerformance TrendsTrends Mass-produced microprocessors a cost-effective high-performance replacement for custom-designed mainframe/minicomputer CPUs 1000 Supercomputers 100 Mainframes 10 Minicomputers 1 Microprocessors 0.1 1965 1970 1975 1980 1985 1990 1995 2000 Year EECC551 - Shaaban #8 Lec # 1 Winter 2001 12-3-2001 MicroprocessorMicroprocessor PerformancePerformance 1987-971987-97 120 0 DEC Alpha 21264/600 100 0 Integer SPEC92 Performance 800 600 DEC Alpha 5/500 400 DEC DEC Alpha 5/300 HP IBM AXP/ 200 Sun MIPS MIPS 9000/ RS/ 500 DEC Alpha 4/266 -4/ M M/ 750 6000 IBM POWER 100 0 260 2000 120 87 88 89 90 91 92 93 94 95 96 97 EECC551 - Shaaban #9 Lec # 1 Winter 2001 12-3-2001 MicroprocessorMicroprocessor FrequencyFrequency TrendTrend 10,000 100 Intel Processor freq IBM Power PC scales by 2X per DEC generation Gate delays/clock 21264S 1,000 21164A 21264 21064A Pentium(R) 21164 10 Mhz II 21066 MPC750 604 Pentium604+ Pro 100 601, 603 (R) Gate Delays/ Clock Pentium(R) 486 386 10 1 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 Ê Frequency doubles each generation Ë Number of gates/clock reduce by 25% EECC551 - Shaaban #10 Lec # 1 Winter 2001 12-3-2001 MicroprocessorMicroprocessor TransistorTransistor CountCount GrowthGrowth RateRate 100000000 10000000 Alpha 21264: 15 million Moore’s Law Pentium Pentium Pro: 5.5 million PowerPC 620: 6.9 million i80486 1000000 Alpha 21164: 9.3 million i80386 Sparc Ultra: 5.2 million i80286 100000 i8086 Moore’s Law: 10000 i8080 2X transistors/Chip i4004 Every 1.5 years 1000 1970 1975 1980 1985 1990 1995 2000 Year EECC551 - Shaaban #11 Lec # 1 Winter 2001 12-3-2001 Increase of Capacity of VLSI Dynamic RAM Chips size year size(Megabit) 1000000000 1980 0.0625 100000000 1983 0.25 1986 1 10000000 1989 4 1000000 1992 16 100000 1996 64 1999 256 10000 2000 1024 1000 1970 1975 1980 1985 1990 1995 2000 1.55X/yr, Year or doubling every 1.6 years EECC551 - Shaaban #12 Lec # 1 Winter 2001 12-3-2001 DRAMDRAM CostCost OverOver TimeTime Current second half 1999 cost: ~ $1 per MB EECC551 - Shaaban #13 Lec # 1 Winter 2001 12-3-2001 RecentRecent TechnologyTechnology TrendsTrends (Summary)(Summary) Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years EECC551 - Shaaban #14 Lec # 1 Winter 2001 12-3-2001 Computer Technology Trends: Evolutionary but Rapid Change • Processor: – 2X in speed every 1.5 years; 1000X performance in last decade. • Memory: – DRAM capacity: > 2x every 1.5 years; 1000X size in last decade. – Cost per bit: Improves about 25% per year. • Disk: – Capacity: > 2X in size every 1.5 years. – Cost per bit: Improves about 60% per year. – 200X size in last decade. – Only 10% performance improvement per year, due to mechanical limitations. • Expected State-of-the-art PC by end of year 2001 : – Processor clock speed: > 2500 MegaHertz (2.5 GigaHertz) – Memory capacity: > 1000 MegaByte (1 GigaBytes) – Disk capacity: > 100 GigaBytes (0.1 TeraBytes) EECC551 - Shaaban #15 Lec # 1 Winter 2001 12-3-2001 Distribution of Cost in a System: An Example Decreasing fraction of total cost Increasing fraction of total cost EECC551 - Shaaban #16 Lec # 1 Winter 2001 12-3-2001 AA SimplifiedSimplified ViewView ofof TheThe Software/HardwareSoftware/Hardware HierarchicalHierarchical LayersLayers EECC551 - Shaaban #17 Lec # 1 Winter 2001 12-3-2001 AA HierarchyHierarchy ofof ComputerComputer DesignDesign Level Name Modules Primitives Descriptive Media 1 Electronics Gates, FF’s Transistors, Resistors, etc. Circuit Diagrams 2 Logic Registers, ALU’s ... Gates, FF’s …. Logic Diagrams 3 Organization Processors, Memories Registers, ALU’s … Register Transfer Notation (RTN) Low Level - Hardware 4 Microprogramming Assembly Language Microinstructions Microprogram Firmware 5 Assembly language OS Routines Assembly language Assembly Language programming Instructions Programs 6 Procedural Applications OS Routines High-level Language Programming Drivers .. High-level Languages Programs 7 Application Systems Procedural Constructs Problem-Oriented Programs High Level - Software EECC551 - Shaaban #18 Lec # 1 Winter 2001 12-3-2001 HierarchyHierarchy ofof ComputerComputer ArchitectureArchitecture High-Level Language Programs Assembly Language Software Application Programs Operating Machine Language System Program Compiler Firmware Software/Hardware Instruction Set Architecture Boundary Instr. Set Proc. I/O system Datapath & Control Hardware Digital Design Microprogram Circuit Design Layout Register Transfer Logic Diagrams Notation (RTN) Circuit Diagrams EECC551 - Shaaban #19 Lec # 1 Winter 2001 12-3-2001 Computer Architecture Vs. Computer Organization • The term Computer architecture is sometimes erroneously restricted to computer instruction set design, with other aspects of computer design called implementation • More accurate definitions: – Instruction set architecture (ISA): The actual programmer- visible instruction set and serves as the boundary between the software and hardware. – Implementation of a machine has two components: • Organization: includes the high-level aspects of a computer’s design such as: The memory system, the bus structure, the internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations. • Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology. • In general, Computer Architecture refers to the above three aspects: Instruction set architecture, organization, and hardware. EECC551 - Shaaban #20 Lec # 1 Winter 2001 12-3-2001 ComputerComputer Architecture’sArchitecture’s ChangingChanging DefinitionDefinition • 1950s to 1960s: Computer Architecture Course = Computer Arithmetic. • 1970s to mid 1980s: Computer Architecture Course = Instruction Set Design, especially ISA appropriate for compilers. • 1990s: Computer Architecture Course = Design of CPU, memory system, I/O system, Multiprocessors. EECC551 - Shaaban #21 Lec # 1

Clock Rate Improves Roughly Proportional to Improvement in L • Number of Transistors Improves Proportional to L2 (Or Faster)

Chapter 1: Computer Abstractions and Technology 1.6 – 1.7: Performance and Power

Atmega165p Datasheet

Performance of a Computer (Chapter 4) Vishwani D

Chap01: Computer Abstractions and Technology

RAMP: Research Accelerator for Multiple Processors

Computer “Performance”

Intel® Itanium® Architecture Software Developer's Manual

The Role of Performance Relating the Metrics CPU Execution Time for a Program = CPU Clock Cycles for a Program * Ming-Hwa Wang, Ph.D

Chapter 4 Accessing and Understanding Performance

TMS320F28035-EP Piccolo™ Microcontroller

Introduction to Microprocessors

Data-Level Parallelism in Vector, SIMD, and GPU Architectures