Modern Computer Architectures Lecture-1: Introduction

Modern Computer Architectures Lecture-1: Introduction Sandeep Kumar Panda Asso.Prof. IT Department CEB,BBSR 1 Introduction – Computer performance has been increasing phenomenally over the last five decades. – Brought out by Moore’s Law: ● Transistors per square inch roughly double every eighteen months. – Moore’s law is not exactly a law: ● But, has held good for nearly 50 years. 2 Introduction Cont… ● If commercial aircrafts had similar performance increase over the last 50 years, we should have: – Commercial planes flying at 1000 times the supersonic speed. – Aircrafts of the size of a chair. – Costing couple of thousand rupees only. 3 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965: “Transistor density of minimum cost semiconductor chips would Moore’s Law: it’s worked for double roughly every 18 months.” a long time. Transistor density is correlated to processing speed. 4 Trends Related to Moore’s Law Cont… • Processor performance: • Twice as fast after every 2 years (roughly). • Memory capacity: • Twice as much after every 18 months (roughly). 5 Interpreting Moore’s Law ● Moore's law is not about just the density of transistors on a chip that can be achieved: – But about the density of transistors at which the cost per transistor is the lowest. ● As more transistors are made on a chip: – The cost to make each transistor reduces. – But the chance that the chip will not work due to a defect rises. ● Moore observed in 1965 there is a transistor density or complexity: – At which "a minimum cost" is achieved. 6 Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Final test yield: Fraction of packaged dies which pass the final testing state. 7 8” MIPS64 Wafer (564 Dies) Drawing single-crystal Si ingot from furnace…. Then, slice into wafers and pattern it… 8 Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Final test yield: Fraction of packaged dies which pass the final testing stage Die yield: Fraction of good dies on a wafer 9 Integrated Circuits Capacity 10 Processor Performance 11 Where Has This Performance Improvement Come From? ● Technology – More transistors per chip – Faster logic ● Machine Organization/Implementation – Deeper pipelines – More instructions executed in parallel,… ● Instruction Set Architecture – Reduced Instruction Set Computers (RISC) – Multimedia extensions – Explicit parallelism ● Compiler technology – Finding more parallelism in code – Greater levels of optimization 12 How Did Processor Performance Improve? ● Till 1980s, most of the performance improvements came from using innovations in manufacturing technologies: – VLSI – Reduction in feature size ● Improvements due to innovations in manufacturing technologies have slowed down since 1980s: – Smaller feature size gives rise to increased resistance, capacitance, propagation delays. – Larger power dissipation. (Aside: What is the power consumption of Intel Pentium Processor? Roughly 100 watts idle) 13 Feature Size 14 Feature size shrinks by 70% per 18 to 24 months Average Transistor Cost Per Year 15 Power Density Trend 16 Watch This Click the chip 17 Power Consumption in a Processor ● Power=Dynamic power + Leakage power ● Dynamic power = Number of transistors x capacitance x voltage2 x frequency ● Leakage power is rising and will soon match dynamic power. Pentium P-Pro P-II P-III P-4 Year 1993 95 97 99 2000 Transistors 3.1M 5.5M 7.5M 9.5M 42M Clock Speed 60M 200M 300M 500M 1.5G 18 Dynamic Power 2 PdynkiCiVAif iunits ● Dynamic power in CMOS current flows when active – Transistor switches on – Combinational logic evaluates new inputs – Flip-flop, latch captures new value (clock edge) ● Terms – C: capacitance of circuit ● wire length, number and size of transistors – V: supply voltage – A: activity factor – f: frequency ● Future: Power dissipation a major factor 19 Current Chip Manufacturing Process ● Till recently most desktop processors were fabricated using a 65 nm process. ● Intel in January 2007 demonstrated a working 45nm chip: – Intel began mass-producing in late 2007 (atom and core2duo processors). – Compare: The diameter of an atom is of the order of 0.1 nm. ● A decade ago, chips were built using a 500 nm (0.5 micron) process. ● In 1971, 10micron process was used. 20 How Did Performance Improve? Cont… ● Since 1980s, most of the performance improvements have come from: – Architectural and organizational innovations ● What is the difference between: – Computer architecture and computer organization? 21 Architecture vs. Organization ● Architecture: – Also known as Instruction Set Architecture (ISA) – Programmer visible part of a processor: instruction set, registers, addressing modes, etc. ● Organization: – High-level design: How many caches? How many arithmetic and logic units? What type of pipelining, control design, etc. – Sometimes known as micro-architecture. 22 Processor Views The ART and Science of Instruction-Set Processor Design [Gerrit Blaauw & Fred Brooks, 1981] ARCHITECTURE (ISA): programmer view – Functional appearance to user/system programmer – Opcodes, addressing modes, registers, etc. IMPLEMENTATION (μarchitecture): Processor designer view – Logical structure or organization that underlies the architecture – Pipelining, functional units, caches, physical registers REALIZATION (Chip): Chip/system designer view – Physical structure that embodies the implementation 23 – Gates, cells, transistors, wires Computer Architecture ● The structure of a computer that a machine language programmer must understand: – To be able to write a correct program for that machine. ● A family of computers of the same architecture should be able to run the same assembly language program. – Architecture leads to the notion of binary compatibility. 24 Instruction Set Architecture: The Critical Interface software instruction set hardware ● Advantages of abstraction: – Lasts through many generations (portability) – Used in many different ways (generality) – Provides convenient functionality to higher levels 25 – Permits an efficient implementation at lower levels Course Objectives ● Modern processors such as Intel Pentium, AMD Athlon, etc. use: – Many architectural and organizational innovations not covered in a first course. – Innovations in memory, bus, and storage designs as well. – Multiprocessors and clusters ● In this light, objective of this course: – Study the architectural and organizational innovations used in modern computers. 26 A Few Architectural and Organizational Innovations ● RISC (Reduced Instruction Set Computers): – Exploited instruction-level parallelism: ● Initially through pipelining and later through multiple instruction issue (superscalar) – Use of on-chip caches ● Dynamic instruction scheduling ● Branch prediction 27 Today’s Objectives ● Study some preliminary concepts: – Amdahl’s law, performance benchmarking, etc. ● RISC versus CISC architectures. ● Types of parallelism in programs versus types of parallel computers. ● Memory hierarchy. 28 Amdahl’s Law ● Quantifies overall performance gain due to improvement to a part of a computation. ● Amdahl’s Law: – Performance improvement gained from using some faster mode of execution is limited by the amount of time the enhancement is actually used. Execution time for a task without enhancement Speedup = Execution time for the task using enhancement 29 A Typical Computer System 30 Computer System Components CPU Caches Processor-Memory Bus Adapter RAM Peripheral Buses Controllers Controllers I/O devices Displays Networks Keyboards 31 Amdahl’s Law and Speedup ● Speedup tells us: – How much faster a machine will run due to an enhancement. ● For using Amdahl’s law two things should be considered: – 1st… Fraction of the computation time in the original machine that can use the enhancement ● If a program executes in 30 seconds and 15 seconds of exec. uses enhancement, fraction = ½ – 2nd… Improvement gained by enhancement ● If enhanced task takes 3.5 seconds and original task took 7secs, we say the speedup is 2. 32 Amdahl’s Law Equations Fractionenhanced Execution timenew = Execution timeold x (1 – Fractionenhanced) + Speedupenhanced Execution Timeold 1 Speedupoverall = = Execution Timenew Fractionenhanced (1 – Fractionenhanced) + Speedup Use previous equation, enhanced Solve for speedup Don’t just try to memorize these equations and plug numbers into them. It’s always important to think about the problem too! 33 Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced ExTimenew Speedupenhanced 34 Amdahl’s Law: Example ● Floating point instructions improved to run 2 times faster. ● But, only 10% of actual instructions are FP ExTimenew = ? Speedupoverall = ? 35 Amdahl’s Law: Example ● Floating point instructions improved to run 2X; – But only 10% of actual instructions are FP. ExTimenew = ExTimeold x (0.9 + 0.1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95 36 Performance Measurements ● Performance measurement is important: – Helps us to determine if one processor (or computer) works faster than another. – A computer exhibits higher performance if it executes programs faster. 37 How to Select Computer Systems? ● Suppose you are asked by your principal to select a computer system for a specific application: – Say to run a web service for your college. ● How will you proceed? 38 Comparing Performance ● X is n times faster than Y

Modern Computer Architectures Lecture-1: Introduction

Pipelining and Vector Processing

Diploma Thesis

How Data Hazards Can Be Removed Effectively

Towards Attack-Tolerant Trusted Execution Environments

Computer Organization

Instruction Pipelining (1 of 7)

Pipelining 2

Review of Computer Architecture

V850ES/SA2, V850ES/SA3 32-Bit Single-Chip Microcontrollers

Pipelining: Basic Concepts and Approaches

Testing and Validation of a Prototype Gpgpu Design for Fpgas Murtaza Merchant University of Massachusetts Amherst

EECS 570 Lecture 5 GPU Winter 2021 Prof