Lecture 1: Introduction

Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen Lecture 1: Introduction by the programmer, i.e., the conceptual structure and functional behavior as CprE 585 Advanced Computer distinct from the organization of the data Architecture, Fall 2003 flow and controls, the logic design, and the physical implementation. Zhao Zhang Gene Amdahl, IBM Journal R&D, April 1964 Contemporary “Computer Architecture” Comprehensive Course Contents Instruction set architecture: program-visible instruction set Fundamentals Instruction format, memory addressing modes, Processor architecture architectural registers EX: RISC, CISC, VLIW, EPIC Memory hierarchy Organization: high-level aspects of a I/O systems computer’s design Multiprocessors Pipeline stages, instruction scheduling, cache, memory, disks, buses, etc. Multicomputers Implementations: the specifics of a machine Logic design, packaging technology Contents of This Course Your Background 1. Fundamentals: ISA design principles, evaluation methodology, market factors in computer design 2. Processor architecture: We will focus on ILP techniques of Some digital design knowledge modern superscalar processors Multiple-issue RISC Istruction set architecture (MIPS) Dynamically scheduling Speculative execution Arithmetic design Non-blocking load/stores 3. Memory hierarchy Control and data path design Cache basics Multi-level caches and memory system designs Single-cycle processor implementation Advanced cache techniques 4. Brief coverage of VLIW and EPIC processors, storage systems and Multi-cycle implementation multiprocessors 5. Selected research topics: multi-threaded processors, embedded Pipelined implementation processor, low power arch., etc. 1 The CPU Performance Equation Instruction-level Parallelism (ILP) LD F2,45(R3) CPU time = #Inst × CPI × Clock cycle time MULTI F0,F2,F4 LD1 LD2 LD F6,34(R2) MULTI SUBD CPI = CPIideal+CPIcontrol hazard+CPIdata hazard SUBD F8,F6,F2 DIVD F10,F0,F6 DIVD ADD ADD F6,F8,F2 Given infinite resources, how fast can the processor run the code? Multi-issue Static and Dynamic Scheduling and VLIW Single-issue Two-way issue LD F2,45(R3) Static scheduling: Instructions MULTI F0,F2,F4 execute in program order IF IF IF LD F6,34(R2) SUBD F8,F6,F2 ID ID ID DIVD F10,F0,F6 Dynamic scheduling: Instructions ADD F6,F8,F2 may execute out-of-order EX EX EX MEM MEM MEM VLIW: dump hardware, compiler determines scheduling WB WB WB How many cycles in each case? Stall check Data forwarding What Is Memory Hierarchy Branch Prediction and Speculative Execution BEQ R8, R0, skip Branch outputs determine data A typical memory hierarchy today: LD F2,45(R3) dependence MULTI F0,F2,F4 Consider typical integer Skip: programs: one branch per Proc/Regs LD F6,34(R2) seven instructions L1-Cache SUBD F8,F6,F2 Bigger Faster DIVD F10,F0,F6 L2-Cache How much performance loss? ADD F6,F8,F2 L3-Cache (optional) Memory Disk, Tape, etc. Here we focus on L1/L2/L3 caches and main memory 2 Why Memory Hierarchy? What Else in This Course µProc VLIW and EPIC processors 1000 CPU 60%/yr. “Moore’s Law” Multiprocessors Storage systems 100 Processor-Memory Performance Gap: Selected advanced topics (tentative list) (grows 50% / year) 10 Simultaneous multithreading processors DRAM Embedded processors Performance DRAM 7%/yr. 1 Modeling 1992 1997 1981 1984 1986 1987 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 1980 1982 1983 1985 2000 Dependability and security … 1980: no cache in µproc; 1995 2-level cache on chip (1989 first Intel µproc with a cache on chip) Course Schedule by Weeks (Subject to Changes) Course Projects You will work in groups of two: Week 1. Introduction; Performance evaluation Preliminary project: get warmed up Week 2. ISA (Lab day) Verilog Project 1: Dynamic instruction scheduling Week 3. Review of MIPS pipeline; Tomasulo Algorithm Tomasulo algorithm Week 3. Tomasulo Algorithm; Alpha 21264 inst scheduling Alpha 21264 instruction scheduling Week 5. Branch prediction and speculative execution Verilog Project 2: Branch prediction and speculative execution Week 6. Memory load/store unit designs Branch prediction table, branch target buffer Week 7. Real examples of superscalar processors Recovery through reorder buffer Week 8. Cache fundamentals Verilog Project 3: Cache and TLB Week 9. Cache optimization techniques Direct mapped cache Direct mapped TLB Week 10. Virtual memory; Exam Final Project: On selected research topics Week 11-15. Advanced topics, student presentations Re-evaluate an existing study; or survey on a topic Including proposal, presentation, and final report Verilog Code Sketch Syllabus, Class web site, WebCT module cpu (reset, cycle, clock); // tomasulo with MIPS32 Syllabus On class web site (found it … /* stage 1: inst fetch */ Course Schedule from my home page) inst_fetch M1(/* request */fetch_req, /* ok */fetch_ok, /* pc */pc, /* inst */inst, /* reset */reset, /* branch */0, /* branch target */0, Textbook and references Check announcements /* clock */clock); Projects Get papers etc /* stage 2: rename, register read, issue */ Homework rename M2(fetch_req, …); Exam On WebCT /* stage 3: execute */ Check your grades adder(request, …); // fu adder with RS Grading … Join discussions: Verilog /* stage 4: write back */ programming, project … endmodule understanding, course contents, homework Still under construction problems 3 Major Faces in Today’s Market Technology Trends To know some non-technical background for processor design Implementation technologies change dramatically Desktop computers Integrated circuit logic technology Providing desktop computing for individuals Optimized for price-performance Semiconductor DRAM Servers Magnetic disk technology Providing larger-scale and more reliable file and computing service Network technology Designed for performance, availability, and scalability Embedded computers Lodged in other devices (networking switches, printer, palm, ISA must be stable: software is more expensive cell phone, etc.) than hardware Emphasizing real-time performance requirements Emphasizing low cost and low power design Cost, Price, and Their Trends Processor Performance Trends Cost and price may determine if a computer product will be successful in markets In many cases cost is the single important factor in design considerations Add a new feature or not? Trade performance with cost and price Especially true for desktop and embedded market Processor Price Trend Cost of IC Cost of die + Testing + Packing & Final Test Final test yield Cost of die = Cost of wafer Dies per wafer ×Die yield 4 Die Yield Die Yield D× Die area Die yield = Wafer yield×(1+ )−α α π × (d/2) 2 π × d Dies per wafer = − D : Defects per unit area Die area 2 × Die area α : Masking level d : Wafer diameter Wafer diameter size: 30cm, Defect density: 0.6 per cm2 Mask level (α): 4 D × Die area 0.7cm ×0.7cm 0.75 Die yield = Wafer yield × (1 + ) −α 1cm×1cm 0.57 α 1.5 cm×1.5cm 0.44 D : Defects per unit area 2cm ×2cm 0.35 α : Masking level Processor cost is more than linear to performance! Price increase is even more! (See textbook) DRAM Price Trend End of Lecture Course Strategies Notes Learn the fundamentals of computer Add slides for multiple-issue, branch architecture design prediction, load/store units, and memory Learn the most important aspects (at this time) of computer architecture: superscalar hierarchy processors and memory hierarchy Add slides for Tomasulo and Alpha-21264 Be exposed on the other topics: storage like scheduling systems and multiprocessors Add slides for course scheduling To appreciate the merits in computer architecture research Remember: hot topics tomorrow may be different 5.

Load more