Computer Architecture
Overview
Prof. Tien-Fu Chen Dept. of Computer Science National Chung Cheng Univ Spring 2002
Overview-1 TF Chen@CCU
Computer Architecture Course Focus
Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of programmable processors in 21st Century
Programming Technology Applications Languages
Computer Architecture: Interface Design • Instruction Set Design (ISA) • Organization • Hardware
Operating Measurement & Systems Evaluation History
TF Chen@CCU Overview-2 Topic Coverage Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2nd Ed., 1996.
! Fundamentals of Computer Architecture (Ch. 1) ! Instruction Set Architecture (Ch. 2) ! Pipelining (Ch. 3) ! Pipelining and Instructional Level Parallelism (Ch. 4) ! Memory Hierarchy (Ch. 5) ! Input/Output and Storage, RAID (Ch. 6) ! Networks and Interconnection Technology (Ch. 7), Multiprocessors (Ch. 8) ! Vector, SIMD, MMX, VLIW, EPIC and DSPs ! Configurable processors and computing
! Overview-3 TF Chen@CCUDesign space exploration and embedded processors
Computer Architecture (Graduate): Administrative Information Instructors: Prof. Tien-Fu Chen Office: 412 Computer Science Dept,
Tel: ext-33111, [email protected]
T. A: Class: Wed 10:10-11:00, Fri 10:10-12:00 Room 101 Text: Computer Architecture: A Quantitative Approach, 2nd Edition (1996) Web page: http://www.cs.ccu.edu.tw/~chen/arch2002 Newsgroup: arch2002
TF Chen@CCU Overview-4 ABC of Computer Architecture:
Performance, Cost, Power
Overview-28 TF Chen@CCU
The Performance Metric
"X is n times faster than Y" means
ExTime(Y) Performance(X) ------= ------ExTime(X) Performance(Y)
• Speed of Concorde vs. Boeing 747
• Throughput of Boeing 747 vs. Concorde
TF Chen@CCU Overview-30 Metrics of Performance
Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS ISA (millions) of (FP) operations per second: MFLOP/s Datapath Control Megabytes per second Function Units Transistors Wires Pins Cycles per second (clock rate)
Overview-31 TF Chen@CCU
Performance
! Performance Measures – Elapsed time – CPU time %time 90.7u 12.9s 2:39 65%
– Cycle time – Law of Performance – CPI – MIPS – MFLOPS – SPECmarks ! Benchmarks ! Averaging
TF Chen@CCU Overview-32 Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
Inst Count CPI Clock Rate Program X
Compiler X (X)
Inst. Set. X X
Organization X X
Technology X
Overview-33 TF Chen@CCU
Cycles Per Instruction “Average Cycles per Instruction”
CPI = Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count n Σ CPU time = CycleTime * CPIii * I i =1 “Instruction Frequency”
n Σ CPI = CPIi * Fi where Fii = I i =1 Instruction Count Invest Resources where time is Spent!
TF Chen@CCU Overview-34 MIPS - Million instructions per second
= instrn × program × −6 MIPS 10 program time = inst count = clock rate exec time × 1066CPI × 10 ! Problems: – Depend on instruction set – Depend on different programs – Can vary inversely with performance ! Marketing metrics ! = Meaningless Indicator of Performance??
Overview-36 TF Chen@CCU
MFLOPS - Million floating-point operations per second
= FP ops × program × − 6 MFLOPS 10 program tim e ! Problems:
" Must be floating-point intensive " Some programs perform no FP " not only changes on mixture of Integer & FP operations, but on mixture of fast and slow FP
! just another Meaningless Indicator of Performance?? ! Peak MFLOPS: • the performance the manufacture guarantees you
TF Chen@CCU won't exceed Overview-37 SPEC: System Performance Evaluation Cooperative
! First Round 1989 – 10 programs yielding a single number (“SPECmarks”) ! Second Round 1992 – SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) » Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas ! Third Round 1995 – new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) – “benchmarks useful for 3 years” – Single flag setting for all programs: SPECint_base95, SPECfp_base95
Overview-39 TF Chen@CCU
Summarizing Results: Average
! Arithmetic Mean = 1 n tim e ∑ tim e i – means for times, CPI n i = 1
! Harmonic mean n rate = n 1 ∑ – means for rates, MIPS, MFLOPS i =1 ratei ! Geometric mean ĭ ĭ ı ijĮ= ı ijĮ ∏ = – normalize numbers
TF Chen@CCU Overview-40 Speedup
! Speedup o ld tim e new r ate Speedup == n ew tim e old rate ! Amdahl's Law Improvement to be gained from using faster mode is limited by the fraction of the time the faster mode can be used Consider an enhancement x speedups fraction fx of a task by Sx 1 Speedup = overall −+ ()(/)1 ffSxxx ! Examples ==→ fSxx90%, 5 1 Speedup = = 36. overall (.)(./)190−+ 905
Overview-43 TF Chen@CCU
Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------= ------ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
TF Chen@CCU Overview-44 Amdahl’s Law
ExTimenew =ExTimeold x(1-Fractionenhanced) + Fractionenhanced
Speedupenhanced
1 ExTimeold Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced ExTimenew Speedupenhanced
Overview-45 TF Chen@CCU
Amdahl’s Law
! Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
1 Speedupoverall = = 1.053 0.95
Law of diminishing return: Focus on the common case!
TF Chen@CCU Overview-46 Amdahl's Law Corollary
= fx 50%, 1 =→ = = 1.0 ! Increasing speedup Sx 1.10 Speedupoverall (1−+ .5) (.5/1.1) 1 =→ = = 1.818 Sx 10 Speedupoverall (1−+ .5) (.5/10) 1 =∞→ = = 20. Sx Speedupoverall (1−+ .5) (.5/ ∞ )
1 Speedup < ! Bound overall − 1 f x
! Examples fx 1% 5% 10% 20% 50% 1 /(1 -fx ) 1.01 1.05 1.11 1.25 200 Overview-47 TF Chen@CCU
Cost/Performance What is Relationship of Cost to Price?
! Recurring Costs – Component Costs – Direct Costs (add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty ! Non-Recurring Costs or Gross Margin (add 82% to 186%) (R&D, equipment maintenance, rental, marketing, sales, financing cost, pretax profits, taxes ! Average Discount to get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Average Discount 25% to 40% Avg. Selling Price Gross Margin 34% to 39% Direct Cost 6% to 8% Component TF Chen@CCU Overview-51 Cost 15% to 33% Energy/Power
! Power dissipation: rate at which energy is taken from the supply (power source) and transformed into heat P=E/t ! Energy dissipation for a given instruction depends upon type of instruction (and state of the processor)
n Σ P=(1/CPUTime)* E*Iii i =1
Overview-55 TF Chen@CCU
Summary, #2
! Amdahl’sLaw: 1 ExTimeold Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced ExTimenew ! CPI Law: Speedupenhanced
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
! Execution time is the REAL measure of computer performance! ! Good products created when have: – Good benchmarks, good ways to summarize performance ! Different set of metrics apply to embedded systems
TF Chen@CCU Overview-57