CPI Clock Rate

Chapter 4 Assessing and Understanding Performance Fall 2005 Department of Computer Science Kent State University Performance • Measure, Report, and Summarize • Make intelligent choices • See through the marketing hype • Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? Computer Architecture CS 35101- 002 Defining Performance Case Study: Airplane Airplane PassengersRange (mi)Speed (mph)Throughput Boeing 777 375 4630 610 288,750 Boeing 747 470 4150 610 286,700 BAC/Sud Concorde 132 4000 1350 178,200 Douglas DC-8-50 146 8720 544 79,424 •The 747 carries the most Passengers •DC-8 has the longest range •Concorde has the highest speed •The 777 has the highest throughput Which airplane performs the best? The answer depends on how performance is measured Computer Architecture CS 35101- 3 002 Defining Performance Computer Systems • Case 1: Individual Computer Users – Response Time (latency) — How long does it take to execute my job (Executing Time)? — How long must I wait for the database query? • Case 2: Data Center, Switching Systems – Throughput – (Total amount work done in a given Time) — How many concurrent jobs can the machine run in a given Time period? — How many subscriber calls can the switch handle without dropping the • Need different performance metrics Computer Architecture CS 35101- 4 002 Computer Performance: TIME, TIME, TIME • Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? • Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? • If we upgrade a machine with a new processor what do we increase? • If we add a new machine to the lab what do we increase? Computer Architecture CS 35101- 002 Execution Time • Elapsed Time – counts everything (disk and memory accesses, I/O , etc.) – a useful number, but often not good for comparison purposes • CPU time – doesn't count I/O or time spent running other programs – can be broken up into system time, and user time • Our focus: user CPU time – time spent executing the lines of code that are "in" our program Computer Architecture CS 35101- 002 Computer Performance Definition Response Time Current Focus : Response Time • To Maximize Performance Minimize Response Time (Execution Time): Computer X: PerformanceX = 1 Executing TimeX Computer Y: Performancey = 1 Executing Timey What if PerformanceX > PerformanceY ? Computer Architecture CS 35101- 7 002 Relative Performance For some program executing on computer X: If "X is n times faster than Y" PerformanceX = n PerformanceY Assume computer X runs a program in 10 seconds while computer Y takes 15 seconds to run the same program. Then computer X has better performance than computer Y. How much better? Performance Execution Time 15 seconds X = y = = 1.5 10 seconds PerformanceY Execution TimeY Computer Architecture CS 35101- 8 002 Performance Metrics End-User Perspective • Elapsed Time (Response Time or Wall-Clock Time) – Total time to complete a task • Disk and Memory Access, I/O OS overhead, CPU execution time etc) • CPU Execution Time – Actual time CPU spends computing for a specific task • System CPU Time (Time spent in OS on behalf of your program) • User CPU Time (Time spent in executing lines of code inside your program) – Does not count I/O or time spent running other programs System Performance ~ Refers to Elapsed Time for an unloaded system CPU Performance ~ Refers to User CPU time (This is our primary focus) Computer Architecture CS 35101- 9 002 Performance Metrics Hardware Perspective • Designers measure hardware performance via clock cycles • Clock “ticks” indicate when to schedule events: time • Clock runs at a constant rate • cycle time = time between ticks = seconds per cycle • clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) • Therefore: cycle time = 1/clock rate 1 12 A 4 Ghz. clock has a ×10 =250 picoseconds ps cycle time 4×109 Computer Architecture CS 35101- 10 002 How to Improve Performance seconds cycles seconds = ´ program program cycle So, to improve performance (everything else being equal) you can either (increase or decrease?) ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. Computer Architecture CS 35101- 002 Improving Performance Example: Favorite program Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" Computer Architecture CS 35101- 12 002 Improving Performance Example: Favorite program CPU clock cyclesA CPU TimeA = Clock rateA 10 seconds = CPU clock cyclesA 4 X 109 cycles sec CPU clock cycles 9 A = 10 seconds X 4 X 10 cycles = 40 X 109 cycles sec sec CPU TimeB = 1.2 X CPU clock cyclesA Clock rateB 6 seconds = 1.2 X 40 X 109 cycles Clock rateB 9 8 X 10 cycles = 8 GHz Clock rate = 9 = B 1.2 X 40 X 10 cycles seconds 6 seconds Computer B must have twice clock rate of A to run program in 6 seconds Computer Architecture CS 35101- 13 002 How many cycles are required for a program? • Could assume that number of cycles equals number of instructions n n n o o i o i t i t t c c c u u u r t tr tr s s s n i n in i d t d h h .. t r t s n th . 4 1 2 3 5 6 time This assumption is incorrect, different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code Computer Architecture CS 35101- 002 Different numbers of cycles for different instructions time • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers • Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) Computer Architecture CS 35101- 002 Now that we understand cycles • A given program will require – some number of instructions (machine instructions) – some number of cycles – some number of seconds • We have a vocabulary that relates these quantities: – cycle time (seconds per cycle) – clock rate (cycles per second) – CPI (cycles per instruction) a floating point intensive application might have a higher CPI – MIPS (millions of instructions per second) this would be higher for a program using simple instructions Computer Architecture CS 35101- 002 Performance • Performance is determined by execution time • Do any of the other variables equal performance? – # of cycles to execute program? – # of instructions in program? – # of cycles per second? – average # of cycles per instruction? – average # of instructions per second? • Common pitfall: thinking one of the variables is indicative of performance when it really isn’t. Computer Architecture CS 35101- 002 More on Clock cycles in a prog. CPI Average Clock cycles CPU time = Instructions per program X Per instruction “Clock cycles = Instructions per program X Per Instruction” (CPI) = Instruction Count X CPI CPI is an average number of clock cycles for all instructions executed in a program Computer Architecture CS 35101- 18 002 CPU Performance Equation CPI Recall: End-User CPU time Clock cycle time in program = x Clock cycle in program End-User CPU Clock cycle time x Instruction Count X CPI = time in program or End-User CPU Instruction Count X CPI = time in program Clock rate Computer Architecture CS 35101- 19 002 CPU Performance and its Factors … so far: new units of measure seconds Time = = seconds x cycle program cycle program = Instructions x Clock cycles x Seconds Program Instruction Clock cycle CPU Execution time for a program Instruction count Avg. CPI Clock cycle time How do we measure these performance factors? (We will talk about this shortly) Computer Architecture CS 35101- 20 002 CPU Performance Factors ..Measure CPU Execution time By running the program for a program Clock cycle time Published as part of the computer decumentation (clock rate) Instruction count Using Simulators of architecture, Hardware counters Avg. CPI Using detailed simulation of the implementation, hardware counters Computer Architecture CS 35101- 21 002 CPI Example • Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 What machine is faster for this program, and by how much? • If two machines haveC tohmep sautemr Aer cIhSitAec wturhei cChS 3o5f1 o01-ur quantities (e.g., clock rate, CPI, execution time, # of ins00tru2 ctions, MIPS) will always be identical? CPI Example Same ISA • Each computer executes same number of instructions (I) for the program CPU time in A = Clock cycle time x Instruction Count X CPI • CPU timeA = 250ps x I x 20 = 500 x I ps • CPU timeB = 500ps x I x 1.2 = 600 x I ps • Hence Computer A is faster by: CPU performance Execution time A = B CPU performanceB Execution timeA 600 x I ps = = 1.2 500 x I ps Computer Architecture CS 35101- 23 002 # of Instructions Example • A compiler designer is trying to decide between two code sequences for a particular machine.

Load more