Performance Evaluation Performance Evaluation Computer Performance

Performance evaluation It’s an everyday process… CS/COE0447: Computer Organization When you buy food and Assembly Language • Same qqy,yuantity, then you look at cost • Same cost, then you look at quantity • (Assuming same quality!) Chapter 4 When yyyou buy a notebook • You give weights to different features LCD size? CPU speed? Sangyeun Cho MMiain memory sii?ze? Battery life? Dept. of Computer Science • How do you quantify these and come up with a single score that will University of Pittsburgh ggyuide your decision? CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 2 Performance evaluation Computer performance Often we face an oppptimization problem… Have you cared about performance seriously when buying a computer? • What were the most important factors when you bought a new PC? When cost is fixed • MiiMaximize performance (d(and ftfeatures )! Defining computer performance may not be that simple… • What do you mean when you say a computer has better performance When performance target is achieved than another? • Minimize cost! We need a common metric • Remember, however, than a single metric will not say everything about a computer Give me the highest-performance computer! Don’t care • We may need multiple metrics! about cost… • Supercomputers are of this class Response time vs. throughput CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 3 4 Response time vs. throughput Conventions Throughput Plane DC to Paris Top Speed Passengers (pmph) Througgphput Boeing 747 6.5 hours 610 mph 470 286,700 • E.g., # of transactions processed per second BAD/Sud • Larger is better 3 hours 1350 mph 132 178,200 Concorde Which has higher performance? If we are primarily concerned with response time, • Time to deliver 1 passenger performance can be defined as 1/T where T is the response • Time to deliver 400 passengers time Time for 1 job called • Smaller T is better • This is called “response time” or “execution time” or “latency” • That is, larger (1/T) is better • Time/job Machine A is N times faster than B Jobs per day called • N = performance(A) / performance(B) = time(B) / time(A) • This is called “throughput” or “bandwidth” • When we present this statement, we want to have N>1 to eliminate • Jobs/time confusion CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 5 6 Response time vs. throughput Time in a computer Time of Concorde vs. Boeing 747 When yypgyou time a program, what do you measure? • Concord is faster • Total time to complete a task • (6.5 hours / 3 hours) or 2.2 times faster • CPU time Throughput of Being 747 vs. Concorde • Overheads include disk accesses,,y memory accesses, I/O activities, OS • 286,700 p·mph / 176,200 p·mph overheads, overheads from other programs, … • 1.6 times higher (faster) • “Real time,” “response time,” “elapsed time,” … “Boeing 747 is 1.6 times (or 60%) faster than Concorde in terms of throughput” We are interested in time spent by CPU on your program “Concorde is 2.2 times (or 120%) faster than Boeing 747 in only – there are multiple programs running at the same time terms of flying time (response time)” • “CPU execution time” or “CPU time” • Often divided into system CPU time (in OS) and user CPU time (in user We will focus primarily on execution time of a single job for program) the remaining discussions CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 7 8 Time in a computer Measuring time In terms of seconds? CPU time: computers are constructed using digital circuitry synchronized at common “clock” ticks • E.g., Rowers and a captain • Clock ticks determine when events take place Clock is an endless,,p repetitive up & down pattern Clock cycle time = period of a clock cycle (up and down) • CCT = 1 / clock rate (or clock frequency) • 1ns if 1GHz clock is used • 0.5ns if 2GHz clock is used • 0.25ns if 4GHz clock is used • ___ if 5GHz clock is used CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 9 10 Measuring time in terms of clocks Measuring time in terms of clocks CPU execution time can be represented with # of clocks or # of clock ticks clock ticks • Has to do with how many instructions are executed in the program • E.g., it takes 1,000,000 clock ticks to finish program A • Has to do with how many clock ticks are needed to finish a single instruction on average If we know the clock cycle time (equivalently clock rate), we # clock ticks = # instructions (also called “instruction can easily calculate the actual time count”)×(clocks per instruction or CPI) • Time = (# of clock titikcks )×(l(cloc k cycle time ) • Time = (# of clock ticks) / (clock rate) Time = (# of clock ticks)×(clock cycle time) Time = IC×CPI×CCT IC may not change if two machines use the same instruction set architecture CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 11 12 What impacts time? Workload Time = IC×CPI×CCT A set of ppgrograms run on a comp uter would form a workload IC – # of instructions • Actual collection of applications • Instruction set architecture (()ISA) • Syypgnthetic programs • Compiler used • Kernel loops CPI (average clock per instruction) • … • Microarchitecture ((g,ppe.g., pipelining, cache size, …) • (Compiler) To compare two computer systems, a user would typically • (ISA) compare the execution time of the workload on the two CCT ((qequivalently clock rate ) machines • Technology & circuit optimization • Microarchitecture, esp. pipelining • (ISA) CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 13 14 Benchmark suites Synthetic benchmarks A benchmark suite is a set of applications relevant for performance A syypgnthesized program intended to mimic real p pgrograms evaluation • A = B×C+D/E+F; SPEC (standard performance evaluation corporation) • www.spec.org One may write a synthetic benchmark to exercise a specific • CPU benchmarks • Server benchmarks sequence of actions • Graphics benchmarks • A program that maximally exercises the multiplication unit in CPU • … EEMBC (embedded microprocessor benchmark consortium) • Automotive • Consumer • Network • Telecom • Office • … CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 15 16 Kernel programs (or loops) Summarizing performance An imppportant portion of a real p pgrogram Computer A Computer B • Quicksort Program 1 (sec) 1 10 • Matrix multiplication Program 2 (sec) 1000 100 Total time (sec) 1001 110 Real programs usually consist of many loops and functions and it may be quite difficult to understand and interpret any Compppguter A is 10 times faster than B for program 1 performance evaluation results if we are only given summary information Computer B is 10 times faster than A for program 2 In the case kernel programs, we immediately understand Alt houg h t he a bove statements are correct in div idua lly, t hey what’s going on in the programs present a confusing picture… CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 17 18 Arithmetic mean Normalized time Assume that we have N ppgrograms Computer A Computer B Ratio Program 1 1 (0.1) 10 (1) 0.1 Arithmetic mean (AM) = (∑ Timei)/N Program 2 1 (10) 0.1 (1) 10 Sum 2 (10.1) 10.1 (2) Weighted arithmetic mean = ∑ Timei×Wi, ∑ Wi = 1 Normalized time maintains the ratio of the original times However, depending on the reference, the result can be AM is a special case of weighted AM where W = 1/N i different! Summing (e.g., arithmetic mean) the normalized times may not produce the same comparison result CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 19 20 Dealing w/ ratios Summarizing performance Computer A Computer B Ratio Program 1 1 10 0.1 Program 2 1 (1000) 0.1 (100) 10 GM 1 When performance ratios are given, geometric mean (GM) can summarize the result more consistently 1/N GM = (∏ Ratioi) • GM(Xi)/GM(Yi) = GM(Xi/Yi) • Taking either the ratio of GM or GM of the ratios yields the same (From Intel Xeon 5100 brochure) result CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 21 22 Case study: SPEC benchmark Amdahl’s law SPEC CPU2000 benchmark Your oppqyppytimization technique is usually applicable to only a • 12 integer programs limited portion of program execution • 14 floating-point (computation-intensive) programs • E.g., larger cache, improved CPU frequency, improved FSB frequency, … To obtain a “SPECmark” • Run each program on the target machine Timeimproved = Timeunaffected+Timeaffected/(improvement factor) • Ge t each program’s performance ratio by diididividing the pre-provide d execution time (of an old

Load more