Performance Evaluation Performance Evaluation Computer Performance

Performance evaluation It’s an everyday process… CS/COE0447: Computer Organization When you buy food and Assembly Language • Same qqy,yuantity, then you look at cost • Same cost, then you look at quantity • (Assuming same quality!) Chapter 4 When yyyou buy a notebook • You give weights to different features LCD size? CPU speed? Sangyeun Cho MMiain memory sii?ze? Battery life? Dept. of Computer Science • How do you quantify these and come up with a single score that will University of Pittsburgh ggyuide your decision? CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 2 Performance evaluation Computer performance Often we face an oppptimization problem… Have you cared about performance seriously when buying a computer? • What were the most important factors when you bought a new PC? When cost is fixed • MiiMaximize performance (d(and ftfeatures )! Defining computer performance may not be that simple… • What do you mean when you say a computer has better performance When performance target is achieved than another? • Minimize cost! We need a common metric • Remember, however, than a single metric will not say everything about a computer Give me the highest-performance computer! Don’t care • We may need multiple metrics! about cost… • Supercomputers are of this class Response time vs. throughput CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 3 4 Response time vs. throughput Conventions Throughput Plane DC to Paris Top Speed Passengers (pmph) Througgphput Boeing 747 6.5 hours 610 mph 470 286,700 • E.g., # of transactions processed per second BAD/Sud • Larger is better 3 hours 1350 mph 132 178,200 Concorde Which has higher performance? If we are primarily concerned with response time, • Time to deliver 1 passenger performance can be defined as 1/T where T is the response • Time to deliver 400 passengers time Time for 1 job called • Smaller T is better • This is called “response time” or “execution time” or “latency” • That is, larger (1/T) is better • Time/job Machine A is N times faster than B Jobs per day called • N = performance(A) / performance(B) = time(B) / time(A) • This is called “throughput” or “bandwidth” • When we present this statement, we want to have N>1 to eliminate • Jobs/time confusion CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 5 6 Response time vs. throughput Time in a computer Time of Concorde vs. Boeing 747 When yypgyou time a program, what do you measure? • Concord is faster • Total time to complete a task • (6.5 hours / 3 hours) or 2.2 times faster • CPU time Throughput of Being 747 vs. Concorde • Overheads include disk accesses,,y memory accesses, I/O activities, OS • 286,700 p·mph / 176,200 p·mph overheads, overheads from other programs, … • 1.6 times higher (faster) • “Real time,” “response time,” “elapsed time,” … “Boeing 747 is 1.6 times (or 60%) faster than Concorde in terms of throughput” We are interested in time spent by CPU on your program “Concorde is 2.2 times (or 120%) faster than Boeing 747 in only – there are multiple programs running at the same time terms of flying time (response time)” • “CPU execution time” or “CPU time” • Often divided into system CPU time (in OS) and user CPU time (in user We will focus primarily on execution time of a single job for program) the remaining discussions CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 7 8 Time in a computer Measuring time In terms of seconds? CPU time: computers are constructed using digital circuitry synchronized at common “clock” ticks • E.g., Rowers and a captain • Clock ticks determine when events take place Clock is an endless,,p repetitive up & down pattern Clock cycle time = period of a clock cycle (up and down) • CCT = 1 / clock rate (or clock frequency) • 1ns if 1GHz clock is used • 0.5ns if 2GHz clock is used • 0.25ns if 4GHz clock is used • ___ if 5GHz clock is used CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 9 10 Measuring time in terms of clocks Measuring time in terms of clocks CPU execution time can be represented with # of clocks or # of clock ticks clock ticks • Has to do with how many instructions are executed in the program • E.g., it takes 1,000,000 clock ticks to finish program A • Has to do with how many clock ticks are needed to finish a single instruction on average If we know the clock cycle time (equivalently clock rate), we # clock ticks = # instructions (also called “instruction can easily calculate the actual time count”)×(clocks per instruction or CPI) • Time = (# of clock titikcks )×(l(cloc k cycle time ) • Time = (# of clock ticks) / (clock rate) Time = (# of clock ticks)×(clock cycle time) Time = IC×CPI×CCT IC may not change if two machines use the same instruction set architecture CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 11 12 What impacts time? Workload Time = IC×CPI×CCT A set of ppgrograms run on a comp uter would form a workload IC – # of instructions • Actual collection of applications • Instruction set architecture (()ISA) • Syypgnthetic programs • Compiler used • Kernel loops CPI (average clock per instruction) • … • Microarchitecture ((g,ppe.g., pipelining, cache size, …) • (Compiler) To compare two computer systems, a user would typically • (ISA) compare the execution time of the workload on the two CCT ((qequivalently clock rate ) machines • Technology & circuit optimization • Microarchitecture, esp. pipelining • (ISA) CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 13 14 Benchmark suites Synthetic benchmarks A benchmark suite is a set of applications relevant for performance A syypgnthesized program intended to mimic real p pgrograms evaluation • A = B×C+D/E+F; SPEC (standard performance evaluation corporation) • www.spec.org One may write a synthetic benchmark to exercise a specific • CPU benchmarks • Server benchmarks sequence of actions • Graphics benchmarks • A program that maximally exercises the multiplication unit in CPU • … EEMBC (embedded microprocessor benchmark consortium) • Automotive • Consumer • Network • Telecom • Office • … CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 15 16 Kernel programs (or loops) Summarizing performance An imppportant portion of a real p pgrogram Computer A Computer B • Quicksort Program 1 (sec) 1 10 • Matrix multiplication Program 2 (sec) 1000 100 Total time (sec) 1001 110 Real programs usually consist of many loops and functions and it may be quite difficult to understand and interpret any Compppguter A is 10 times faster than B for program 1 performance evaluation results if we are only given summary information Computer B is 10 times faster than A for program 2 In the case kernel programs, we immediately understand Alt houg h t he a bove statements are correct in div idua lly, t hey what’s going on in the programs present a confusing picture… CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 17 18 Arithmetic mean Normalized time Assume that we have N ppgrograms Computer A Computer B Ratio Program 1 1 (0.1) 10 (1) 0.1 Arithmetic mean (AM) = (∑ Timei)/N Program 2 1 (10) 0.1 (1) 10 Sum 2 (10.1) 10.1 (2) Weighted arithmetic mean = ∑ Timei×Wi, ∑ Wi = 1 Normalized time maintains the ratio of the original times However, depending on the reference, the result can be AM is a special case of weighted AM where W = 1/N i different! Summing (e.g., arithmetic mean) the normalized times may not produce the same comparison result CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 19 20 Dealing w/ ratios Summarizing performance Computer A Computer B Ratio Program 1 1 10 0.1 Program 2 1 (1000) 0.1 (100) 10 GM 1 When performance ratios are given, geometric mean (GM) can summarize the result more consistently 1/N GM = (∏ Ratioi) • GM(Xi)/GM(Yi) = GM(Xi/Yi) • Taking either the ratio of GM or GM of the ratios yields the same (From Intel Xeon 5100 brochure) result CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh CS/CoE0447: Computer Organization and Assembly Language University of Pittsburgh 21 22 Case study: SPEC benchmark Amdahl’s law SPEC CPU2000 benchmark Your oppqyppytimization technique is usually applicable to only a • 12 integer programs limited portion of program execution • 14 floating-point (computation-intensive) programs • E.g., larger cache, improved CPU frequency, improved FSB frequency, … To obtain a “SPECmark” • Run each program on the target machine Timeimproved = Timeunaffected+Timeaffected/(improvement factor) • Ge t each program’s performance ratio by diididividing the pre-provide d execution time (of an old

Performance Evaluation Performance Evaluation Computer Performance

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support