•The Better a Computer Performs, the Better It Is! •But What Is Better?
Total Page:16
File Type:pdf, Size:1020Kb
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Performance Lecture #38 Performance •The better a computer 2006-12-01 performs, the better it is! Titleless TA Aaron Staley •But what is better? inst.eecs./~cs61c-tc Ancient Greeks used Computers ⇒ Mechanical device had 37 gear wheels which could be used to add, subtract, multiply, and divide http://www.nytimes.com/2006/11/30/science/30compute.html CS61C L3 Performance (1) Staley, Fall 2005 © UCB CS61C L3 Performance (2) Staley, Fall 2005 © UCB Why Performance? Two Notions of “Performance” • Purchasing Perspective: given a DC to Top Passen- Throughput Plane collection of machines (or upgrade Paris Speed gers (pmph) options), which has the Boeing 6.5 610 - best performance ? 470 286,700 - least cost ? 747 hours mph - best performance / cost ? BAD/Sud 3 1350 132 178,200 • Computer Designer Perspective: faced Concorde hours mph with design options, which has the •Which has higher performance? - best performance improvement ? •Interested in time to deliver 100 passengers? - least cost ? •Interested in delivering as many passengers per day as possible? - best performance / cost ? •In a computer, time for one task called • All require basis for comparison and Response Time or Execution Time metric for evaluation! •In a computer, tasks per unit time called Throughput or Bandwidth •Solid metrics lead to solid progress! CS61C L3 Performance (3) Staley, Fall 2005 © UCB CS61C L3 Performance (4) Staley, Fall 2005 © UCB Definitions Example of Response Time v. Throughput • Performance is in units of things per sec • Time of Concorde vs. Boeing 747? • bigger is better • Concord is 6.5 hours / 3 hours = 2.2 times faster • If we are primarily concerned with • Throughput of Boeing vs. Concorde? response time • Boeing 747: 286,700 pmph / 178,200 pmph • performance(x) = 1 = 1.6 times faster execution_time(x) • Boeing is 1.6 times (“60%”) faster in " F(ast) is n times faster than S(low) " means… terms of throughput performance(F) execution_time(S) • Concord is 2.2 times (“120%”) faster in terms of flying time (response time) n = = We will focus primarily on response performance(S) execution_time(F) time. CS61C L3 Performance (5) Staley, Fall 2005 © UCB CS61C L3 Performance (6) Staley, Fall 2005 © UCB Words, Words, Words… What is Time? • Straightforward definition of time: • Will (try to) stick to “n times faster”; • Total time to complete a task, including disk its less confusing than “m % faster” accesses, memory accesses, I/O activities, operating system overhead, ... • As faster means both decreased • “real time”, “response time” or execution time and increased “elapsed time” performance, to reduce confusion we • Alternative: just time processor (CPU) will (and you should) use is working only on your program (since “improve execution time” or multiple processes running at same time) “improve performance” • “CPU execution time” or “CPU time” • Often divided into system CPU time (in OS) and user CPU time (in user program) CS61C L3 Performance (7) Staley, Fall 2005 © UCB CS61C L3 Performance (8) Staley, Fall 2005 © UCB How to Measure Time? Measuring Time using Clock Cycles (1/2) • Real Time ⇒ Actual time elapsed • CPU execution time for a program • CPU Time: Computers constructed using a clock that runs at a constant = Clock Cycles for a program rate and determines when events take x Clock Period place in the hardware • or • These discrete time intervals called clock cycles (or informally clocks or = Clock Cycles for a program cycles) Clock Rate • Length of clock period: clock cycle time (e.g., 2 nanoseconds or 2 ns) and clock rate (e.g., 500 megahertz, or 500 MHz), which is the inverse of the clock period; use these! CS61C L3 Performance (9) Staley, Fall 2005 © UCB CS61C L3 Performance (10) Staley, Fall 2005 © UCB Measuring Time using Clock Cycles (2/2) Performance Calculation (1/2) • One way to define clock cycles: • CPU execution time for program = Clock Cycles for program Clock Cycles for program x Clock Cycle Time = Instructions for a program • Substituting for clock cycles: (called “Instruction Count”) CPU execution time for program x Average Clock cycles Per Instruction = (Instruction Count x CPI) (abbreviated “CPI”) x Clock Cycle Time • CPI one way to compare two machines = Instruction Count x CPI x Clock Cycle Time with same instruction set, since Instruction Count would be the same CS61C L3 Performance (11) Staley, Fall 2005 © UCB CS61C L3 Performance (12) Staley, Fall 2005 © UCB Performance Calculation (2/2) How Calculate the 3 Components? CPU time = Instructions x Cycles x Seconds • Clock Cycle Time: in specification of computer (Clock Rate in advertisements) Program Instruction Cycle • Instruction Count: CPU time = Instructions x Cycles x Seconds • Count instructions in loop of small program Program Instruction Cycle • Use simulator to count instructions CPU time = Instructions x Cycles x Seconds • Hardware counter in spec. register Program Instruction Cycle - (Pentium II,III,4) CPU time = Seconds • CPI: Program • Calculate: Execution Time / Clock cycle time • Product of all 3 terms: if missing a term, can’t Instruction Count predict time, the real measure of performance • Hardware counter in special register (PII,III,4) CS61C L3 Performance (13) Staley, Fall 2005 © UCB CS61C L3 Performance (14) Staley, Fall 2005 © UCB Calculating CPI Another Way Example (RISC processor) Op Freq CPI Prod (% Time) • First calculate CPI for each individual i i instruction (add, sub, and, etc.) ALU 50% 1 .5 (23%) Load 20% 5 1.0 (45%) • Next calculate frequency of each individual instruction Store 10% 3 .3 (14%) Branch 20% 2 .4 (18%) • Finally multiply these two for each 2.2 instruction and add them up to get Instruction Mix (Where time spent) final CPI (the weighted sum) • What if Branch instructions twice as fast? CS61C L3 Performance (15) Staley, Fall 2005 © UCB CS61C L3 Performance (16) Staley, Fall 2005 © UCB What Programs Measure for Comparison? Benchmarks • Ideally run typical programs with typical input before purchase, • Obviously, apparent speed of or before even build machine processor depends on code used to test it • Called a “workload”; For example: • Engineer uses compiler, spreadsheet • Need industry standards so that • Author uses word processor, drawing different processors can be fairly program, compression software compared • In some situations its hard to do • Companies exist that create these • Don’t have access to machine to benchmarks: “typical” code used to “benchmark” before purchase evaluate systems • Don’t know workload in future • Need to be changed every 2 or 3 years since designers could (and do!) target • Next: benchmarks & PC-Mac showdown! for these standard benchmarks CS61C L3 Performance (17) Staley, Fall 2005 © UCB CS61C L3 Performance (18) Staley, Fall 2005 © UCB Example Standardized Benchmarks (1/2) Example Standardized Benchmarks (2/2) • SPEC • Standard Performance Evaluation Corporation (SPEC) SPEC CPU2000 • Benchmarks distributed in source code • Members of consortium select workload • CINT2000 12 integer (gzip, gcc, crafty, perl, ...) - 30+ companies, 40+ universities • CFP2000 14 floating-point (swim, mesa, art, ...) • Compiler, machine designers target • All relative to base machine benchmarks, so try to change every 3 years Sun 300MHz 256Mb-RAM Ultra5_10, which gets score of 100 • An Example of SPEC 2000: • They measure CFP2000 CINT2000 gzip C Compression wupwise Fortran77 Physics / Quantum Chromodynamics swim Fortran77 Shallow Water Modeling vpr C FPGA Circuit Placement and - System speed (SPECint2000) Routing mgrid Fortran77 Multi-grid Solver: 3D Potential Field gcc C C Programming Language Compiler applu Fortran77 Parabolic / Elliptic Partial Differential Equations mcf C Combinatorial Optimization - System throughput (SPECint_rate2000) crafty C Game Playing: Chess mesa C 3-D Graphics Library parser C Word Processing galgel Fortran90 Computational Fluid Dynamics eon C++ Computer Visualization art C Image Recognition / Neural Networks perlbmk C PERL Programming Language equake C Seismic Wave Propagation Simulation facerec Fortran90 Image Processing: Face Recognition •Newer one: www.spec.org/osg/cpu2006/ gap C Group Theory, Interpreter vortex C Object-oriented Database ammp C Computational Chemistry bzip2 C Compression lucas Fortran90 Number Theory / Primality Testing fma3d Fortran90 Finite-element Crash Simulation twolf C Place and Route Simulator sixtrack Fortran77 High Energy Nuclear Physics Accelerator Design CS61C L3 Performance (19) Staley, Fall 2005 © UCB CS61C L3 Performance (20) apsi Fortran77 Meteorology: Pollutant DistrSitabluetyi,o Fnall 2005 © UCB Another Benchmark Performance Evaluation: An Aside Demo • PCs: Ziff-Davis Benchmark Suite If we’re talking about performance, let’s discuss the ways shady salespeople • “Business Winstone is a system-level, have fooled consumers application-based benchmark that measures (so that you don’t get taken!) a PC's overall performance when running today's top-selling Windows-based 32-bit 5. Never let the user touch it applications… it doesn't mimic what these 4. Only run the demo through a script packages do; it runs real applications through a series of scripted activities and 3. Run it on a stock machine in which uses the time a PC takes to complete those “no expense was spared” activities to produce its performance scores. 2. Preprocess all available data • Also tests for CDs, Content-creation, Audio, 3D graphics, battery life 1. Play a movie http://www.etestinglabs.com/benchmarks/ CS61C L3 Performance