•The Better a Computer Performs, the Better It Is! •But What Is Better?

Total Page:16

File Type:pdf, Size:1020Kb

•The Better a Computer Performs, the Better It Is! •But What Is Better? inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Performance Lecture #38 Performance •The better a computer 2006-12-01 performs, the better it is! Titleless TA Aaron Staley •But what is better? inst.eecs./~cs61c-tc Ancient Greeks used Computers ⇒ Mechanical device had 37 gear wheels which could be used to add, subtract, multiply, and divide http://www.nytimes.com/2006/11/30/science/30compute.html CS61C L3 Performance (1) Staley, Fall 2005 © UCB CS61C L3 Performance (2) Staley, Fall 2005 © UCB Why Performance? Two Notions of “Performance” • Purchasing Perspective: given a DC to Top Passen- Throughput Plane collection of machines (or upgrade Paris Speed gers (pmph) options), which has the Boeing 6.5 610 - best performance ? 470 286,700 - least cost ? 747 hours mph - best performance / cost ? BAD/Sud 3 1350 132 178,200 • Computer Designer Perspective: faced Concorde hours mph with design options, which has the •Which has higher performance? - best performance improvement ? •Interested in time to deliver 100 passengers? - least cost ? •Interested in delivering as many passengers per day as possible? - best performance / cost ? •In a computer, time for one task called • All require basis for comparison and Response Time or Execution Time metric for evaluation! •In a computer, tasks per unit time called Throughput or Bandwidth •Solid metrics lead to solid progress! CS61C L3 Performance (3) Staley, Fall 2005 © UCB CS61C L3 Performance (4) Staley, Fall 2005 © UCB Definitions Example of Response Time v. Throughput • Performance is in units of things per sec • Time of Concorde vs. Boeing 747? • bigger is better • Concord is 6.5 hours / 3 hours = 2.2 times faster • If we are primarily concerned with • Throughput of Boeing vs. Concorde? response time • Boeing 747: 286,700 pmph / 178,200 pmph • performance(x) = 1 = 1.6 times faster execution_time(x) • Boeing is 1.6 times (“60%”) faster in " F(ast) is n times faster than S(low) " means… terms of throughput performance(F) execution_time(S) • Concord is 2.2 times (“120%”) faster in terms of flying time (response time) n = = We will focus primarily on response performance(S) execution_time(F) time. CS61C L3 Performance (5) Staley, Fall 2005 © UCB CS61C L3 Performance (6) Staley, Fall 2005 © UCB Words, Words, Words… What is Time? • Straightforward definition of time: • Will (try to) stick to “n times faster”; • Total time to complete a task, including disk its less confusing than “m % faster” accesses, memory accesses, I/O activities, operating system overhead, ... • As faster means both decreased • “real time”, “response time” or execution time and increased “elapsed time” performance, to reduce confusion we • Alternative: just time processor (CPU) will (and you should) use is working only on your program (since “improve execution time” or multiple processes running at same time) “improve performance” • “CPU execution time” or “CPU time” • Often divided into system CPU time (in OS) and user CPU time (in user program) CS61C L3 Performance (7) Staley, Fall 2005 © UCB CS61C L3 Performance (8) Staley, Fall 2005 © UCB How to Measure Time? Measuring Time using Clock Cycles (1/2) • Real Time ⇒ Actual time elapsed • CPU execution time for a program • CPU Time: Computers constructed using a clock that runs at a constant = Clock Cycles for a program rate and determines when events take x Clock Period place in the hardware • or • These discrete time intervals called clock cycles (or informally clocks or = Clock Cycles for a program cycles) Clock Rate • Length of clock period: clock cycle time (e.g., 2 nanoseconds or 2 ns) and clock rate (e.g., 500 megahertz, or 500 MHz), which is the inverse of the clock period; use these! CS61C L3 Performance (9) Staley, Fall 2005 © UCB CS61C L3 Performance (10) Staley, Fall 2005 © UCB Measuring Time using Clock Cycles (2/2) Performance Calculation (1/2) • One way to define clock cycles: • CPU execution time for program = Clock Cycles for program Clock Cycles for program x Clock Cycle Time = Instructions for a program • Substituting for clock cycles: (called “Instruction Count”) CPU execution time for program x Average Clock cycles Per Instruction = (Instruction Count x CPI) (abbreviated “CPI”) x Clock Cycle Time • CPI one way to compare two machines = Instruction Count x CPI x Clock Cycle Time with same instruction set, since Instruction Count would be the same CS61C L3 Performance (11) Staley, Fall 2005 © UCB CS61C L3 Performance (12) Staley, Fall 2005 © UCB Performance Calculation (2/2) How Calculate the 3 Components? CPU time = Instructions x Cycles x Seconds • Clock Cycle Time: in specification of computer (Clock Rate in advertisements) Program Instruction Cycle • Instruction Count: CPU time = Instructions x Cycles x Seconds • Count instructions in loop of small program Program Instruction Cycle • Use simulator to count instructions CPU time = Instructions x Cycles x Seconds • Hardware counter in spec. register Program Instruction Cycle - (Pentium II,III,4) CPU time = Seconds • CPI: Program • Calculate: Execution Time / Clock cycle time • Product of all 3 terms: if missing a term, can’t Instruction Count predict time, the real measure of performance • Hardware counter in special register (PII,III,4) CS61C L3 Performance (13) Staley, Fall 2005 © UCB CS61C L3 Performance (14) Staley, Fall 2005 © UCB Calculating CPI Another Way Example (RISC processor) Op Freq CPI Prod (% Time) • First calculate CPI for each individual i i instruction (add, sub, and, etc.) ALU 50% 1 .5 (23%) Load 20% 5 1.0 (45%) • Next calculate frequency of each individual instruction Store 10% 3 .3 (14%) Branch 20% 2 .4 (18%) • Finally multiply these two for each 2.2 instruction and add them up to get Instruction Mix (Where time spent) final CPI (the weighted sum) • What if Branch instructions twice as fast? CS61C L3 Performance (15) Staley, Fall 2005 © UCB CS61C L3 Performance (16) Staley, Fall 2005 © UCB What Programs Measure for Comparison? Benchmarks • Ideally run typical programs with typical input before purchase, • Obviously, apparent speed of or before even build machine processor depends on code used to test it • Called a “workload”; For example: • Engineer uses compiler, spreadsheet • Need industry standards so that • Author uses word processor, drawing different processors can be fairly program, compression software compared • In some situations its hard to do • Companies exist that create these • Don’t have access to machine to benchmarks: “typical” code used to “benchmark” before purchase evaluate systems • Don’t know workload in future • Need to be changed every 2 or 3 years since designers could (and do!) target • Next: benchmarks & PC-Mac showdown! for these standard benchmarks CS61C L3 Performance (17) Staley, Fall 2005 © UCB CS61C L3 Performance (18) Staley, Fall 2005 © UCB Example Standardized Benchmarks (1/2) Example Standardized Benchmarks (2/2) • SPEC • Standard Performance Evaluation Corporation (SPEC) SPEC CPU2000 • Benchmarks distributed in source code • Members of consortium select workload • CINT2000 12 integer (gzip, gcc, crafty, perl, ...) - 30+ companies, 40+ universities • CFP2000 14 floating-point (swim, mesa, art, ...) • Compiler, machine designers target • All relative to base machine benchmarks, so try to change every 3 years Sun 300MHz 256Mb-RAM Ultra5_10, which gets score of 100 • An Example of SPEC 2000: • They measure CFP2000 CINT2000 gzip C Compression wupwise Fortran77 Physics / Quantum Chromodynamics swim Fortran77 Shallow Water Modeling vpr C FPGA Circuit Placement and - System speed (SPECint2000) Routing mgrid Fortran77 Multi-grid Solver: 3D Potential Field gcc C C Programming Language Compiler applu Fortran77 Parabolic / Elliptic Partial Differential Equations mcf C Combinatorial Optimization - System throughput (SPECint_rate2000) crafty C Game Playing: Chess mesa C 3-D Graphics Library parser C Word Processing galgel Fortran90 Computational Fluid Dynamics eon C++ Computer Visualization art C Image Recognition / Neural Networks perlbmk C PERL Programming Language equake C Seismic Wave Propagation Simulation facerec Fortran90 Image Processing: Face Recognition •Newer one: www.spec.org/osg/cpu2006/ gap C Group Theory, Interpreter vortex C Object-oriented Database ammp C Computational Chemistry bzip2 C Compression lucas Fortran90 Number Theory / Primality Testing fma3d Fortran90 Finite-element Crash Simulation twolf C Place and Route Simulator sixtrack Fortran77 High Energy Nuclear Physics Accelerator Design CS61C L3 Performance (19) Staley, Fall 2005 © UCB CS61C L3 Performance (20) apsi Fortran77 Meteorology: Pollutant DistrSitabluetyi,o Fnall 2005 © UCB Another Benchmark Performance Evaluation: An Aside Demo • PCs: Ziff-Davis Benchmark Suite If we’re talking about performance, let’s discuss the ways shady salespeople • “Business Winstone is a system-level, have fooled consumers application-based benchmark that measures (so that you don’t get taken!) a PC's overall performance when running today's top-selling Windows-based 32-bit 5. Never let the user touch it applications… it doesn't mimic what these 4. Only run the demo through a script packages do; it runs real applications through a series of scripted activities and 3. Run it on a stock machine in which uses the time a PC takes to complete those “no expense was spared” activities to produce its performance scores. 2. Preprocess all available data • Also tests for CDs, Content-creation, Audio, 3D graphics, battery life 1. Play a movie http://www.etestinglabs.com/benchmarks/ CS61C L3 Performance
Recommended publications
  • Personal Computing, the Notebook Battery Crisis, and Postindustrial
    1 1 <running head>Eisler | Exploding the Black Box 2 <title>Exploding the Black Box 3 <subtitle>Personal Computing, the Notebook Battery Crisis, and Postindustrial 4 Systems Thinking 5 Matthew N. Eisler 6 Matthew N. Eisler studies the relationship between ideology, material practices, and 7 the social relations of contemporary science and engineering at the intersection of energy, 8 environmental, and industrial policy. His first book, Overpotential: Fuel Cells, Futurism, 9 and the Making of a Power Panacea was published by Rutgers University Press in 2012. 10 He has been affiliated with Western University, the Center for Nanotechnology in Society 11 at the University of California at Santa Barbara, the Chemical Heritage Foundation, and 12 the Department of Engineering and Society at the University of Virginia. He is currently a 13 Visiting Assistant Professor in the Department of Integrated Science and Technology at 14 James Madison University. He thanks Jack K. Brown, W. Bernard Carlson, Paul E. 15 Ceruzzi, Michael D. Gordin, Barbara Hahn, Cyrus C.M. Mody, Hannah S. Rogers, and 16 three anonymous referees for their incisive and constructive criticism of earlier drafts of 17 this article. 18 Abstract: 19 Historians of science and technology have generally ignored the role of power 20 sources in the development of consumer electronics. In this they have followed the 21 predilections of historical actors. Research, development, and manufacturing of 22 batteries has historically occurred at a social and intellectual distance from the research, 23 development, and manufacturing of the devices they power. Nevertheless, power source 24 technoscience should properly be understood as an allied yet estranged field of 2 1 electronics.
    [Show full text]
  • CS61C : Machine Structures
    inst.eecs.berkeley.edu/~cs61c/su05 Review CS61C : Machine Structures • Benchmarks Lecture #29: Intel & Summary • Attempt to predict performance • Updated every few years • Measure everything from simulation of desktop graphics programs to battery life • Megahertz Myth • MHz ≠ performance, it’s just one factor 2005-08-10 Andy Carle CS 61C L29 Intel & Review (1) A Carle, Summer 2005 © UCB CS 61C L29 Intel & Review (2) A Carle, Summer 2005 © UCB MIPS is example of RISC MIPS vs. 80386 • RISC = Reduced Instruction Set Computer • Address: 32-bit • 32-bit • Term coined at Berkeley, ideas pioneered • Page size: 4KB • 4KB by IBM, Berkeley, Stanford • Data aligned • Data unaligned • RISC characteristics: • Destination reg: Left • Right • Load-store architecture •add $rd,$rs1,$rs2 •add %rs1,%rs2,%rd • Fixed-length instructions (typically 32 bits) • Regs: $0, $1, ..., $31 • %r0, %r1, ..., %r7 • Three-address architecture • Reg = 0: $0 • (n.a.) • RISC examples: MIPS, SPARC, • Return address: $31 • (n.a.) IBM/Motorola PowerPC, Compaq Alpha, ARM, SH4, HP-PA, ... CS 61C L29 Intel & Review (3) A Carle, Summer 2005 © UCB CS 61C L29 Intel & Review (4) A Carle, Summer 2005 © UCB MIPS vs. Intel 80x86 MIPS vs. Intel 80x86 • MIPS: “load-store architecture” • MIPS: “Three-address architecture” • Only Load/Store access memory; rest • Arithmetic-logic specify all 3 operands operations register-register; e.g., add $s0,$s1,$s2 # s0=s1+s2 lw $t0, 12($gp) add $s0,$s0,$t0 # s0=s0+Mem[12+gp] • Benefit: fewer instructions ⇒ performance • Benefit: simpler hardware ⇒ easier
    [Show full text]
  • CS61C : Machine Structures
    inst.eecs.berkeley.edu/~cs61c/su05 Review CS61C : Machine Structures • Benchmarks Lecture #29: Intel & Summary • Attempt to predict performance • Updated every few years • Measure everything from simulation of desktop graphics programs to battery life • Megahertz Myth • MHz ≠ performance, it’s just one factor 2005-08-10 Andy Carle CS 61C L29 Intel & Review (1) A Carle, Summer 2005 © UCB CS 61C L29 Intel & Review (2) A Carle, Summer 2005 © UCB MIPS is example of RISC MIPS vs. 80386 • RISC = Reduced Instruction Set Computer • Address: 32-bit • 32-bit • Term coined at Berkeley, ideas pioneered • Page size: 4KB • 4KB by IBM, Berkeley, Stanford • Data aligned • Data unaligned • RISC characteristics: • Destination reg: Left • Right • Load-store architecture •add $rd,$rs1,$rs2 •add %rs1,%rs2,%rd • Fixed-length instructions (typically 32 bits) • Regs: $0, $1, ..., $31 • %r0, %r1, ..., %r7 • Three-address architecture • Reg = 0: $0 • (n.a.) • RISC examples: MIPS, SPARC, • Return address: $31 • (n.a.) IBM/Motorola PowerPC, Compaq Alpha, ARM, SH4, HP-PA, ... CS 61C L29 Intel & Review (3) A Carle, Summer 2005 © UCB CS 61C L29 Intel & Review (4) A Carle, Summer 2005 © UCB MIPS vs. Intel 80x86 MIPS vs. Intel 80x86 • MIPS: “load-store architecture” • MIPS: “Three-address architecture” • Only Load/Store access memory; rest • Arithmetic-logic specify all 3 operands operations register-register; e.g., add $s0,$s1,$s2 # s0=s1+s2 lw $t0, 12($gp) add $s0,$s0,$t0 # s0=s0+Mem[12+gp] • Benefit: fewer instructions ⇒ performance • Benefit: simpler hardware ⇒ easier
    [Show full text]
  • Sebastian Hodge My Science Project 9/6/15 – 25/6/15
    Sebastian Hodge My Science Project 9/6/15 – 25/6/15 CPU Clock Speed and Performance Sebastian Hodge | CPU Clock Speed and Performance Contents Aim ................................................................................................................................................................ 3 Background Information ............................................................................................................................... 3 Introduction .............................................................................................................................................. 3 What is a CPU? What is clock speed? ....................................................................................................... 3 Do two processors with the same GHz have the same performance? ..................................................... 4 How can CPU clock speed be changed? .................................................................................................... 5 How is CPU performance measured? ....................................................................................................... 5 What is this experiment looking to find? .................................................................................................. 6 Hypothesis..................................................................................................................................................... 7 Materials/Equipment ...................................................................................................................................
    [Show full text]
  • How ICT Works
    How ICT Works FOCUS QUESTION >> 2 How does information and communication technology work? TOPICS Expectations • ANATOMY OF A COMPUTER • COMPUTER HARDWARE By the end of this chapter, students will • COMPUTER SOFTWARE • demonstrate an understanding of the terminology associated • COMPUTER COMMUNICATION with information and communication technology • identify types of devices and tools used in information and communication technology WORD WALL • define key terms associated with information and BIOS communication technology bits bus • use current information and communication technology terms bytes appropriately cache • demonstrate an understanding of the computer workstation CDs (compact discs) environment client/server network command-driven interface • explain the basic functions of the components of a computer conductor wires and its peripheral devices CPU (central processing unit) • explain the purpose of an operating system data decode • identify common user interface elements and describe their functions desktop • compare stand-alone and networked computer environments directory • create and maintain a portfolio by selecting samples of their document work, including business communications, that illustrate their drive bays DSL (Digital Subscriber Line) skills and competencies in information and communication DVDs (Digital Versatile/ technology Video Discs) • analyze ethical issues related to information and communication execute technology expansion cards expansion slots • identify the skills and competencies needed to work effectively
    [Show full text]
  • UNIT-1 Basic Structure of Computers: Computer Types, Functional Units, Basic
    www.edutechlearners.com COMPUTER ORGANIZATION UNIT-1 Basic Structure of Computers: Computer Types, Functional Units, Basic Operational Concepts, Bus Structures, Performance – Processor Clock, Basic Performance Equation, Clock Rate, Performance Measurement, Historical Perspective. Machine Instructions and Programs: Numbers, Arithmetic Operations and Characters, Memory Location and Addresses, Memory Operations, Instructions and Instruction Sequencing. 1 of 268 www.edutechlearners.com COMPUTER ORGANIZATION CHAPTER – 1 BASIC STRUCTURE OF COMPUTERS 1.1 Computer types A computer can be defined as a fast electronic calculating machine that accepts the (data) digitized input information process it as per the list of internally stored instructions and produces the resulting information. List of instructions are called programs & internal storage is called computer memory. The different types of computers are 1. Personal computers: - This is the most common type found in homes, schools, Business offices etc., It is the most common type of desk top computers with processing and storage units along with various input and output devices. 2. Note book computers: - These are compact and portable versions of PC 3. Work stations: - These have high resolution input/output (I/O) graphics capability, but with same dimensions as that of desktop computer. These are used in engineering applications of interactive design work. 4. Enterprise systems: - These are used for business data processing in medium to large corporations that require much more computing power and storage capacity than work stations. Internet associated with servers have become a dominant worldwide source of all types of information. 5. Super computers: - These are used for large scale numerical calculations required in the applications like weather forecasting etc., 1.2 Functional unit A computer consists of five functionally independent main parts input, memory, arithmetic logic unit (ALU), output and control unit.
    [Show full text]
  • Krazy Ken: in 1997, Apple's Website Went Dark for 24 Hours, and It Mysteriously Showed a Picture of a Chocolate Chip Cookie, a Shopping Cart, and a Screwdriver
    Krazy Ken: In 1997, Apple's website went dark for 24 hours, and it mysteriously showed a picture of a chocolate chip cookie, a shopping cart, and a screwdriver. What the heck could this possibly mean? Was it a hint for a new product? Let's find out. Apple Keynote Chronicles is made possible by our awesome friends at Linode. You can simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines. To put it simply, if it runs on Linux, it runs on Linode. Hey guys, how are you all doing? If you're new here, welcome. My name is Krazy Ken, and welcome back to Apple Keynote Chronicles. Today, as usual, I'm joined by Brad, who has, for some stupid reason, agreed to go on this torturous journey with me. So thank you, Brad. I really appreciate it. I think Brad: I caught the crazy, I just haven't passed it on, yet. Krazy Ken: That's true, you haven't. But hopefully, if my plan works out... Okay, it's not really a plan, it's just an idea baby right now, I would like to have other guests on the show, too. It'd be cool to talk with some other people, who were maybe even at these keynotes, but that's for the future. So yes, we're here again today, and we're talking about two Apple events, or just quickly going through a Seybold Seminar that Steve Jobs was at. But then the main event was the November 1997 Apple special event, where they introduced the PowerPC G3 processor.
    [Show full text]
  • Artificial Intelligence
    EQUITY RESEARCH INDUSTRY UPDATE June 3, 2021 Artificial Intelligence The Next Technology Frontier TECHNOLOGY/SEMICONDUCTORS & COMPONENTS SUMMARY Artificial Intelligence, once the stuff of science fiction, has arrived. Interest is high and adoption increasing from supercomputers to smartphones. Investors have taken note and rewarded early leaders like NVIDIA. Advances in semiconductors and software have enabled sophisticated neural networks, further accelerating AI development. Models continue to grow in size and sophistication, delivering transformative breakthroughs in image recognition, natural language processing, and recommendation systems. We see AI as a leading catalyst for Industry 4.0, a disruptive technology with broad societal/economic benefits. In this report, we explore key concepts underpinning the evolution of AI from a hardware and software perspective. We consulted more than a dozen leading public and private companies working on the latest AI platforms. We see a large and rapidly expanding AI accelerator opportunity. We estimate AI hardware platform TAM at $105B by 2025, a 34% CAGR. KEY POINTS ■ AI/ML/DL: Artificial Intelligence enables machines to simulate human intelligence. Machine Learning (ML) is one of the most prevalent AI techniques, where data- trained models allow machines to make informed predictions. Within ML, Deep Learning (DL) uses Artificial Neural Networks to replicate the compute capabilities of biological neurons. DL is showing promise in AI research, providing machines the ability to self-learn. ■ Drivers: We highlight three factors driving the latest DL breakthroughs: 1) Rapid Data Growth—global data is expected to reach 180ZB by 2025 (25% CAGR), necessitating AI to process this data and create meaningful inferences; 2) Advanced Processors—the decline of Moore’s Law and shift to heterogeneous computing have sparked specialized AI silicon development, providing critical performance gains; 3) Neural Networks—DL performance scales with exponential data and neural network model growth.
    [Show full text]
  • The Travelling Salesman Problem
    Design of efficient parallel algorithms for the TSP on multicore processors with shared memory Steinar Sørli Hegna Master’s Thesis Informatics: Programming and Networks 60 credits Department of Informatics Faculty of Mathematics and Natural Sciences UNIVERSITY OF OSLO February 2019 II Design of efficient parallel algorithms for the travelling salesman problem on multicore processors with shared memory III IV © Steinar Sørli Hegna 2019 Design of efficient parallel algorithms for the travelling salesman problem on multicore processors with shared memory Steinar Sørli Hegna http://www.duo.uio.no/ Print: Reprosentralen, University of Oslo V VI Abstract As CPU (Central Processing Unit) clock-rates have stagnated due to physical limitations, and core counts are ever-increasing we must utilize the available processing power by implementing parallelism into our algorithms and programs. (푁−1)! The NP-hard symmetric travelling salesman problem with a complexity of isn’t 2 solvable within a reasonable amount of time. Therefore, we can first design sequential algorithms for this computationally hard problem and further improve their efficiency by utilizing Java’s threads functionality. We will demonstrate how we can start with initial approximations found by the nearest neighbour algorithm and improve them with improvement only based algorithms to solve symmetric travelling salesman problems of size 푁 < 300 optimally. Furthermore, we will improve the highest runtime algorithms with parallelism, and demonstrate how we can reduce the runtimes by a factor of the number of cores available. VII VIII Preface This thesis was completed between January 2018 and February 2019 with the Department of Informatics at the University of Oslo as the final part of my Master’s Degree in Informatics: Programming and Networks.
    [Show full text]
  • Unifying Hardware and Software Benchmarking: a Resource-Agnostic Model
    Unifying hardware and software benchmarking: a resource-agnostic model Davide Morelli University of Pisa Computer Science Department June 2015 Ph.D. thesis 2 Abstract Lilja (2005) states that \In the field of computer science and engineering there is surprisingly little agreement on how to measure something as fun- damental as the performance of a computer system.". The field lacks of the most fundamental element for sharing measures and results: an appropriate metric to express performance. Since the introduction of laptops and mobile devices, there has been a strong research focus towards the energy efficiency of hardware. Many papers, both from academia and industrial research labs, focus on methods and ideas to lower power consumption in order to lengthen the battery life of portable device components. Much less effort has been spent on defining the responsibility of software in the overall computational system energy consumption. Some attempts have been made to describe the energy behaviour of software, but none of them abstract from the physical machine where the measurements were taken. In our opinion this is a strong drawback because results can not be generalized. In this work we attempt to bridge the gap between characterization and prediction, of both hardware and software, of performance and energy, in a single unified model. We propose a model designed to be as simple as possible, generic enough to be abstract from the specific resource being described or predicted (applying to both time, memory and energy), but also concrete and practical, allowing useful and precise performance and energy predictions. The model applies to the broadest set of resource possible.
    [Show full text]
  • Architecture 1 CISC, RISC, VLIW, Dataflow Contents
    Architecture 1 CISC, RISC, VLIW, Dataflow Contents 1 Computer architecture 1 1.1 History ................................................ 1 1.2 Subcategories ............................................. 1 1.3 Roles .................................................. 2 1.3.1 Definition ........................................... 2 1.3.2 Instruction set architecture .................................. 2 1.3.3 Computer organization .................................... 3 1.3.4 Implementation ........................................ 3 1.4 Design goals .............................................. 3 1.4.1 Performance ......................................... 3 1.4.2 Power consumption ...................................... 4 1.4.3 Shifts in market demand ................................... 4 1.5 See also ................................................ 4 1.6 Notes ................................................. 4 1.7 References ............................................... 5 1.8 External links ............................................. 5 2 Complex instruction set computing 6 2.1 Historical design context ........................................ 6 2.1.1 Incitements and benefits .................................... 6 2.1.2 Design issues ......................................... 7 2.2 See also ................................................ 8 2.3 Notes ................................................. 8 2.4 References ............................................... 8 2.5 Further reading ............................................ 8 2.6 External
    [Show full text]
  • Cs61c/Su06 CS61C : Machine Structures Lecture #28: Parallel Computing + Summary
    inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #28: Parallel Computing + Summary 2006-08-16 Andy Carle CS61C L28 Parallel Computing (1) A Carle, Summer 2006 © UCB What Programs Measure for Comparison? °Ideally run typical programs with typical input before purchase, or before even build machine • Called a “workload”; For example: • Engineer uses compiler, spreadsheet • Author uses word processor, drawing program, compression software °In some situations it’s hard to do • Don’t have access to machine to “benchmark” before purchase • Don’t know workload in future CS61C L28 Parallel Computing (2) A Carle, Summer 2006 © UCB Example Standardized Benchmarks (1/2) °Standard Performance Evaluation Corporation (SPEC) SPEC CPU2000 • CINT2000 12 integer (gzip, gcc, crafty, perl, ...) • CFP2000 14 floating-point (swim, mesa, art, ...) • All relative to base machine Sun 300MHz 256Mb-RAM Ultra5_10, which gets score of 100 • www.spec.org/osg/cpu2000/ • They measure - System speed (SPECint2000) - System throughput (SPECint_rate2000) CS61C L28 Parallel Computing (3) A Carle, Summer 2006 © UCB Example Standardized Benchmarks (2/2) °SPEC • Benchmarks distributed in source code • Big Company representatives select workload - Sun, HP, IBM, etc. • Compiler, machine designers target benchmarks, so try to change every 3 years CS61C L28 Parallel Computing (4) A Carle, Summer 2006 © UCB Example PC Workload Benchmark °PCs: Ziff-Davis Benchmark Suite • “Business Winstone is a system-level, application-based benchmark that measures a PC's overall performance when running today's top-selling Windows-based 32-bit applications… it doesn't mimic what these packages do; it runs real applications through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores.
    [Show full text]