Chapter 1 Exercises 37

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 1 Exercises 37 Chapter 1 Exercises 37 Chapter 1 Exercises 1. Look up the definition of “parallel” in your favorite dictionary. How does it compare to the definition of “parallelism” given in this chapter? 2. Give reasons why 10 bricklayers would not in real life build a wall 10 times faster than one bricklayer. 3. Ten volunteers are formed into a bucket brigade between a small pond and a cabin on fire. Why is this better than each volunteer working individually carrying water to the fire. Analyze the bucket brigade as a pipeline. How are the buckets returned? 4. Using an assembly line, one wants the conveyor belt to move as fast as possible in order to produce the most widgets per unit time. What determines the maximum speed of the conveyor belt? 5. Assume a conveyor belt assembly line of five tasks. Each task takes T units of time. What is the speedup for manufacturing a). 10 units? b). 100 units? c). 1000 units? 6. Plot the graph of the results of problem 5. What is the shape of the curve? 7. Given the assembly line of problem 5, what are the speedups if one of the tasks takes 2T units of time to complete? 8. Assume a conveyor belt assembly line of five tasks. One task takes 2T units of time and the other four each take T units of time. How could the differences in task times be accommodated? 9. Simple Simon has learned that the asymptotic speedup of an n station pipeline is n. Given a 5 station pipeline, Simple Simon figures he can make it twice as fast if he makes it into a 10 station pipeline by adding 5 “do nothing” stages. Is Simple Simon right in his thinking? Explain why or why not. 10. Select a parallel computer or parallel programming language and write a paper on its history. 2.1 Measures of Performance 37 Chapter 2 Measuring Performance This chapter focuses on measuring the performance of parallel computers. Many customers buy a parallel computer primarily for increased performance. Therefore, measuring performance accurately and in a meaningful manner is important. In this chapter, we will explore several measures of performance assuming a scientific computing environment. After selecting a “good” measure, we discuss the use of benchmarks to gather performance data. Since the performance of parallel architectures is harder to characterize than scalar ones, a performance model by Hockney is introduced. The performance model is used to identify performance issues with vector processors and SIMD machines. Next, we discuss several special problems associated with the performance of MIMD machines. Also, we discuss how to measure the performance of the new massively parallel machines. Lastly, we explore the physical and algorithmic limitations to increasing performance on parallel computers. 2.1 Measures of Performance First, we will consider measures of performance. Many measures of performance could be suggested, for example, instructions per second, disk reads and writes per second, memory accesses per second, or bus accesses per second. Before we can decide on a measure, we must ask ourselves what are we measuring? With parallel computers in a scientific computing environment, we are mostly concerned with CPU computing speed in performing numerical calculations. Therefore, a potential measure might be CPU instructions per second. However, in the next section we will find that this is a poor measure. 2.1.1 MIPS as a Performance Measure We all have seen advertisements claiming that such and such company has an X MIPS machine, where X is 50, 100 or whatever. The measure MIPS (Millions of Instructions Per Second) sounds impressive. However, it is a poor measure of performance, since processors have widely varying instruction sets. Consider, for example, the following data for a CISC (Complex Instruction Set Computer) Motorola MC68000 microprocessor and a RISC (Reduced Instruction Set Computer) Inmos T424 microprocessor. Processor Total Number of Instructions Time in Seconds MC68000 109,366 0.11 T424 539,215 0.03 Fig. 2.1 Performance Data for the Sieve of Eratosthenes Benchmark10 Both microprocessors are solving the same problem, i. e., a widely used benchmark for evaluating microprocessor performance called the Sieve of Eratosthenes, which finds all the prime numbers up to 10,000. Notice that the T424 with its simpler instruction set must perform almost five times as many instructions as the MC68000. total number of instructions rate = time to solve problem 539215 rateT424 = 0.03 = 18.0 MIPS 10 Inmos Technical Note 3: "Ims T424 - MC68000 Performance Comparison" 38 CHAPTER 2 MEASURING PERFORMANCE 109366 rateMC68000 = 0.11 = 1.0 MIPS The T424 running the Sieve program executes instructions at the rate of 18.0 MIPS. In contrast, the MC68000 running the same Sieve program executes instructions at the rate of 1.0 MIPS. Although the T424's MIPS rating is 18 times the MC68000's MIPS rating, the T424 is only 3.6 times faster than the MC68000. We conclude that the MIPS rating is not a good indicator of speed. We must be suspect when we see performance comparisons stated in MIPS. If MIPS is a poor measure, what is a good measure? 2.1.2 MFLOPS as a Performance Measure A reasonable measure for scientific computations is Millions of FLoating-point Operations Per Second (MFLOPS or Mega FLOPS). Since a typical scientific or engineering program contains a high percentage of floating-point operations, a good candidate for a performance measure is MFLOPS. Most of the time spent executing scientific programs is calculating floating-point values inside of nested loops. Clearly, not all work in a scientific environment is floating-point intensive, e. g., compiling a FORTRAN program. However, the computing industry has found MFLOPS to be a useful measure. Of course, some applications such as expert systems do very few floating point calculations and an MFLOPS rating is rather meaningless. A possible measure for expert systems might be the number of logical inferences per second. 2.2 MFLOPS Performance of Supercomputers Over Time To demonstrate the increase in MFLOPS over the last two decades, the chart below shows some representative parallel machines and their theoretical peak MFLOPS rating. Each was the fastest machine in its day. The chart also includes the number of processors contained in the machine and the year the first machine was shipped to a customer. Year Peak MFLOPS Number of Processors CDC 6600 1966 1 1 ILLIAC IV 1975 100 64 Cray-1 1976 160 1 CDC 205 1981 400 1 Cray X-MP/2 1983 420 2 Cray Y-MP/832 1987 1333 4 Cray Y-MP C90 1992 16000 16 NEC SX-3/44 1992 22000 4 Fig. 2.2 Peak MFLOPS for the Fastest Computer in That Year From the chart, we see that the MFLOPS rating has risen at a phenomenal rate in the last two decades. To see if there are any trends, we plot the Peak MFLOPS on a logarithmic scale versus the year. The result is almost a straight line! This means the performance increases tenfold about every five years. Can the computer industry continue at this rate? The indications are they can for at least another decade. 2.2 MFLOPS of Supercomputers Over Time 39 100000 10000 1000 100 10 PEAK MFLOPS 1 1965 1970 1975 1980 1985 1990 1995 Fig. 2.3 Log Plot of Peak MFLOPS in the Last Two Decades One caveat: the chart and graph use a machine's theoretical peak performance in MFLOPS. This is not the performance measured in a typical user's program. In a later section, we will explore the differences between "peak" and "useful" performance. Building a GFLOPS (Giga FLOPS or 1000 MFLOPS) machine is a major accomplishment. Do we need faster machines? Yes! In the next section, we will discuss why we need significantly higher performance. 2.3 The Need for Higher Performance Computers We saw in the last section that supercomputers have grown in performance at a phenomenal rate. Fast computers are in high demand in many scientific, engineering, energy, medical, military and basic research areas. In this section, we will focus on several applications which need enormous amounts of computing power. The first of these is numerical weather forecasting. Hwang and Briggs [Hwang, 1984] is the primary source of the information for this example. Considering the great benefits of accurate weather forecasting to navigation at sea and in the air, to food production and to the quality of life, it is not surprising that considerable effort has been expended in perfecting the art of forecasting. The weatherman’s latest tool is the supercomputer which is used to predict the weather based on a simulation of an atmospheric model. For the prediction, the weather analyst needs to solve a general circulation model. The atmospheric state is represented by the surface pressure, the wind field, the temperature and the water vapor mixing ratio. These state variables are governed by the Navier-Stokes fluid dynamics equations in a spherical coordinate system. To solve the continuous Navier-Stokes equations, we discretize both the variables and equations. That is, we divide the atmosphere into three- dimensional sub regions, associate a grid point with each sub region and replace the partial differential equations (defined at infinitely many points in space) with difference equations relating the discretized variables (defined on only the finitely many grid points). We initialize the state variables of each grid point based on the current weather at weather stations around the country. The computation is carried out on this three-dimensional grid that partitions the atmosphere vertically into K levels and horizontally into M intervals of longitude and N intervals of latitude.
Recommended publications
  • IBM Z Systems Introduction May 2017
    IBM z Systems Introduction May 2017 IBM z13s and IBM z13 Frequently Asked Questions Worldwide ZSQ03076-USEN-15 Table of Contents z13s Hardware .......................................................................................................................................................................... 3 z13 Hardware ........................................................................................................................................................................... 11 Performance ............................................................................................................................................................................ 19 z13 Warranty ............................................................................................................................................................................ 23 Hardware Management Console (HMC) ..................................................................................................................... 24 Power requirements (including High Voltage DC Power option) ..................................................................... 28 Overhead Cabling and Power ..........................................................................................................................................30 z13 Water cooling option .................................................................................................................................................... 31 Secure Service Container .................................................................................................................................................
    [Show full text]
  • Examining the Viability of FPGA Supercomputing
    1 Examining the Viability of FPGA Supercomputing Stephen D. Craven and Peter Athanas Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, VA 24061 USA email: {scraven,athanas}@vt.edu Abstract—For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) produces significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters. This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating- point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general High-Performance Computing (HPC). Index Terms— computational accelerator, digital arithmetic, Field programmable gate arrays, high- performance computing, supercomputers. I. INTRODUCTION Supercomputers have experienced a recent resurgence, fueled by government research dollars and the development of low-cost supercomputing clusters. Unlike the Massively Parallel Processor (MPP) designs found in Cray and CDC machines of the 70s and 80s, featuring proprietary processor architectures, many modern supercomputing clusters are constructed from commodity PC processors, significantly reducing procurement costs. In an effort to improve performance, several companies offer machines that place one or more FPGAs in each node of the cluster. Configurable logic devices, of which FPGAs are one example, permit the device’s hardware to be programmed multiple times after manufacture. A wide body of research over two decades has repeatedly demonstrated significant performance improvements for certain classes of applications when implemented within an FPGA’s configurable logic [1]. Applications well suited to speed-up by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Performance of a Computer (Chapter 4) Vishwani D
    ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction.
    [Show full text]
  • Computer “Performance”
    Computer “Performance” Readings: 1.6-1.8 BIPS (Billion Instructions Per Second) vs. GHz (Giga Cycles Per Second) Throughput (jobs/seconds) vs. Latency (time to complete a job) Measuring “best” in a computer Hyper 3.0 GHz The PowerBook G4 outguns Pentium Pipelined III-based notebooks by up to 30 percent.* Technology * Based on Adobe Photoshop tests comparing a 500MHz PowerBook G4 to 850MHz Pentium III-based portable computers 58 Performance Example: Homebuilders Builder Time per Houses Per House Dollars Per House Month Options House Self-build 24 months 1/24 Infinite $200,000 Contractor 3 months 1 100 $400,000 Prefab 6 months 1,000 1 $250,000 Which is the “best” home builder? Homeowner on a budget? Rebuilding Haiti? Moving to wilds of Alaska? Which is the “speediest” builder? Latency: how fast is one house built? Throughput: how long will it take to build a large number of houses? 59 Computer Performance Primary goal: execution time (time from program start to program completion) 1 Performance ExecutionTime To compare machines, we say “X is n times faster than Y” Performance ExecutionTime n x y Performancey ExecutionTimex Example: Machine Orange and Grape run a program Orange takes 5 seconds, Grape takes 10 seconds Orange is _____ times faster than Grape 60 Execution Time Elapsed Time counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Example: Unix “time” command linux15.ee.washington.edu> time javac CircuitViewer.java 3.370u 0.570s 0:12.44 31.6% Our focus: user CPU time time spent executing the lines of code that are "in" our program 61 CPU Time CPU execution time CPU clock cycles =*Clock period for a program for a program CPU execution time CPU clock cycles 1 =* for a program for a program Clock rate Application example: A program takes 10 seconds on computer Orange, with a 400MHz clock.
    [Show full text]
  • Trends in Processor Architecture
    A. González Trends in Processor Architecture Trends in Processor Architecture Antonio González Universitat Politècnica de Catalunya, Barcelona, Spain 1. Past Trends Processors have undergone a tremendous evolution throughout their history. A key milestone in this evolution was the introduction of the microprocessor, term that refers to a processor that is implemented in a single chip. The first microprocessor was introduced by Intel under the name of Intel 4004 in 1971. It contained about 2,300 transistors, was clocked at 740 KHz and delivered 92,000 instructions per second while dissipating around 0.5 watts. Since then, practically every year we have witnessed the launch of a new microprocessor, delivering significant performance improvements over previous ones. Some studies have estimated this growth to be exponential, in the order of about 50% per year, which results in a cumulative growth of over three orders of magnitude in a time span of two decades [12]. These improvements have been fueled by advances in the manufacturing process and innovations in processor architecture. According to several studies [4][6], both aspects contributed in a similar amount to the global gains. The manufacturing process technology has tried to follow the scaling recipe laid down by Robert N. Dennard in the early 1970s [7]. The basics of this technology scaling consists of reducing transistor dimensions by a factor of 30% every generation (typically 2 years) while keeping electric fields constant. The 30% scaling in the dimensions results in doubling the transistor density (doubling transistor density every two years was predicted in 1975 by Gordon Moore and is normally referred to as Moore’s Law [21][22]).
    [Show full text]
  • Advanced Computer Architecture Lecture No. 30
    Advanced Computer Architecture-CS501 ________________________________________________________ Advanced Computer Architecture Lecture No. 30 Reading Material Vincent P. Heuring & Harry F. Jordan Chapter 8 Computer Systems Design and Architecture 8.3.3, 8.4 Summary • Nested Interrupts • Interrupt Mask • DMA Nested Interrupts (Read from Book, Jordan Page 397) Interrupt Mask (Read from Book, Jordan Page 397) Priority Mask (Read from Book, Jordan Page 398) Examples Example # 123 Assume that three I/O devices are connected to a 32-bit, 10 MIPS CPU. The first device is a hard drive with a maximum transfer rate of 1MB/sec. It has a 32-bit bus. The second device is a floppy drive with a transfer rate of 25KB/sec over a 16-bit bus, and the third device is a keyboard that must be polled thirty times per second. Assuming that the polling operation requires 20 instructions for each I/O device, determine the percentage of CPU time required to poll each device. Solution: The hard drive can transfer 1MB/sec or 250 K 32-bit words every second. Thus, this hard drive should be polled using at least this rate. Using 1K=210, the number of CPU instructions required would be 250 x 210 x 20 = 5120000 instructions per second. 23 Adopted from [H&P org] Last Modified: 01-Nov-06 Page 309 Advanced Computer Architecture-CS501 ________________________________________________________ Percentage of CPU time required for polling is (5.12 x 106)/ (10 x106) = 51.2% The floppy disk can transfer 25K/2= 12.5 x 210 half-words per second. It should be polled with at least this rate.
    [Show full text]
  • Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
    Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction ..............................................................................................
    [Show full text]
  • Xilinx Running the Dhrystone 2.1 Benchmark on a Virtex-II Pro
    Product Not Recommended for New Designs Application Note: Virtex-II Pro Device R Running the Dhrystone 2.1 Benchmark on a Virtex-II Pro PowerPC Processor XAPP507 (v1.0) July 11, 2005 Author: Paul Glover Summary This application note describes a working Virtex™-II Pro PowerPC™ system that uses the Dhrystone benchmark and the reference design on which the system runs. The Dhrystone benchmark is commonly used to measure CPU performance. Introduction The Dhrystone benchmark is a general-performance benchmark used to evaluate processor execution time. This benchmark tests the integer performance of a CPU and the optimization capabilities of the compiler used to generate the code. The output from the benchmark is the number of Dhrystones per second (that is, the number of iterations of the main code loop per second). This application note describes a PowerPC design created with Embedded Development Kit (EDK) 7.1 that runs the Dhrystone benchmark, producing 600+ DMIPS (Dhrystone Millions of Instructions Per Second) at 400 MHz. Prerequisites Required Software • Xilinx ISE 7.1i SP1 • Xilinx EDK 7.1i SP1 • WindRiver Diab DCC 5.2.1.0 or later Note: The Diab compiler for the PowerPC processor must be installed and included in the path. • HyperTerminal Required Hardware • Xilinx ML310 Demonstration Platform • Serial Cable • Xilinx Parallel-4 Configuration Cable Dhrystone Developed in 1984 by Reinhold P. Wecker, the Dhrystone benchmark (written in C) was Description originally developed to benchmark computer systems, a short benchmark that was representative of integer programming. The program is CPU-bound, performing no I/O functions or operating system calls.
    [Show full text]
  • Introduction to Cpu
    microprocessors and microcontrollers - sadri 1 INTRODUCTION TO CPU Mohammad Sadegh Sadri Session 2 Microprocessor Course Isfahan University of Technology Sep., Oct., 2010 microprocessors and microcontrollers - sadri 2 Agenda • Review of the first session • A tour of silicon world! • Basic definition of CPU • Von Neumann Architecture • Example: Basic ARM7 Architecture • A brief detailed explanation of ARM7 Architecture • Hardvard Architecture • Example: TMS320C25 DSP microprocessors and microcontrollers - sadri 3 Agenda (2) • History of CPUs • 4004 • TMS1000 • 8080 • Z80 • Am2901 • 8051 • PIC16 microprocessors and microcontrollers - sadri 4 Von Neumann Architecture • Same Memory • Program • Data • Single Bus microprocessors and microcontrollers - sadri 5 Sample : ARM7T CPU microprocessors and microcontrollers - sadri 6 Harvard Architecture • Separate memories for program and data microprocessors and microcontrollers - sadri 7 TMS320C25 DSP microprocessors and microcontrollers - sadri 8 Silicon Market Revenue Rank Rank Country of 2009/2008 Company (million Market share 2009 2008 origin changes $ USD) Intel 11 USA 32 410 -4.0% 14.1% Corporation Samsung 22 South Korea 17 496 +3.5% 7.6% Electronics Toshiba 33Semiconduc Japan 10 319 -6.9% 4.5% tors Texas 44 USA 9 617 -12.6% 4.2% Instruments STMicroelec 55 FranceItaly 8 510 -17.6% 3.7% tronics 68Qualcomm USA 6 409 -1.1% 2.8% 79Hynix South Korea 6 246 +3.7% 2.7% 812AMD USA 5 207 -4.6% 2.3% Renesas 96 Japan 5 153 -26.6% 2.2% Technology 10 7 Sony Japan 4 468 -35.7% 1.9% microprocessors and microcontrollers
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Levels in Computer Design
    Performance of computer systems • Many different factors among which: – Technology • Raw speed of the circuits (clock, switching time) • Process technology (how many transistors on a chip) – Organization • What type of processor (e.g., RISC vs. CISC) • What type of memory hierarchy • What types of I/O devices – How many processors in the system – Software • O.S., compilers, database drivers etc CSE378 Performance. 1 Moore’s Law Courtesy Intel Corp. CSE378 Performance. 2 Processor-Memory Performance Gap • x Memory latency decrease (10x over 8 years but densities have increased 100x over the same period) • o x86 CPU speed (100x over 10 years) Pentium IV 1000 o Pentium III o Pentium Pro o “Memory wall” Pentium 100 o 386o “Memory gap” x x x x 10 x x 1 89 91 93 95 97 99 01 CSE378 Performance. 3 What are some possible metrics • Raw speed (peak performance = clock rate) • Execution time (or response time): time to execute one (suite of) program from beginning to end. – Need benchmarks for integer dominated programs, scientific, graphical interfaces, multimedia tasks, desktop apps, utilities etc. • Throughput (total amount of work in a given time) – measures utilization of resources (good metric when many users: e.g., large data base queries, Web servers) – Improving (decreasing) execution time will improve (increase) throughput. – Most of the time, improving throughput will decrease execution time CSE378 Performance. 4 Execution time Metric • Execution time: inverse of performance Performance A = 1 / (Execution_time A) • Processor A is faster than Processor B Execution_time A < Execution_time B Performance A > Performance B • Relative performance (a computer is “n times faster” than another one) Performance A / Performance B =Execution_time B / Execution_time A CSE378 Performance.
    [Show full text]