Chapter 1 Exercises 37

Chapter 1 Exercises 37 Chapter 1 Exercises 1. Look up the definition of “parallel” in your favorite dictionary. How does it compare to the definition of “parallelism” given in this chapter? 2. Give reasons why 10 bricklayers would not in real life build a wall 10 times faster than one bricklayer. 3. Ten volunteers are formed into a bucket brigade between a small pond and a cabin on fire. Why is this better than each volunteer working individually carrying water to the fire. Analyze the bucket brigade as a pipeline. How are the buckets returned? 4. Using an assembly line, one wants the conveyor belt to move as fast as possible in order to produce the most widgets per unit time. What determines the maximum speed of the conveyor belt? 5. Assume a conveyor belt assembly line of five tasks. Each task takes T units of time. What is the speedup for manufacturing a). 10 units? b). 100 units? c). 1000 units? 6. Plot the graph of the results of problem 5. What is the shape of the curve? 7. Given the assembly line of problem 5, what are the speedups if one of the tasks takes 2T units of time to complete? 8. Assume a conveyor belt assembly line of five tasks. One task takes 2T units of time and the other four each take T units of time. How could the differences in task times be accommodated? 9. Simple Simon has learned that the asymptotic speedup of an n station pipeline is n. Given a 5 station pipeline, Simple Simon figures he can make it twice as fast if he makes it into a 10 station pipeline by adding 5 “do nothing” stages. Is Simple Simon right in his thinking? Explain why or why not. 10. Select a parallel computer or parallel programming language and write a paper on its history. 2.1 Measures of Performance 37 Chapter 2 Measuring Performance This chapter focuses on measuring the performance of parallel computers. Many customers buy a parallel computer primarily for increased performance. Therefore, measuring performance accurately and in a meaningful manner is important. In this chapter, we will explore several measures of performance assuming a scientific computing environment. After selecting a “good” measure, we discuss the use of benchmarks to gather performance data. Since the performance of parallel architectures is harder to characterize than scalar ones, a performance model by Hockney is introduced. The performance model is used to identify performance issues with vector processors and SIMD machines. Next, we discuss several special problems associated with the performance of MIMD machines. Also, we discuss how to measure the performance of the new massively parallel machines. Lastly, we explore the physical and algorithmic limitations to increasing performance on parallel computers. 2.1 Measures of Performance First, we will consider measures of performance. Many measures of performance could be suggested, for example, instructions per second, disk reads and writes per second, memory accesses per second, or bus accesses per second. Before we can decide on a measure, we must ask ourselves what are we measuring? With parallel computers in a scientific computing environment, we are mostly concerned with CPU computing speed in performing numerical calculations. Therefore, a potential measure might be CPU instructions per second. However, in the next section we will find that this is a poor measure. 2.1.1 MIPS as a Performance Measure We all have seen advertisements claiming that such and such company has an X MIPS machine, where X is 50, 100 or whatever. The measure MIPS (Millions of Instructions Per Second) sounds impressive. However, it is a poor measure of performance, since processors have widely varying instruction sets. Consider, for example, the following data for a CISC (Complex Instruction Set Computer) Motorola MC68000 microprocessor and a RISC (Reduced Instruction Set Computer) Inmos T424 microprocessor. Processor Total Number of Instructions Time in Seconds MC68000 109,366 0.11 T424 539,215 0.03 Fig. 2.1 Performance Data for the Sieve of Eratosthenes Benchmark10 Both microprocessors are solving the same problem, i. e., a widely used benchmark for evaluating microprocessor performance called the Sieve of Eratosthenes, which finds all the prime numbers up to 10,000. Notice that the T424 with its simpler instruction set must perform almost five times as many instructions as the MC68000. total number of instructions rate = time to solve problem 539215 rateT424 = 0.03 = 18.0 MIPS 10 Inmos Technical Note 3: "Ims T424 - MC68000 Performance Comparison" 38 CHAPTER 2 MEASURING PERFORMANCE 109366 rateMC68000 = 0.11 = 1.0 MIPS The T424 running the Sieve program executes instructions at the rate of 18.0 MIPS. In contrast, the MC68000 running the same Sieve program executes instructions at the rate of 1.0 MIPS. Although the T424's MIPS rating is 18 times the MC68000's MIPS rating, the T424 is only 3.6 times faster than the MC68000. We conclude that the MIPS rating is not a good indicator of speed. We must be suspect when we see performance comparisons stated in MIPS. If MIPS is a poor measure, what is a good measure? 2.1.2 MFLOPS as a Performance Measure A reasonable measure for scientific computations is Millions of FLoating-point Operations Per Second (MFLOPS or Mega FLOPS). Since a typical scientific or engineering program contains a high percentage of floating-point operations, a good candidate for a performance measure is MFLOPS. Most of the time spent executing scientific programs is calculating floating-point values inside of nested loops. Clearly, not all work in a scientific environment is floating-point intensive, e. g., compiling a FORTRAN program. However, the computing industry has found MFLOPS to be a useful measure. Of course, some applications such as expert systems do very few floating point calculations and an MFLOPS rating is rather meaningless. A possible measure for expert systems might be the number of logical inferences per second. 2.2 MFLOPS Performance of Supercomputers Over Time To demonstrate the increase in MFLOPS over the last two decades, the chart below shows some representative parallel machines and their theoretical peak MFLOPS rating. Each was the fastest machine in its day. The chart also includes the number of processors contained in the machine and the year the first machine was shipped to a customer. Year Peak MFLOPS Number of Processors CDC 6600 1966 1 1 ILLIAC IV 1975 100 64 Cray-1 1976 160 1 CDC 205 1981 400 1 Cray X-MP/2 1983 420 2 Cray Y-MP/832 1987 1333 4 Cray Y-MP C90 1992 16000 16 NEC SX-3/44 1992 22000 4 Fig. 2.2 Peak MFLOPS for the Fastest Computer in That Year From the chart, we see that the MFLOPS rating has risen at a phenomenal rate in the last two decades. To see if there are any trends, we plot the Peak MFLOPS on a logarithmic scale versus the year. The result is almost a straight line! This means the performance increases tenfold about every five years. Can the computer industry continue at this rate? The indications are they can for at least another decade. 2.2 MFLOPS of Supercomputers Over Time 39 100000 10000 1000 100 10 PEAK MFLOPS 1 1965 1970 1975 1980 1985 1990 1995 Fig. 2.3 Log Plot of Peak MFLOPS in the Last Two Decades One caveat: the chart and graph use a machine's theoretical peak performance in MFLOPS. This is not the performance measured in a typical user's program. In a later section, we will explore the differences between "peak" and "useful" performance. Building a GFLOPS (Giga FLOPS or 1000 MFLOPS) machine is a major accomplishment. Do we need faster machines? Yes! In the next section, we will discuss why we need significantly higher performance. 2.3 The Need for Higher Performance Computers We saw in the last section that supercomputers have grown in performance at a phenomenal rate. Fast computers are in high demand in many scientific, engineering, energy, medical, military and basic research areas. In this section, we will focus on several applications which need enormous amounts of computing power. The first of these is numerical weather forecasting. Hwang and Briggs [Hwang, 1984] is the primary source of the information for this example. Considering the great benefits of accurate weather forecasting to navigation at sea and in the air, to food production and to the quality of life, it is not surprising that considerable effort has been expended in perfecting the art of forecasting. The weatherman’s latest tool is the supercomputer which is used to predict the weather based on a simulation of an atmospheric model. For the prediction, the weather analyst needs to solve a general circulation model. The atmospheric state is represented by the surface pressure, the wind field, the temperature and the water vapor mixing ratio. These state variables are governed by the Navier-Stokes fluid dynamics equations in a spherical coordinate system. To solve the continuous Navier-Stokes equations, we discretize both the variables and equations. That is, we divide the atmosphere into three- dimensional sub regions, associate a grid point with each sub region and replace the partial differential equations (defined at infinitely many points in space) with difference equations relating the discretized variables (defined on only the finitely many grid points). We initialize the state variables of each grid point based on the current weather at weather stations around the country. The computation is carried out on this three-dimensional grid that partitions the atmosphere vertically into K levels and horizontally into M intervals of longitude and N intervals of latitude.

Load more