Scalability Study in Parallel Computing Mark Alan Fienup Iowa State University

Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1995 Scalability study in parallel computing Mark Alan Fienup Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Computer Sciences Commons Recommended Citation Fienup, Mark Alan, "Scalability study in parallel computing " (1995). Retrospective Theses and Dissertations. 10900. https://lib.dr.iastate.edu/rtd/10900 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS Hiis manuscript has been reproduced fFom the microfilni master. UMI films the text directly from the original or copy submitted. Thus, some thesis and disseitation cc^ies are in typewriter £ace, while others may be from any type of conq)uter printer. The quality of this reproduction is dependoit npon the qnali^ of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margiTic and improper aUgmnent can adverse^ affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note win indicate the deletion. Oversize materials (e.g., rn^s, drawings, diarts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to ri^ in equal sections with small overly Each original is also photographed in one exposure and is included in reduced form at the bade of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photQgr^hic prints are available for any photographs or illustrations appearing in this c(^ for an additional chaige. Contact UMI direct^ to order. A Bell & Howell Information Company 300 North Zeeb Road. Ann Arbor. Ml 48106-1346 USA 313/761-4700 800/521-0600 Scalability study in parallel computing by Mark Alan Fienup A Thesis Submitted to the Graduate Faculty in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Department: Computer Science Major: Computer Science Signature was redacted for privacy. In" ge of Major Work Signature was redacted for privacy Signature was redacted for privacy. For the Graduate College Iowa State University Ames, Iowa 1995 UHI Niimber: 9531736 HMZ Hicro£orm 9531736 Copyright 1995, by ONI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17r United States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 To my wife, Virginia, and daughter, Ann. lU TABLE OF CONTENTS CHAPTER L INTRODUCTION 1 CHAPTER 2. LITERATURE REVIEW 7 CHAPTER 3. PARALLEL ALGORITHMS 13 CHAPTER 4. ASYMPTOTIC ANALYSIS OF SCALABILITY 26 CHAPTERS. MASPAR IMPLEMENTATIONS 51 CHAPTER 6. SCALABILITY EXPERIMENTS 62 CHAPTER 7. ACCURACY OF ASYMPTOTIC SCALABILITY METRICS IN PRACTICE 69 CHAPTER 8. CONCLUSIONS 88 REFERENCES 91 ACKNOWLEDGEMENTS 94 APPENDIX A. ADDITIONAL TABLES 95 APPENDIX B. MASPAR MPL CODES 111 1 CHAPTER 1. INTRODUCTION 1.1. Motivation One of the major goals of parallel computing is to decrease the execution-time of a computing task. For sequential programs, there are often several algorithms for solving a task, but usually a simple time-complexity analysis using "big-oh" notation suffices in determining the better algorithm. When trying to solve a task on a parallel computer, several complicating factors arise: (1) "How should the problem be decomposed on the processors?", (2) "How does the communication network's topology impact performance?", (3) "If I run my problem using more processors, how much improvement can I expect?", (4) "Given a choice of parallel computers to run my task, which should I choose?", and (5) "If I want to solve a bigger problem, using more processors what kind of performance can I expect?". To help answer many of these questions several scalability metrics have been proposed. These metrics differ in the scaling assumptions that they make. For example, Amdahl's law [2] assumes that the problem size is fixed as the number of processors is increased, the isoefficiency function [8] keeps the parallel computers efficiency fixed by allowing the problem size to grow with the number of processors, and in [23] scaling to control the simulation enor in scientific applications is discussed. In this thesis, a new scalablity metric, called CMP-scalability *(Constant-Memory-per-Processor) [5] [6], is proposed that assumes that the memory used per processor is constant as the problem size grows. 2 As you might expect, the CMP scalability metric is tailored to answering question (5) above: "If I want to solve a bigger problem, using more processors what kind of performance can I expect?". Specifically, the CMP-scalability function describes the asymptotic rate of grow of the speedup function under the constant-memory-per-processor scaling assumption. This seems like a natural question to ask as parallel-computer users often want to run larger problems. Since the memory available on each processor is fixed, more processors are required to store the larger problem and execute the problem in a shorter amount of time. Thus, CMP-scalability seems to be based on a reasonable scaling assumption. Parallel implementations of Cannon's matrix multiplication (MM) [4], Gauss-Jordan Elimination with partial pivoting (GJE), and Faster Fourier Transform (FFT) [6] on the MasPar architecture (a SIMD machine with a two-dimensional array of processors) were performed to evaluate the CMP-scalability metric. These algorithms were chosen because of their widely varying computation to communication ratios. The CMP-scalability metric predicted that the speedup of matrix multiplication would grow linearly, 0(P), with the number of processors, the speedup of Gauss-Jordan Elimination would grow as OiyfP) with the number of processors, and the speedup of Fast Fourier Transform would grow as Oi/P logjP) with the number of processors. It was somewhat surprising that the communication intensive Fast Fourier Transform was predicted to outperform the computationally more intensive Gauss-Jordan Elimination. Experimental studies on a 16K-processor MasPar MP-1 and 4K-processor MP-2 did 3 not seem to behave as predicted by the CMP-scalability metric. The speedup plots, under constant-memory-per-processor scaling, for all three algorithms appeared to be roughly linear with the slopes of 0.89,0.76, and 0.62 for matrix multiplication, Gauss-Jordan Elimination, and Fast Fourier Transform, respectively. At this point, the formulas describing the execution time of each algorithm were scrutinized, refined, and experimentally verified since these formulas were used in the CMP-scalability analysis. While some small errors in the formulas were discovered, the CMP-scalability analysis was not affected. To help explain the experimental results, other scalability metrics, especially the isoefficiency function, were used to analyze the chosen algorithms. The isoefficiency analysis of the three algorithms on a mesh architecture predicted that matrix multiplication would scale better than Gauss-Jordan Elimination which would scale much better than Fast Fourier Transform. Surprisingly, the predicted relative order of the scalability for the three algorithms was different for the isoefficiency metric than the CMP-scalability metric. While the relative scalability predicted by the isoefficiency function matched the experimental study, the magnitude of difference between the Gauss-Jordan Elimination and Fast Fourier Transform was not experimentally observed. 1.2. Questions to be Answered in the Thesis The following questions will be answered by the thesis; 1) How can the CMP and isoefficiency-scalability metrics predict different relative scalability for the Gauss-Jordan Elimination and Fast Fourier Transform algorithms? 2) Why don't the experimental results on the MasPar computers agree with the CMP and 4 isoefficiency-scalability analyses? 3) Would the predicted scalability analysis be observed on a different parallel computer with different machine parameters? 1.3. Organization of Thesis and Summary of Results The thesis starts in Chapter 2 with a review of the relevant literature to provide necessary background information. Scaling alternative and other scalability metrics are reviewed. In light of the questions to be answered by the thesis, particular emphasis will be placed on constant-memory-per-processor scaling and the isoefficiency-scalability metric. Chapter 3 discusses the parallel algorithms for Cannon's matrix multiplication, Gauss-Jordan Elimination, and Fast Fourier Transform on a two-dimensional mesh and on a hypercube parallel computer. For each algorithm, execution-time formulas are developed. These formulas are used in Chapter 4 to demonstrate how the asymptotic CMP and isoefficiency-scalability metrics can be calculated with a minimum amount of work. The apparenfly conflicting scalability results between the CMP and isoefficiency-scalability metrics for Gauss-Jordan Elimination and Fast Fourier Transform on a mesh are identified. At this point question

Scalability Study in Parallel Computing Mark Alan Fienup Iowa State University

Parallel Prefix Sum (Scan) with CUDA

Distributed Algorithms with Theoretic Scalability Analysis of Radial and Looped Load ﬂows for Power Distribution Systems

Scalability and Performance Management of Internet Applications in the Cloud

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (Genn)

CSE373: Data Structures & Algorithms Lecture 26

CSE 613: Parallel Programming Lecture 2

A Review of Multicore Processors with Parallel Programming

Parallel Algorithms and Parallel Program Design

Parallel System Performance: Evaluation & Scalability

Multiprocessing and Scalability

Pascal Viewer: a Tool for the Visualization of Parallel Scalability Trends

Computer Architecture: Parallel Processing Basics