Evaluating Parallel Algorithms

Evaluating Parallel Algorithms Theoretical and Practical Asp ects Lasse Natvig Division of Computer Systems and Telematics The Norwegian Institute of Technology The UniversityofTrondheim NORWAY Novemb er Abstract The motivation for the work rep orted in this thesis has b een to lessen the gap b etween theory and practice within the eld of parallel computing When lo oking for new and faster parallel algorithms for use in massively paral lel systems it is tempting to investigate promising alternatives from the large b o dy of research done on parallel algorithms within the eld of theoretical computer science These algorithms are mainly describ ed for the PRAM Parallel Random Access Machine mo del of computation This thesis prop oses a metho d for evaluating the practical value of PRAM algorithms The approach is based on implementing PRAM algorithms for execution on a CREW Concurrent Read ExclusiveWrite PRAM simula tor Measuring and analysis of implemented algorithms on nite problems provide new and more practically oriented results than those traditionally obtained by asymptotical analysis O notation The evaluation metho d is demonstrated byinvestigating the practical value of a new and imp ortant parallel sorting algorithm from theoretical computer scienceknown as Coles Paral lel Merge Sort algorithm Coles algorithm is compared with the well known Batchers bitonic sorting algo rithm Coles algorithm is asymptotically probably the fastest among all known sorting algorithms and also cost optimal Its O log n time con sumption implies that it is faster than bitonic sorting whichisO log n timeprovided that the numb er of items to b e sorted n is large enough However it is found that bitonic sorting is faster as long as n is less than ie ab out Giga Tera items Consequently Coles logarithmic time algorithm is not fast in practice The thesis also gives an overview of complexity theory for sequential and parallel computations and describ es a promising alternative for parallel programming called synchronous MIMD programming Preface This is a thesis submitted to the Norwegian Institute of Technology NTH for the do ctoral degree doktor ingenir dring The rep orted work has b een carried out at the Division of Computer Systems and Telematics The Norwegian Institute of Technology The UniversityofTrondheim Norway The work has b een sup ervised by Professor Arne Halaas Thesis Overview and Scop e Ab out the contents Chapter starts by explaining how the likely proliferation of computers with a large numb er of pro cessors makes it increasingly imp ortant to knowthe research done on parallel algorithms within theoretical computer science It outlines how the work describ ed in the thesis was motivated byalargegap between theory and practice within parallel computing and it presents some asp ects of this gap The chapter also summarises the main contributions of mywork Chapter describ es various concepts whicharecentral when studying teaching or evaluating parallel algorithms It providesanintro duction to parallel complexity theoryincluding themes such as a class of problems that may b e solved very fast in parallel and another class consisting of inherently serial problems The chapter ends with a discussion of Amdahls law and how the size of a problem is essential for the sp eedup that maybe achieved by using parallel pro cessing Chapter is devoted to the socalled CREW PRAM Concurrent Read ExclusiveWrite Parallel Random Access Machine mo del This is a mo del for parallel computations whichiswell known within theoretical computer science It is the main underlying concept for this work After a general description of the mo del it is explained how the mo del may b e programmed i in a high level notation The programming style which has b een used called synchronous MIMD programming contains some prop erties which make programming a surprisingly easy task These nice features are out lined At the end of the chapter it is given a more technical description of how algorithms may b e implemented on the CREW PRAM simulator prototyp ewhich has b een develop ed as part of the work Chapter describ es the technically most dicult part of the worka detailed investigation of a wellknown sorting algorithm from theoretical computer science Coles parallel merge sort A topdown detailed under standing of the algorithm is presented together with a description of howit has b een implemented The implementation is probably the rst complete implementation of this algorithm A comparison of the implementation with several simpler sorting algorithms based on measurements and analytical mo dels b eing presented Chapter summarises what has b een learned from doing this work and gives a brief sketchofinteresting directions for future research Appendix A gives a brief description of the CREW PRAM simulator prototyp e whichmakes it p ossible to implement test and measure CREW PRAM algorithms The main part of the chapter is a users guide for howto develop and test parallel programs on the simulator system as it is currently used at the Norwegian Institute of Technology Appendix B gives a complete listing of the implementation of Coles par allel merge sort algorithm In addition to b eing the fundamental do cumen tation of the implementationitisprovided to give the reader the p ossibility of studying the details of mediumscale synchronous MIMD programming on the simulator system Issues given less attention The thesis covers complexity theory detailed studies of a complex algorithm and implementation of a simulator system This variety has made it imp or tant to restrict the work I have tried to separate the goals and the to ols used to achieve these goals It has b een crucial to restrict the eorts put into the design of the CREW PRAM simulator system Prototyping and simple ad ho c solutions have b een used when p ossible without reducing the quality of the simulation results The pseudo co de notation used to express parallel algorithms is not meant to b e a new and b etter notation for this purp ose It is intended only to b e a common sense mo dernisation of the notation used in Wyl ii for high level programming of the original CREW PRAM mo del Similarly the language used to implement the algorithms as running programs on the simulator is not a prop osal of a new complete and b etter language for parallel programming just a minor extension of the language used to implement the simulator Simplicity has b een given higher priority than ecient execution of par allel programs in the development of the CREW PRAM simulator Even though nearly all the algorithms discussed are sorting algorithms the work is not primarily ab out parallel sorting It is not an attempt to nd new and b etter parallel sorting algorithms nor is it an attempt to survey the large numb er of parallel sorting algorithms Sorting has b een selected b ecause it is an imp ortant problem with manywell known parallel algorithms Acknowledgements I wish to thank my sup ervisor Professor Arne Halaas for intro ducing me to the eld of complexity theory for constructive criticisms for continuing encouragement and for letting me work on own ideas The Division of Computer Systems and Telematics consists of a lot of nice and helpful p eople Havard Eidnes Anund Lie Jan Grnsb erg and Tore E Aarseth have answered technical question provided excellent computer facilities and allowed me to consume an awful lot of CPU seconds Gunnar Borthne Marianne Hagaseth Torstein Heggeb Roger Midtstraum Pet ter Mo e Eirik Knutsen ystein Nytr Tore Ster ystein Torb jrnsen and Havard Tveite have all read various drafts of the thesis and provided valuable comments Thanks The work has b een nancially supp orted bya scholarship from The Royal Norwegian Council for Scientic and Industrial Research NTNF Last but not least I am grateful to my family My parents for teaching me to b elieveinmyown work my wife Marit for doing most of the research in a much more imp ortant pro ject taking care of our children Ola and Simen and to Ola and Simen whichhavebeenacontinuing inspiration for nishing this thesis Trondheim December iii Lasse Natvig iv Contents Intro duction Massively Parallel Computation Summary of Contributions Motivation Background Parallel Complexity TheoryA Rich Source of Paral lel Algorithms The Gap Between Theory and Practice Traditional Use of the CREW PRAM Mo del Implementing On an Abstract Mo del Maybe Worthwhile Parallel Algorithms Central Concepts Parallel Complexity Theory Intro duction Basic Concepts Denitions and Terminology Mo dels of Parallel Computation Imp ortant ComplexityClasses Evaluating Parallel Algorithms Basic Perfomance Metrics Analysis of Sp eedup Amdahls Law and Problem Scaling The CREW PRAM Mo del The Computational Mo del The Main Prop erties of The Original PRAM From Mo del to MachineAdditional Prop erties CREW PRAM Programming v Common Misconceptions ab out PRAM Programming Notation Parallel Pseudo PascalPPP The Imp ortance of Synchronous Op eration Compile Time Padding General Explicit Resynchronisation Parallelism Should Make Programming Easier Intro duction Synchronous MIMD Programming Is Easy CREW PRAM Programming on the Simulator Step Sketching the Algorithm in PseudoCo de Step Pro cessor Allo cation and Main Structure Step Complete Implementation With Simplied Time Mo delling Step Complete Implementation

Load more