Language Constructs for Modular Parallel Programs 1

Preprint MCSP Mathematics and Computer Science Division Argonne National Lab oratory Argonne Ill January Language Constructs for Mo dular Parallel Programs Ian Foster Mathematics and Computer Science Division Argonne National Lab oratory Argonne IL USA Abstract We describ e programming language constructs that facilitate the application of mo dular design techniques in parallel programming These constructs allow us to iso late resource management and pro cessor scheduling decisions from the sp ecication of individual mo dules which can themselves encapsulate design decisions concerned with concurrency communication pro cess mapping and data distribution This ap proach p ermits development of libraries of reusable parallel program comp onents and the reuse of these comp onents in dierent contexts In particular alternative map ping strategies can b e explored without mo difying other asp ects of program logic We describ e how these constructs are incorp orated in two practical parallel programming languages PCN and Fortran M Compilers have b een developed for b oth languages allowing exp erimentation in substantial applications Keywords mo dularity parallel programming programming languages program com p osition co de reuse virtual computer Introduction In sequential programming mo dular and ob jectoriented design and programming tech niques are well understo o d and widely used to reduce complexity p ermit separate devel opment of comp onents and encourage reuse In parallel programming the situation is less advanced Parallel programs are for the most part developed in an adho c fashion as monolithic entities that cannot easily b e adapted to changing circum stances Co de reuse is rare outside the sp ecialized context of singleprogram multipledata SPMD programming where the same program is run on every pro cessor and libraries can b e called to p erform common global op erations This pap er presents programming language constructs that p ermit the b enets of mo d ularity to b e realized in parallel programs The central ideas are as follows First pro cess and data placement decisions within an individual mo dule are sp ecied with resp ect to a virtual computer the embedding of this virtual computer within a physical or another virtual computer is sp ecied when the mo dule is invoked This approach p ermits resource management and lo cality decisions to b e separated from the sp ecications of individual mo dules and developed in a hierarchical fashion by using stepwise renement techniques Second virtual computer constructs are incorp orated into compositional programming languages in which concurrency is sp ecied with explicit parallel constructs and inter actions b etween concurrent pro cesses are restricted so that neither physical lo cation nor execution schedule aects the result of a computation Languages with this prop erty include Strand and PCN in which pro cesses interact by reading and writing shared singleassignment variables and Fortran M in which interactions o ccur via singlereader singlewriter virtual channels Comp ositional languages p ermit design decisions concerned with mapping data distribution communication and scheduling to b e made separately and to b e mo died without changing other asp ects of a design In this pap er we show how virtual computer constructs are integrated into two com p ositional parallel programming languages PCN and Fortran M PCN is a Clike language that integrates ideas from concurrent logic programming and imp erative programming Fortran M is a small set of extensions to Fortran In each case virtual com puters and related constructs have b een introduced in a manner that is consistent with the base language hence simplifying b oth comprehension and compilation Both PCN and Fortran M have b een used to develop a range of substantial parallel applications these provide an empirical basis for evaluation of the concepts In summary the contributions of this pap er are as follows The denition of programming language constructs that facilitate the mo dular con struction of parallel programs The instantiation of the constructs in practical parallel programming languages Empirical evaluation in a range of applications The next three sections of this pap er address each of these issues in turn after which we review related work and present our conclusions Mo dularity and Parallel Programs We use the term mo dularity to refer to two related programstructuring techniques Stepwise renement the decomp osition of a complex problem into simpler subproblems each solved by a separate mo dule with a welldened interface Mo dular decomp osition the isolation in separate mo dules of design decisions that are dicult likely to change or common to several program comp onents The rst of these techniques is more naturally applied top down in the design pro cess and if care is taken to ensure generality pro duces mo dules that can b e reused in dierent contexts The second is more naturally applied bottom up and can reduce the cost of program mo dications Both techniques are central to ob jectoriented design It is instructive to examine how these techniques can b e applied in sequential and paral lel programming For illustrative purp oses we consider a simple example the convolution op eration used for example in motion estimation algorithms in image pro cessing Each pair of images is rst pro cessed by using a twodimensional fast Fourier transform FFT The resulting matrices are then multiplied and an inverse FFT is applied to the result The various op erations and the dep endencies b etween them are illustrated in Figure FFT MUL FFT -1 FFT Figure Convolution Algorithm Structure In a sequential implementation stepwise renement may identify FFT matrix multi plication input and output mo dules The data structures to b e input to and output from each mo dule are sp ecied in its interface internal data structures and algorithmic details are encapsulated and not visible to other comp onents The interface sp ecies the size and p erhaps the type of the data structures to b e op erated on hence allowing the mo dule to b e used in dierent contexts For example the same FFT mo dule can b e called three times in the convolution example The interface design may also address eciency issues for example by sp ecifying that data is to b e passed by reference rather than by value The same decomp osition can b e used in a parallel implementation Now however we need to encapsulate not only data and computation but also concurrency communication pro cess mapping and data distribution If we do not encapsulate this information two mo dules mapp ed to the same pro cessor might interfere for example if one received mes sages intended for the other When comp osing two mo dules we prefer not to b e aware of how each maps computation and data The mappings used in dierent mo dules can b e made conformant to improve p erformance but should not aect correctness Again eciency issues need to b e addressed in the interface design for example by allowing for the exchange of distributed data structures b etween parallel mo dules In a sequential implementation mo dular decomp osition may identify the representa tion of matrices as common to several program comp onents Hence we dene a mo dule that provides the required matrix op erations read an element write an element etc The rest of the program uses these functions to access matrices alternative matrix rep resentations eg linear storage or a linkedlist representation of sparse matrices can b e substituted without changes to other comp onents A design decision of equal signicance in a parallel implementation is how computation and data are mapp ed to pro cessors For example the various phases of the convolution problem illustrated in Figure can b e organized so as to execute a on disjoint sets of pro cessors as a pip eline b in sequence for each input data set or c as some hybrid of these two approaches Figure We wish to lo calize the changes required to explore these alternatives which do not aect the basic structure of the algorithm but may have dierent p erformance characteristics These changes concern the allo cation of computa tional resources pro cessors to mo dules and the scheduling of comp onents mapp ed to the same pro cessor (a) (b) (c) Figure Alternative Mappings for the Parallel Convolution Algorithm We have found that the design issues encountered in the convolution example are common to many parallel programming problems In summary the most imp ortant issues are as follows Parallel mo dules need to encapsulate not only computation and data as in sequential programming but also communication concurrency pro cess mapping and data distribution Resource allo cation and scheduling decisions need to b e isolated from mo dule sp ec ications Interfaces b etween parallel mo dules need to allow for the sharing of distributed data structures We now present the programming language concepts that we use to address these issues Virtual Computers We remove resource allo cation decisions from individual mo dules by sp ecifying pro cess and data mapping within a mo dule relative to a virtual computer The shap e size and mapping of this virtual computer to physical pro cessors are sp ecied externally to the mo dule This mapping need not b e one to one several virtual computer no des can b e lo cated on a single physical pro cessor or a virtual computer can corresp ond to a subset of a physical computer For example a parallel FFT mo dule might b e sp ecied with

Load more