On the Coexistence of Shared-Memory and Message-Passing in The
Total Page:16
File Type:pdf, Size:1020Kb
On the Co existence of SharedMemory and MessagePassing in the Programming of Parallel Applications 1 2 J Cordsen and W SchroderPreikschat GMD FIRST Rudower Chaussee D Berlin Germany University of Potsdam Am Neuen Palais D Potsdam Germany Abstract Interop erability in nonsequential applications requires com munication to exchange information using either the sharedmemory or messagepassing paradigm In the past the communication paradigm in use was determined through the architecture of the underlying comput ing platform Sharedmemory computing systems were programmed to use sharedmemory communication whereas distributedmemory archi tectures were running applications communicating via messagepassing Current trends in the architecture of parallel machines are based on sharedmemory and distributedmemory For scalable parallel applica tions in order to maintain transparency and eciency b oth communi cation paradigms have to co exist Users should not b e obliged to know when to use which of the two paradigms On the other hand the user should b e able to exploit either of the paradigms directly in order to achieve the b est p ossible solution The pap er presents the VOTE communication supp ort system VOTE provides co existent implementations of sharedmemory and message passing communication Applications can change the communication paradigm dynamically at runtime thus are able to employ the under lying computing system in the most convenient and applicationoriented way The presented case study and detailed p erformance analysis un derpins the applicability of the VOTE concepts for highp erformance parallel computing Intro duction Future parallel machines will exhibit a hybrid architecture based on shared memory and distributedmemory This line is already followed with all stateof theart sp ecial purp ose parallel computers Similar holds for parallel machines based on workstation clusters actual highp erformance workstations are shared memory multipro cessor systems Beside the diculty developing scalable and ecient parallel programs at all the problem of an application programmer is exploiting the various com puter architectures in the most ecient way Due to false sharing relying on the sharedmemory paradigm as intro duced by a virtual sharedmemory VSM system results in a decrease of end user p erformance while running in a dis tributed environment Vice versa relying on a distributedmemory paradigm ie messagepassing results in a decrease of end user p erformance while running in a sharedmemory environment Thus in the attempt of developing scalable and p ortable parallel application software programmers are concerned with a tradeo The purp ose of the VOTE system is to supp ort application programmers confronted with the tradeo b etween sharedmemory and messagepassing com munication Sharedmemory communication is physically limited to op erate lo cally only Therefore to implement a global sharedmemory communication fa cility VOTE supp orts a highly ecient core system The core system implements a global address space by means of VSM At default a multiple readersingle writer invalidationbased sequential consistency maintenance is provided A ex ible set of functions is available allowing many p owerful optimizations to pro grams running under control of sequential consistency As a step b eyond VOTE supp orts messagepassing communication MPI standard through a functional enrichment of its VSM system Thus VOTE provides to the applica tion programmer architectural transparency by means of sequential consistency and a messagepassing communication system The rest of this pap er is organized as follows Section deals with the prin ciple problem of supp orting b oth communication paradigms Afterwards Sec tion covers an application case study by demonstrating the implementation of successive overrelaxation dynamically changing b etween the two communication paradigms Section discusses the achieved runtime results Related works are presented in Section and Section concludes the pap er Communication Paradigms To days trend in the design of scalable highp erformance computing systems is based on a co op eration b etween software and hardware to supp ort b oth an ecient sharedmemory and messagepassing abstraction The main problem of this co op eration is due to the incompatibility of the two views of consistency semantics as given by the two communication paradigms Messagepassing is an explicit communication style lacking any measures of implicit consistency maintenance Global eects in the sense of making changes to some logically shared but physically distributed state are achieved only when the resp ective parallel application program explicitly calls commu nication functions In contrast changing p ortions of the logically shared but physically lo cal state works implicit by simply readingwriting the memory cells ie program variables The sharedmemory communication paradigm is a single readersingle writer SRSW scheme o ccasionally extended by cachebased sharedmemory machines to a multiple readersingle writer scheme Every mo difying memory access works implicit However since for p erformance reasons data replication usually takes place this paradigm requires either invalidating or up dating the data copies held in the caches or the memories These measures of consistency maintenance im plicity inuence the op eration of all pro cesses and generally impact the runtime b ehavior of a sharedmemory parallel program A paradigm shift from sharedmemory to messagepassing communication is almost straightforward It makes public ie global shared data private ie lo cal and thus nonshared The pro cess shifting to the messagepassing paradigm will b e deleted from the directory information maintained by the VSM sys tem to keep track of p otential data copy holders Due to the paradigm shift programming complexity increases signicantly Running under control of the sharedmemory paradigm consistency was ensured by the VSM system In the messagepassing case the programmer b ecomes resp onsible maintaining a glob ally consistent view of the common data sets A paradigm shift from messagepassing to sharedmemory is not that straightforward The main problem is the unication of the lo cal data copies into a single data image In other words private data copies are made public or invalidated and the directory information is up dated In most cases this task cannot b e p erformed automatically b ecause of the lack of information ab out the order of mo dications done by the individual pro cesses In the current imple mentation of VOTE it is therefore up to the application program to announce the lo cality of consistent data Case Study Successive Overrelaxation This section presents two implementations of a successive overrelaxation algo rithm based on the red black technique This kind of application is a typical representative for parallel numerical algorithms op erating on dense data struc tures The amount of data b eing shared is little and increases only with the num b er of computing no des Data sharing is p erformed only by a pair of pro cesses that is each pro cess communicates with two other pro cesses the predecessor and successor Merely the rst and last pro cess communicate only with a single pro cess The following subsections present two implementations of a SPMDbased single programmultiple data successive overrelaxation program The rst co de communicates via sharedmemory and uses a global barrier synchronization The second co de takes full advantage of the communication supp ort system VOTE that is it changes the communication paradigm from sharedmemory to message passing and vice versa SharedMemory Communication The co de example in Fig uses sharedmemory communication and is a very compact implementation of a successive overrelaxation The red black tech nique is used to avoid overwriting the old value of a matrix element b efore it is used for the computation of the new values of the resp ective neighb or Therefore computation is done only on every other row p er iteration line This approach requires two instead of one global barrier synchronization lines void sor float rMN float bMN int f int t int i j k while converged rbft for j f j t j for k k N k bjk rjkrjkrjkrjk if j t break for k k N k bjk rjkrjkrjkrjk votebarrier for j f j t j for k k N k rjk bjkbjkbjkbjk if j t break for k k N k rjk bjkbjkbjkbjk votebarrier Fig Sharedmemory version and The algorithm continues executing the outer lo op line until the check of convergence evaluates to true line Variables f and t line implement a blo ck distribution of matrix r and b These variables are calculated on a p erpro cess basis and are a function of the size of a matrix the total numb er of pro cesses and the logical identication of a pro cess The computation steps in lines and lines pro duce data used as input for the pro cesses with adjacent values of the variables f and t When the barrier synchronization falls that data is consumed by the communication partner MessagePassing Communication The second implementation in Fig comprises three parts The rst part lines implements a switch from sharedmemory to messagepassing com munication This requires a barrier synchronization making sure that all pro cesses are ready for leaving consistency maintenance of VOTE Line compute Comm rank the working set of a pro cess using the logical identication MPI and the total numb er of computing pro cesses MPI Comm size Calling access is to ensure