A High-Performance, Portable Implementation of the MPI Message
Total Page:16
File Type:pdf, Size:1020Kb
A HighPerformance Portable Implementation of the MPI Message Passing Interface Standard William Gropp Ewing Lusk Mathematics and Computer Science Division Argonne National Lab oratory Nathan Doss Anthony Skjellum Department of Computer Science NSF Engineering Research Center for CFS y Mississippi State University Abstract MPI Message Passing Interface is a sp ecication for a standard library for message passing that was dened by the MPI Forum a broadly based group of parallel com puter vendors library writers and applications sp ecialists Multiple implementations of MPI havebeendevelop ed In this pap er we describ e MPICH unique among existing implementations in its design goal of combining p ortability with high p erformance We do cument its p ortability and p erformance and describ e the architecture bywhichthese features are simultaneously achieved We also discuss the set of to ols that accompany the free distribution of MPICH which constitute the b eginnings of a p ortable parallel programming environment A pro ject of this scop e inevitably imparts lessons ab out parallel computing the sp ecication b eing followed the currenthardware and software environment for parallel computing and pro ject management we describ e those we have learned Finallywe discuss future developments for MPICH including those nec essary to accommo date extensions to the MPI Standard now b eing contemplated by the MPI Forum Intro duction The messagepassing mo del of parallel computation has emerged as an expressive e cient and wellundersto o d paradigm for parallel programming Until recen tly the syntax and precise semantics of each messagepassing library implementation were dierentfrom This work was supp orted by the Mathematical Information and Computational Sciences Division subprogram of the Oce of Computational and Technology Research US Department of Energyunder Contract WEng y This work was supp orted in part by the NSF Engineering ResearchCenter for Computational Field Simulation Mississipp i State University Additional supp ort came from the NSF Career Program under grant ASC and from ARPA under Order D the others although many of the general semantics were similar The proliferation of messagepassing library designs from b oth vendors and users was appropriate for a while but eventually it was seen that enough consensus on requirements and general semantics for messagepassing had b een reached that an attempt at standardization might usefully b e undertaken The pro cess of creating a standard to enable p ortability of messagepassing applica tions co des b egan at a workshop on Message Passing Standardization in April and the Message Passing Interface MPI Forum organized itself at the Sup ercomputing Conference During the next eighteen months the MPI Forum met regularly and Version of the MPI Standard was completed in May Some clarications and re nements were made in the spring of and Version of the MPI Standard is now available For a detailed presentation of the Standard itself see for a tutorial approach to MPI see In this pap er we assume that the reader is relatively familiar with the MPI sp ecication but weprovide a brief overview in Section The pro ject to provide a p ortable implementation of MPI b egan at the same time as the MPI denition pro cess itself The idea was to provide early feedback on decisions b eing made by the MPI Forum and provide an early implementation to allow users to exp eriment with the denitions even as they were b eing develop ed Targets for the implementation were to include all systems capable of supp orting the messagepassing mo del MPICH is a freely available complete implementation of the MPI sp ecication designed to b e b oth p ortable and ecient The CH in MPICH stands for Chameleon symb ol of adaptabilityto ones environment and thus of p ortability Chameleons are fast and from the b eginning a y secondary goal was to give up as little eciency as p ossible for the p ortabilit MPICH is thus b oth a research pro ject and a software development pro ject As a research project its goal is to explore metho ds for narrowing the gap b etween the program mer of a parallel computer and the p erformance deliverable by its hardware In MPICH we adopt the constraint that the programming interface will b e MPI reject constraints on the architecture of the target machine and retain high p erformance measured in terms of bandwidth and latency for messagepassing op erations as a goal As a softwareproject MPICHs goal is to promote the adoption of the MPI Standard by providing users with a free highp erformance implementation on a diversity of platforms while aiding vendors in providing their own customized implementations The extenttowhich these goals have b een achieved is the main thrust of this pap er The rest of this pap er is organized as follows Section gives a short overview of MPI and briey describ es the precursor systems that inuenced MPICH and enabled it to come into existence so quickly In Section we do cument the extentofMPICHsportability and present results of a numb er of p erformance measurements In Section we describ e in some detail the software architecture of MPICH which comprises the results of our researchin combining p ortability and p erformance In Section we presentseveral sp ecic asp ects of the implementation that merit more detailed analysis Section describ es a family of supp orting programs that surround the core MPI implementation and turn MPICH into a p ortable environment for developing parallel applications In Section we describ e howw e as a small distributed group havecombined a numberoffreelyavailable to ols in the Unix environment to enable us to develop distribute and maintain MPICH with a minimum of resources In the course of developing MPICH wehave learned a numb er of lessons from the challenges p osed b oth accidentally and delib erately for MPI implementors by the MPI sp ecication these lessons are discussed in Section Finally Section describ es the current status of MPICH Version as of February and outlines our plans for future development Background In this section we giveanoverview of MPI itself describ e briey the systems on which the rst versions of MPICH were built and review the history of the development of the pro ject Precursor Systems MPICH came into b eing quickly b ecause it could build on stable co de from existing systems These systems pregured in various ways the p ortability p erformance and some of the other features of MPICH Although most of that original co de has b een extensively reworked MPICH still owes some of its design to those earlier systems whichwe briey describ e here P is a thirdgeneration parallel programming library including b oth messagepassing and sharedmemory comp onents p ortable to a great many parallel computing environments including heterogeneous networks Although p contributed much of the co de for TCPIP networks and sharedmemory multipro cessors for the early versions of MPICH most of that has b een rewritten P remains one of the devices on which MPICH can b e built see Section but in most cases more customized alternatives are available Chameleon is a highp erformance p ortabilitypackage for message passing on par allel sup ercomputers It is implemented as a thin layer mostly C macros over vendor messagepassing systems Intels NX TMCs CMMD IBMs MPL for p erformance and over publicly available systems p and PVM for p ortability A substantial amountof to MPICHas detailed in Section Chameleon technology is incorp orated in Zip co de is a p ortable system for writing scalable libraries It contributed several concepts to the design of the MPI Standardin particular contexts groups and mailers the equivalent of MPI communicators Zip co de also contains extensive collective op erations with group scop e as well as virtual top ologies and this co de was heavily b orrowed from in the rst version of MPICH Brief Overview of MPI MPI is a messagepassing application programmer interface together with proto col and semantic sp ecications for how its features must b ehaveinany implementation suchas a message buering and message delivery progress requirement MPI includes p ointto p oint message passing and collective global op erations all scop ed to a usersp ecied group of pro cesses Furthermore MPI provides abstractions for pro cesses at twolevels First pro cesses are named according to the rank of the group in which the communication is b eing p erformed Second virtual top ologies allow for graph or Cartesian naming of pro cesses that help relate the application semantics to the message passing semantics in a convenient ecientway Communicators which house groups and communication context scoping information provide an imp ortant measure of safety that is necessary and useful for building up libraryoriented parallel co de MPI also provides three additional classes of services environmental inquiry basic timing information for application p erformance measurement and a proling interface for external p erformance monitoring MPI makes heterogeneous data conversion a transparent part of its services by requiring datatyp e sp ecication for all communication op erations Both builtin and userdened datatyp es are provided MPI accomplishes its functionality with opaque ob jects with welldened constructors and destructors giving MPI an ob jectbased lo ok and feel Opaque ob jects include groups the fundamental container for pro cesses communicators whichcontain groups and are used as arguments to communication calls