A Manual for the CHAOS Runtime Library

A Manual for the CHAOS Runtime Library Jo el Saltz Ravi Ponnusamy Shamik D Sharma Bongki Mo on YuanShin Hwang Mustafa Uysal Ra ja Das UMIACS and Dept of Computer Science University of Maryland College Park MD dybbukcsumdedu Abstract Pro cedures are presented that are designed to help users eciently program irregu lar problems eg unstructured mesh sweeps sparse matrix co des adaptive mesh partial dierential equations solvers on distributed memory machines These pro cedures are also designed for use in compilers for distributed memory multipro cessors The p ortable CHAOS pro cedures are designed to supp ort dynamic data distributions and to automatically gener ate send and receive messsage by capturing communications patterns at runtime The CHAOS pro ject was sp onsored in part by ARPA NAG NSF ASC and ONR SC i Fact Sheet Principal Investigator Jo el Saltz The CHAOS Team Ra ja Das YuanShin Hwang Bongki Mo on Ravi Ponnusamy Shamik Sharma Mustafa Uysal Computer Science Department University of Maryland College Park MD fsaltzrajashinbkmoonpravishamikuysalgcsumdedu Research Collab orators Rice University Reinard von Hanxeleden Seema Hiranandani Charles Ko elb el Ken Kennedy Syracuse University Alok Choudhary Georey Fox Sanjay Ranka To Obtain CHAOS Software Anonymous ftp hyenacsumdedu email fbkmoonshamikshinuysalgcsumdedu Bug Rep orts email fbkmoonshamikshinuysalgcsumdedu ii Contents Intro duction Getting the CHAOS Library Sneak Preview Problems Data Structures and Pro cedures Translation Table Dereference and Schedule Computing Schedules Overview of CHAOS Insp ector Communication Schedules Singlephase Insp ector subroutine lo calize subroutine reglo calize Twophase Insp ector subroutine PARTI schedule subroutine PARTI hash create hash table function PARTI subroutine PARTI free hash table make mask function PARTI Insp ectors for Partially Mo died Data Access Patterns subroutine PARTI clear mask clear stamp subroutine PARTI Incremental schedules subroutine PARTI incremental schedule Data Exchangers subroutine PREFIXgather subroutine PREFIXscatter subroutine PREFIXscatter FUNC subroutine PREFIXmultigather subroutine PREFIXmultiscatter subroutine PREFIXmultiscatter FUNC subroutine PREFIXscatternc subroutine PREFIXmultiscatternc FUNCnc subroutine PREFIXscatter subroutine PREFIXmultiscatter FUNCnc Data Exchangers For General Data Structures gather subroutine PARTI subroutine PARTI scatter mulgather subroutine PARTI mulscatter subroutine PARTI iii CHAOS Pro cedures for Adaptive Problems Intro duction function schedule pro c subsection PREFIXscatter app end back subsection PREFIXscatter app end subsection PREFIXmultiscatter subroutine PREFIXmultiscatter back scatter app end subroutine PREFIXmultiarr subroutine PREFIXmultiarr scatter back Example Translation Table function build translation table subroutine dereference function build reg translation table dst translation table function build ttable with pro c function init subroutine free table subroutine derefpro c subroutine derefoset subroutine remap table function tableGetReplication function tableGetPageSize subroutine tableGetIndices Miscellaneous Pro cedures subroutine PARTI setup subroutine renumb er Runtime Data Distribution Runtime Data Graph Data Mapping Pro cedures subroutine eliminate dup edges rdg subroutine generate function init rdg hash table Lo op Iteration Partitioning Pro cedures subroutine dref rig partitioner subroutine iteration Data Remapping subroutine remap Parallel Partitioners Geometry Based Partitioners Recursive Bisection Coloring Weighted Recursive Bisection Coloring Recursive Bisection with Remapping iv Weighted Recursive Bisection with Remapping Recursive Sp ectral Partitioner A Example Program NEIGHBORCOUNT A Sequential Version of Program A Parallel Version of Program Regular blo ck Distribution A Parallel Version of Program Irregular Distribution B Example Program Simplied ISING B Sequential Version of Program B Parallel Version of Program v Intro duction A suite of pro cedures has b een designed and develop ed to eciently solve a class of problems commonly known as irregular problems Irregular concurrent problems are a class of irregular problems which consist of a sequence of concurrent computational phases Patterns of data ac cess and computational cost of each phase of these typ es of problems cannot b e predicted until runtime This prevents compile time optimization In this class of problems once runtime in formation is available data access patterns are known in advance making it p ossible to utilize a variety of protable prepro cessing strategies Runtime supp ort compilation metho ds are b eing develop ed that are applicable to a variety of unstructured problems including explicit multi grid unstructured computational uid dynamic solvers molecular dynamics co des CHARMM AMBER GROMOS etc diagonal or p olynomial preconditioned iterative linear solvers di rect simulation Monte Carlo DSMC co des and particleincell PIC co des These problems share the characteristics of arrays accessed through one or more levels of indirection and formulation of the problem as a sequence of lo op nests which prove to b e parallelizable The CHAOS pro cedures describ ed in this manual can b e viewed as forming a p ortion of a p ortable compiler indep endent runtime supp ort library The CHAOS libray has b een written in C however interfaces have b een provided to call CHAOS library routines from Fortran application programs as well This manual presents CHAOS procedures for Fortran application codes Getting the CHAOS Library The CHAOS pro cedures presented in this manual and related technical pap ers can b e obtained from the anonymous ftp site hyenacsumdedu Example codes shown in this manual are distributed along with CHAOS software Sneak Preview Problems Data Structures and Pro cedures An example of an irregular problem is presented in this section CHAOS data structures and pro cedures to parallelize this example are intro duced Figure illustrates a simple sequential Fortran irregular lo op lo op L which is similar in form to lo ops found in unstructured com putational uid dynamics CFD co des and molecular dynamics co des In Figure arrays x and y are accessed by indirection arrays edge and edge Note that the data access pattern asso ciated with the inner lo op lo op L is determined by integer arrays edge and edge Because arrays edge and edge are not mo died within lo op L Ls data access pattern can b e anticipated prior to executing it Consequently edge and edge are used to carry out prepro cessing needed to minimize communication volume and startup costs Since large data arrays are asso ciated with a typical uid dynamics problem the rst step in parallelizing involves partitioning data arrays x and y The next step involves assigning equal amounts of work to pro cessors to maintain load balance Translation Table Dereference and Schedule On distributed memory machines large data arrays may not t in a singlepro cessors memory hence they are divided among pro cessors Also computational work is divided among indi vidual pro cessors to achieve parallelism Once distributed arrays have b een partitioned each C Outer lo op L L do i n step C Sweep Over Edges Inner Lo op L L do i nedge n edgei n edgei yn yn fxn xn yn yn gxn xn end do end do Figure An Example co de with

Load more