Parallel Program Graphs and Their Classi Cation
Total Page:16
File Type:pdf, Size:1020Kb
Parallel Program Graphs and their Classication Vivek Sarkar Barbara Simons IBM Santa Teresa Lab oratory Bailey Avenue San Jose CA fvivek sarkarsimonsgvnetib mcom Abstract We categorize and compare dierent representations of pro gram dep endence graphs including the Control Flow Graph CFG which is a sequential representation lacking data dep endences the Program Dep endence Graph PDG which is a parallel representation of a se quential program and is comprised of control and data dep endences and more generally the Parallel Program Graph PPG which is a parallel representation of sequential and inherently parallel programs and is comprised of parallel control ow edges and synchronization edges PPGs are classied according to their graphical structure and prop erties related to deadlo ck detection and serializabil i ty Intro duction The imp ortance of optimizing compilers has increased dramatically with the development of multipro cessor computers Writing correct and ecient parallel co de is quite dicult at b est writing such software and making it p ortable cur rently is almost imp ossible Intermediate forms are required b oth for optimiza tion and p ortability A natural intermediate form is the graph that represents the instructions of the program and the dep endences that exist b etween them This is the role played by the Parallel Program Graph PPG The Control Flow Graph CFG in turn is required once p ortions of the PPG have b een mapp ed onto individual pro cessors PPGs are a generalization of Program Dep endence Graphs PDGs which have b een shown to b e useful for solving a variety of problems including optimization vectorization co de generation for VLIW machines merging versions of programs and automatic detection and management of parallelism PPGs are useful for determining the semantic equivalence of parallel programs and also have applications to automatic co de generation In section we dene PPGs as a more general intermediate representation of parallel programs than PDGs PPGs contain control edges that represent parallel ow of control and synchronization edges that imp ose ordering constraints on execution instances of PPG no des MGOTO no des are used to create new threads of parallel control Section contains a few examples of PPGs to provide some intuition on how PPGs can b e built in a compiler and how they execute at run time Section characterizes PPGs based on the nature of their control edges and synchronization edges and summarizes the classications of PPGs according to the results rep orted in sections and Unlike PDGs which can only represent the parallelism in a sequential pro gram PPGs can represent the parallelism in inherently parallel programs Two imp ortant issues that are sp ecic to PPGs are dead lock and serializability A PPGs execution may deadlo ck if its synchronization edges are incorrectly sp ec ied ie if there is an execution sequence of PPG no des containing a cycle This is also one reason why PPGs are more general than PDGs In section we characterize the complexity of deadlo ck detection for dierent classes of PPGs Serialization is the pro cess of transforming a PPG into a semantically equivalent sequential CFG with neither no de duplication nor creation of Bo olean guards It has already b een shown that there exist PPGs that are not serializable which is another reason why PPGs are more general than PDGs In section we further study the serialization problem by characterizing the serializability of dierent classes of PPGs Finally section discusses related work and section contains the conclu sions of this pap er and an outline of p ossible directions for future work Denition of PPGs Control Flow Graphs CFGs We b egin with the denition of a control ow graph as presented in It diers from the usual denition by the inclusion of a set of lab els for edges and a mapping of no de typ es The lab el set fT F U g is used to distinguish b etween conditional execution lab els T and F and unconditional execution lab el U Denition A control ow graph CFG N E TYPE is a ro oted directed cf multigraph in which every no de is reachable from the ro ot It consists of N a set of no des A CFG no de represents an arbitrary sequential computa tion eg a basic blo ck a statement or an op eration E N N fT F U g a set of label led control ow edges cf TYPE a no de typ e mapping TYPEn identies the typ e of no de n as one of the following values START STOP PREDICATE COMPUTE We assume that CFG contains two distinguished no des of typ e START and STOP resp ectively and that for any no de N in CFG there exist directed paths from START to N and from N to STOP Each no de in a CFG has at most two outgoing edges A no de with exactly two outgoing edges represents a conditional branch predicate in that case the no de has TYPE PREDICATE and we assume that the outgoing edges are distin 1 guished by T true and F false lab els If a no de has TYPE COMPUTE then it has exactly one outgoing edge with lab el U unconditional 1 The restriction to at most two outgoing edges is made to simplify the discussion All the results presented in this pap er are applicabl e to control ow graphs with arbitrary outdegree eg when representing a computed GOTO statement in Fortran or a switch statement in C Note that CFG imp oses no restriction on the indegree of a no de 2 A CFG is a standard sequential representation of a program with no par allel constructs A CFG is also the target for co de generation from parallel to sequential co de Program Dep endence Graphs PDGs The Program Dependence Graph PDG is a standard representation of control and data dep endences derived from a sequential program and its CFG Like CFG no des a PDG no de represents an arbitrary sequential computation eg a basic blo ck a statement or an op eration An edge in a PDG represents a control dependence or a data dependence PDGs reveal the inherent parallelism in a program written in a sequential programming language by removing articial sequencing constraints from the program text Denition A Program Dependence Graph PDG N E E TYPE is a cd dd ro oted directed multigraph in which every no de is reachable from the ro ot It consists of N a set of no des E N N fT F U g a set of lab elled control dependence edges cd Edge a b L E identies a control dep endence from no de a to no de cd b with lab el L This means that a must have TYPE PREDICATE or TYPE REGION If TYPE PREDICATE and if as predicate value is evaluated to b e L then it causes an execution instance of b to b e created If TYPE REGION then an execution instance of b is always created E N N Contexts a set of data dependence edges dd Edge a b C E identies a data dep endence from no de a to no de b dd with context C The context C identies the pairs of execution instances of no des a and b that must b e synchronized Direction vectors and distance vectors are common approaches to representing contexts of data dep endences the gist and the lastwrite tree are newer representa tions for data dep endence contexts that provide more information Context information can identify a data dep endence as b eing loopindependent or loopcarried In the absence of any other information the context of a data dep endence edge in a PDG must at least b e plausible ie the execution instances participating in the data dep endences must b e consistent with the sequential ordering represented by the CFG from which the PDG is derived TYPE a no de typ e mapping TYPEn identies the typ e of no de n as one of the following values START PREDICATE COMPUTE REGION The START no de and PREDICATE no de typ es in a PDG are like the corresp onding no de typ es in a CFG The COMPUTE no de typ e is slightly dierent b ecause it has no outgoing control edges in a PDG By contrast every no de typ e except STOP is required to have at least one outgoing edge in a CFG A no de with TYPE REGION serves as a summary no de for a set of control dep endence successors in a PDG Region no des are also useful for factoring sets of control dep endence successors 2 Parallel Program Graphs PPGs The Paral lel Program Graph PPG is a general intermediate representation of parallel programs that includes PDGs and CFGs Analogous to control de p endence and data dep endence edges in PDGs PPGs contain control edges that represent parallel ow of control and synchronization edges that imp ose ordering constraints on execution instances of PPG no des PPGs also contain MGOTO no des that are a generalization of REGION no des in PDGs What distinguishes PPGs from PDGs is the complete freedom in connecting PPG no des with control and synchronization edges to span the full sp ectrum of sequential and parallel programs Control edges in PPGs can b e used to mo del program constructs that cannot b e mo delled by control dep endence edges in PDGs eg PPGs can represent nonserializable parallel programs Synchronization edges in PPGs can b e used to mo del program constructs that cannot b e mo delled with data dep endence edges in PDGs eg a PPG may enforce a synchronization with distance vector from the next iteration of a lo op to the previous iteration as in say Haskell array constructors or Istructures while such data dep endences are prohibited in PDGs b ecause they are not plausible with reference to the sequential program from which the PDG was derived Denition A Paral lel Program Graph PPG N E E TYPE is a cont sy nc ro oted directed multigraph in which every no de is reachable from the ro ot using only control