Parallel Program Graphs and their

Classication

Vivek Sarkar Barbara Simons

IBM Santa Teresa Lab oratory Bailey Avenue San Jose CA

fvivek sarkarsimonsgvnetib mcom

Abstract We categorize and compare dierent representations of pro

gram dep endence graphs including the Control Flow Graph CFG which

is a sequential representation lacking data dep endences the Program

Dep endence Graph PDG which is a parallel representation of a se

quential program and is comprised of control and data dep endences and

more generally the Parallel Program Graph PPG which is a parallel

representation of sequential and inherently parallel programs and is

comprised of parallel control ow edges and synchronization edges PPGs

are classied according to their graphical structure and prop erties related

to deadlo ck detection and serializabil i ty

Intro duction

The imp ortance of optimizing has increased dramatically with the

development of multipro cessor computers Writing correct and ecient parallel

co de is quite dicult at b est writing such software and making it p ortable cur

rently is almost imp ossible Intermediate forms are required b oth for optimiza

tion and p ortability A natural intermediate form is the graph that represents

the instructions of the program and the dep endences that exist b etween them

This is the role played by the Parallel Program Graph PPG The Control Flow

Graph CFG in turn is required once p ortions of the PPG have b een mapp ed

onto individual pro cessors PPGs are a generalization of Program Dep endence

Graphs PDGs which have b een shown to b e useful for solving a variety

of problems including optimization vectorization co de generation for

VLIW machines merging versions of programs and automatic detection

and management of parallelism PPGs are useful for determining the

semantic equivalence of parallel programs and also have applications

to automatic co de generation

In section we dene PPGs as a more general intermediate representation of

parallel programs than PDGs PPGs contain control edges that represent parallel

ow of control and synchronization edges that imp ose ordering constraints on

execution instances of PPG no des MGOTO no des are used to create new threads

of parallel control Section contains a few examples of PPGs to provide some

intuition on how PPGs can b e built in a and how they execute at run

time Section characterizes PPGs based on the nature of their control edges

and synchronization edges and summarizes the classications of PPGs according to the results rep orted in sections and

Unlike PDGs which can only represent the parallelism in a sequential pro

gram PPGs can represent the parallelism in inherently parallel programs Two

imp ortant issues that are sp ecic to PPGs are dead lock and serializability A

PPGs execution may deadlo ck if its synchronization edges are incorrectly sp ec

ied ie if there is an execution sequence of PPG no des containing a cycle This

is also one reason why PPGs are more general than PDGs In section we

characterize the complexity of deadlo ck detection for dierent classes of PPGs

Serialization is the pro cess of transforming a PPG into a semantically equivalent

sequential CFG with neither no de duplication nor creation of Bo olean guards It

has already b een shown that there exist PPGs that are not serializable

which is another reason why PPGs are more general than PDGs In section

we further study the serialization problem by characterizing the serializability

of dierent classes of PPGs

Finally section discusses related work and section contains the conclu

sions of this pap er and an outline of p ossible directions for future work

Denition of PPGs

Control Flow Graphs CFGs

We b egin with the denition of a control ow graph as presented in

It diers from the usual denition by the inclusion of a set of lab els for

edges and a mapping of no de typ es The lab el set fT F U g is used to distinguish

b etween conditional execution lab els T and F and unconditional execution

lab el U

Denition A control ow graph CFG N E TYPE is a ro oted directed

cf

multigraph in which every no de is reachable from the ro ot It consists of

N a set of no des A CFG no de represents an arbitrary sequential computa

tion eg a basic blo ck a statement or an op eration

E N N fT F U g a set of label led control ow edges

cf

TYPE a no de typ e mapping TYPEn identies the typ e of no de n as one

of the following values START STOP PREDICATE COMPUTE

We assume that CFG contains two distinguished no des of typ e START and

STOP resp ectively and that for any no de N in CFG there exist directed paths

from START to N and from N to STOP

Each no de in a CFG has at most two outgoing edges A no de with exactly two

outgoing edges represents a conditional branch predicate in that case the no de

has TYPE PREDICATE and we assume that the outgoing edges are distin

1

guished by T true and F false lab els If a no de has TYPE COMPUTE

then it has exactly one outgoing edge with lab el U unconditional

1

The restriction to at most two outgoing edges is made to simplify the discussion

All the results presented in this pap er are applicabl e to control ow graphs with

arbitrary outdegree eg when representing a computed GOTO statement in

or a switch statement in C Note that CFG imp oses no restriction on the indegree of a no de

2

A CFG is a standard sequential representation of a program with no par

allel constructs A CFG is also the target for co de generation from parallel to

sequential co de

Program Dep endence Graphs PDGs

The Program Dependence Graph PDG is a standard representation of

control and data dep endences derived from a sequential program and its CFG

Like CFG no des a PDG no de represents an arbitrary sequential computation

eg a basic blo ck a statement or an op eration An edge in a PDG represents a

control dependence or a data dependence PDGs reveal the inherent parallelism in

a program written in a sequential by removing articial

sequencing constraints from the program text

Denition A Program Dependence Graph PDG N E E TYPE is a

cd dd

ro oted directed multigraph in which every no de is reachable from the ro ot It

consists of

N a set of no des

E N N fT F U g a set of lab elled control dependence edges

cd

Edge a b L E identies a control dep endence from no de a to no de

cd

b with lab el L This means that a must have TYPE PREDICATE or

TYPE REGION If TYPE PREDICATE and if as predicate value is

evaluated to b e L then it causes an execution instance of b to b e created If

TYPE REGION then an execution instance of b is always created

E N N Contexts a set of data dependence edges

dd

Edge a b C E identies a data dep endence from no de a to no de b

dd

with context C The context C identies the pairs of execution instances

of no des a and b that must b e synchronized Direction vectors and

distance vectors are common approaches to representing contexts of data

dep endences the gist and the lastwrite tree are newer representa

tions for data dep endence contexts that provide more information Context

information can identify a data dep endence as b eing loopindependent or

loopcarried In the absence of any other information the context of a data

dep endence edge in a PDG must at least b e plausible ie the execution

instances participating in the data dep endences must b e consistent with the

sequential ordering represented by the CFG from which the PDG is derived

TYPE a no de typ e mapping TYPEn identies the typ e of no de n as one

of the following values START PREDICATE COMPUTE REGION

The START no de and PREDICATE no de typ es in a PDG are like the

corresp onding no de typ es in a CFG The COMPUTE no de typ e is slightly

dierent b ecause it has no outgoing control edges in a PDG By contrast

every no de typ e except STOP is required to have at least one outgoing edge

in a CFG A no de with TYPE REGION serves as a summary no de for a

set of control dep endence successors in a PDG Region no des are also useful

for factoring sets of control dep endence successors

2

Parallel Program Graphs PPGs

The Paral lel Program Graph PPG is a general intermediate representation

of parallel programs that includes PDGs and CFGs Analogous to control de

p endence and data dep endence edges in PDGs PPGs contain control edges that

represent parallel ow of control and synchronization edges that imp ose ordering

constraints on execution instances of PPG no des PPGs also contain MGOTO

no des that are a generalization of REGION no des in PDGs What distinguishes

PPGs from PDGs is the complete freedom in connecting PPG no des with control

and synchronization edges to span the full sp ectrum of sequential and parallel

programs Control edges in PPGs can b e used to mo del program constructs

that cannot b e mo delled by control dep endence edges in PDGs eg PPGs can

represent nonserializable parallel programs Synchronization edges

in PPGs can b e used to mo del program constructs that cannot b e mo delled

with data dep endence edges in PDGs eg a PPG may enforce a synchronization

with distance vector from the next iteration of a lo op to the previous

iteration as in say Haskell array constructors or Istructures while such

data dep endences are prohibited in PDGs b ecause they are not plausible with

reference to the sequential program from which the PDG was derived

Denition A Paral lel Program Graph PPG N E E TYPE is a

cont sy nc

ro oted directed multigraph in which every no de is reachable from the ro ot using

only control edges It consists of

N a set of no des

E N N fT F U g a set of lab elled control edges Edge a b L

cont

E identies a control edge from no de a to no de b with lab el L

cont

E N N SynchronizationConditions a set of synchronization edges

sy nc

Edge a b f E denes a synchronization from no de a to no de b with

sy nc

synchronization condition f

TYPE a no de typ e mapping TYPEn identies the typ e of no de n as one

of the following values START PREDICATE COMPUTE MGOTO

The START no de and PREDICATE no de typ es in a PPG are just like the

corresp onding no de typ es in a CFG or a PDG The COMPUTE no de typ e

is more general b ecause a PPG compute no de may either have an outgoing

control edge as in a CFG or may have no outgoing control edges as in a PDG

A no de with TYPE MGOTO is used as a construct for creating parallel

threads of computation a new thread is created for each successor of an

MGOTO no de MGOTO no des in PPGs are a generalization of REGION no des in PDGs

We distinguish b etween a PPG no de and an execution instance of a PPG

no de ie a dynamic instantiation of that no de Given an execution instance

I of PPG no de a its execution history H I is dened as the sequence of

a a

PPG no de and lab el values that caused execution instance I to o ccur A PPGs

a

execution b egins with a single execution instance I of the start no de with

star t

H I an empty sequence

star t

A control edge in E is a triple of the form a b L which denes a transfer

cont

of control from no de a to no de b for branch lab el or branch condition L

The semantics of control edges is as follows If TYPEa PREDICATE then

consider an execution instance I of no de a that evaluates no de as branch lab el

a

to b e L if there exists an edge a b L E execution instance I creates

cont a

a new execution instance I of no de b there can b e at most one such successor

b

no de and then terminates itself The execution history of each I is simply

b

H I H I a L where is the sequence concatenation op erator If

b a

TYPEa MGOTO then let the set of outgoing control edges from no de a

with branch lab el L which must b e U b e fa b L a b Lg After

1 k

of each completion execution instance I creates a new execution instance I

a b

i

is target no de b and then terminates itself The execution history of each I

i b

i

H I a L simply H I

a b

i

A synchronization edge in E is a triple of the form a b f which denes

sy nc

a PPG edge from no de a to no de b with synchronization condition f f H H

1 2

is a computable Bo olean function on execution histories Given two execution

instances I and I of no des a and b f H I H I returns true if and only

a b a b

if execution instance I must complete execution b efore execution instance I

a b

can b e started For theoretical reasons it is useful to think of synchronization

condition f as an arbitrary computable Bo olean function however in practice

we will only b e interested in simple denitions of f that can b e stored in at most

O jN j space Note that the synchronization condition dep ends only on execution

histories and not on any program data values Also due to the presence of

synchronization edges it is p ossible to construct PPGs that may deadlo ck 2

A PPG no de represents an arbitrary sequential computation The control

edges of a PPG sp ecify how execution instances of PPG no des are created

unravelled and the synchronization edges of a PPG sp ecify how execution

instances need to b e synchronized A formal denition of the execution semantics

of mgoto edges and synchronization edges is given in

We would like to have a denition for synchronization edges that corre

sp onds to the notion of loopindependent data dep endence edges in sequential

programs If no des a and x are in a lo op in a sequential program and have a

lo opindep endent data dep endence b etween them then whenever x is executed

in an iteration of the lo op an instance of a also must b e executed prior to x

in that same loop iteration However if in some iteration of the lo op x is not

executed say b ecause of some conditional branch then we dont care whether or

not a is executed This is the notion we are trying to capture with our denition

of controlindependent b elow First we need a couple of other denitions

Consider an execution instance I of PPG no de x with execution history

x

H I u L u u u L u x as dened in

x 1 1 i j k k k

We dene nodeprexH I a the nodeprex of execution history H I with

x x

resp ect to the PPG no de a as follows If there is no o ccurrence of no de a

in H I then nodeprexH I a If a x nodeprexH I a

x x x

u L u L u a u a i j k if a x then nodeprexH I a

1 1 i i i j x

H I u L u u u L

x 1 1 i j k k

A control path in a PPG is a path that consists of only control edges A no de

a is a control ancestor of no de x if there exists an acyclic control path from

START to x such that a is on that path

A synchronization edge x y f is controlindependent if a necessary but not

necessarily sucient condition for f H I H I true is that nodeprexH I a

a b x

nodeprexH I a for all no des a that are control ancestors of b oth x and

y

y

Examples of PPGs

In this section we give a few examples of PPGs to provide some intuition on

how PPGs can b e built in a compiler and how they execute at runtime

Figure is an example of a PPG obtained from a threadbased parallel

programming mo del that uses create and terminate op erations This PPG

contains only control edges it has no synchronization edges The create op

eration in statement S is mo deled by an MGOTO no de with two successors

no de S which is the target of the create op eration and no de S which is the

continuation of the parent thread The terminate op eration in statement S is

mo deled by making no de S a leaf no de ie a no de with no outgoing control

edges

Figure is another example of a PPG that contains only control edges and is

obtained from a threadbased parallel programming mo del This is an example

of unstructured parallelism b ecause of the goto S statements which cause

no de S to b e executed twice when predicate S evaluates to true The PPG

in Figure is also an example of a nonserializable PPG Because no de S can

b e executed twice it is not p ossible to generate an equivalent sequential CFG

for this PPG without duplicating co de or inserting Bo olean guards With co de

duplication this PPG can b e made serializable by splitting no de S into two

copies one for parent no de S and one for parent no de S

Figure is an example of a PPG that represents a parallel sections

construct which is similar to the cobegincoend construct This PPG

contains b oth control edges and synchronization edges The start of a parallel

sections construct is mo delled by an MGOTO no de with three successors no de

S which is the start of the rst parallel section no de S which is the start of

the second parallel section and no de S which is the continuation of the parent

thread The goto S statement at the b ottom of the co de fragment is simply

mo delled by a control edge that branches back to no de S PARALLEL THREADS EXAMPLE

START

U

S1: . . . S1

S2: create S6 U S3: . . . S4: . . . S2 [mgoto]

S5: terminate U U S6: . . .

. . . . S3 S6

UU .

S4

Fig Example of PPG with only control edges

The semantics of parallel sections requires a synchronization at the end

sections statement This is mo delled by inserting a synchronization edge from

no de S to no de S and another synchronization edge from no de S to no de S

The synchronization condition f for the two synchronization edges is dened in

Figure as a simple function on the execution histories of execution instances

IS IS IS of no des S S S resp ectively For example the synchronization

condition for the edge from S to S is dened to b e true if and only if H I S

concatH I S S U this condition identies the synchronization edge as

b eing controlindep endent In practice a controlindep endent synchronization

is implemented by simple semaphore op erations without needing to examine the

execution histories at runtime

As an optimization one may observe that the threeway branching at the

MGOTO no de S in Figure can b e replaced by twoway branching without

any loss of parallelism This is b ecause no de S has to wait for no des S and

S to complete b efore it can start execution A more ecient PPG for this

example can b e obtained by deleting the control edge from no de S to no de

S and then replacing the synchronization edge from no de S to no de S EXAMPLE OF UNSTRUCTURED PARALLELISM

START

U S1: if ( ) mgoto S2, S3 S1 S5: . . . TF . . . .

terminate S5 S2: . . . S2.t [mgoto] goto S4 U U U S3: . . . . goto S4

S4: . . . S2 S3 terminate UU

S4

Fig Example of unstructured PPG with only control edges

by a control edge This transformed PPG would contain the same amount of

ideal parallelism as the original PPG but would create two threads instead

of three and would p erform one synchronization instead of two for a given

2

instantiation of the parallel sections construct This idea is an extension of

the control sequencing transformation used in the PTRAN system to replace a

data synchronization by sequential control

Further if the granularity of work b eing p erformed in the parallel sections

is to o little to justify creation of parallel threads the PPG in Figure can b e

transformed into a sequential CFG which is a PPG with no MGOTO no des and

no synchronization edges this transformation is p ossible b ecause the PPG in

Figure is serializable Therefore we see that PPG framework can can supp ort

a wide range of parallelism from ideal parallelism to useful parallelism to no

2

Note that if b oth synchronization edges coming into S were replaced by control

edges the resulting PPG would p erform no de S twice for a given instantiation of

the parallel sections construct which is incorrect EXAMPLE OF PARALLEL SECTIONS (COBEGIN−COEND)

control edge START

synchronization edge U

S0 [mgoto]

S0: PARALLEL SECTIONS U U U SECTION S1: . . . S2: . . . S1 S3

SECTION U U S3: . . . S4: . . . S2 S4 S5: END SECTIONS

. . . . f(H(I.S2), H(I.S5)) f(H(I.S4), H(I.S5)) goto S0 S5 . . f(H(I.S4), H(I.S5)) := H(I.S4) = concat(H(I.S5), )

f(H(I.S2), H(I.S5)) := H(I.S2) = concat(H(I.S5), )

Fig Example of structured PPG with control and synchronizatio n edges

parallelism As future work it would b e interesting to extend existing algorithms

for selecting useful parallelism from PDGs eg to op erate more generally

on PPGs

Classication of PPGs

Classication criteria

In this subsection we characterize PPGs based on the nature of their control

edges and synchronization edges

Control edges The control edges of PPGs can b e acyclic or they can have lo ops

which in turn can b e reducible or irreducible For reducibility we extend the

denition used for CFGs so that it is applicable to PPGs as well a PPG is

reducible if each strongly connected region of its control edges has a single entry

no de otherwise a PPG is said to b e irreducible This classication of

a PPGs control edges as acyclicreduciblearbitrary is interesting b ecause of

the characterizations that can then b e made Note that each class is prop erly

contained within the next with irreducible PPGs b eing the most general case

Synchronization edges PPGs can also b e classied by the nature of their syn

chronization edges as follows

No synchronization edges

A CFG is a simple example of a PPG with no synchronization edges Fig

ures and in section are some more examples of PPGs that have no

synchronization edges

controlindep endent synchronizations

In this case all synchronization edges are assumed to have synchronization

conditions that cause only execution instances from the same lo op iteration

3

to b e synchronized For example the two synchronizations in Figure are

controlindep endent

General synchronizations

In general the synchronization function can include controlindep endent

synchronizations lo opcarried synchronizations etc

Predicateancestor condition

Denition Predicateancestor condition predanc cond If two control edges

terminate in the same no de then the least common control ancestor obtained

by following only control edges of the source no des of b oth edges is a predicate

no de

Nopostdominator condition

Denition Nop ostdominator condition nop ostdom cond let P b e a pred

icate and S a control descendant of P then S do es not p ostdominate P if there

is some path from P to a leaf no de that excludes S The nop ostdominator

condition assumes that for every descendant S of predicate P S do es not p ost

dominate P

Characterization of PPGs based on the Deadlo ck Detection

Problem

Table classies PPGs according to how easy or hard it is to detect if a PPG

in a given class can deadlo ck A detailed discussion of these results is presented

in section

Characterization of PPGs based on Serializability

One of the primary distinctions b etween a CFG and a PDG or PPG is that

the CFG is already serialized while neither the PDG or the PPG is serialized

Table classies PPGs according to how easy or hard it is to detect if a PPG

in a given class is serializable A detailed discussion of these results is presented

in Section Note that serializability is a stronger condition than deadlo ck

detection b ecause a serializable program is guaranteed never to deadlo ck

3

We assume START and STOP are part of a dummy lo op with one iteration

Control edges Sync edges Characterization

PPGs in this class will never deadlo ck b ecause they

Arbitrary None

have no synchronization edges Lemma

PPGs in this class will never deadlo ck b ecause it is

Acyclic in con

not p ossible for a cyclic wait to o ccur at runtime

junction with con Acyclic

Lemma

trol edges

Control

indep endent no

Reducible PPGs in this class will never deadlo ck Lemma

synchronization

backedges

This is a sp ecial case of PPGs with acyclic control

Acyclic satises

edges There are known p olynomial time algorithms

control

predanc and no

that can determine whether such a PPG is serializable

indep endent

p ostdominator

If it is serializabl e then it is guaranteed to not

conditions

deadlo ck

The problem of determining an assignment of truth

values to the predicates that causes the PPG to dead

control lo ck is shown to b e NPhard It is easy to sp ecify an

Acyclic

indep endent exp onentialtime algorithm for solving the problem by

enumerating all p ossible assignments of truth values

to predicates

The problem of detetecting deadlo ck in arbitrary

PPGs is a very hard problem We conjecture that it is Arbitrary Arbitrary

undecidable in general

Table Characterization of deadlo ck detection problem for PPGs

Deadlo ck detection

The intro duction of synchronization edges makes deadlo ck p ossible Deadlo ck

o ccurs if there is a cyclic wait so that every no de in the cycle is waiting for

one of the other no des in the cycle to pro ceed Formally we say that a PPG

dead locks if there is an execution instance dynamic instantiation v of a no de

i

v that instance is never executed scheduled no other new execution instances

are created and no other no de is executed We refer to v as a dead locking node

and v as a dead locking instance there might b e multiple deadlo cking no des and

i

instances Note that our denition of deadlo ck implies that there is no innite

lo op The motivation for this denition is to distinguish b etween deadlo ck and

nontermination

Given a synchronization edge u v f and an execution instance v of v we

i

say that u v f is irrelevant for v if for all execution instances u of u that are

i j

H i false and for all u such that f H I H i created f H I

v j u v u

i j i j

true no execution instance of u can ever b e created u v f is relevant for v

j i

if it is not irrelevant for v i

Control edges Sync edges Characterization

Section outlines known algorithms for determining

Reducible satisfy

whether or not a PPG in this class is serializabl e

control

predanc and no

without no de duplication or creation of Bo olean

indep endent

p ostdominator

guards

conditions

It is known that PPGs that do not satisfy the predanc

Do

condition are not serializabl e without no de duplica

None

not satisfy pred

tion or creation of Bo olean guards

anc condition

Table Characterization of serializabi l ity detection of PPGs

Lemma A PPG with no synchronization edges wil l never dead lock

Pro of Follows from the denitions of control edges and deadlo ck ut

A PPG G is acyclic if it is acyclic when b oth the control and the synchro

nization edges are considered

Lemma Let G N E be an acyclic PPG Then G never dead locks

Pro of The pro of is by induction on the size of N If jN j then G

consists of only START and will never deadlo ck Supp ose for induction that the

lemma holds for all PPGs with k or fewer no des and consider G N E with

jN j k Supp ose v N is a deadlo cking no de with deadlo cking instance v

i

By lemma there must b e a synchronization edge u v f that is relevant for v

i

and that is causing v to deadlo ck Since G is acyclic u v By the denition

i

of deadlo ck v is created but not executed Supp ose v is the only instance of

i i

v that is created If u or some predecessor of u is a deadlo cking no de then we

get a contradiction to the induction assumption by considering the subgraph

of G that contains all no des except for v So either there are no instances of u

created or all instances u of u that are created are executed Since u v f is

j

relevant either f will evaluate to false or u v f is not a deadlo cking edge for

j i

all created u

j

If multiple instances of v are created then since there are no synchronization

edges b etween any pair of instances of v the instances are all indep endent

Consequently the ab ove argument holds for any deadlo cking instance of v ut

A cycle is a path of directed synchronization andor control edges such that

the rst and last no des on the path are identical Given cycle C no de a C

is an entry point of C if there exists some path of all control edges from the

ro ot to a such that contains no other no des from C A cycle is singleentry if

it contains only one entry p oint otherwise it is multipleentry

Synchronization edge v u f or control edge v u is a backedge if u is a

control ancestor of v Note that the denition of backedge can include selflo ops

ie a synchronization edge v v f

Lemma Let G N E be a PPG which contains only controlindependent

synchronization edges Suppose also that there are no synchronization backedges

0 0 0

and that al l cycles are singleentry Let G N E be the PPG that is obtained

0 0

from G by deleting al l backedges in G ie N N and E E Then for al l

a x N if a is a control ancestor of x in G then a is a control ancestor of x

0

in G

Pro of The pro of is an induction on the numb er of backedges removed from

G Initially supp ose that G has only a single backedge u v a is a control

0

ancestor of x in G but a is no longer a control ancestor of x when G is created

from G by removing u v By the denition of control ancestor there exists

some acyclic control path in G from START to x that contains a Because a

x

0

is not a control ancestor of x in G we must have u v By the denition

x

of a PPG and backedges there must b e an acyclic control path from START

v

to v such that u

v

Case a Since contains a subpath from v to x a implies that

v x v

there is a control path from a to x that do es not contain u v Consequently a

remains a control ancestor of x after the removal of u v a contradiction

Case a Because u v is a backedge v is a control ancestor of u

v

and consequently v is the entry p oint of some cycle containing u v Because

a there must b e a no de on that is another entry p oint to that cycle

v x

Consequently u v b elongs to a multipleentry cycle a contradiction

If G has k backedges we instead consider G which is constructed from G by

removing backedges such that the set of control ancestors of each no de remains

unchanged and the removal of any additional backedge would cause some no de

to lo ose a control ancestor We then apply the ab ove argument to G ut

Lemma Let G N E be a PPG and assume that al l cycles in G are

singleentry al l synchronization edges are controlindependent and none is a

backedge Then G wil l not dead lock

Pro of Let G b e the graph that it obtained from G by removing all back

edges from G By lemma every no de in N has the same set of control ancestors

in G and in G Since all backedges are control edges G and G have the same

synchronization edges By lemma G never deadlo cks

Assume for contradiction G deadlo cks and let v N b e a deadlo cking

no de in G with deadlo cking instance v such that no predecessor of v in G

i

is a deadlo cking no de in G By lemma there exists a synchronization edge

u v f that is relevant for v and which is causing v to deadlo ck If v is

i i

not part of a cycle in G then the arguments of lemma hold If v is part

of a cycle then since u is a predecessor of v in G no instance u of u that is

j

created deadlo cks Consequently since u v f is relevant there must b e some

true but u is not created It H I instance u of u such that f H I

j v j u

i j

follows from the denition of controlindep endent synchronization edges that

a for all no des a that are control a nodeprexH I nodeprexH I

v u

i j

ancestors of b oth u and v Therefore if v is created so is u and consequently

j i i j

v cannot deadlo ck ut

i

Theorem If al l cycles are singleentry the presence of a synchronization

edge that is a potential backedge or is not a controlindependent edge is necessary

for dead lock to occur

Pro of Follows from lemma ut

Let C b e an multipleentry cycle with entry p oints x x x We dene

1 2 k

to b e the singleentry cycle that is obtained by deleting all edges y x j C

j x

i

is a singleentry cycle with entry point x i such that y C We say that C

i x

i

A backedge Edge v x is a potential backedge if v x is a backedge of C

i i x

i

is also a p otential backedge

We conjecture that lemma and theorem hold but we have not yet

formalized the pro of

Lemma Conjecture Suppose G contains multipleentry cycles If al l syn

chronization edges are controlindependent and none is a potential backedge

then G cannot dead lock

Theorem Conjecture The presence of a synchronization edge that is a

potential backedge or is not a controlindependent edge is necessary for dead lock

to occur

We now show that determining if there exists an execution sequence that

deadlo cks in a PPG with acyclic control edges is NPhard The transformation

is from SAT Given a Bo olean expression E a a a a a

i j k i j

a am am am with each clause c ap ap ap containing

k i j k p i j k

literals p m Assume all the literals that o ccur in E are either p ositive

or negative forms of variables a a a Is there a truth assignment to the

1 2 n

variables predicates such that E evaluates to true

Theorem Determining if there exists an execution sequence that contains

a dead locking cycle is NPhard for a PPG with an acyclic control dependence

graph

Proof To prove that the problem is NPhard let E b e an instance of SAT

with n variables a a a and m clauses c c c We assume that

1 2 n 1 2 m

there is some o ccurrence of b oth the true and false literals of every variable in E

since otherwise we could just set the truth assignment of the variable equal to

the value that makes all of its literals true and eliminate all clauses containing

that literal

We construct a PPG G with a START no de n variable predicate no des

n MGOTO no des and m clause COMPUTE no des We rst describ e

the control edges of G START has an MGOTO no de as its only child That

MGOTO no de has the n predicate no des as its children Each predicate no de

corresp onds to one of the variables that o ccur in E If A is the predicate no de i

corresp onding to the variable a then the true branch of A has an MGOTO

i i

no de as its child and the false branch of A has a second MGOTO no de as its

i

child Let c b e a clause in E with corresp onding COMPUTE no de C in G and

r r

supp ose that C a a a Then there are control edges from the MGOTO

r i j k

children of the true edges of A and A and from the false edge of A to C In

i k j r

general if a o ccurs in c in its true false form then there is an edge from the

i j

MGOTO descendant of the true false branch of A to C in G

i j

The synchronization edges are directed from C to C for i m and

i i+1

from C to C The synchronization condition f for the synchronization edges

m 1

is ALL ie all instances of the source no de must complete execution b efore any

instance of the destination no de can start execution For each COMPUTE no de

the wait op erations for all incoming synchronization edges must b e completed

b efore the signal op eration for any outgoing synchronization edges can b e p er

formed So a COMPUTE no de will send its synchronization signal to its neighb or

only after having received a similar signal from its other neighb or

If all COMPUTE no des have at least one instantiation then the execution

sequence for G will deadlo ck If one of the COMPUTE no des C is not executed

i

at all then no instance of C needs to wait for synchronization messages

i+1

from C and so the instances of C will complete execution and send their

i i+1

synchronization messages to instances of C which will then send messages to

i+2

instances of C and so on

i+3

If there is a truth assignment to the variables of E so that E is satised then

for that truth assignment each COMPUTE no de will b e executed at least once

and deadlo ck will o ccur Conversely if G deadlo cks then we can obtain a truth

assignment to the variables of E such that E is satised 2

For each clause no de there are three no de disjoint paths from the MGOTO

child of START to that clause no de Therefore the PPG used in the ab ove

4

reduction do es not satisfy the predanc cond The eect of the predanc cond

is that in a PPG with acyclic control edges a no de is executed only zero or

one times In section Figures and are examples of PPGs that satisfy the

ancprec cond while Figure is an example of a PPG that do es not satisfy the

ancprec cond As we show in section the predanc cond is a necessary but

not sucient condition for a PPG to b e serializable without no de duplication or

the intro duction of guard variables Furthermore if the predicateancestor and

nop ostdominator conditions hold then determining whether or not a PPG can

b e serialized without no de duplication can b e done in p olynomial time

An obvious question is if the predanc cond holds can deadlo ck detection

also b e done in p olynomial time One observation is that if the PPG is found

to b e serializable then we can assert that it will not deadlo ck So the reduced

question is if the predanc condition holds and the PPG is known to b e non

serializable can deadlo ck detection also b e done in p olynomial time Since we

dont have the answer to that question a p ossibly easier but related question

is the following Given a set of PPG no des none of which is a control ancestor

of the other and assuming that the predanc cond holds can we determine in

4 But it satises the no p ostdominance rule

p olynomial time if there exists an execution sequence such that all the no des in

the set are executed We are also unable to answer this question but we can

give a necessary and sucient characterization for the existence of an execution

sequence in which all the no des in the set are executed This characterization

implies a fast algorithm for detecting b ogus edges in an acyclic PPG that

satises the predanc cond

Lemma Let G be an acyclic PPG and assume that the predanc cond holds

Let S be a set of nodes in G such that none is a control ancestor of the other

Then there exists an execution sequence of G in which al l the nodes in S are

executed if and only if there is a rooted spanning tree T not necessarily spanning

al l the nodes of G consisting of only control edges of G such that al l nodes in T

with multiple outedges are MGOTO nodes and al l nodes in S are leaves in T

Proof If such a T exists then since the ro ot of T is reachable from START

and since all outedges of an MGOTO no des are always taken then there exists

an execution sequence for which all the no des of S are executed Conversely

supp ose that there exists an execution sequence such that all the no des of S are

0

executed Let G consist of only the no des and edges in G that are executed in

that execution sequence Because the predanc cond implies that every no de is

executed zero or one time if two no des X Y S have a predicate least common

0

control ancestor P in G then since P is executed at most once at most one

1 1

of X and Y will b e executed whenever P is executed If X and Y have two

1

predicate least common ancestors P and P then there must b e two no de

1 2

disjoint paths to each of X and Y one containing P and the other containing

1

P Therefore by the predanc cond the least common ancestor of P and P

2 1 2

must b e a predicate P and at most one of P and P and consequently X

3 1 2

and Y will b e executed whenever P is executed It follows from induction that

3

0

the least common control ancestor in G of any pair of no des in S must b e an

MGOTO no de A similar argument shows that any pair of these MGOTO either

0

lie in a path in G or have another MGOTO no de as their least common ancestor

Finally the conguration of the MGOTO no des and the no des in S must b e a

tree since otherwise some MGOTO no de would violate the predanc cond 2

Denition A synchronization edge a b f is said to b e bogus if there is

no execution sequence of the PPG in which it is executed ie there are no p ossible

instantiations I and I of no des a and b that satisfy f H I H I true and

a b a b

that can b e obtained from the same execution of the PPG

Corollary If an acyclic PPG satises the predanc cond then al l bogus

synchronization edges can be detected in O jE j jE j time where jE j

sy nc cont sy nc

is the number of synchronization edges and jE j is the number of control edges

cont

in G

Proof Lemma implies that a synchronization edge u v f is b ogus if and

only if there is no MGOTO no de which is the least common ancestor of b oth

u and v in G Consequently a simple backtracking search from each endp oint

of the synchronization edge b eing tested is sucient to determine if the two

endp oints have an MGOTO least common ancestor 2

Serializability of PPGs

After a PPG has b een used to optimize and p ossibly parallelize co de the opti

mized co de frequently must b e mapp ed into sequential target co de for the indi

vidual pro cessors An equivalent problem can arise when trying automatically

to merge dierent versions of a program which have b een up dated in parallel by

several p eople

Serialization customarily involves transforming the PPG to a CFG from

which the sequential co de is then derived Whenever p ossible the CFG is con

structed without duplicating no des or inserting Bo olean guard variables No de

duplication is undesirable b ecause of p otential space requirements which can cre

ate cache and page faults and the intro duction of guard variables is undesirable

b ecause of the additional time and space needed to store and test such variables

The mo del

In this section we analyze a restricted mo del of a PPG as dened in

We allow only the MGOTO no de to have multiple control inedges This is not

a restriction on the mo del since if a PREDICATE or COMPUTE no de a has

multiple control inedges we can always direct the inedges into a new MGOTO

no de f which is then made the unique parent of a We also assume that all

COMPUTE no des are leaves in the control subgraph of the PPG and that all

predicates have precisely two outgoing control edges one lab elled true and the

other lab elled false

Finally we assume that predanc condition holds This is indeed a restriction

since if the predanc cond do es not hold then any CFG constructed from a

PPG that violates the predanc cond provably requires no de duplication as is

illustrated in gure In the example of gure the least common ancestor of

two paths terminating at s is MGOTO no de f Therefore there is an execution

sequence in which s is executed twice and any sequential representation will

have two copies of s

Since a CFG has no parallelism construction of a CFG from a PPG involves

the elimination of MGOTO no des and the merging of the subgraphs of their

children For the remainder of this section we assume as input a PPG that

satises the predanc cond and the no de restrictions discussed ab ove

Even if the predanc cond is satised there are PPGs for which no de dupli

cation or guard variable insertion is unavoidable

The Control Dep endence Graph CDG is the subgraph of the PPG that

represents only the control dep endences or control edges The CDG encapsu

lates the diculty of translating the PPG into a CFG The result presented in

f p

t f

p p s s

t f t f

s s s p

t f

s s

Fig A PPG subgraph and equivalent sequential CFG subgraph with duplicatio n

this section also assumes that the CDG satises the no p ostdominator condition

dened in section

In order to determine when a PPG can b e serialized we need some notion

of equivalence b etween a CDG and a CFG One approached taken by Ball

0

and Horwitz is to say that a CFG G is a corresponding CFG of CDG G if

0

and only if the CDG that is constructed from G is isomorphic to G with the

isomorphism resp ecting no de and edge lab els We have chosen a denition that

do es not dep end on isomorphism

0 0

Given two graphs G and G where G is an acyclic CDG and G is a CFG

0

we say that G is semantical ly equivalent to G if

0

All PREDICATE and COMPUTE no des in G are copies of no des in G and

0

every PREDICATE or COMPUTE no de in G has at least one copy in G

For every truth assignment to the predicates the PREDICATE and COM

0

PUTE no des that are executed in G are copies of exactly those PREDICATE

and COMPUTE no des executed in G If PREDICATE or COMPUTE no de

a is executed ca times in G under some truth assignment then the numb er

0 5

of times a copy of a is executed in G is equal to ca

Because a CDG G has no p ostdominance one can prove that the following

0

that corresp onds to an acyclic CDG G that condition must hold in any CFG G

satises the predanc cond

Supp ose a and b are PREDICATE or COMPUTE no des in G Then if a

0

precedes b in G then no copy of b precedes a copy of a in G

The denition of semantic equivalence is extended to cyclic graphs with

reducible lo ops by rst applying it to the acyclic graphs in which the lo ops

are replaced by sup erno des and then applying the denition recursively to the

acyclic subgraphs within corresp onding sup erno des

5

If the PPG satises the predanc cond then ca

Algorithms for constructing a CFG from a PDG

There are two algorithms for constructing a CFG from a PDG each of which

uses a somewhat dierent mo del for the PDG Both assume that all lo ops in

the PDG are reducible and that the nop ostdominator condition holds b oth

have a running time of O ne where n is the numb er of no des and e is the

numb er of edges in the PDG The complexity involves the analysis of the control

dep endences

Constructing a CFG and reconstituting a CDG The algorithm presented in

0

determines orderings that must exist in any CFG G that corresp onds to the

6 0

CDG G If the algorithm fails to construct a CFG G or if it constructs a

CFG that do es not corresp ond to G ie for which there is not a CDG that

is isomorphic to G then the algorithm fails Although the algorithm do es not

explicitly deal with data dep endences it can b e mo died to do so

The diculty with the ab ove approach is that when the algorithm fails it

do es not provide any information as to why it fails

Algorithm Sequentialize Algorithm Sequentialize is based on analysis

0

that states that a CFG G can b e constructed from a PDG G without no de

duplication if and only if there is no forbidden subgraph The intuition b ehind

the forbidden subgraph is that duplication is not required if and only if there is

no set of forced execution sequences which would create a cycle in the acyclic

p ortion of the graph A forbidden subgraph implies the existence of such a cycle

A cycle can also exist if data dep endences contradict an ordering implied by the

control dep endences When duplication is required the algorithm can b e used

as the basis of a heuristic that duplicates subgraphs or adds guard variables

Related Work

The idea of extending PDGs to PPGs was intro duced by Ferrante and Mace in

extended by Simons Alp ern and Ferrante in and further extended

by Sarkar in The denition of PPGs used in this pap er is similar to the

denition from the only dierence is that the denition in used mgoto

edges instead of mgoto nodes The PPG denition used in imp osed

several restrictions a the PPGs could not contain lo opcarried data dep endences

synchronizations b only reducible lo ops were considered c PPGs had to

satisfy the nop ostdominator rule and d PPGs had to satisfy the predanc

rule The PPGs dened in this pap er have none of these restrictions Restrictions

a and b limit their PPGs from b eing able to represent all PDGs that can

b e derived from sequential programs restriction a is a serious limitation in

practice given the imp ortant role played by lo opcarried data dep endences in

representing lo op parallelism Restriction c limits their PPGs from b eing

6

The mo del presented in diers somewhat from that of However the

mo dels are essentially equivalent

able to represent CFGs Restriction d limits their PPGs from representing

threadbased parallel programs in their full generality eg the PPG from Figure

cannot b e expressed in their PPG mo del b ecause it is p ossible for no de S to b e

executed twice

The Hierarchical Task Graph HTG prop osed by Girkar and Polychronop ou

los is another variant of the PPG applicable only to structured programs

that can b e represented by the hierarchy dened in the HTG

One of the central issues in dening PDGs and PPGs is in dening their paral

lel execution semantics The semantics of PDGs has b een examined in past work

by Selke by Cartwright and Felleisen and by Sarkar In Selke presented an

op erational semantics for PDGs based on graph rewriting In Cartwright

and Felleisen presented a denotational semantics for PDGs and showed how

it could b e used to generate an equivalent functional dataow program Both

those approaches assumed a restricted programming language the language W

and presented a valueoriented semantics for PDGs The valueoriented nature

of the semantics made it inconvenient to mo del storeoriented op erations that

are found in real programming languages like an up date of an array element or

of a p ointerdereferenced lo cation Also the control structures in the language

W were restricted to ifthenelse and whiledo the semantics did not cover

programs with arbitrary control ow In contrast the semantics dened by

Sarkar is applicable to PDGs with arbitrary unstructured control ow

and arbitrary data read and write accesses and more generally to PPGs as

dened in this pap er

The serialization problem has b een addressed in some prior work Generating

a CFG from the CDGs of wellstructured programs is rep orted in such

CDGs are trees and do not require any duplication The IBM PTRAN system

generates CFGs from the CDGs of unstructured ie not necessarily

trees FORTRAN programs The work presented in uses assumptions

ab out the structure of CFGs derived from sequential programs thereby limiting

its applicability The results presented in consider the general problem in an

informal manner by determining the order restrictions that had to hold for region

no de children of a region no de

Conclusions and Future Work

In this pap er we have presented denitions and classications of Parallel Pro

gram Graphs PPGs a general parallel program representation that also en

compasses PDGs and CFGs We b elieve that the PPG has the right level of

semantic information to match the programmers way of thinking ab out parallel

programs and that it also provides a suitable execution mo del for ecient

implementation on real architectures The algorithms presented for deadlo ck

detection and serialization for sp ecial classes of PPGs can b e used to relieve the

programmer from p erforming those tasks manually The characterization results

can b e viewed as also identifying PPGs that are easier for programmers to work

with eg PPGs for which deadlo ck detection andor serializability are hard to

p erform are quite likely to b e PPGs that are hard for programmers to understand

and that lead to programming errors

We see several p ossible directions for future work based on the results pre

sented in this pap er

Extend sequential program analysis techniques to the PPG

All program analysis techniques currently used in compilers eg constant

propagation induction variable analysis computation of Static Single As

signment form etc op erate on a sequential control ow graph representa

tion of the program There has b een some recent work in the area of program

analysis in the presence of structured parallel language constructs

However just as the control ow graph is the representation of choice due

to its simplicity and generality for analyzing sequential programs it would

b e desirable to use the PPG representation as a simple and general repre

sentation for parallel programs

In particular we are interested in extending to PPGs the notions of dom

inators and control dep endence that have b een dened for CFGs These

extensions should provide us with a way of automatically transforming a

given PPG into a form that satises the nop ostdominator condition so that

assuming the nop ostdominator condition will not restrict the applicability

of the results obtained from that assumption

Investigate the complexity of dead lock detection for acyclic PPGs that satisfy

the predanc and nopostdominator conditions

In section we showed that the problem of detecting deadlo ck in PPGs with

acyclic control edges is NPhard Moreover for PPGs with acyclic control

edges that satisfy the predanc and nop ostdominator conditions section

provides a p olynomialtime algorithm for determining whether or not the

PPG is serializable We have observed that serializable PPGs cannot dead

lo ck Therefore the remaining op en issue is what is the complexity of dead

lo ck detection for PPGs that satisfy the predanc and nop ostdominator

conditions but are nonserializable We are interested in resolving this op en

issue by either obtaining a p olynomialtime deadlo ck detection algorithm for

such PPGs or by proving that the problem is NPhard

Develop a transformation framework for PPGs

Current parallelizing compilers optimize programs for parallel architectures

by p erforming transformations on p erfect lo op nest on CFGs and on PPGs

It would b e interesting to extend existing algorithms and design new algo

rithms in a transformation framework for PPGs In this pap er we discussed

serialization and the elimination of b ogus edges which can also b e viewed

as transformations on PPGs In the future we would like to p erform other

transformations to match the p erformance characteristics and co de genera

tion requirements of dierent parallel architectures

Develop a common compilation and execution environment for dierent par

al lel programming languages

The PPG can b e used as the basis for a common compilation and ex

ecution environment for parallel programs written in dierent languages

The scheduling system dened in can b e develop ed into a PPGbased

interpreter debugger or runtime system for parallel programs that provides

feedback to the user ab out the parallel programs execution eg parallelism

proles detection of readwrite or writewrite hazards detection of deadlo ck

etc Currently such programming to ols are b eing develop ed separately for

dierent languages with dierent ad ho c assumptions b eing made ab out

their parallel execution semantics The PPG representation could b e useful

in leading to a common environment for dierent parallel programming

languages

Study the limitations of the PPG representation

Though we showed in this pap er how PPGs are more general than PDGs and

CFGs we are aware that there are certain parallelism constructs that do not

map directly to PPGs Examples of such constructs include general p ostwait

synchronizations in which the synchronization condition dep ends on data

values not just on execution histories and singleprogrammultipledata

SPMD execution in which the numb er of times a region of co de is executed

can dep end on a runtime parameter such as the numb er of pro cessors It

would b e interesting to see how such constructs can b e represented by PPGs

without losing to o much semantic information We conjecture that parallel

programming constructs that are hard to represent by PPGs are also hard

for programmers to understand A study of the limitations of PPGs will shed

more light on this conjecture

References

AV Aho R Sethi and JD Ullman Compilers Principles Techniques and

Tools AddisonWesley

F E Allen Control ow analysis ACM SIGPLAN Notices

Frances Allen Michael Burke Philipp e Charles Ron Cytron and Jeanne Ferrante

An Overview of the PTRAN Analysis System for Multipro cessi ng Proceedings of

the ACM International Conference on Supercomputing Also published

in The Journal of Parallel and Distributed Computing Oct pages

Randy Allen and Ken Kennedy Automatic Translation of FORTRAN Programs

to Vector Form ACM Transactions on Programming Languages and Systems

Octob er

Arvind ML Dertouzos RS Nikhil and GM Papadop oulos Pro ject Dataow

A system based on the Monso on architecture and the Id

programming language Technical rep ort MIT Lab for March

Computation Structures Group Memo

Thomas Ball and Susan Horwitz Constructing Control Flow from Data Dep en

dence Technical rep ort University of WisconsinMad iso n TR No

Willia m Baxter and J R Bauer I I I The Program Dep endence Graph in

Vectorization Sixteenth ACM Principles of Programming Languages Symposium

pages January Austin Texas

Michael Burke and Ron Cytron Interpro cedural Dep endence Analysis and Par

allelizati on Proceedings of the Sigplan Symposium on Compiler Construction

July

Michael Burke Ron Cytron Jeanne Ferrante and Wilson Hsieh Automatic

Generation of Nested ForkJoin Parallelism Journal of Supercomputing

July

Rob ert Cartwright and Mathias Felleisen The Semantics of Program Dep endence

SIGPLAN Conference on Programming Language Design and Implementation

pages June

Ron Cytron Jeanne Ferrante and Vivek Sarkar Exp eriences Using Control

Dep endence in PTRAN Proceedings of the Second Workshop on Languages and

Compilers for Paral lel Computing August In Languages and Compilers for

Parallel Computing edited by D Gelernter A Nicolau and D Padua MIT Press

pages

Ron Cytron Jeanne Ferrante and Vivek Sarkar Compact Representations for

Control Dep endence Proceedings of the ACM SIGPLAN Conference on

Programming Language Design and Implementation White Plains

pages June

Ron Cytron Michael Hind and Wilson Hsieh Automatic Generation of DAG

Paralleli sm Proceedings of the ACM SIGPLAN Conference on Programming

Language Design and Implementation Portland Oregon June

J Ferrante and M E Mace On Linearizing Parallel Co de Conf Rec Twelfth

ACM Symp on Principles of Programming Languages pages January

J Ferrante K Ottenstein and J Warren The Program Dep endence Graph and

its Use in Optimization ACM Transactions on Programming Languages and

Systems July

Jeanne Ferrante Mary Mace and Barbara Simons Generating Sequential Co de

From Parallel Co de Proceedings of the ACM International Conference on

Supercomputing pages July

Michael R Garey and David S Johnson Computers and Intractibility A Guide

to the Theory of NPCompleteness WH Freeman and Company

Miland Girkar and Constantine Polychronop ou lo s The HTG An Intermediate

Representation for Programs Based on Control and Data Dep endences Technical

rep ort Center for Sup ercomputing Res and DevUniversity of Illinois May

CSRD Rpt No

Ra jiv Gupta and Mary Lou Soa Region Scheduling Proc of the Second

International Conferenece on Supercomputing May

P Brinch Hansen The programming language Concurrent Pascal IEEE

Transactions on Software engineering SE June

Matthew S Hecht Flow Analysis of Computer Programs Elsevier NorthHolland

Inc

Susan Horwitz Jan Prins and Thomas Reps Integrating NonInterfering Versions

of Programs Conf Rec Fifteenth ACM Symposium on Principles of Programming

Languages pages January

Susan Horwitz Jan Prins and Thomas Reps On the Adequacy of Program

Dep endence Graphs for Representing Programs Conf Rec Fifteenth ACM

Symposium on Principles of Programming Languages pages January

Paul Hudak and Philip Wadler et al Rep ort on the Functional Programming

Language Haskell Technical rep ort Yale University Research Rep ort

YALEUDCSRR

Peter Ladkin and Barbara Simons CompileTime Analysis of Communicating

Pro cesses Proc of the ACM International Conference on Supercomputing

pages July

L Lamp ort The Parallel Execution of DO Lo ops Communications of the ACM

February

Dror E Maydan Saman P Amarasinghe and Monica S Lam Array Data

Flow Analysis and its Use in Array Privatization Conf Rec Twentieth ACM

Symposium on Principles of Programming Languages January

Samuel Midki David Padua and Ron Cytron Compiling Programs with User

Parallelis m Proceedings of the Second Workshop on Languages and Compilers for

Paral lel Computing August

PCF Parallel Computing Forum Final Rep ort Technical rep ort Kuck and

Asso ciates Incorp orated In preparation

W Pugh and D Wonnacott Eliminati ng false data dep endences using the omega

test Proceedings of the ACM SIGPLAN Conference on Programming Language

Design and Implementation San Francisco California pages June

Vivek Sarkar Determining Average Program Execution Times and their Variance

Proceedings of the SIGPLAN Conference on Programming Language Design

and Implementation July

Vivek Sarkar Automatic Partitioning of a Program Dep endence Graph into

Parallel Tasks IBM Journal of Research and Development

Vivek Sarkar The PTRAN Parallel Programming System Paral lel Functional

Programming Languages and Compilers pages

Vivek Sarkar A Concurrent Execution Semantics for Parallel Program Graphs

and Program Dep endence Graphs Extended Abstract Technical Rep ort

YALEUDCSRR Yale University Department of Computer Science August

Conference Record of the th Workshop on Languages and Compilers for

Parallel Computing Yale University August

Reb ecca Parsons Selke A Rewriting Semantics for Program Dep endence Graphs

Sixteenth ACM Principles of Programming Languages Symposium January

Austin Texas

B Simons Constructing a Constructing a Control Flow Graph for Parallel Co de

Technical rep ort To app ear

B Simons and J Ferrante An Ecient Algorithm for Constructing a Control

Flow Graph for Parallel Co de Technical rep ort IBM February Technical

Rep ort TR

Barbara Simons David Alp ern and Jeanne Ferrante AFoundation for Sequen

tializing Parallel Co de Proceedings of the ACM Symposium on Paral lel

Algorithms and Architecture July

Harini Srinivasan and Michael Wolfe Analyzing Programs with Explicit Par

allelism Proceedings of the Fourth Workshop on Languages and Compilers for

Paral lel Computing August To b e published by SpringerVerlag

Michael J Wolfe Optimizing Supercompilers for Pitman London

and The MIT Press Cambridge Massachusetts In the series Research

Monographs in Parallel and Distributed Computing

a

This article was pro cessed using the L T X macro package with LLNCS style E