Abstract

In this pap er we present a concurrent execution semantics for Parallel Program

Graphs PPGs a general parallel program representation that includes Program

Dep endence Graphs PDGs and sequential programs We b elieve that this

semantics is natural to the programmers way of thinking and that it also provides

a suitable execution mo del for ecient implementation on real architectures To

demonstrate the robustness of our semantics we prove a Reordering Theorem

which states that a PPGs semantics do es not dep end on the order in which

parallel no des are executed and an Equivalence Theorem which states that the

semantics of a sequential program is identical to the semantics of its PDG

Intro duction

The Program Dependence Graph PDG is a p opular representation of control and data

dep endences derived from a sequential program PDGs have b een shown to b e useful for

solving a variety of problems including optimization vectorization co de generation

for VLIW machines merging versions of programs and automatic detection

and management of parallelism A PDG no de represents an arbitrary sequential

computation eg a basic blo ck a statement or an op eration An edge in a PDG represents

a control dependence or a data dependence PDGs reveal the inherent parallelism in a

program written in a sequential by removing articial sequencing

constraints from the program text

In this pap er we intro duce the Paral lel Program Graph PPG as a more general in

termediate representation of parallel programs than PDGs PPGs contain mgoto edges

that represent parallel ow of control and synchronization edges that imp ose ordering

constraints on execution instances of PPG no des As they execute the PPG no des p erform

read and write accesses to a shared memory If read and write accesses to the same lo cation

are not prop erly guarded by mgoto edges or by synchronization edges then the PPGs

execution may incur an access anomaly If a read access is p erformed in parallel with a

write access that changes the lo cations value then the result of the read access is undened

all memory accesses are assumed to b e nonatomic Similarly the result of two parallel

write accesses with dierent values is also undened Note that nonatomicity implies that

memory accesses cannot b e used for synchronization the mgoto edges and synchronization

edges are the only mechanisms available in the PPG for co ordinating execution instances

of PPG no des

Given the widespread use of the PDG there is a strong motivation to understand its

parallel execution semantics The semantics of PDGs has b een examined in past work

by Selke and by Cartwright and Felleisen Both those approaches assumed a restricted

programming language the language W and presented a valueoriented semantics for

PDGs The valueoriented nature of the semantics makes it inconvenient to mo del store

oriented op erations such as an up date of an array element or of a p ointerdereferenced

lo cation Also the control structures in the language W are restricted to ifthenelse

and whiledo the semantics did not cover programs with arbitrary control ow However

PDGs can b e used to represent arbitrary unstructured control ow and arbitrary data

read and write accesses as found in imp erative programming languages like C and

Pascal This generality leads to many subtleties and p otential ambiguities in interpreting

a PDGs parallel execution

In this pap er we resolve these subtleties and ambiguities by dening a parallel imp erative

semantics for PPGs and hence for PDGs in their full generality To demonstrate the

robustness of our semantics we prove a Reordering Theorem which states that a PPGs

semantics do es not dep end on the order in which parallel no des are executed and an

Equivalence Theorem which states that the semantics of a sequential program is identical to

the semantics of its PDG Our semantics accounts for the p ossibility of program exceptions

like race conditions and deadlo ck and that the theorems also take these p ossibilities into

account

We b elieve that the semantics presented in this pap er is natural to the programmers

way of thinking and lends itself to ecient concurrent execution mo dels for PPGs The

semantics is dened with resp ect to a global scheduling system The scheduling system

allows a programmer or a debugging system to follow the progress of a PPGs execution

in a stepbystep manner if so desired Parallel no des may b e executed in any orderthe

Reordering Theorem guarantees that all no de orderings yield the same semantics The

global scheduling system is dened as a centralized scheduling algorithm so as to provide

a simple and convenient mental abstraction for the programmer a distributed scheduler

would probably b e the preferred way to actually implement such a scheduling system on a

parallel architecture Our semantics is imperative the execution of a PPG no de is dened

by read and write accesses into a shared global memory This is in contrast to the value

oriented semantics prop osed for PDGs in where a PDG no de is viewed as a function

that consumes input values from its input data dep endence edges and pro duces output

values on its output data dep endence edges It is well known that a valueoriented mo del

can b e inecient to implement in practice b ecause the extra copying overhead required

for storeoriented op erations like the up date of an array element can b e prohibitive It

is also awkward to force the programmer to think of storeoriented op erations in a value

oriented way Our semantics for PPGs is strict the entire program execution is assumed

to return error if any statement in the program returns This is in contrast to

the nonstrict semantics prop osed for PDGs in similar to the nonstrict semantics of

functional languages We b elieve that a strict semantics is more natural for PPGs b ecause

the source programming languages typically used for PDGs eg Fortran Pascal C also

have a strict semantics Further a strict semantics is more ecient to implement than a

nonstrict semantics b ecause it do es not require supp ort for demanddriven evaluation In

summary we b elieve that our parallel imp erative and strict semantics is more appropriate

for PPGs and PDGs than other semantics that have b een prop osed b ecause it is a

logical extension of the semantics of the PDGs source languages b ecause it is applicable

to general PDGs as they are used in practice and b ecause it can b e implemented more

eciently

The rest of the pap er is organized as follows Section denes the PPG representation as

well as the virtual paral lel architecture that serves as the target execution mo del for PPGs

Section denes the global scheduling system for PPGs Section presents the concurrent

execution semantics for PPGs and also proves the Reordering Theorem Section relates

PDGs to PPGs and also proves the Equivalence Theorem Section discusses related work

and Section contains the conclusions of this pap er and an outline of p ossible directions

for future work

Denition of the Parallel Program Graph PPG

Denition A Paral lel Program Graph PPG G N E E star t consists of

mg oto sy nc

a set of no des N a set of mgoto multiple goto edges E a set of synchronization

mg oto

edges E and a designated start no de star t N

sy nc

We distinguish b etween a PPG no de and an execution instance of a PPG no de ie a

dynamic instantiation of that no de Given an execution instance I of PPG no de a its

a

execution history H I is dened as the sequence of PPG no de and lab el values that

a

caused execution instance I to o ccur A PPGs execution b egins with a single execution

a

instance I of the start no de with H I an empty sequence

star t star t

An mgoto edge in E is a triple of the form a b L which denes a PPG edge from

mg oto i

no de a to no de b with branch lab el or branch condition L In general there may b e

i

multiple outgoing mgoto edges from no de a with branch lab el L fa b L a b Lg

1 k

The semantics of mgoto edges is as follows Consider an execution instance I of no de

a

a that evaluates no de as branch lab el to b e L After completion execution instance I a

creates a new execution instance I of each target no de b and then terminates itself

b i

i

The execution history of I is simply H I H I a L where is the sequence

b b a

i i

concatenation op erator

A synchronization edge in E is a triple of the form a c f which denes a PPG

sy nc

edge from no de a to no de c with synchronization condition f f H H is a Bo olean

1 2

function on execution histories Given two execution instances I and I of no des a and c

a c

f H I H I returns true if and only if execution instance I must complete execution

a c a

b efore execution instance I can b e started Note that the synchronization condition

c

dep ends only on execution histories and not on any program data values 2

A PPG no de represents an arbitrary sequential computation The mgoto edges sp ecify

how execution instances of PPG no des are created unravelled and the synchronization

edges sp ecify how execution instances need to b e synchronized A formal denition of

the execution semantics of mgoto edges and synchronization edges is presented later in

Sections and

The parallel computation mo del assumed in this pap er can b e viewed as a virtual paral lel

architecture This abstract architecture consists of an unb ounded numb er of asynchronous

virtual pro cessors that communicate through a shared global memory In our mo del all

read and write accesses to the shared memory ie all interpro cessor communications are

assumed to b e nonatomic transactions Nonatomicity is less restrictive than atomicity

and makes it easier and more ecient to implement the virtual parallel architectures

memory access mo del on a real architecture The nonatomicity assumption has an im

p ortant eect on the semantics of parallel reads and writes to the same lo cation If two

write accesses with dierent values are issued in parallel to the same memory lo cation the

1

resulting value is undened in our mo del We say that b oth write accesses incur a write

write hazard In other sharedmemory mo dels it is usually assumed that the resulting

value must b e one of the two values b eing written If a read access to a memory lo cation

is issued in parallel with a write access that changes the value in the lo cation the result of

the read op eration is undened in our mo del We say that the read access incurs a read

write hazard The semantics of writewrite and readwrite hazards are formally sp ecied

in Section Denition

A desirable consequence of the nonatomicity assumption is that there is a logical separation

b etween interpro cessor synchronization and interpro cessor communication in the archi

tecture Interpro cessor synchronization is sp ecied by the PPGs mgoto edges and syn

chronization edges Interpro cessor communication is sp ecied by the readwrite shared

memory accesses in the PPGs no des Due to nonatomicity it is not p ossible to p erform

any synchronization by using sharedmemory accesses

1

Note that the result of two parallel write accesses with the same value is welldened

Scheduling Mo del for PPGs

In this section we present a scheduling system for PPGs that unravels a PPG into a

partial order dened on execution instances of PPG no des based on the semantics of

mgoto edges and synchronization edges in the PPG The partial order is a conceptual to ol

used to dene the scheduling system We do not suggest that the partial order b e actually

constructed as a data structure when scheduling execution instances of PPG no des on

real architectures instead more ecient schedulingsynchronization mechanisms should

b e used as appropriate for the target architecture to enforce the constraints imp osed by

the partial order eg forkjoin op erations counting semaphores guards etc

Figure outlines a scheduling algorithm for scheduling PPG no des on the virtual parallel

architecture describ ed in Section The initial contents of memory lo cations are dened

by an input store PPG execution b egins with a single execution instance of the start

i

no de Each iteration of the whilelo op in Figure schedules an execution instance that

has b een created and whose predecessors have previously b een scheduled Step schedules

the execution instance on a virtual pro cessor and evaluates the branch lab el obtained on

completion Step creates new execution instances for successor no des as dened by the

semantics of mgoto edges Denition

Step also sp ecies how the partial order should b e up dated after creating a new execution

Steps b and c resp ectively up date for all successor and predecessor instance I

b

i

step b enumer To identify successor execution instances of I execution instances of I

b b

i i

ates each acyclic sequence of mgoto edges S x x L x x L that starts

1 2 1 k k +1 k

with PPG no de x b examines each synchronization edge of the form x c f and

1 i k +1

P H I true where P checks if there is an execution instance I such that f H I

c c b

i

x L x L is the history sux dened by S To identify predecessor execution

1 1 k k

step c examines each synchronization edge of the form d b f then instances of I

i b

i

enumerates each acyclic sequence of mgoto edges S x x L x x L

1 2 1 k k +1 k

that ends with PPG no de x d and checks if there is an execution instance I such

k +1 c

that f H I P H I true where c x and P x L x L is the

c b 1 1 1 k k

i

history sux dened by S

Since a partial order is a reexive transitive antisymmetric binary relation by denition

all up dates to in Figure take care to preserve these prop erties

A Concurrent Execution Semantics for PPGs

In this section we present a concurrent execution semantics for PPGs The concurrent

execution semantics for PPGs is based on the scheduling system dened in Section

The op erational semantics presented in this pap er can b e used to determine if two PPGs

have the same semantics and if a given PPGs execution leads to unsafe situations like

readwrite and writewrite hazards nontermination or deadlo ck

Create I an execution instance of the start no de and initialize fI I g

star t star t star t

and H I an empty sequence

star t

while an execution instance I that has b een created but not scheduled such that all of

a

I s predecessors in except I itself have b een scheduled do

a a

Schedule execution instance I

a

let L branch lab el for PPG no de a evaluated at the end of execution instance I

a

for each PPG no de b such that a b L E do

i i mg oto

a Create I an execution instance of b initialized with

b i

i

H I H I a L and up date fI I g

b a b b

i i i

b Add the pair I I to for each instance I that must wait for I

b c c b

i i

for each sequence of k E edges

mg oto

S x x L x x L x x L such that x b and all

1 2 1 2 3 1 k k +1 k 1 i

x s are distinct do

j

Note that S always satises this condition when k

i P x L x L x L History sux obtained from S

1 1 2 2 k k

ii for each synchronization edge x c f in E do

k +1 sy nc

for each execution instance I of no de c that has b een created do

c



I g P H I true then fI if f H I

c c b b

i i

end for

end for

end for

must wait for to for each instance I that I c Add the pair I I

c b c b

i i

for each synchronization edge d b f in E do

i sy nc

for each sequence of k E edges S x x L x x L

mg oto 1 2 1 k k +1 k

such that x d and all x s are distinct do

k +1 j

Note that S always satises this condition

i P x L x L

1 1 k k

ii for each execution instance I of no de c x that has b een created do

c 1



if f H I P H I true then fI I g

c b c b

i i

end for

end for

end for

end for

end while

Figure Scheduling System Algorithm

The eect of a PPGs execution is to map an input store to a nal store For

i f

convenience we assume that the PPGs execution has no other observable eect on the

outside world An IO or graphics device can b e mo deled as a sp ecial memory lo cation

that contains the entire sequence of state changes p erformed on the device instead of just

the nal value written

A store is a mathematical representation of the machines memory which maps memory

lo cation addresses to values addr val l represents the value stored in lo cation l

As in we use the notation l v to represent an up dated store that is identical to the

original store except that l v has the value v in lo cation l Both l and l v

are assumed to b e strict functions so that l if l and l v

if l v

We b egin by dening the semantics of a single execution instance of a PPG no de A PPG

no des computation consists of reading some values from the store evaluating a result and

2

writing the result into a single lo cation in the store We assume that the no des evaluation

is deterministic so that it always computes the same result value from the same set of input

values The op erational semantics of an execution instance of PPG no de n is dened with

resp ect to two parameters that provide the context for the execution instance

An input store

i

A set of parallel writes P AR fl v l v g the set of lo cation

w r ites 1 1 2 2

value pairs for which a write access may b e p erformed in parallel with the given

execution instance of no de n

is required for sp ecifying no de ns input values P AR is required for dening the

i w r ites

parallel semantics of readwrite and writewrite hazards based on the nonatomic memory

access mo del assumed in Section

Denition The operational semantics of an execution instance of PPG node n is

dened by the store mapping n with resp ect to a given set of parallel writes

i f

2

For convenience we assume that a PPG no de writes into a single lo cation in the store The semantics

can b e easily extended to handle no des that write into multiple lo cations in the store To represent such

a no de in the current semantics the no de should b e expanded into multiple no des one for each write lo cation

P AR where

w r ites

8

>

if the execution instance of no de n p erforms a read access

>

>

>

>

on lo cation l and l v P AR st l l v l >

j j w r ites j j i

>

>

>

>

readwrite hazard

>

>

>

>

>

if the execution instance of no de ns computation incurs an exception

>

>

>

<

eg nontermination reading a value from overow etc

i

f

>

l if the execution instance of no de n attempts to write value v into

i

>

>

>

>

>

lo cation l and l v P AR st l l v v

j j w r ites j j

>

>

>

>

>

writewrite hazard

>

>

>

>

l v otherwise assuming that the execution instance of no de n writes

>

i

>

>

:

value v into lo cation l

2

To obtain the op erational semantics of a PPG we dene a scheduled sequence to b e the

sequence of no de execution instances scheduled in step of the while lo op in Figure

assuming that the PPG terminates successfully for the given input store For a given PPG

and input store there may b e several p ossible scheduled sequences based on which ready

execution instance is chosen each time step is p erformed

Denition Let N N b e any scheduled sequence of PPG no de execution

1 k

instances obtained for a given PPG G and input store and let

i

b e the partial order on fN N g dened by the scheduling system algorithm

1 k

in Figure

P AR N fl v j N st N N N N

w r ites j i i j j i

execution instance N writes value v into lo cation lg b e the set of parallel writes

i

for each execution instance N j k

j

Then the operational semantics of PPG G is dened to b e the store mapping G

i

where N N and the P AR N set dened ab ove is used in

f f k 1 i w r ites j

determining the store mapping for each execution instance N j k as sp ecied in

j

Denition

If the scheduling system do es not yield a complete scheduled sequence in which all created

execution instances are scheduled due to some exception then G is dened to b e

i

2

Lemma Consider any two scheduled sequences S N N and S

1 11 12 2

N N that may b e obtained for a given PPG G and input store and consider

21 22 i

any PPG no de a for which there is an execution instance N in S and an execution

1j 1

instance N in S such that H N H N Let the resulting stores after executing

2k 2 1j 2k

N in S and N in S b e N N and N N

1j 1 2k 2 1 1j 11 i 2 2k 21 i

resp ectively Then and implies that execution instances N and N

1 2 1j 2k

p erformed readwrite op erations on the same lo cations and with the same values 2

Pro of by induction on length of scheduled sequences Sketch Since and

1 2

no read access in execution instance N incurred a readwrite hazard in S and the

1j 1

same for execution instance N in S Therefore the read accesses p erformed by N

2k 2 1j

N were prop erly guarded by the PPGs mgoto and synchronization edges in scheduled

2k

sequences S S and all writes to N s N s input lo cations must have b een p erformed

1 2 1j 2k

b efore execution instance N N was scheduled Since execution instances N and

1j 2k 1j

N have the same execution history each input lo cation accessed by N and N must

2k 1j 2k

contain the same value in the two cases by induction on their predecessors With the

same input lo cation values execution instances N and N of no de a must compute

1j 2k

the same result and attempt to store it in the same output lo cation in b oth S and S

1 2

Since the condition for an execution instance incurring a writewrite hazard dep ends on

the P AR set and not on the ordering of predecessors in the scheduled sequences

w r ites

execution instance N must store the same output value in as execution instance N

1j 1 2k

in the output value will b e in the case of a writewrite hazard 2

2

Theorem Reordering Theorem For a given PPG G and input store the nal store

i

G obtained according to Denition is the same for all p ossible scheduled

f i

sequences that can b e chosen by the scheduling system 2

Pro of by contradiction and induction on length of scheduled sequences Sketch As

sume that there exist two scheduled sequences S N N and S

1 11 12 2

N N that yield dierent nal stores Two cases arise

21 22 1 2

If some execution instance in S say execution instance N of no de a causes

1 1j 1

to b e then a corresp onding execution instance of no de a in S say N with

2 2k

H N H N must force

1j 2k 2

If and then each pair of execution instances in S and S with the

1 2 1 2

same execution history and the same source PPG no de must have the same input

and output values by Lemma which means that S and S must take the same

1 2

conditional branches eventually yielding

1 2

Contradiction in b oth cases 2

Relating Program Dep endence Graphs PDGs to

Parallel Program Graphs PPGs

A Program Dep endence Graph PDG consists of a set of no des connected by control

dependence and data dependence edges and is derived from a sequential program A

control dep endence edge is a triple of the form a b L which denes a control dep endence

from no de a to no de b with branch lab el L A data dep endence edge is a triple of the form

a b C which denes a data dep endence from no de a to no de b with context information C

that identies the execution instances of a and b that are involved in the data dep endence

By default the context information C states that the dep endence must hold for each

plausible pair of execution instances I and I ie for each I I pair such that I is

a b a b b

executed after I in the original sequential program However in many imp ortant cases the

a

context information C contains direction vectors or distance vectors which give us

more information ab out the execution instances involved in lo opcarried data dep endences

Direction vectors and distance vectors are dened with resp ect to a nest of singleentry

lo ops enclosing no des a and b in the sequential program

It is easy to build a PPG from a given PDG The same set of no des is used in b oth

cases The control and data dep endence edges in the PDG corresp ond to mgoto edges and

synchronization edges in the PPG A control dep endence edge from the PDG can directly

b e used as an mgoto edge in the PPG A data dep endence edge from the PDG can b e

used as a synchronization edge in the PPG by translating the context information C to an

appropriate synchronization condition f on execution histories For convenience we will

use the notation PPGG to refer to the PPG corresp onding to PDG G

However not all PPGs are derivable from PDGs which is why PPGs are more general

than PDGs This issue has b een studied in It was shown that there is a class of

PPGs for which it is not p ossible to generate an equivalent sequential control ow graph

without duplicating co de or inserting b o olean guards We call this class of PPGs non

serializable A PPG for which it is p ossible to generate a sequential control ow graph

without co de duplication or insertion of b o olean guards is called serializable Figure

shows a nonserializable PPG with ve mgoto edges and no synchronization edges Note

that no de will b e executed twice if predicate no des and all evaluate to true This

PPG can b e made serializable by splitting no de into two copies one for predicate no de

and one for predicate no de

1 F TT

2 3 5

TT

4

Figure Example of a nonserializable PPG

Just as we dened PPGG to b e the PPG corresp onding to PDG G we dene PPGP to

b e the PPG corresp onding to a sequential program P A sequential program is dened by its

control ow graph The control ow graph itself can b e viewed as a degenerate PPG in

which there are no synchronization edges and in which all mgoto edges are sequential

ie for a given no de a and branch lab el L there is at most one no de b such that

a b L E Let N N b e the execution instance sequence obtained

mg oto 1 k

by executing a sequential program P as a PPG on input store assuming that program

i

P terminates successfully Then the op erational semantics of sequential program P will b e

the comp osition of the individual execution instance mappings P N N

i k 1 i

assuming that P AR N j k If program P do es not terminate

w r ites j

successfully for input store then P

i i

We conclude this section with the Equivalence Theorem which states that the semantics of

a sequential program P is identical to the semantics of its PDG G by using the execution

semantics dened in Section to show that PPGP and PPGG must have the same

semantics

Theorem Equivalence Theorem A sequential program P and its corresp onding

PDG G have identical semantics ie P G for all input stores 2

i i i

Pro of Sketch The execution instance sequence N N obtained from executing

1 k

the sequential program P should b e a valid scheduled sequence for PDG G We assume

that PDG G correctly expresses all the control and data dep endences in program P which

implies that no scheduled sequence for G including N N incurs a readwrite

1 k

hazard or a writewrite hazard Therefore the P AR sets in Denition have no

w r ites

impact on PDG Gs op erational semantics So using Denition and Theorem we

have P N N G 2

i k 1 i i

Related Work

The idea of extending PDGs to PPGs was intro duced by Ferrante and Mace in and

further extended by Alp ern Ferrante and Simons in Several restrictions were imp osed

on the PPGs dened in a the PPGs could only contain control ow edges but

no lo opcarried data dep endence synchronization edges b only a sp ecial kind of no de

forall was allowed to transfer control to more than one target no de c only forall

no des were allowed to b e merge no des ie to have multiple incoming edges d only a

predicate no de could b e the source of multiple nonintersecting paths that reach the same

merge no de and e only reducible lo ops were considered The PPGs dened in this pap er

have none of these restrictions Restrictions a and e limited their PPGs from b eing

able to represent all PDGs that can b e derived from sequential programs restriction a

is a serious limitation in practice given the imp ortant role played by lo opcarried data

dep endences in representing lo op parallelism Restrictions b and c are minor in

their PPGs a multiplegoto op eration or a merge op eration is explicitly represented by

a forall op erator whereas in our PPGs these op erations are implicit in the denition

of the E edges Restriction d prohibited a forall op erator from creating multiple

mg oto

execution instances of the same no de eg the PPG shown in Figure cannot b e expressed in

their PPG mo del b ecause it is p ossible for no de to b e executed twice The Hierarchical

Task Graph HTG prop osed by Girkar and Polychronop oulos is also a form of a

parallel program graph applicable only to structured programs that can b e represented by

the hierarchy dened in the HTG

Apart from the restrictions in the PPG mo del the ma jor dierence with is that the

problem addressed by their work was that of linearizing parallel co de ie of determining

when it is p ossible to transform a given PPG into an equivalent sequential control ow

graph without duplicating co de or inserting b o olean guards In this pap er we address the

problem of dening the concurrent execution semantics of PPGs whether or not the PPGs

can b e linearized

As mentioned earlier the semantics of PDGs has b een examined in past work by Selke

and by Cartwright and Felleisen In Selke presented an op erational semantics for

PDGs based on graph rewriting In Cartwright and Felleisen presented a denotational

semantics for PDGs and showed how it could b e used to generate an equivalent functional

dataow program Both those approaches assumed a restricted programming language

the language W and presented a valueoriented semantics for PDGs The valueoriented

nature of the semantics made it inconvenient to mo del storeoriented op erations that are

found in real programming languages like an up date of an array element or of a p ointer

dereferenced lo cation Also the control structures in the language W were restricted to

ifthenelse and whiledo the semantics did not cover programs with arbitrary control

ow In contrast our semantics is applicable to PDGs with arbitrary unstructured control

ow and arbitrary data read and write accesses and more generally to PPGs as dened in

this pap er

In Selke presented a graph rewriting semantics for PDGs that is similar in spirit to the

graph reduction mo del for executing functional programs The program execution mo del

in is based on rewriting rules that incrementally create a new PDG from an old PDG

This mo del has no explicit store Instead values are propagated along data ow edges into

the expressions that need them contains a result similar to the Reordering Theorem

for the graph rewriting mo del which states that the parallel rewriting of a set of redexes

yields the same result as any sequential rewriting of those redexes for a deterministic

PDG However Theorem is applicable to b oth deterministic and nondeterministic

PPGs This is b ecause nondeterminism manifests itself as a writewrite or a readwrite

hazard in our mo del which results in a value for the lo cation or for the entire store

In Cartwright and Felleisen presented a demandoriented denotational semantics for

the W language and shows how the denotational denition of a program can b e used to

to generate its semantic PDG The semantic PDG is like an executable dataow graph

that p erforms a demanddriven nonstrict execution of the original program However

the programming languages that motivate the use of the PDG eg Fortran Pascal C

have strict semantics which diers from the nonstrict PDG execution semantics assumed

in contains a result stating that the execution semantics of a PDG either deforder

or output may b e dierent from the semantics of sequential execution sp ecically the

PDG semantics dominates the sequential semantics One reason for the dierence is that

they assumed a strict store up date function for the sequential semantics and a nonstrict

store up date function for the PDG semantics However the Equivalence Theorem is valid

in our mo del b ecause we use a strict store up date function for b oth sequential programs

and PPGs and also b ecause of the semantics given to readwrite and writewrite hazards

The work done in and attempts to the bridge the semantic gap b etween an

imp erative language and a functional executional mo del by providing a functionalstyle

semantics for PDGs This research direction has also b een undertaken in the eld of

where there is ongoing work in translating imp erative programs to dataow

graphs In contrast this pap er provides an imp erativestyle semantics for PPGs

and PDGs so as to represent in their full generality imp erativestyle parallel programming

mo dels that allow for arbitrary unstructured control ow as well as arbitrary data read

and write accesses

Conclusions and Future Work

In this pap er we have presented a concurrent execution semantics for Parallel Program

Graphs PPGs a general parallel program representation that also encompasses PDGs and

sequential programs We b elieve that this semantics is natural to the programmers way of

thinking and that it also provides a suitable execution mo del for ecient implementation

on real architectures The robustness of the semantics is demonstrated by the Reordering

Theorem and the Equivalence Theorem

We see several p ossible directions for future work based on the PPG representation and

the concurrent execution semantics presented in this pap er

Extend sequential program analysis techniques to the PPG

All program analysis techniques currently used in compilers eg constant propaga

tion induction variable analysis computation of Static Single Assignment form etc

op erate on a sequential control ow graph representation of the program There has

b een some recent work in the area of program analysis in the presence of structured

parallel language constructs However just as the control ow graph is the

representation of choice due to its simplicity and generality for analyzing sequential

programs it would b e desirable to use the PPG representation p ossibly restricted

to just mgoto edges as a simple and general representation for parallel programs

Use semantics to prove legality of program transformations

Both the PPG representation and its execution semantics presented in this pa

p er should lend themselves nicely to sp ecications and legality pro ofs for program

transformations The PPG representation is at a level that is most convenient for

sp ecifying program transformations compare to the sourcelanguage level and the

machineinstruction level The semantics provides a complete description of the

result of a PPGs execution and avoids the kind of ambiguity problems that arise

when the concurrent execution semantics for a parallel programming language is

partly unsp ecied and left op en to interpretation

Hardware support features for scheduling system

Even though the scheduling system dened in Section was presented as a theoretical

abstraction it lends itself nicely to ecient hardware implementations The parallel

ow of control dened by mgoto edges is similar to lightweight threads in a multi

threaded architecture an obvious optimization is to not create a new thread for a

sequential branch but only when control is transferred to two or more parallel branch

targets In general it can b e exp ensive to implement an arbitrary synchronization

function for a PPG edge However in many cases the synchronization function can

b e implemented by incrementing and decrementing countingsemaphore variables

which can b e p erformed very eciently if hardware supp ort is available in the form

of synchronization registers It would also b e interesting to pursue hardware imple

mentations for the restricted class of PPGs that contain only mgoto edges

Develop a common compilation and execution environment for dierent paral lel pro

gramming languages

The PPG can b e used as the basis for a common compilation and execution envi

ronment for parallel programs written in dierent languages The scheduling system

dened in Section can b e develop ed into a PPGbased interpreter debugger or

runtime system for parallel programs that provides feedback to the user ab out

the parallel programs execution eg parallelism proles detection of readwrite

or writewrite hazards detection of deadlo ck etc Currently such programming

to ols are b eing develop ed separately for dierent languages with dierent ad ho c

assumptions b eing made ab out their parallel execution semantics The PPG repre

sentation could b e useful in leading to a common environment for dierent parallel

programming languages

Acknowledgments

The author would like to thank Barbara Simons for her detailed comments and suggestions

and Jeanne Ferrante for early discussions related to this work

References

AV Aho R Sethi and JD Ullman Compilers Principles Techniques and Tools

AddisonWesley

Frances Allen Michael Burke Philipp e Charles Ron Cytron and Jeanne Ferrante An

Overview of the PTRAN Analysis System for Multipro cessing Proceedings of the ACM

International Conference on Supercomputing Also published in The Journal of

Parallel and Distributed Computing Oct pages

Frances Allen Michael Burke Ron Cytron Jeanne Ferrante Wilson Hsieh and Vivek Sarkar

AFramework for Determining Useful Parallelism Proceedings of the ACM International

Conference on Supercomputing pages July

David Alp ern Jeanne Ferrante and Barbara Simons A Foundation for Sequentializing

Parallel Co de Proceedings of the Second ACM Symposium on Paral lel Algorithms and

Architectures July

William Baxter and J R Bauer I I I The Program Dep endence Graph in Vectorization

Sixteenth ACM Principles of Programming Languages Symposium pages January

Austin Texas

Micah Beck and Keshav Pingali From Control Flow to Dataow Technical rep ort

Department of Cornell University Octob er TR

Rob ert Cartwright and Mathias Felleisen The Semantics of Program Dep endence SIGPLAN

Conference on Programming Language Design and Implementation pages June

Ron Cytron Jeanne Ferrante and Vivek Sarkar Exp eriences Using Control Dep endence

in PTRAN Proceedings of the Second Workshop on Languages and Compilers for Paral lel

Computing August In Languages and Compilers for edited by

D Gelernter A Nicolau and D Padua MIT Press pages

J Ferrante and M E Mace On Linearizing Parallel Co de Conf Rec Twelfth ACM Symp

on Principles of Programming Languages pages January

J Ferrante K Ottenstein and J Warren The Program Dep endence Graph and its Use in

Optimization ACM Transactions on Programming Languages and Systems

July

Miland Girkar and Constantine Polychronop oulos The HTG An Intermediate Representa

tion for Programs Based on Control and Data Dep endences Technical rep ort Center for

Sup ercomputing Res and DevUniversity of Illinois May CSRD Rpt No

Ra jiv Gupta and Mary Lou Soa A Recongurable LIW Architecture and its

Proc of the Intl Conf on Paral lel Processing Aug

Ra jiv Gupta and Mary Lou Soa Region Scheduling Proc of the Second International

Conferenece on Supercomputing May

Susan Horwitz Jan Prins and Thomas Reps Integrating NonInterfering Versions of

Programs Conf Rec Fifteenth ACM Symposium on Principles of Programming Languages

pages January

L Lamp ort The Parallel Execution of DO Lo ops Communications of the ACM

February

Samuel Midki David Padua and Ron Cytron Compiling Programs with User Parallelism

Proceedings of the Second Workshop on Languages and Compilers for Paral lel Computing

August

Karl J Ottenstein Rob ert A Ballance and Arthur B Maccab e Gated SingleAssignment

Form Dataow Interpretation for Imp erative Languages Proceedings of the ACM SIGPLAN

Conference on Programming Language Design and Implementation White Plains New

York pages June

Reb ecca Parsons Selke A Rewriting Semantics for Program Dep endence Graphs Sixteenth

ACM Principles of Programming Languages Symposium January Austin Texas

Harini Srinivasan and Michael Wolfe Analyzing Programs with Explicit Parallelism

Proceedings of the Fourth Workshop on Languages and Compilers for Paral lel Computing

August To b e published by SpringerVerlag

Michael J Wolfe Optimizing Supercompilers for Pitman London and The

MIT Press Cambridge Massachusetts In the series Research Monographs in Parallel

and Distributed Computing This monograph is a revised version of the authors PhD

dissertation published as Technical Rep ort UIUCDCSR U Illinois at Urbana Champaign