The Semantics of Future and Its Use in Program Optimization

Cormac Flanagan

Department of Computer Science

Rice University

Houston Texas

Abstract pression had immediately returned Instead of a prop er

value though it returns a placeholder When a program

The future annotations of MultiLisp provide a simple

op eration requires sp ecic knowledge ab out the value

metho d for taming the implicit parallelism of functional

of some subcomputation but nds a placeholder in its

programs Past research concerning futures has fo cused

place the runtime system p erforms a touch op eration

on implementation issues In this pap er we present a se

which synchronizes the appropriate parallel threads

ries of op erational semantics for an idealized functional

Past researchonfutures has almost exclusively con

language with futures with varying degrees of inten

centrated on the ecient implementation of the under

sionality Wedevelop a setbased analysis algorithm

lying task creation mechanism and

from the most intensional semantics and use that al

on the extension of the concept to rstclass continu

gorithm to p erform touch optimization on programs

ations In contrast the driving force b ehind

Exp eriments with the Gambit indicates that

our eort is the desire to develop a semantic frame

this optimization substantially reduces program execu

work and semantically wellfounded optimizations for

tion times

languages with future The sp ecic example wecho ose

to consider is the development of an algorithm for re

moving provablyredundant touch op erations from pro

Implicit Parallelism via Annotations

grams Our primary results are a series of semantics

for a functional language with futures and a program

Programs in functional languages oer numerous opp or

analysis The rst semantics denes future to b e a

tunities for executing program comp onents in parallel

semanticallytransparent annotation The second one

In a callbyvalue language for example the evalua

validates that a future expression interpreted as pro cess

tion of every function application could spawn a par

The last one is a lowlevel rene creation is correct

allel thread for each subexpression However if sucha

ment which explicates just enough information to p er

strategy were applied indiscriminately the execution of

mit the derivation of a setbased program analysis

a program would generate far to o many parallel threads

The secondary result is a touch optimization algorithm

The overhead of managing these threads would clearly

based on the analysis with its correctness pro of The

outweigh any b enets from parallel execution

algorithm was added to the Gambit Scheme compiler

The future annotations of MultiLisp and its

and pro duced signicant sp eedups on a standard set of

Scheme successors provide a simple metho d for taming

benchmarks

the implicit parallelism of functional programs If a pro

The presentation of our results pro ceeds as follows

grammer b elieves that the parallel evaluation of some

The second section intro duces an idealized functional

expression out weighs the overhead of creating a sepa

language with futures together with its denitional

rate task he may annotate the expression with the key

sequential semantics that interprets futures as noops

word future An annotated functional program has the

The third section presents an equivalent parallel seman

same observable b ehavior as the original program but

tics for futures and the fourth section contains the low

the runtime system maycho ose to evaluate the future

level renement of that semantics The fth section dis

expression in parallel with the rest of the program If it

cusses the cost of touch op erations and presents a prov

do es the evaluation will pro ceed as if the annotated ex

ably correct algorithm for eliminating unnecessary touch

Supp orted in part by NSF grant CCR Texas ATP grant

op erations The latter is based on the setbased analy

and a sabbatical at Carnegie Mellon University

sis algorithm of the sixth section The seventh section

presents exp erimental results demonstrating the eec

tiveness of this optimization Section eight discusses re

To app ear in the nd Annual ACM SIGPLAN

SIGACT Symp osium on Principles of Programming lated work For more details we refer the interested

Languages San Francisco California January

reader to two technical rep orts on this work

If the argument is not a pair then the transition rule

M x Terms

a

pro duces the state error

j let xV M

The rule for future pretends that future is the iden

j let x future M M

j let x car y M

tity op eration It demands that the b o dy of a future

j let x cdr y M

expression is rst reduced to a value and then replaces

j let x if yM M M

the name for the future expression with this value

j let x apply yz M

The denition of the transition function relies on the

V Value c j x j x M j xy Values

notion of evaluation contextsAnevaluation context E

x Vars fxyzg Variables

c Const ftrue false g Constants

is a term with a hole in place of the next subterm to

be evaluated eg in the term let xM M the next

Figure The Anormalized Language

a

subterm to b e evaluated is M andthus the denition

of evaluation contexts includes let x E M

Amachine state is a nal state if it is a value or the

state error No transitions are p ossible from a nal

AFunctional Language with Futures

state and for any state that is not a nal state there

is a unique transition step from that state to its succes

Syntax Given the goal of developing a seman

sor state This implies that the relation eval is a total

tics that is useful for proving the soundness of opti

c

function Either the transition sequence for a program

mizations wedevelop the denitional semantics for fu

P terminates in a nal state in whichcaseeval P is

tures for an intermediate representation of an idealized

c

an answer or error or else the transition sequence is

functional language Sp ecicallywe use the subset of

innite in which case eval P Since the evalua

Anormal forms of an extended calculuslikelan

c

guage that includes conditionals and a future construct tics tor eval obviously agrees with the sequential seman

c

of the underlying functional language future is clearly

see Figure The language also includes primitives for

nothing but an annotation

list manipulation which serve to illustrate the treatment

of primitive op erations and an unsp ecied set of basic

constants numb ers b o oleans

AParallel Op erational Semantics

The key prop erty of terms in Anormal form is that

eachintermediate value is explicitly named and that

The sequential C machine denes future as an anno

the order of execution follows the lexical nesting of let

tation and ignores the intension of future as an advi

expressions The use of Anormal forms facilitates the

sory instruction concerning parallel evaluation Toun

compiletime analysis of programs and it simplies

derstand this intensional asp ect of futurewe need a

the denition of abstract machines

semantics of future that mo dels the concurrentevalua

Wework with the usual conventions and terminology

tion of future expressions

of the lamb da calculus when discussing syntactic issues

In particular the substitution op eration M x V re

The P C machine The state space of the P C

places all free o ccurrences of x within M by V X de

machine is dened in the rst part of Figure The

notes the set of closed terms of typ e X terms values

set of P C values includes the values of the sequen

and M P denotes that the term M o ccurs in the pro

tial C machine constants variables closures and pairs

gram P Also we use the following notations through

whichwe refer to as proper values To mo del futures the

out the pap er P denotes the p owerset constructor

P C machine also includes a new class of values called

f A B denotes that f is a total function from

placeholder variables A placeholder variable p repre

A to B and f A B denotes that f is a partial

p

sents the result of a computation that is in progress

function from A to B

Once the computation terminates all o ccurrences of the

placeholder are replaced by the value returned bythe

Denitional Semantics The semantics of

a

computation

is a function from closed programs to results A re

ts a single thread of Each C machine state represen

sult is either an answer whichisaclosedvalue with

control To mo del parallel threads the P C machine

all expressions replaced by procedureor error in

includes additional states of the form flet pS S

dicating that some program op eration was misapplied

The primary substate S is initially the b o dy of a fu

or if the program do es not terminate We sp ecify the

ture expression and the secondary substate S is ini

denitional semantics of the language using a sequential

tially the evaluation context surrounding the future ex

abstract machine called the C machine see Figure

pression The placeholder p represents the result of S

whose states are either closed terms over the runtime

in S The usual conventions for binding constructs like

language or else the sp ecial state error and whose

c

and let apply to flet The function FP returns the

deterministic transition rules are the typical leftmost

set of free placeholders in a state The evaluation of

outermost reductions of the calculus Each transi

S is mandatory since it is guaranteed to contribute to

tion rule also sp ecies the error semantics of a particular

the completion of the program The evaluation of S is

class of expressions For example the transition rule for

speculative since suchwork may not b e required for the

car denes that if the argumentto car is a pair then

termination of the program In particular if S raises

the transition rule extracts the rst elemen of the pair

an error signal then the evaluator discards the state

Evaluator

eval Answers ferror g

c

a

unload V if P V

c

c

error if P error

eval P

c

c

if i N M State suchthatP M and M M

c c

i i i

Data Sp ecications Unload Function

S State M j error States

Answers unload Value

c

c

c

M V Runtime Language

unload c c

c

c

j let xV M

unload x M procedure

c

j let x future M M

unload cons V V cons A A

c

   

j let x car V M

A unload V

c

i i

j let x cdr V M

j let x if VMM M

j let x apply VV M

j let xM M

V Value c j x j x M j cons VV Runtime Values

c

A Answers c j procedure j cons AA Answers

E EvalCtxt Evaluation Contexts

j let x E M

j let x future E M

Transition Rules

E let xV M E M x V bind

c

E let x future V M E M x V futureid

c

n

E M x V if V cons V V

  

E let x car V M car

c

error if V cons V V

 

E let x cdr V M analogous to car cdr

c

n

E let xM M if V false



E let x if VM M M if

c

 

E let xM M if V false



n

E let xNy V M if V y N

 

E let x apply V V M apply

c

 

error if V y N



Figure The sequential C machine

S and any eort invested in the evaluation of S is The transition rules join and joinerror merge

wasted The distinction b etween mandatory and sp ecu distinct threads of evaluation When the primary sub

lative steps is crucial for ensuring a sound denition of state S of a parallel state flet pS S returns a

an evaluator and is incorp orated into the denition of value then the rule join replaces all o ccurrences of

the transition relation the placeholder p within S bythatvalue If the pri

mary substate S evaluates to error then the rule

joinerror discards the secondary substate S and re

Transition Rules We sp ecify the transition relation

nm

turns error as the result of the parallel state

of the P C machine as a quadruple If S S

pc

lift restructures nested parallel The transition rule

holds then the index n is the numb er of steps involved

states and thus exp oses additional parallelism in certain

in the transition from S to S and the index m n is

cases Consider flet p flet p S V S The

the numb er of these steps that are mandatory

rule lift allows the value V to b e returned to the sub

The transition rules bind futureid car cdr

state S via a subsequentjoin transition without

if and apply are simply the rules of the C machine

having to wait on the termination of S

appropriately mo died to allow for placeholder variables

The rules reexive and transitive close the rela

An application of one of these rules counts as a manda



tion under reexivity and transitivityWe write S

pc

tory step

nm

S if S S for some n m N

The transition rule fork initiates parallel evalua

pc

tion This rule may b e applied whenever the current

term includes a future expression within an evaluation

Indeterminism Unlikethe C machine in whicheach

text ie con

state has a unique successor state the transition rules

of the P C machine denote a true relation In partic

E let x future N M

ular the denition do es not sp ecify when the transi

tion rule fork applies For example given the state

The future annotation allows the expression N to b e

E let x future N M the machine may pro ceed

evaluated in parallel with the enclosing context The

either byevaluating N sequentiallyorby creating a new

machine creates a new placeholder p to representthe

task via a fork transition An implementation may

result of N and initiates parallel evaluation of N and



E let xp M

A second reason for the inclusion of this rule is that it is necessary

for an elegant pro of of the consistency of the machine using a mo died

The transition rule paral lel p ermits concurrenteval

form of the diamond lemma of the lamb da calculus

uation of b oth substates of a parallel state

Evaluator

Answers ferror g eval

pc

a

unload V if P V

c

pc

error if P error

eval P

pc

pc

n m

i i

if i N S State n m N with m P S and S S

pc

i i i i i pc i

Data Sp ecications

S State M j error j flet pS S States

pc

M V j let xV M j As for

pc c

V Value c j x j x M j cons VV j p Runtime Values

pc

p PhVars fp p p g Placeholders Variables

  

Transition Rules



E M x V bind E let xV M

pc



E M x V futureid E let x future V M

pc

n

E M x V if V cons V V



  

car E let x car V M

pc

error if V cons V V V p

 



analogous to car cdr E let x cdr V M

pc

n

E let xM M if V falseV p





if if VM M M E let x

 

pc

E let xM M if V false



n

E let xNy V M if V y N



 

E let x apply V V M apply

 

pc

error if V y N V p

 



flet pN E M x p p FP E FP M fork E let x future N M

pc



flet pV S S p V join

pc



flet p error S error joinerror

pc



flet p flet p S S S flet p S flet p S S p FPS lift

           

pc

cd ab nb

n a c paral lel S S if S S S flet pS S flet pS

   

pc pc pc

   

S S reexive

pc

nm ab cd

S S if S S S S n a c m b d n transitive

pc pc pc

Figure The parallel P C machine

consequently cho ose to ignore future expressions along This evaluation diverges b ecause it exclusively con

the lines of the C machine which yields a sequential ex sists of sp eculative transition steps and do es not include

ecution to execute fork as early as p ossible whichyields any mandatory transition steps that contribute to the

an eager task creation mechanism or to cho ose sequential evaluation of the program The evaluator

some strategy in b etween the extremes which yields lazy for the P C machine excludes these excessively spec

task creation ulative transition sequences and only considers transi

A second source of indeterminism is the transition tion sequences that regularly includes mandatory steps

rule paral lel This rule do es not sp ecify the number of For a terminating transition sequence the number of

steps that parallel substates must p erform b efore they sp eculative steps p erformed is implicitly b ounded For

synchronize An implementationofthemachine can use nonterminating sequences the denition of the evalu

almost anyscheduling strategy for allo cating pro cessors ator explicitly requires the p erformance of mandatory

to tasks as long as it regularly schedules the mandatory transition steps on a regular basis This constraint im

thread plies that an implementation of the machine must keep

track of the mandatory thread and must ensure that this

mandatory thread is regularly executed

Evaluation In general the evaluation of a program

can pro ceed in many dierent directions Some of these

transition sequences may b e innite even if the program Correctness The observable behavior of the P C

terminates according to the sequential semantics Con machine on a given program is deterministic despite its

sider indeterminate internal b ehavior

Theorem eval is a function

pc

P let x future error

Weprove this consistency using a mo died form of

where is some diverging sequential term ie

pc

the traditional diamond lemma The mo died diamond

The sequential evaluator never executes b ecause

lemma states that if we reduce an initial state S bytwo

P s result is errorIncontrast P admits the following

alternative transitions pro ducing resp ectively states S

innite parallel transition sequence

and S then there is some state S that is reachable from

b oth S and S Furthermore the number of mandatory

flet p error via fork P

pc

steps on the transition from S to S via S is b ounded

flet p error since

pc pc

bythe total numb er of steps on the transition from S

to S via S and viceversa This b ound is necessary to

pc

b efore the program op eration can take place The con prove that all transition sequences for a given program

ditions precisely identify the p ositions of car cdr if exhibit the same termination b ehavior

and apply that demand prop er values and also show Since each sequential transition rule of the P C

that op erations like cons or the second p osition of ap machine subsumes the corresp onding transition rule of

ply do not need to knowanything ab out the values they the C machine every transition of the C machine is also

pro cess a transition of the P C machine which implies that the

The transition relation reformulation of the evaluators are equivalent

pcek

relation that takes into accountthechange of state

pc

nm

Theorem eval eval



pc c

representation We write S S if S S for

pcek pcek

some n m N

Put dierentlytheP C machine is a correct im

plementation of the C machine in that b oth dene the

same semantics for the source language Hence the in

Correctness The correctness pro of for the new

terpretation of future as a task creation construct with

machine involves two steps The rst step constructs an

implicit task co ordination is entirely consistent with the

intermediate semantics byintro ducing placeholder ob

denitional semantics of future as an annotation

jects into the P C machine The second step proves

CEKmachine with resp ect to the correctness of the P

the intermediate semantics using standard pro of tech

ALowLevel Op erational Semantics

niques appropriately mo died to account for parallel

evaluation

Since optimizations heavily rely on static information

ab out the values that variables can assume the P C

Theorem eval eval

pcek pc

machine is illsuited for correctness pro ofs of appropri

ate analysis algorithms On one hand the states of the

Touch Optimization

P C machine contain no binding information relating

program variables and values Instead the machine re

The P CEKmachine p erforms touch op erations on ar

lies on substitution for making progress On the other

guments in placeholderstrict p ositions of all program

hand the representation of runtime values and other

op erations These implicit touch op erations guarantee

C machine is to o coarse For example ob jects in the P

the transparency of placeholders whichmakes future

it do es not p ermit a detailed view of the synchronization

based parallelism so convenient to use Unfortunately

op erations that are required for co ordinating futures

these compilerinserted touch op erations imp ose a signif

To address these problems we rene the P C machine

icantoverhead on the execution of annotated programs

to the P CEKmachine see Figure using standard

For example an annotated doublyrecursiveversion of

techniques

b p erforms million touch op erations during the com

putation of b

The P CEKmachine An evaluation context

Due to the dynamic typing of Scheme the cost of

which represents the control stack is now represented

each touch op eration dep ends on the program op eration

as a sequence of activation records which are similar

that invoked it If a program op eration already p erforms

to closures A tagged activation record har y x M E i

atyp e dispatch to ensure that its arguments have the ap

represents a p oint where the continuation can b e split

propriate typ e eg car cdr apply etc thenatouch

into separate tasks cmp fork The substitution op era

op eration is free Put dierently an implementation of

tion is replaced byanenvironment in the usual manner

car x in pseudoco de is

An environment E is a mapping from variables to run

if pair x uncheckedcar x

time values The emptyenvironment is denoted by

error car Not a pair

and the op eration E x V extends the environment E

to map the variable x to the value V

Extending the semantics of car to p erform a touch op

During the course of the renemen t we also replace

eration on placeholders is simple

each placeholder p with an explicit undetermined place

if pair x uncheckedcar x

holder object hph p i The symbol indicates that

let y touch x

the result of the asso ciated computation is unknown

if pair y uncheckedcar y

When the asso ciated computation terminates pro ducing

error car Not a pair

avalue V then the undetermined placeholder ob ject is

The touch ing version of car incurs an additional over

replaced by the determined placeholder object hph pVi

head only in the error case or when x is a placeholder

This change of representation explicates touch op era

For the interesting case when x is a pair no overhead is

tions in the form of sideconditions on the appropriate

incurred Since the vast ma jorityofScheme op erations

transition rules The conditions state that an undeter

already p erform a typ edispatch on their arguments

mined placeholder ob ject hph p imust have b een re

the ov erhead of p erforming implicit touch op erations ap

placed by a determined placeholder ob ject hph pVi

p ears to b e acceptable at rst glance



The machine is also far to o abstract for the derivation of an



implementation This problem is also addressed by the following

Two notable exceptions are if which doesnotperformatyp e

development

dispatch on the value of the test expression and the equality predicate

eq whichistypically implemented as a p ointer comparison

Evaluator

Answers ferror g eval

pcek

a

unload E x if hP i hx E i

pcek

pcek

error if hP i error

eval P

pcek

pcek

n m

i i

if i N S State n m N with m S hP i and S S

i i i i i i

pcek

pcek

Data Sp ecicatio ns

S State hM E Kijerror j flet pS S States

pcek

M Anf Language

a

E Env Vars Value Environments

p

pcek pcek

V Value PValue j PhObj RunTime Values

pcek pcek

pcek

W PValue c j x j Cl j Pair Prop er Values

pcek pcek pcek

Cl hx M Ei Closures

pcek

Pair cons VV Pairs

pcek

hph p i j hph pVi Placeholder Ob jects PhObj

pcek

K Cont jhar x M E iK jhar y x M E iK Continuations

pcek

Auxiliary Functions

touch Value PValue fg

pcek pcek pcek

unload Value Answers

pcek

pcek

touch hph p i

pcek

unload c c

pcek

touch hph pVi touch V

pcek pcek

unload hx M Ei procedure

pcek

touch W W

pcek

unload cons V V cons unload V unload V

   

pcek pcek pcek

unload hph pVi unload V

pcek pcek

Transition Rules



hlet x cons yz M EKi hM Ex cons E y E z Ki bindcons

pcek

transition rules for let xc M let xy M andlet x y N M are similar to bindcons



hx E har y M E iK i hM E y E xKi return

pcek

n

hM Ex V Ki if touch E y cons V V



  

pcek

car hlet x car y M EKi

pcek

Pair fg error if touch E y

pcek pcek



hlet x cdr y M EKi analogous to car cdr

pcek

n

hM Ehar x M E iK i if touch E y ffalse g





pcek

hlet x if yM M M EKi if

 

pcek

hM Ehar x M E iK i if touch E y false



pcek

n

hN E x E z har x M E iK i if touch E y hx NE i



pcek

apply hlet x apply yz M EKi

pcek

error if touch E y Cl fg

pcek pcek



future hlet x future N M EKi hN E har y x M E iK i

pcek



hM E y E xKi futureid hx E har y y M E iK i

pcek



hM E K har y x N E iK i flet p hM E K i hN E x hph p iK i p FPE FPK fork

    

pcek



S S p E x join flet p hx E i

pcek

transition rules joinerror lift paral lel reexive andtransitive are as for the P C machine

Figure The P CEKmachine

The classical solution for avoiding this overhead is to Unfortunately a standard technique for increasing

provide a compiler switch that disables the automatic execution sp eed in Scheme systems is to disable typ e

insertion of touch es and a touch primitivesothatpro checking typically based on informal correctness argu

grammers can insert touch op erations explicitly where ments or based on typ e veriers for the underlying se

needed We b elieve that this solution is awed quential language When typ echecking is disabled

for several reasons First it clearly destroys the trans most program op erations do not p erform a typ edispatch

parentcharacter of future annotations Instead of an on their arguments Under these circumstances the

annotation that only aects executions on some ma source co de car x translates to the pseudoco de

chines future is now a task creation construct and touch

uncheckedcar x

is a synchronization to ol Second to use this solution

Extending the semantics of car to p erform a touch op

safely the programmer must know where placeholders

eration on placeholders is now quite exp ensive since it

can app ear instead of regular values and must add touch

then p erforms an additional checkonevery invo cation

op erations at these places in the program In contrast

to the addition of future annotations the placement

if placeholder x uncheckedcar touch x

of touch op erations is far more dicult while the for

uncheckedcar x

mer requires a prediction concerning computational in

Performing these placeholder checks can add a signif

tensity the latter demands a full understanding of the

icantoverhead to the execution time Kranz and

data ow prop erties of the program Since we b elieve

Feeley estimated this cost at nearly of the se

that an accurate prediction of data owby the program

quential execution time and our exp eriments conrm

mer is only p ossible for small programs we reject this

these results see b elow

The Touch Optimization Algorithm The goal



hlet x car y M EKi

pcek

of touch optimization is to replace the touching op era

hM Ex V Ki if E y cons V V

  

tions car cdr if and apply by the corresp onding non

unsp ecied if E y PhObj

pcek

touching op eration whenever p ossible without changing

error otherwise

the semantics of programs For example supp ose that a



program contains let x car y M andwe can prove

y M EKi hlet x cdr

pcek

that y is never b ound to a placeholder Then wecan

analogous to car

replace the expression by the form let x car y M



hlet x if yM M M EKi

which the machine can execute more eciently without

 

pcek

p erforming a test for placeholdership on y

hM E har x M E iK i if E y false



unsp ecied if E y PhObj

This optimization technique relies on a detailed data

pcek

hM E har x M E iK i otherwise



ow analysis of the program that determines a conserva

tive approximation to the set of runtime values for each



hlet x apply yz M EKi

pcek

variable More sp ecicallywe assume that the analysis

hN E x E z

if E y hx NE i

returns a valid set environment which is a table map

har x M E iK i

ping program variables to a set of runtime values that

unsp ecied if E y PhObj

pcek

subsumes the set of values asso ciated with that variable

error otherwise

during an execution

Figure Nontouch ing transition rules

Denition Set environments validity Let P be

a program and let Vars b e the set of variables o ccurring

P

in P

traditional solution

A mapping E Vars PValue isaset en

P pcek

A b etter approach than explicit touch es is for the

vironment

compiler to use information provided by a dataow anal

ysis of the program to remove unnecessary touch es wher

A set environment E is valid for P if S j E holds



ever p ossible This approach substantially reduces the

S for every S such that hP i

pcek

overhead of touch op erations without sacricing the sim

The relation S j E holds if every environmentin

plicity or transparency of future annotations

S maps every variable x to a value in E x

Nontouching Primitives The current language

do es not provide primitives that do not touch arguments

The basic idea b ehind touch optimization is noweasy

in placeholderstrict p ositions To express and verify

to explain If a valid set environmentshows that the ar

an algorithm that replaces touch ing primitives by non

gumentofa touch ing version of car cdr if or apply can

touch ing primitives we extend the language with

a

never b e a placeholder the optimization algorithm re

nontouching forms of the placeholderstrict primitive

places the op eration with its nontouch ing version The

op erations denoted car cdr if and applyrespec

optimization algorithm T is dened in Figure

tively

The function sba describ ed in the next section al

ways returns a valid set environment for a program

y M M let x car

Assuming the correctness of of setbased analysis the

j let x cdr y M

touch optimization algorithm preserves the meaning of

j let x if yM M M

programs since each transition step of a source program

j let x apply yz M

P corresp onds to a transition step of the optimized pro

As their name indicates a nontouching op eration b e

gram

haves in the same manner as the original version as long

Theorem Correctness of touch optimization

as its argument in the placeholderstrict p osition is not

For P eval P eval T P

a placeholder If the argument is a placeholder the

pcek pcek

a

sba P

b ehavior of the nontouch ing variant is undened The

Any implementation that realizes eval correctly

pcek

extended language is called

a

can therefore make use of our optimization technique

We dene the semantics of the extended language

a

by extending the P CEKmachine with the additional

transition rules describ ed in Figure The evaluator for

SetBased Analysis for Futures

the extended language eval is dened in the usual

pcek

way cmp Figure Unlike eval theevaluator

pcek

Wedevelop the analysis that pro duces valid set environ

eval is no longer a function There are programs in

pcek

ments in two steps First we use the transition rules of

for which the evaluator eval can either return a

the P CEKmachine to derive constraints on the sets

a pcek

of runtime values that variables in a program mayas

value or can b e unsp ecied b ecause of the application

sume Any set environment satisfying these constraints

of a nontouch ing op eration to a placeholder Still the

twoevaluators clearly agree on programs in

a



Or a least a representation of this set that provides the appropriate

information

Lemma For P eval P eval P

a pcek pcek

Supp ose S S via the rule apply In the

pcek

T

a a

E

interesting case y is b ound either directly or via a

T x x

E

placeholder to a closure hx NE i

T let xc M let xc T M

E E

T let xy M let xy T M

E E

T let x y N M let x y T N T M

S hlet x apply yz M EKi

E E E

T let x cons yz M let x cons yz T M

E E

S hN E x E z har x M E iK i

pcek

T let x future N M let x future T N T M

E E E

Then this rule binds x to E z To ensure that E

T let x car y M

E

n

accounts for this binding we demand that E satisfy let x car y T M if E y PValue

E pcek

let x car y T M if E y PValue

E pcek

the constraint

T let x cdr y M analogous to car

let x apply yz M P

E

V Ey V Ez

z

T let x if yM M M

 

E

n

touch V hx NEi

pcek

P

yM M T M if E y PValue let x if

 

E pcek

C

let x if yM M T M if E y PValue

V Ex

 

E pcek

z

T let x apply yz M

E

n

Supp ose S S via the rule return

let x apply yz T M if E y PValue

pcek

E pcek

let x apply yz T M if E y PValue

E pcek

S hx E har y M E iK i

Figure The touch optimization algorithm T

S hM Ey E xKi

pcek

We need to ensure that E includes E xasapos

sible value of y However when analyzing a re

isavalid set environment Second wedevelop an algo

turn instruction xwehave no information re

rithm for nding the minimal set environment satisfying

garding the p ossible activation records that may

these constraints The constraints we pro duce are simi

receivethevalue of x during an execution In

lar to those in Heintzes work on set based analysis for

contrast when analyzing an application expression

ML though our derivation of these constraints dif

let y apply fz M weknow b oth the calling

fers substantially

context and from E f the set of closures that can

p ossibly b e invoked Furthermore if hx NEi

Deriving Set Constraints We derive constraints

is the closure b eing invoked then the result of the

on valid set environments by analyzing the transition

application will b e the current binding of the vari

rules of the machine Each constraintwe pro duce is of

able FinalVar N where FinalVar is the following

the form

function from expressions to variables

A

B

FinalVar Vars

a

where A and B are statements concerning E and A also

FinalVar x x

dep ends on the program b eing analyzed A set environ

FinalVar let xV M FinalVar M

ment E satises this constraint if whenever A holds for

FinalVar let x future N M FinalVar M

E then B also holds for E

Let P b e the program of interest and supp ose that

nm

the evaluation of P involves the transition S S

To ensure that E accounts for the binding of x to

pcek

N forallclosuresthatmay the value of FinalVar

where S j E We derive constraints on E sucientto

be invoked we demand that E satisfy the constraint

ensure that S j E by case analysis on the last transition

nm

rule used for S S

pcek

let x apply yz M P

We present three representative cases

V Ey

touch V hx NEi

pcek

S via the rule bindconst Supp ose S

pcek

V EFinalVar N

N

P

C

xc M EKi S hlet

V Ex

N

S hM Ex cKi

pcek

Examining each of the transition rules of the machine

in a similar manner results in eleven programbased set

This rule binds x to the constant c where the term

P P

see Figure sucienttoen C constraints C

let xc M o ccurs in P To ensure that the set

sure that a set environmentisvalid Put dierentlyif

environment E includes c as one of the p ossible val

asetenvironment E satises the set constraints for a

ues of xwe demand that E satisfy the constraint

program P itiseasytoprove using induction on the

let xc M P

length of the transition sequence that E is valid for P

P

C

c Ex

Theorem Soundness of Constraints If E sat

P P

ises C C thenE is valid for P

let xc M P let xc M P

P

P

C C





c Ex

c E x

P

let xy M P V Ey

V E

y let xy M P

P

P

C

C





V Ex

V E x

let x y N M P

let x y N M P

P

C

x dom E Ex Ex



P

C

y N E x

P



hy N EiEx

E

let x cons y y M P y i

  i

P

let x cons y y M P V Ey i

  i i

C

P



C

cons y y E x



 

P

cons V V Ex

 

let x car y M P V E y

let x car y M P V Ey

cons z z touch E V V E z

   

P

touch V cons V V

 

pcek

P P

C

C





V Ex V E x

 

let x cdr y M P V Ey

let x cdr y M P V E y

touch V cons V V

 

pcek

P

touch E V V E z cons z z

   

P

C

P



C

V Ex





V E x



let x apply yz M P V Ey

V E y let x apply yz M P

touch V hx NEi V Ez

z

pcek

P

C

x N touch E V V E z



z

P

V Ex P

z

C



V E x

z

let x apply yz M P V Ey

touch V hx NEi

pcek

let x apply yz M P V E y

V EFinalVar N

N

P

touch E V x N

P

C

V Ex

N

V E FinalVar N

N

P

C

let x if yM M M P

 

V E x

N

V EFinalVar M EFinalVar M

 

P

C

let x if yM M M P

 

V Ex

V E FinalVar M E FinalVar M

 

P

C

let x future N M P V EFinalVar N

P

V E x

C



V Ex hph pViEx

V E FinalVar N let x future N M P

P

C

let x future N M P



P

V E x hph FinalVar N iE x

C

P



hph p iEx

let x future N M P

P

C



hph i E x

Figure Set Constraints on E with resp ect to P

Auxiliary Function touch

Solving Set Constraints The class of set en

touch AbsEnv AbsValue PAbsValue

P P P

vironments for a given program P denoted SetEnv

touch E c fc g

P

P P

forms a complete lattice under the natural p ointwise

touch E x M fx M g

P P

partial ordering Smaller set environments corresp ond

touch E cons xy fcons xy g

P P

to more accurate approximations b ecause they include

touch E hph x i

P

include fewer extraneous bindings We dene setbased

fW j U E y andW touch E U g

analysis as the function that returns the least set envi

ronment satisfying the set constraints

Figure Abstract Constraints on E with resp ect to P

Denition sba

P P

g C sba P ufE j E satises C

where the constant c the expression x M the

P P

pair cons xy and the variable x are all the resp ec

P P

Since sba P mapsvariables to innite sets of p ossi

tive subterms of P The size of this set is O jP j where

ble values we need to nd a suitable nite representa

jP j is the length of P

tion for these innite sets A systematic insp ection of

Abstract values provide nite representations for the

the set constraints suggests that the set of closures for

innite set environments encountered during setbased

a expression can b e represented by the expression

analysis Sp ecicallyanabstract set environment E is a

itself that the set of pairs for a consexpression can b e

mapping from variables in P to sets of abstract values

represented by the consexpression etc The actual sets

The class of all abstract set environments for a program

of runtime values can easily b e reconstructed from the

P is denoted AbsEnv Each abstract set environment

P

representative terms and the set environment In short

represents a particular set environment according to the

we can take the set of abstract values for a program P

following function

to b e

V AbsValue

P

c j x M j cons xy jhph x ijhph i

P P P P

Program Description

th

F AbsEnv SetEnv

fib Computes the b onacci numb er using

P P

a doublyrecursive algorithm

F E xfV j VE xg

queens Computes the numb er of solutions to the

nqueens problem for n

c E x c E x

rantree Traverses a binary with no des

mm Multiplies twoby matrices of integers

E x x M E xand hx M Ei

P

scan Computes the parallel prex sum of a vector

x dom E Ex E x

of integers

cons V V E x cons y y E xV E y

sum Uses a divideandconquer algorithm to sum

P i i

avector of integers

hph pVi E x hph y iE xandVE y

P

tridiag Solves a tridiagonal system of equations

hph p i E x hph i E x

allpairs Computes the shortest path b etween all pairs

in a no de graph using Floyds algorithm

Reformulating the set constraints from Figure for

abisort Sorts integers using adaptive bitonic sort

mst Computes the minimum spanning tree of a

abstract set environments pro duces the abstract con

no de graph

P P

straints C C in Figure Wedene sba P be

qsort Uses a parallel Quicksort algorithm to sort

the least abstract set environment satisfying the abstract

integers

poly Computes the square of a term p olynomial

constraints with resp ect to P

and evaluates the result at a given value of x

sba Denition

P P

Figure Description of the b enchmark programs

sba P ufEjE satises C C g

large subset of functional Scheme We used the ex

The corresp ondence b etween set constraints and abstract

tended Gambit compiler to test the eectiveness of touch

constraints implies that sba P is a nite representation

optimization on the suite of b enchmarks contained in

for sba P

Feeleys PhD thesis on a GP sharedmemory

multipro cessor Figure describ es these b enchmarks

sba P Theorem sba P F

Eachbenchmark was tested on the original compiler

Since AbsEnv is nite of size O jP j we can cal

P standard and on the mo died compiler touch opti

sba P in an iterative manner starting with the culate mized The results of the test runs are do cumented

in Figure The rst two columns presentthenum

empty abstract set environment E x Since wecan

ber of touch op erations p erformed during the execution

extend E at most O jP j times this algorithm termi

of a b enchmark using the standard compiler column

E with a new nates Furthermore eachtimewe extend

and the sequential execution overhead of these touch

binding calculating the additional bindings implied by

op erations column To determine the absolute over

that new binding takes at most O jP j time Hence the

head of touch we also ran the programs on a single

algorithm runs in O jP j time

pro cessor after removing all touch op erations The next

Optimization algorithms can interpret the abstract

two columns contain the corresp onding measurements

sba P in a straightforward manner set environment

for the touch optimizing compiler The touch optimiza

For example the query on sba P from the touch op

tion algorithm reduces the number of touch op erations

timization algorithm

to a small fraction of the original numb er column

thus reducing the average overhead of touch op erations

sba P y PValue

pcek

from approximately to less than column

The last three columns show the relative sp eedup of

is equivalent to the following query on sba P

each b enchmark for one four and pro cessor cong

urations resp ectively The numb er compares the run

sba P y fc x M cons xy g

P P P

ning time of the b enchmarks using the standard compiler

In a similar manner other queries on sba P can easily

with the optimizing compiler As exp ected the relative

sp eedup decreases as the numb er of pro cessors increases

b e reformulated in terms of sba P

b ecause the execution time is then dominated byother

factors such as memory contention and communication

Exp erimental Results

costs For most b enchmarks the b enet of our touch

optimization is still substantial pro ducing an average

We extended the Gambit compiler whichmakes

sp eedup over the standard compiler of on four pro

no attempt to remove touch op erations from programs

cessors and of on pro cessors The exceptions are

with a prepro cessor that implements the setbased anal

the last three b enchmarks mst qsort and polyHow

ysis algorithm and the touch optimization algorithm



Five of the b enchmarks include a small numb er one or twoper

The analysis and the optimization algorithm are as de

b enchmark of explicit touch op erations for co ordinating sideeects

scrib ed in the previous sections extended to a suciently

They do not aect the validity of the analysis and touch optimization

algorithms

standard touch optimized

Benchmark touch es n touch es n sp eedup over standard

countK overhead countK overhead n n n

fib

queens

rantree

mm

scan

sum

tridiag

allpairs

abisort

mst

qsort

poly

Figure Benchmark Results

ers he derived from the semantic mappings ever even Feeley describ ed these as p o orly parallel

that translate syntax into calculus expressions The programs in which the eects of memory contention and

extension of this work to parallel compilers starts from communication costs are esp ecially visible It is there

a semantic mapping that translates a Schemelikelan fore not surprising that our optimizing compiler do es not

guage with pro cess creation and communication con improve the running time in these cases

structs into a higherorder calculus of communication

and computation After separating the compiler from

Related Work

the machine the correctness pro of is a combination of

the sequential correctness pro of and a correctness pro of

The literature on programming languages contains a num

for the parallel p ortion of the language The pro of tech

b er of semantics for parallel Schemelike languages The

niques are related to the ones we used to prove the equiv

only one that directly deals with parallelism based on

alence of the PCEKmachine and the PCmachine

transparent annotations is Moreaus PhD thesis

Kranz et al briey describ e a simplistic algo

Moreau studies the functional core of Scheme extended

rithm for touch optimization based on a rstorder typ e

with p call for evaluating function and argumentex

analysis The algorithm lowers the touch overhead to

pressions of an application in parallel and rstclass

from in standard b enchmarks that is it is

continuations His primary goal is to design a semantics

signicantly less eective than our touch optimization

for the language that treats p call as a pure annotation

The pap er do es not address the semantics of future or

His correctness pro ofs is far more complicated than our

the wellfoundedness of the optimizations Knopp

techniques due to the inclusion of continuations

rep orts the existence of a touch optimization algorithm

Indep endentlyReppy and Leroy dene a

based on abstract interpretation His pap er presents nei

formal op erational semantics for an MLlike language

ther a semantics nor the abstract interpretation He

with rstclass synchronization op erations Reppys lan

only rep orts the reduction of static counts of touch op

guage Concurrent ML can provide the future mecha

erations for an implementation of with

nism as an abstraction over the given primitives The

future Neither pap er gives an indication concerning

semantics is a twolevel rewriting system Reppy uses

the exp ense of the analysis algorithms

his semantics to proveatyp e soundness theorem Leroy

Our analysis metho ds most closely follows Heintzes

formulates a semantics for a subset of CML in the tradi

work on setbased analysis for the sequential language

tional natural semantics framework He also uses his

ML but the extension of his technique to paral

semantics to prove the typ e soundness of the complete

lel languages requires a substantial reformulation of the

language Neither Reppy nor Leroy used their semantics

derivation and correctness pro of Sp ecicallyHeintze

for developing analyses or optimizations

uses the natural semantics framework to dene a set

Jaganathan and Weeks dene an op erational se

based natural semantics from which he reads o safe

mantics for a simple function language extended with

ness conditions on set environments He then presents

the spawn construct by extending Deutschs transition

set constraints whose solution is the minimal safe set

semantics They describ ed an analysis for their lan

environment We start from a parallel abstract machine

guage that they intend to use in a forthcoming compiler

and avoid these intermediate steps by deriving our set

but they do not have an implementation of their analysis

constraints and proving their correctness directly from

for a full language like functional Scheme and they do

the abstract machine semantics

not have optimization algorithms that exploit the results

Other techniques for static analysis of sequential pro

of their analysis

grams include abstract interpretation and Shivers

Wand recently extended his work on correctness

CFA The relationship b etween abstract interpre

pro ofs for sequential compilers to parallel languages In

tation and setbased analysis was covered by Heintze

his prior work on the correctness of sequential compil

Flanagan C and Felleisen M The semantics of Sequential optimization techniques such as tagging

Future Rice University Comp Sci TR

optimization and softtyping are similar in char

Flanagan C and Felleisen M Wellfounded

acter to touch optimization Both techniques remove

touch optimization of Parallel Scheme Rice University

the typ edispatches required for dynamic typ echecking

Comp Sci TR

wherever p ossible without changing the b ehavior of pro

Flanagan C Sabry A Duba B F and

grams in the same fashion as weremove touch op era

Felleisen M The essence of compiling with continu

tions However the analyses relies on conventional typ e

ations In PLDI

inference techniques

Halstead R Multilisp A language for concurrent

symb olic computataion ACM Transactions on Pro

gramming Languages and Systems

Conclusion

Heintze N Set BasedProgram A nalysis PhD thesis

Carnegie Mellon University

The development of a semantics for futures directly

Heintze N Setbased analysis of ML programs In

leads to the derivation of a p owerful program analy

LFP

sis The analysis is computationally inexp ensivebut

Henglein F Global tagging optimization bytyp e in

yields enough information to eliminate numerous im

ference In LFP

plicit touch op erations We b elieve that the construc

Ito T and Halstead R Eds Paral lel Lisp Lan

tion of this simple touch optimization algorithm clearly

guages and Systems SpringerVerlag Lecture Notes in

illustrates howsemantics can contribute to the devel

Computer Science 

opmentofadvanced compilers Weintend to use our

Ito T and Matsui M A parallel lisp language

semantic characterization for the derivation of other pro

Pailisp and its kernel sp ecication

gram optimizations in Gambit and for the design of truly

Jagannathan S and Weeks S Analyzing stores

transparent future annotations for languages with im

and references in a parallel symb olic language In LFP

p erative constructs

Katz M and Weise D Continuing into the fu

ture on the interaction of futures and rstclass contin

Acknowledgments We thank Marc Feeley for discus

uations In LFP

sions concerning touch optimizations and for his assis

Kessler RR and R Swanson Concurrentscheme

tance in testing the eectiveness of our algorithm and

Nevin Heintze for discussions on setbased analysis and

Knopp J Improving the p erformance of parallel lisp

for access to his implementation of set based analysis for

by compile time analysis

ML

Kranz D Halstead R and Mohr E MulT A

highp erformance parallel lisp

References

Kranz D Halstead R and Mohr E MulT A

highp erformance parallel lisp In PLDI

Baker H and Hewitt C The incremental garbage

Leroy X Typage polymorphe dun langage algorith

collection of pro cesses In Proceedings of the Symposium

aris mique PhD thesis UniversiteP

on Articial Intel ligence and Programming Languages

Miller J MultiScheme A Paral lel Processing System

vol

PhD thesis MIT

BBN Advanced Computers Inc Cambridge MA

Inside the GP Mohr E Kranz R and Halstead R Lazy task

creation A technique for increasing the granularityof

Cousot P and Cousot R Abstract interpretation

parallel programs In LFP

A unied lattice mo del for static analyses of programs

by consruction or approximation of xp oints In POPL Moreau L Sound Evaluation of Paral lel Functional

Programs with FirstClass Continuations PhD thesis

Universite de Liege

Cousot P and Cousot R Higer order abstract

interpretation and application to comp ortment analysis Reppy JH HigherOrder Concurrency PhD thesis

generalizing strictness termination pro jection and p er Cornell University Jan

analysis of functional languages ICCL

Sabry A and Felleisen M Is continuationp assi ng

Deutsch A Modeles Operationnels de Language useful for data ow analysis In PLDI

de Programmation et RepresentationsdeRelations

Shivers O Controlow Analysis of HigherOrder

sue des Languages Rationnels avec Application a la

Languages or Taming Lambda PhD thesis Carnegie

Determination Statique de Proprietes de Partages Dy

Mellon University

namiques de Donnees PhD thesis Universite Paris VI

Swanson M Kessler R and Lindstrom G An

implementation of p ortable standard lisp on the BBN

Feeley M An Ecient and General Implementation

butteryInLFP

of Futures on Large Sc ale SharedMemory Multiproces

Wand M Compiler correctness for parallel languages

sors PhD thesis Department of Computer Science

Unpublished manuscript

Brandeis University

Wright A and R Cartwright A practical soft

Feeley M and Miller J S A parallel virtual ma

typ e system for scheme In LFP

chine for ecientscheme compilation In LFP

Felleisen M and Friedman D P Control op er

ators the SECDmachine and the lamb dacalculu s In

rd Working ConferenceontheFormal Description of

Programming Concepts Aug