How Much Nonstrictness do Lenient Programs Require

Klaus E Schauser Seth C Goldstein

Department of Computer Science Computer Science Division

University of California Santa Barbara University of California Berkeley

Santa Barbara CA Berkeley CA

schausercsucsbedu sethgcsb erkeleyedu

contribute to the result thus decreasing parallelism How Abstract

ever nonstrictness can also b e combined with eager evalu

Lenient languages such as Id have b een touted as among

ation This combination called lenient evaluation Tra

the b est functional languages for massively parallel machines

exhibits more parallelism than while retain

AHN Lenient evaluation combines nonstrict semantics

ing much of the latters expressive p ower

with eager evaluation Tra Nonstrictness gives these

Unfortunately nonstrictness comes at a high implemen

languages more expressive p ower than strict semantics while

tation cost since it requires negrained dynamic schedul

eager evaluation ensures the highest degree of parallelism

ing and synchronization Nonstrict semantics can make it

Unfortunately nonstrictness incurs a large overhead as it

imp ossible to statically determine the order in which ar

requires dynamic scheduling and synchronization As a re

guments are evaluated and op erations execute Expressions

sult many p owerful program analysis techniques have b een

can b e evaluated only after all of the arguments they dep end

develop ed to statically determine when nonstrictness is not

on are available This dynamic scheduling is exp ensive on

required CPJ Tra Sch

commo dity micropro cessors which eciently supp ort only

This pap er studies a large set of lenient programs and

a single thread of control and incur a high cost for context

quanties the degree of nonstrictness they require We

switching

identify several forms of nonstrictness including functional

Much research has b een devoted to analysis techniques

conditional and data structure nonstrictness Surprisingly

that determine when the full generality of nonstrictness

most Id programs require neither functional nor condi

is not required These techniques include strictness anal

tional nonstrictness Many b enchmark programs however

ysis Pey backwards analysis Hugb path analysis

make use of a limited form of data structure nonstrictness

BH and partitioning Tra SCvE HDGS For

The pap er refutes the myth that lenient programs require

lenient languages the compilation approach is to schedule

extensive nonstrictness

instructions statically into sequential threads and have dy

namic scheduling only b etween threads The task of identi

fying p ortions of the program that can b e scheduled stati

Intro duction

cally and ordered into threads is called partitioning Tra

Two op erations can b e placed into the same thread only

Nonstrict functional languages have b een widely regarded

if the compiler can statically determine the order in which

as an attractive basis for parallel computing Unlike sequen

they execute

tial languages they exp ose all the parallelism in a program

In Sch we presented a new thread partitioning al

In sideeectfree functional languages the arguments to a

gorithm separation constraint partitioning which improves

call can b e evaluated in parallel Further nonstrict

substantially on previous work Tra HDGS SCvE

execution can substantially enhance parallelism since func

To evaluate our partitioning algorithm we compared b ench

tions can start executing b efore all of their arguments have

mark programs compiled using our algorithm and using strict

b een provided In a language with nonstrict semantics it

partitioning Strict partitioning ignores p ossible nonstrict

is p ossible to dene functions which return a result even if

ness and compiles functions and conditionals strictly thus

one of the arguments diverges Among other things this

representing the b est p ossible partitioning Obviously strict

makes it p ossible to create recursively dened and cyclic

partitioning pro duces an incorrect partitioning ie leading

data structures since p ortions of the result of a function

to a deadlo ck for programs which require nonstrictness

call can b e used as arguments to that call Nonstrictness

Surprisingly almost all of our b enchmark programs still ran

increases the expressiveness of the language but requires a

correctly an indication that nonstrictness is rarely used in

more exible strategy than strict evaluation which evalu

lenient programs

ates all arguments b efore calling a function

This observation led us to pursue a more detailed study

Nonstrictness is usually combined with lazy evaluation

of how much nonstrictness lenient programs use and to

which delays evaluation of an expression until it is known to

fo cus on a larger set of programs In this pap er we eval

uate the strictness prop erties of a large collection of pro

grams written in Id a nonstrict functional language that

To app ear in th Annual ACM SIGPLANSIGARCH

uses lenient evaluation We dene three catagories of non

Conference on Languages and

strictness functional conditional and datastructure non

Computer Architecture FPCA

strictness We nd that functional and conditional non

strictness are rarely used in lenient programs On the other

hand data structure nonstrictness is used extensively We

more expressive power

catagorize datastructure nonstrictness into circu

further M-structures

recursive and dynamically scheduled nonstrictness and

lar I-structures

e see that the exp ensive dynamically scheduled nonstrict

w determinate

is seldom required ness Functional core

referentially transparent

this study our fo cus is on whether nonstrictness is

In determinate

required and if it is what kind is required for executing

simpler semantics

lenient programs correctly Therefore the study is not af

fected by whether the programs are executed in parallel or

sequentially We chose to study the sequential execution

Figure The three layers of Id from Nik

b ehavior However evaluating functions even strict func

tions nonstrictly can increase parallelism We nd that

nonstrictness is rarely used suggesting that future work on

elements Like functional arrays Istructures provide e

lenient languages should fo cus less on techniques for deter

cient indexed access to the individual elements In addi

mining when nonstrictness is not used and more on when

tion Istructures provide elementbyelement synchroniza

strict functions and conditionals should b e evaluated non

tion b etween the pro ducer and consumer Istructures are

strictly

not purely functional since they lose referential transpar

The remainder of the pap er is structured as follows Sec

ency Any function obtaining a p ointer to an Istructure

tion summarizes the dierent evaluation strategies for func

may store into it On the other hand determinacy is pre

tional languages strict lenient and lazy and previous ar

served b ecause each lo cation can b e stored into at most once

guments for lenient evaluation as the most suitable way to

Nondeterministic computation can b e expressed with M

write programs for massively parallel machines Section

structures mutable data structures that provide supp ort for

identies and illustrates three kinds of nonstrictness con

atomic up dates

ditional functional and data structure nonstrictness Sec

tion describ es the metho dology and the b enchmark pro

Evaluation Strategies

grams which were used for this study and presents the re

The two most widely used evaluation strategies for func

sults which show the typ e of nonstrictness each program

tional languages are strict and lazy evaluation We com

requires Section discusses related work and Section

pare them to lenient evaluation the strategy used in Id

suggests directions for future work

The three strategies dier in how they handle the evalua

tion of typ e constructors and function calls As presented

Lenient Evaluation

in Tra to evaluate a function call f e e

1 n

This section summarizes dierent evaluation strategies for

Strict evaluation rst evaluates all of the argument ex

functional languages strict lenient and lazy The three

pressions e to e and then evaluates the function

1 n

evaluation strategies dier in their expressiveness paral

b o dy passing in the evaluated arguments

lelism and eciency Many arguments have b een made

Lenient evaluation starts the evaluation of the function

in favor of lenient evaluation as emb o died in the language

1

b o dy f in parallel with the evaluation of all the argu

Id as the b est choice for massively parallel machines

ment expressions e to e evaluating each only as far

1 n

AHN Nik Being nonstrict lenient evaluation retains

as the data dep endencies p ermit

much of the expressive p ower of lazy evaluation at a much

lower overhead while exhibiting more parallelism than b oth

Lazy evaluation starts the evaluation of the function b o dy

strict and lazy evaluation Tra

passing the arguments in an unevaluated form

The evaluation rules for constants arithmetic primitives

The Lenient Language Id

and conditionals are the same for the three strategies The

Id is a functional language augmented with synchronizing

evaluation of a constant yields that constant Arithmetic

data structures It consists of three layers as shown in Fig

op erators are strict in their arguments so they evaluate the

ure The functional core has all of the prop erties of purely

arguments b efore p erforming the op eration The evaluation

functional languages including referential transparency and

of a conditional if e then e else e rst evaluates the pred

1 2 3

determinacy Id provides array comprehensions for e

icate e dep ending on the result either e or e is then

1 2 3

ciently expressing scientic computation requiring arrays or

evaluated The three evaluation strategies treat the condi

other large data structures These are purely functional con

tional conservatively they do not start evaluating e and e

2 3

structs and dene an array by dening each element With

sp eculatively b efore knowing the value of the predicate

array comprehensions elements of the array can b e dened

We see that under strict evaluation it is imp ossible to

in terms of other elements One limitation is that in or

call a function with only some of the arguments Thus one

der to b e purely functional the elements of the array have

cannot pass the result or part of it back into the function

to b e dened completely within the array comprehension

On the other hand this is p ossible under b oth lenient and

This ensures that referential transparency is preserved Be

lazy evaluation b oth of which implement nonstrict seman

cause this is not sucient for certain problems ANP

tics Lazy evaluation ensures that only expressions which

Id also provides two nonfunctional data structures I

contribute to the nal answer are evaluated the evaluation

structures ANP and Mstructures BNA

is demanddriven In contrast strict and lenient evaluation

Istructures are writeonce data structures which sepa

evaluate all expressions with the exception of those inside

rate the creation of the structure from the denition of its

the arms of conditionals they use eager evaluation which is

1

datadriven in the sense that an expression can b e evaluated

Parallel Haskell pH an ongoing development which integrates

as so on as its data inputs are available

many concepts of Id into Haskell may also b e based on lenient

evaluation

As Table shows the three evaluation strategies dier

substantially in their expressiveness the amount of paral

lelism they exp ose the implementation overhead and the

little parallelism The right subtree is completely attened amount of sp eculative computation These dierences are

b efore starting with the left subtree And this holds for discussed in the following subsections

every level so the tree is attened sequentially

Under lenient execution the attening of the two sub

Expressiveness strict lenient lazy

trees can b e pip elined Flattening of the right subtree can

Parallelism lazy  strict  lenient

b e started simultaneously with the attening of the left sub

Overhead strict  lenient  lazy

tree The resulting list for the right subtree is fed into the

Sp eculative comp lazy  lenient strict

function working on the left subtree In eect the entire list

is constructed in parallel This example shows that overlap

ping the consumer and pro ducer can yield more than just a

Table Comparison of the three evaluation strategies

2

constant factor increase in parallelism Nik

Lazy evaluation attens the subtrees in the opp osite or

der from strict evaluation Lazy evaluation rst attens the

3

left subtree and then starts working on the right subtree

Expressiveness

Therefore lazy evaluation also do es not exhibit any paral

As shown in Figure the expressiveness of the evalua

lelism

tion strategies forms a threelevel hierarchy Lazy evalua

tion results in more expressiveness than lenient evaluation

Evaluation Overhead

which in turn has more expressiveness than strict evalua

tion By more expressiveness we informally mean that

As should already b e clear the evaluation strategies dier

the translation of a program which uses nonstrictness into

substantially in their overhead The biggest dierences are

a strict equivalent may require a global reorganization of

in the costs for argument passing and dynamic scheduling

the entire program a more formal denition of expressive

Strict evaluation has the lowest overhead All arguments

ness is given in Fel Nonstrictness as present in lenient

can b e evaluated b efore calling a function and therefore can

and lazy evaluation signicantly increases expressiveness

b e passed by value Furthermore it is p ossible to stati

The programmer can create circular data structures and de

cally schedule all of the computation inside a function and

ne data structures recursively in terms of their own ele

pro duce ecient sequential co de The compilation of strict

ments In Section we present several of such examples

functional languages is therefore similar to well understo o d

4

Under the purely functional setting nonstrictness results

implementations of sequential languages Lazy evaluation

in more ecient programsin b oth space and time require

has a high overhead b ecause it requires arguments to b e

ments Huga Bir Joh In addition lazy evaluation

passed in unevaluated form unless it can b e shown that

provides the control structure to handle innite data struc

they contribute to the nal result The overhead of lenient

tures as long as only a nite part is accessed Using explicit

evaluation falls b etween that of lazy and strict evaluation

delays programmers can obtain the same b enet under le

Tra

nient evaluation Hel

Sp eculative Computation

Lazy evaluation more expressiveness

non-strictness

evaluation is the only strategy which completely avoids

infinite structures Lazy

e computation it evaluates only expressions which Lenient evaluation sp eculativ

non-strictness

tribute to the nal result The other two schemes do

circular structures con

computation eagerly thereby putting the burden on the pro

Strict evaluation

grammer to avoid innite or sp eculative computation

Summary

Figure Expressiveness of the three evaluation strategies

We have seen that b oth lenient and lazy evaluation real

ize nonstrictness ie they may start the evaluation of a

function b o dy b efore evaluating the arguments When non

Parallelism

strictness is combined with eager evaluation as in lenient

evaluation parallelism is increased substantially b ecause

The can have a strong impact on the

all arguments can b e evaluated in parallel with the func

amount of parallelism HA Lenient evaluation results

tion b o dy This is why many researchers have claimed that

in the largest amount of parallelism while lazy evaluation

lenient evaluation is b est for parallel implementation of func

shows the least amount of parallelism Tre Lazy evalua

tional languages

tion needs to show that an argument is actually required for

2

the nal answer b efore starting its evaluation while strict

Simulations for the dataow architecture TTDA show that with

lenient evaluation the critical path on a full binary tree of depth

and lenient evaluation can always evaluate indep endent ar

consists of time steps with a maximum parallelism of and

guments in parallel In addition lenient evaluation can ex

an average parallelism of instructions assuming the resources are

ploit pro ducerconsumer parallelism Consider the function

available If executed strictly the critical path would grow to

flat which pro duces a list of the leaves of a binary tree

time steps with a maximum parallelism of instructions

3

using accumulation lists Nik

Of course lazy evaluation would only pro duce as much of the

attened list as is actually required

4

def flat tree acc

Additional issues arise due to the use of higherorder functions

if leaf tree then cons tree acc partial applications parallelism and single assignment limitations

else flat left tree flat right tree acc

This example do es not require nonstrictness therefore

it can b e executed with any of the three evaluation strate

gies However under strict evaluation this co de exhibits

b gets the value of z y the value z  z and a the same value Forms of Nonstrictness

2

as y x and the result c b ecome z  z z  z z In this

In this section we identify dierent forms of nonstrictness

case the multiplication is executed b efore the addition If

and present examples to illustrate them The present work

the predicate is false the variables are evaluated in a dier

fo cuses on programs which execute under lenient evaluation

ent order First the variable a is b ound to the argument z

and do es not consider programs requiring lazy evaluation

then x and b evaluate to z z and nally y and the result

2

ie those that manipulate p otentially innite data struc

c evaluate to z z  z z z Now the addition

tures Thus we limit our discussion of the various forms of

is p erformed b efore the multiplication Again we see that

nonstrictness to lenient evaluation

b oth the addition and the multiplication have to b e sched

uled dynamically Though the op erations app ear outside of

the scop e of the conditional the conditional aects the order

Functional Nonstrictness

in which the values a b and c are available

Functional nonstrictness arises from feedback dep endencies

It may seem that in this example unlike the previous

from the results of a function invo cation to its arguments

one the scheduling is at least datadep endent since it is

To illustrate this form of nonstrictness and the need for dy

inuenced by the conditional and therefore dep ends on the

namic scheduling we use the following simple but somewhat

value of the predicate While this observation is correct we

5

contrived computation SCG

can obtain precisely the same b ehavior without conditionals

as shown in the next example

def two x y xx yy

def g z ab two z a in b

def f x y z yzx

def h z ab two b z in a

def f x y z zxy

def kt func z abc func x y z

In this example the function two takes two arguments

x aa

x and y and returns two results x  x and y y Inside the

y bb

function two there is no dep endence b etween the multiplica

in c

def g z kt f z

tion and addition thus co de to evaluate the two halves of the

def h z kt f z

pair can b e put in either order when compiling the function

under traditional strict evaluation This is not true under

Here the conditional is replaced with a call to a function

nonstrict evaluation In our example the function two is

taking three arguments and returning three results The ar

used in two dierent contexts which require nonstrictness

gument func determines which function is called it is f in

In the function g the argument z is given as the rst ar

the case of g and f in the case of h These two functions f

gument to the function two while the second argument to

and f do not p erform any computation they just shue

two is taken from its rst result This requires that two

the results around and thereby aect the order in which the

rst compute z  z return this result and then compute

2

addition and multiplication in the caller get executed This

z  z z  z z We see that in this case the multipli

example illustrates that in addition to the caller aecting

cation is executed b efore the addition In the function h just

the order in which op erations get executed in the callee the

the opp osite o ccurs The second result of the function two

callee can also aect the order in the caller In general it

is fed back in as the rst argument Here z z is computed

2

is the whole contextie the whole call treein which a

rst and then z z  z z z Now the addition

function app ears which determines the order This exam

is executed b efore the multiplication Thus the multiplica

ple also shows the duality b etween conditionals and function

tion and the addition have to b e scheduled indep endently

calls made through function variables A conditional can b e

Notice that the scheduling is indep endent of the data val

viewed as a call site where one of two functions is called

ues for the arguments it dep ends only on the context in

dep ending on the predicate AA These two functions

which the function is used and how results are fed back in

contain the co de of the then side and else side of the condi

as arguments

tional resp ectively

Conditional Nonstrictness

Data Structure Nonstrictness

Nonstrictness and the requirement for dynamic scheduling

In nonstrict languages data structure constructors exhibit

not only o ccur across function calls but can also app ear

the same form of nonstrictness as function calls ie the

within conditionals The following example taken from

result may b e required b efore all the elements are dened

Tra illustrates this

This gives the programmer the ability to dene circular data

structures or recursively dene some elements in terms of

def kt p z abc if p then yzx else zxy

x aa

other elements For example the recursive binding a

y bb

cons a denotes a simple cyclic list and a cons

in c

hd a denotes an tuple containing the same element

def g z kt true z

This p ower of nonstrictness also extends to list compre

def h z kt false z

hensions array comprehensions and I and Mstructures

AHN ANP BNA

In this example a single conditional steers the evaluation

We can dene the following hierarchy of nonstrictness

of three variables a b and c If the predicate is true then

in data structures

5

In all of the examples we present the dierent denitions of the

2

Functionally strict All elements must b e evaluated b e

function g always p erform the same computation z  z z  z z

Similarly the dierent denitions of the function h always compute

fore the data structure is created

2

z z  z z z In all cases a single addition and single

multiplication are executed in the functions g the multiplication o c

Circular A p ointer to a data structure may b e stored into

curs b efore the addition while in the functions h they execute in the

one of its elements

reverse order

Recursive An element of a data structure may b e dened the store to the fetch as indicated by the dashed line

in terms of other elements For this case we can distin and the multiplication has to b e executed b efore the ad

guish two sub cases which describ e the schedule used dition Note that this dep endence is not directly present

to ll in the elements in the function but is established through the synchroniz

ing Istructure Should the fetch o ccur b efore the store it

Static A static schedule can b e found for the pro

would get deferred until the store happ ens In the case where

gram which lls in the elements in the right order

l m the op erations would execute in the reverse order

Obviously the conditions k l and l m should not hold

Dynamic A dynamic schedule is required This im

together since deadlo ck would result

plies that presence bits are needed as fetches from

In general it is not known which function lls in an I

elements may defer

structure the consumer of an Istructure element cannot

name its pro ducer Therefore if a fetch defers it is necessary

The two small examples ab ove using cons can b e classi

to continue with the evaluation of any expression which do es

ed as circular and recursive resp ectively Other frequently

not dep end on the fetch even if the computation is in a

cited examples using recursively dened data structures are

dierent function activation This example illustrates that

wavefront ANP and the following function computing an

dep endencies have to unravel at run time and that they may

array of the rst n Fib onacci numb ers

travel not only through arguments results and internal call

sites but also through Istructure accesses

def fibarray n array n of

Exp erimental Results

i AiAi i to n

In this section we assess the degree of nonstrictness required

Both examples can b e scheduled statically ie a static

by a large set of Id programs Most of these programs were

scheduling of the program can b e found which ob eys the

implemented by other researchers

data dep endencies among the array elements lefttoright

in the case of b array and lefttoright toptob ottom in

Metho dology

the case of wavefront With such a schedule it is p ossible

to implement the arrays without any presence bits just like

To determine how much functional or conditional nonstrict

plain arrays in standard imp erative languages This may

ness our programs require we extend the compiler with

not always b e the case as the following example illustrates

options to treat function calls and conditionals in a strict

fashion Under this mo de a function is invoked only after

def dyn A k l m n Ak Am Am

all of its arguments have b een evaluated and results are

Al An An

returned only after they have b een completely computed

def g z A Iarray

Similarly we can also compile conditionals so that they are

A z

dyn A

evaluated only after the predicate and all inputs are avail

in A

able and return only after all results have b een pro duced

def h z A Iarray

In the case where a program requires functional or condi

A z

tional nonstrictness this compilation scheme leads either

dyn A

to a static dep endence cycle which the compiler detects or

in A

to deadlo ck at runtime In either case we know that the

program requires nonstrictness

This example shows that dynamic scheduling may b e

To check for data structure nonstrictness we extend the

required for nonstrict Istructures The function dyn takes

runtime system to handle synchronizing data structures

ve argumentsan Istructure A and four indices k l m

dierently These variations in the runtime system emu

and n It fetches an element from Am multiplies this

late the dierent degrees of data structure nonstrictness

with itself and stores the result into lo cation Ak The

For example to check whether elements are recursively de

function also fetches from lo cation An adds this element

ned we make structures fully strict in all elements A fetch

to itself and stores it into lo cation Al Figure shows

from a structure is deferred until all structure elements have

the dep endencies for the b o dy of function dyn and what

b een lled in The runtime system rep orts if the program

happ ens for the two cases where k n and l m

deadlo cks which is an indication of recursive dep endencies

among elements of the same structure Similarly to check

a program can execute using a static schedule we

A[m] A[n] A[m] A[n] A[m] A[n] whether

test whether any deferred fetches o ccur If no fetches defer

this is an indication that the current schedule lls in the

elements in the right order Obviously this approach

* + * + * + data

would determine only that a static schedule exists for the

current input of the program under the current partitioning

execution order To determine that this is also the case

:= A[k] := A[l] := A[k] := A[l] := A[k] := A[l] and

for any input we study the program in detail to ensure that

dep endencies are not related to problem size and other

k =n l = m data

input parameters for example wavefront or make sure

Figure The dep endencies for function dyn and two p os

that all p ossible data dep endencies are generated for ex

sible scenarios k n and l m

ample dyn As it turns out most of the original programs

execute with defers b ecause they are inherently parallel and

the compiler and runtime system just happ en to schedule a

If k n as is the case when dyn is called from the func

pro ducer and consumer in the wrong order We solved this

tion g then the store into lo cation Ak denes the value

which is fetched from An So there is a dep endence from

by adding explicit sequentialization statements barriers to

the program to ensure the correct order of execution

Program Lines Short Description and Reference

Symb olic computation

Compilation Framework

Parans Enumerate isomers of parans

AHN

The programs were all compiled using the Id compiler

Primes lter Generate primes by ltering Hel

with the Berkeley TAM backend The compiler uses a

Primes sieve Primes using sieve of Eratosthenes

frontend develop ed at MIT Tra which pro duces dataow

Tre

graphs The backend partitions these dataow graphs into

Compresspath Recursive doubling algorithm

threads and then generates co de for TAM a Threaded Ab

SPR

stract Machine CGSvE The TAM co de is then trans

NQueens Nqueens problem using circular

lated to the target machine Our translation path uses C

lists All

NQ Solutions Numb er of Nqueens solutions

as a p ortable intermediate form and pro duces co de for

All

the CM as well as for various standard sequential ma

Permutations Generate p ermutations of sets of

chines Gol For the exp eriments p erformed in this pap er

elements using circular structure

we limit ourselves to sequential execution on a SparcSta

All

tion As we have explained the compiler has b een ex

Id Compiler Id in Id compiler

tended with options to treat function calls and conditionals

Id Interpreter Top level Id interpreter

strictly as well as with a runtime system to test for the

+

Quicksort Quick sort on lists CSS

+

dierent degrees of nonstrictness in data structures

Arraysort Selection sort on arrays CSS

Bitonicsort Bitonic sort Mstructures SBc

Bubblesort Bubble sort Mstructures SBc

Programs

Heapsort IS Heap sort Istructures SBc

Heapsort MS Heap sort Mstructures SBc We used the b enchmark programs shown in Table The

Quicksort IS Quick sort Istructures SBc

programs vary substantially in size and application area as

Quicksort MS Quick sort Mstructures SBc

well as in b ehavior They range in size from a few to sev

Mergesort Merge sort Istructures SBc

eral thousand lines of source co de The small examples dis

Shellsort Shell sort Mstructures SBc

cussed so far in this pap er are included as a p oint of refer

Scientic computation

ence The application areas represented by these programs

Wavefront Simple wavefront SOR ANP

include scientic computing sorting and search problems

+

Pseudoknot Molecular Biology HFA

symb olic computing NAS parallel b enchmarks and small

Gamteb Monte Carlo neutron transp ort

kernels Most of the programs fall into the catagory of sci

+

BCS

entic computing the area sp ecically targeted by the im

MCNP Monte Carlo photon transp ort

plicitly parallel language Id AE Most of the programs

HL

contain many conditionals and function calls and exhibit

Simple Hydro dynamics and heat conduc

negrained b ehavior eg Quicksort while programs such

tion AE

Sp eech Sp eech pro cessing Sah

as the blo cked matrix multiply are more mediumgrained

+

DTW Dynamic time warp Sah

SGS

MM Matrix multiply HCAA

MMT Blo cked matrix multiply test

Nonstrictness Requirements

CGSvE

Eigen Eigen problem

Following the metho dology describ ed ab ove we determined

Householder Householder EigenSolver SBa

the degree of nonstrictness each of the programs requires

Jacobi Jacobi EigenSolver BH

The results of our measurements are summarized in Table

Jacobi group Jacobi EigenSolver group rota

The rst two columns indicate whether the program contains

tions BH

any functional or conditional nonstrictness The third col

FT fu NAS b enchmark FT functional

SBb

umn shows whether the program has any circular data struc

FT is NAS b enchmark FT Istructures

ture denitions The fourth column shows whether any data

SBb

element is dened recursively in terms of other elements of

FT ms NAS b enchmark FT

the same data structure Finally the last column indicates

Mstructures

whether the program requires dynamic scheduling Often we

SBb

were able to nd a static scheduling of the program p os

Small examples

sibly by inserting barriers that would allow it to execute

KTcond Traubs nonstrict conditional

without deferred fetches

Tra

Most of the Id programs still run correctly after they are

Two Nonstrict function with two results

compiled with strict functions and strict conditionals This

SCG

indicates that programmers rarely make use of the addi

Cub e Nonstrict cub e function SCvE

tional expressiveness provided by nonstrictness at the func

Flat Flatten leaves of tree Nik

Fib Doubly recursive Fib onacci

tional or conditional level On the other hand many of the

Fib lazy Innite list of Fib onacci numb ers

b enchmark programs make use of a limited form of data

Fib lenient Finite list of Fib onacci numb ers

structure nonstrictness The ability to dene circular data

Fib strict Fib onacci using linear recurrence

structures or array elements in terms of other elements is

Fib array Fib onacci using functional arrays

certainly imp ortant

For most of the programs we were able to nd a schedule

Table Benchmark programs The programs are available

that results in no fetches deferring at runtime This result

from httpwwwcsucsbeduschauserid

is contrary to the current thought that lenient programs

inherently require dynamic scheduling

sented in Section the nonstrict Cub e and the lenient

Form of nonstrictness

version of Fib onacci use functional nonstrictness We have

Program Func Cond Circ Recu Dyn

also included a lazy version of Fib onacci which generates the

Symb olic computation

innite list of Fib onacci numb ers and then returns the nth

Parans No No No Yes No

numb er Under our lenient evaluation this program fails to

Primes lter Yes No No Yes Yes

return the answer b ecause it runs out of heap space while

Primes sieve No No No No No

generating the innite list There are also several medium

Compresspath No No Yes No No

sized programsPrimes lter N Queens NQ Solutions and

NQueens Yes No No Yes Yes

Permutationswhich use functional nonstrictness They

NQ Solutions Yes No No Yes Yes

Permutations Yes No No Yes Yes

use it in a very similar way which is most easily illustrated

Id Compiler Yes Yes Yes Yes Yes

with the lenient Fib onacci function

Id Interpreter No No No Yes No

Quicksort No No No No No

list of Fibonacci numbers from i to n

Arraysort No No No No No

def fiblist l i n

Bitonicsort No No No Yes No

if i n then nil else

Bubblesort No No No Yes No

nth i lnth i l

Heapsort IS No No No Yes No

fiblist l i n

Heapsort MS No No No Yes No

def fib n li fiblist li n

Quicksort IS No No No Yes No

in nth n li

Quicksort MS No No No Yes No

Mergesort No No No Yes No

This function creates the list of Fib onacci numb ers from

Shellsort No No No Yes No

to n and then returns the nth numb er The function that

Scientic computation

creates the list of Fib onacci numb ers from i to n requires

Wavefront No No No Yes No

the Fib onacci numb ers b elow i so that it can compute the

Pseudoknot No No No No No

subsequent numb ers This is achieved by providing the base

Gamteb Yes No No Yes Yes

list for the rst two numb ers and then feeding the result

Gamtebdefsubst No No No Yes No

list The programs list li back in as an argument to fib

MCNP No No No Yes No

Primes lter N Queens NQ Solutions and Permutations

Simple No No No Yes No

are based on the same principle

Sp eech No No No Yes No

The largest program requiring functional nonstrictness

DTW No No No Yes No

MM No No No No No

is the Id Compiler Nonstrictness is used to create circular

MMT No No No No No

data structures where one element of a data structure con

Eigen No No No No No

tains a reference to the data structure itself One such pat

Householder No No No Yes No

tern which o ccurs frequently is found in the co de for creating

Jacobi No No No No No

variable structures that contain information p ertinent to Id

Jacobi group No No No No No

identiers This is done using a record which contains elds

FT fu No No No Yes No

ab out the identier Some of the record elds are dened as

FT is No No No Yes No

functional prop erties dened at creation time and not mo d

FT ms No No No Yes No

ied thereafter some are Istructure prop erties emtpy at

Small examples

creation time to b e lled in later at most once and some

KTcond No Yes No No Yes

are Mstructure prop erties mutuable capable of rep eated

Two Yes No No No Yes

mo dication As shown in the example co de b elow one

Cub e Yes No No No Yes

Flat No No No No No

of the functional elds vr node includes a reference to the

Fib No No No No No

record itself thus intro ducing nonstrictness

Fib lazy Lazy No No No Yes

Fib lenient Yes No No No Yes

def makevarstruct sourcename place arity opcode

Fib strict No No No No No

vs record

Fib array No No No Yes No

other entries

vrnode func vsnil

Table Nonstrictness requirements The entries indicate

In vs

whether the program requires functional conditional or

data structure nonstrictness Data structure nonstrictness

is further divided into circular recursive and dynamic non

One way of avoiding the mutual recursion would b e to

strictness

change the functional eld vr node into an Istructure eld

When creating the variable structure record this eld would

initially b e left empty After the record is created func

could b e invoked and the result used to ll in the eld

Discussion

vr node Of course this solution go es b eyond the purely

We now present the results in more detail fo cusing on the in

functional programming style but the nonfunctional asp ect

dividual programs and the dierent forms of nonstrictness

is not visible outside of this function

Another interesting program in this category is Gamteb

Functional Nonstrictness

which requires functional nonstrictness b ecause it uses the

librarydened accumulators to tally statistics Gamteb

Few programs require functional nonstrictness and most of

allo cates several of these accumulators using the function

those that do were written precisely to exhibit this b ehavior

make accumulator

The two exceptions are Gamteb and the Id Compiler which

we discuss in more detail b elow

result accum makeaccumulator done

For the small examples the nonstrict functions two pre

This library function takes three arguments the b ounds

in vse

of the accumulator the op eration used for accumulation

In vs

and nally a trigger to indicate that the accumulation has

nished The function returns two results the accumula

tor structure which is used to p erform the accumulations

Our results seem to b e a strong indication that except in

accum and the array which contains the nal result result

the simplest cases conditional nonstrictness is to o compli

The accumulation is done using Mstructures which p er

cated for programmers to use Just tracing the dep endencies

form the up dates in place It is imp ortant that the nal

of the small example given in Section already requires

result b e returned only after all of the accumulations are

substantial eort Unfortunately conditional nonstrictness

done Otherwise a consumer of the result may read incor

has a serious impact on co degeneration The compiler gen

rect values while the accumulation is still going on Thus

erates an indep endent switch for every input to a conditional

accumulator exhibits the classical form of the function make

and an indep endent merge for every output This results in

nonstrictness It has to return the accumulation structure

a large runtime overhead This can b e avoided only if the

after receiving the b ounds and the op eration and can return

compiler do es the appropriate analysis to gure out that the

the nal result only after receiving done when all accumula

conditional nonstrictness do es not o ccur

tions have completed Trying to pass in all three arguments

together will result in deadlo ck

Data Structure Nonstrictness

Nevertheless it is p ossible to mo dify the program so that

6

it do es not use functional nonstrictness What is required

Datastructure nonstrictness seems to b e the most imp or

is to separate the creation of the accumulator from the clos

tant form of nonstrictness Many programs use it Some

ing of it Then we can call them by separate functions and

create circular data structures but most dene array ele

have the programmer ensure that they execute in the correct

ments recursively in terms of other elements The prototyp

order eg using explicit barriers

ical example is wavefront or the fib array presented in Sec

tion The ability to dene data structures recursively is

Conditional Nonstrictness

essential for eciency Without this feature dening a new

element would require subsequent allo cation of new copies

Only two programs require conditional nonstrictness The

of the arrays Obviously imp erative languages have pro

rst is the small function presented in Section sp eci

vided the p ower of nonstrict data structures all along In

cally develop ed by Traub to illustrate the concept of non

imp erative languages the programmer can allo cate a data

strictness going through conditionals The second is the

structure and then ll it in any order values can even b e

Id Compiler which uses conditional nonstrictness to condi

up dated in place

tionally create circular data structures As shown b elow the

One feature imp erative data structures do not have is

example is very similar to the co de presented in the previous

presence bits to provide synchronization on an elementby

subsection But this time dep ending on the arity argument

element basis The programmer has to explicitly ensure

one of two records is created

that all dep endencies are satised and that the structure

is lled in the right order In Id presence bits automati

def makevarstruct sourcename place arity opcode

cally ensure the right order For almost all of the programs

vs if eq arity then

record we were able to nd a static schedule such that fetches of

entries for arity equal

data elements would never defer Often we were required

vrnode func vsnil

to enhance our b enchmark programs with explicit sequen

tialization statements we used barriers to ensure the right

else

execution order

record

Nonstrict data structures are critical and are required

entries for arity not

for ecient execution The lenient Fib onacci example from

vrnode func vsnil

Section and the Fib onacci array from Section il

lustrate this Both construct a data structure with the rst

In vs

n Fib onacci numb ers either a list or an array and then

return the nth numb er This is the idea underlying many

It is straightforward to change this co de to avoid using

dynamic programming problems Tra Thus one of the

conditional nonstrictness All we need to do is fully dene

reasons we dont see more functional nonstrictness may b e

the record in each arm of the conditional For example it

that we have data structure nonstrictness which gives us a

could b e changed into the following

similar expressive p ower and often can express complicated

dep endencies more elegantly and eciently

def makevarstruct sourcename place arity opcode

vs if eq arity then

Related work

vst record

entries for arity equal

As far as we know this is the rst study to fo cus on how

vrnode func vstnil

much nonstrictness lenient programs require

in vst

Tremblay did a similar study for lazy evaluation Tre

else

He collected a large set of b enchmark programs and de

vse record

termined whether they actually require lazy evaluation or

entries for arity not

whether lenient evaluation suces Lazy evaluation is re

vrnode func vsenil

quired only if the program manipulates innite lists His

6

observation is that for most programs lenient evaluation is

In fact the version of Gamteb that we tested did not exhibit

functional nonstrictness as the calls to allo cate the accumulators were

sucient He found that contrary to what other authors

inlined into the caller using Ids defsubst mechanism Only when

claimed most of the programs presented in Bir All

inlining was disabled did we encounter functional nonstrictness

to developing techniques which determine when strict func All do not require laziness Tremblay also studied how

tions and conditionals should b e evaluated in a nonstrict lazy evaluation impacts parallelism under the TTDA exe

fashion to b enecially increase the parallelism As the exam cution mo del

ple flat from Section clearly shows lenient evaluation Much research has b een devoted to analysis techniques

can exp ose large amounts of consumerpro ducer parallelism that determine when the full generality of nonstrictness

which may b e completely lost under strict evaluation While is not required These techniques include strictness anal

idealized parallelism proles derived from dataow simula ysis Pey backwards analysis Hugb and path analy

tions such as the TTDA are capable of exp osing the inherent sis BH For lenient languages this is approached by par

parallelism in a program under lenient evaluation more re titioning a program into sequential threads Tra Parti

alistic metho ds have to b e used to gain information ab out tioning algorithms have b een develop ed by Tra SCvE

parallelism which reects more accurately the execution on HDGS SCG

real parallel machines Many researchers have studied the parallel execution of a

+

We conclude that while it is desirable to exploit the in variety of lenient programs AHN Hel CSS SBc

creased parallelism under lenient evaluation is not necessary ANP AE Sah HCAA CGSvE SBa BH

+

to make the language nonstrict SBb SCG Nik SGS These studies have b een

p erformed on various hardware and simulation platforms

including the TTDA dataow machines Monso on architec

Acknowledgments

ture and MINT simulator as well as TAM implementations

on the CM Sequent and JMachine All of the programs

We are grateful to Martin Rinard Guy Tremblay Nathan

used in those studies are included in this pap er

Tawil Pedro Dinez and the anonymous referees for their

valuable comments Further we want to thank Guy Trem

blay Wim Bohm Olaf Lub eck Je Hammes Christina

Conclusions

Flo o d R Paul Johnson and the whole IdinId compiler

group at MIT for providing us with the large collection This pap er studies a large set of Id b enchmark programs

of Id programs Computational supp ort at Berkeley was and determines the degree of nonstrictness they require

provided by the NSF Infrastructure Grant numb er CDA Contrary to current thought most of the programs require

Klaus Erik Schauser received research supp ort neither functional nor conditional nonstrictness The only

from the Department of Computer Science at UCSB Seth large program which requires these forms of nonstrictness

Cop en Goldstein is supp orted by an ATT Graduate Fel is the Id compiler which uses them to dene circular data

lowship structures We b elieve that b oth functional and conditional

nonstrictness are seldom used b ecause they are dicult to

grapple with mentally The seemingly cyclic dep endencies

References

from results back to arguments automatically unravel at

AA Z Ariola and Arvind PTAC A parallel intermedi

runtime and are dicult to trace Another reason for the

ate language In Proceedings of the Conference

lack of functional nonstrictness may b e that programmers

on Functional Programming Languages and Com

tend to follow the imp erative programming style they are

puter Architecture pages Septemb er

used to For example many of the larger programs we stud

AE Arvind and K Ekanadham Future Scientic Pro

ied fall into the area of scientic computation and may have

gramming on Parallel Machines Journal of Paral lel

b een based on FORTRAN equivalents eg Gamteb and

and Distributed Computing Octob er

Simple

We think however that the main reason for the lack

AHN Arvind S K Heller and R S Nikhil Programming

of functional nonstrictness lies elsewhere Even a limited

Generality and Parallel Computers In Proc of the

use of nonstrict data structures viz the ability to dene

Fourth Int Symp on Biological and Articial Intel

data structure elements recursively provides essentially the

ligence Systems pages ESCOM Leider

same expressiveness as and often can enco de these recur

Trento Italy Septemb er

sive dep endencies more eciently than purely functional

All L Allison Lazy dynamicprogramming can b e eager

nonstrictness using lists Thus many of the programs make

Info Proc Letters

use of data structure nonstrictness An unexp ected result

All L Allison Application of recursively dened data is that for many of the programs we were able to determine

structures Australian Comp Journal

a schedule such that none of the fetches to data structures

deferred This seems to indicate that at least a sequential

implementation of lenient languages might b e able to elimi

ANP Arvind R S Nikhil and K K Pingali IStructures

Data Structures for Parallel Computing Technical nate much of the overhead in managing the presence bits of

Rep ort CSG Memo MIT Lab for Comp Sci

synchronizing data structures

February Also in Proc of the Graph Reduction

One form of nonstrictness that lenient evaluation do es

Workshop Santa Fe NM Octob er

not provide is supp ort for streams and innite data struc

ANP Arvind R S Nikhil and K K Pingali IStructures tures While we think that this may b e very useful we

Data Structures for Parallel Computing ACM

would caution against making a language lazy just for this

Transactions on Programming Languages and Sys

reason First by using explicit delays or control constructs

tems Octob er

programmers can obtain the same b enet without lazy eval

+

BCS P J Burns M Christon R Schweitzer O M uation Hel Second Tremblay has observed that many

Lub eck H J Wasserman M L Simmons and

supp osedly lazy programs require only lenient evaluation

D V Pryor Vectorization of MonteCarlo Particle

Tre

Transp ort An Architectural Study using the LANL

Since funtional and conditional nonstrictness are rarely

Benchmark Gamteb In Proc Supercomputing

used and often unnecessary we suggest eliminating them

IEEE Computer So ciety and ACM SIGARCH New

from the language We b elieve that data structure non

York NY Novemb er

strictness is sucient Researchers should devote more eort

Hugb R J M Hughes Backwards Analysis of Functional BH A Bloss and P Hudak Path Semantics In Mathe

Programs pages Elsevier Science Publish matical Foundations of Se

ers BV North Holland mantics LNCS SpringerVerlag April

Joh T Johnsson Attribute Grammars as a Functional

BH W Bohm and R E Hiromoto A Functional Imple

Programming Paradigm In Proceedings of the Con

mentation of the Jacobi EigenSolver Tech Rep ort

ference on Functional Programming Languages and

CS Colorado State University May

Computer Architecture February Springer

Bir R S Bird Using Circular Programs to Elimi

Verlag LNCS

nate Multiple Traversals of Data Acta Informatica

Nik R S Nikhil Id Version Reference Manual

Technical Rep ort CSG Memo MIT Lab for Comp

BNA P S Barth R S Nikhil and Arvind MStructures

Sci

Extending a Parallel Nonstrict Functional Lan

Nik R S Nikhil The Parallel Programming Language Id

guage with State In Proceedings of the Con

and its Compilation for Parallel Machines In Proc

ference on Functional Programming Languages and

Workshop on Massive Paral lelism Amal Italy

Computer Architecture Cambridge MA August

October Academic Press

Nik R S Nikhil An Overview of the Parallel Language

CGSvE D E Culler S C Goldstein K E Schauser and

Id a foundation for pH a parallel dialect of Haskell

T von Eicken TAM A Compiler Controlled

Technical rep ort Digital Equipment Corp Cam

Threaded Abstract Machine Journal of Paral lel and

bridge Research Lab oratory Septemb er

Distributed Computing July

Pey S L Peyton Jones The Implementation of Func

CPJ C Clack and S L PeytonJones Strictness Analysis

tional Programming Languages Prentice Hall

A Practical Approach In Proc Functional Pro

Sah A Sah Parallel Language Supp ort for Shared mem

gramming Languages and Computer Architecture

ory multipro cessors Masters thesis Computer Sci

Sept SpringerVerlag LNCS

ence Div University of California at Berkeley May

+

CSS D Culler A Sah K Schauser T von Eicken

and J Wawrzynek Finegrain Parallelism with

SBa S Sur and W Bohm Analysis of NonStrict Func

Minimal Hardware Supp ort A CompilerControlled

tional Implementations of the DongarraSorensen

Threaded Abstract Machine In Proc of th Int

Eigensolver Tech Rep ort CS Colorado State

Conf on Architectural Support for Programming

University Decemb er

Languages and Operating Systems SantaClara CA

SBb S Sur and W Bohm Ecient Declarative Programs

April

Exp erience in Implementing NAS Benchmark FT

Fel M Felleisen On the expressive p ower of program

Tech Rep ort CS Colorado State University

ming languages Science of Computer Programming

Octob er

Decemb er

SBc S Sur and W Bohm NAS parallel b enchmark in

Gol S C Goldstein Implementation of a Threaded Ab

teger sort IS p erformance on MINT Tech Rep ort

stract Machine on Sequential and Multipro cessors

CS Colorado State University May

Masters thesis Computer Science Division EECS

SCG K E Schauser D E Culler and S C Goldstein

UC Berkeley UCBCSD

Separation Constraint Partitioning A New Algo

rithm for Partitioning Nonstrict Programs into Se

HA P Hudak and S Anderson Pomset Interpretations

quential Threads In Proc Principles of Program

of Parallel Functional Programs In Proc Functional

ming Languages January

Programming Languages and Comp Arch Septem

b er

Sch K E Schauser Compiling Lenient Languages for

Paral lel Asynchronous Execution PhD thesis Com

HCAA J Hicks D Chiou B S Ang and Arvind Perfor

puter Science Div University of California at Berke

mance Studies of Id on the Monso on Dataow Sys

ley May

tem Journal of Paral lel and Distributed Computing

July

SCvE K E Schauser D Culler and T von Eicken

Compilercontrolled Multithreading for Lenient Par

HDGS J E Ho ch D M Davenp ort V G Grafe and K M

allel Languages In Proc Conf on Functional Pro

Steele Compiletime Partitioning of a Nonstrict

gramming Languages and Comp Arch Aug

Language into Sequential Threads In Proc Symp

+

SGS E Sp ertus S C Goldstein K E Schauser T von

on Paral lel and Distributed Processing Dec

Eicken D E Culler and W J Dally Evaluation

Hel S K Heller Ecient Lazy DataStructures on a

of Mechanisms for FineGrained Parallel Programs

Dataow Machine PhD thesis Dept of EECS MIT

in the JMachine and the CM In Proc of the

February

th Intl Symposium on Computer Architecture San

+

HFA P H Hartel M Feeley M Alt L Augustsson

Diego CA May

P Baumann M Beemster E Chailloux C H

SPR R C Sekar S Pawagi and IV Ramakrishnan

Flo o d W Grieskamp J H G van Groningen

Small domains sp ell fast strictness analysis In Proc

K Hammond B Hausman M Y Ivory P Lee

ACM Symp on Principles of Programming Lan

X Leroy S Lo osemore N Rojemo M Serrano J

guages pages

P Talpin J Thackray P Weis and P Wentworth

Tra K R Traub A Compiler for the MIT TaggedToken

Pseudoknot a FloatIntensive b enchmark for func

Dataow Architecture Technical Rep ort TR

tional compilers DRAFT In J R W Glauert edi

MIT Lab for Comp Sci August MS The

tor Implementation of Functional Languages pages

sis Dept of EECS MIT

Scho ol of Information Systems Univ of

Tra K R Traub Implementation of Nonstrict Func

East Anglia Norwich NR TJ UK Sep

tional Programming Languages MIT Press

HL J Hammes and O Lub eck MCNPID A Dataow

Tre G Tremblay Paral lel Implementation of Lazy Func

Photon Transp ort Co de LANL June

tional Languages using Abstract Demand Propaga

Huga J Hughes Why Functional Programming Matters

tion PhD thesis McGill University Scho ol of Com

The Computer Journal

puter Science Novemb er