Abstract Interpretation and LowLevel Co de Optimization

Saumya Debray

t of Computer Science Departmen

University of Arizona

Tucson AZ

Abstract language levels our current implementation applies op

timizations at levels the Janus virtual machine

Abstract interpretation is widely accepted as a natu

the intermediate representations of the C

ral framework for semanticsbased analysis of program

and the target machine co de the last two within

prop erties However most formulations of abstract in

the C compiler In each case the optimizations can

el semantic enti terpretation are in terms of highlev

b e seen as program transformations at a particular lan

address the needs of low ties that do not adequately

guage level A fundamental requirement of the compila

level optimizations In this pap er we discuss the role

tion pro cess is that it should b e semanticspreserving

of abstract interpretation in lowlevel compiler opti

in the sense that the meaning or b ehavior of the

mizations examine some of its limitations and consider

executable co de should conform to what the semantics

ways in which they might b e addressed

or this to of the source program says it should b e F

happ en it is necessary in general that b oth transla

tions and optimizations should b e semanticspreserving

Intro duction

in this sense Since our primary fo cus is on optimiza

tions rather than translations we will assume here that

The pro cess of compilation by which executable co de

our translations satisfy this requirement and fo cus our

is generated from a source program can b e thought of

attention on optimizations

as a series of transformations and translations through

It is very often the case that an optimization is not

a succession of languages starting at the source lan

universally applicable In other words in order to en

guage and ending at the target language In this pic

sure that an optimization do es not alter the observable

ture we can distinguish b etween two kinds of trans

b ehavior of a program in unacceptable ways we have

formations translations which take a program in a

to ensure that certain preconditions particular to that

language and pro duce a program in a dierent usu

optimization are satised As an example consider reg

ally lowerlevel language and optimizations which

ister allo cation in a C compiler the value of a variable

transform a program in a language to another program

can b e kept in a register only if certain conditions re

in the same language As an example a compiler that

garding aliasing are fullled In general this means that

we have implemented for a logic programming language

it may b e necessary to examine a program and extract

called Jan us works by translating the input pro

some information ab out its b ehavior which can then

grams into C then invoking a C compiler to generate

b e used for optimization purp oses Further in order to

executable co de In this system we can identify the fol

verify that the prop erties so inferred describ e all p ossi

lowing language levels the source language the

ble runtime b ehaviors of a program it is necessary to

Janus virtual machine language C the inter

b e able to relate the analyses to the semantics of the

mediate representations within the C compiler and

language in a precise way

the target machine language In principle optimiz

Semanticsbased techniques such as abstract inter

ing transformations can b e applied at each of these ve

pretation provide a natural framework for



This work was supp orted in part by the National Science Foun

such program analyses The general idea is to rely on dation under grant CCR

the formal semantics of a program to sp ecify all of its

p ossible computational b ehaviors and to derive nitely

computable descriptions of such b ehaviors by system

atically approximating the op erational b ehavior of the 0

Benchmark Execution Time secs Heap Usage words

noopt opt nooptopt noopt opt nooptopt

aquad

bessel

binomial

chebyshev

e

fib

log

mandelbrot

muldiv

nrev

pi

sum

tak

Geometric Mean

Table Performance improvements due to lowlevel optimizations jc on a SparcstationIPC

or not an optimization is to considered lowlevel de

program The correctness of an analysis can then b e

p ends among other things on the language b eing con

derived from the mathematical relationships b etween

sidered for example in a language with explicit con

the actual computational domain of the program and

structs for iteration the implementation of a tail re

the domain of descriptions manipulated by the analy

cursive pro cedure in terms of iteration could b e consid

sis and b etween the actual op erations executed by the

ered as a highlevel optimization in a language without

program and the approximations to those op erations

sourcelevel iterative constructs however this would b e

used during the analysis

a lowlevel optimization

program transformations can b e viewed Optimizing

There are two reasons why lowlevel optimizations

at many levels corresp onding to the dierent levels of

are imp ortant The rst is that they are b eyond the

languages encountered during compilation At a high

reach of the user The p oint is that when faced with a

level for example we have transformations such as

compiler that do es not do much in the way of highlevel

nite dierencing recursion removal ie trans

optimizations the determined user can in principle

formation of recursive programs to tail recursive form

carry out the transformations manually where necessary

deforestation transformations for par

a in order to obtain co de with go o d p erformance With

allelization and vectorization see for example

compiler that do es not p erform lowlevel optimizations

as well as various transformations describ ed by Bacon

however there is little that even the most determined of

et al At the level of intermediate co de we

users can do In particular this implies that in the ab

have machineindep endent lowlevel optimizations such

sence of lowlevel optimizations even carefully crafted

as elimination closure representa

programs written by skilled programmers will incur p er

tion optimization in functional languages and

formance p enalties over which they have little control

dereferencing optimizations in logic programming lan

The second reason such optimizations are imp ortant

guages At a lower level still we have machine

is that they can pro duce substantial p erformance im

dep endent transformations such as register allo cation

provements As an example of this Table gives some

and Concep

p erformance numb ers for jc an implementation of a

tually we can divide these various optimizations into

dynamically typ ed logic programming language

two classes high level optimizations which corresp ond

The jc compiler currently p erforms only lowlevel opti

roughly to optimizations that can b e expressed in terms

mizations call forwarding which is a form of jump

of transformations on the source program or its ab

redirection at the intermediate co de level a simple form

stract syntax tree and lowlevel optimizations which

of interpro cedural register allo cation for output value

that are not visible at the involve constructs and ob jects

placement and representation optimization ie us

source level and which therefore cannot b e so expressed

ing unb oxed values where p ossible for numerical val

this classication is not absolute of course whether

ues As Table indicates for the b enchmarks tested ab out machinelevel entities has b een abstracted away

these optimizations more than double the sp eed of the The problem of course is that usually we think of the

programs on the average and also lead to signicant pro cess of abstraction as forgetting ab out irrelevant

improvements in heap memory usage The sp eed of asp ects of the b ehavior of a program while in this case

the resulting co de is comp etitive with that of optimized it is precisely the most relevant asp ects of the programs

C co de written in a natural imp erative style on the b ehavior that are b eing forgotten

b enchmarks shown the Janus programswhic h are dy The problem can b e addressed by abstract interpre

w synchronization b e namically typ ed and use datao tation based on a lowlevel semantics While this do es

tween pro ducers and consumersis on the average not seem dierent from any other sort of abstract inter

only slower than C co de compiled with gcc O pretation at a conceptual level the practical details can

ab out faster than C compiled with cc O and b ecome messy As an example it is very likely simpler

faster than C compiled with cc O This indicates and more convenient to manipulate a highlevel repre

that lowlevel optimizations can b e a valuable source of sentation of a program such as an abstract syntax tree

p erformance improvements for such analyses since the numb er of dierent kinds of

The app eal of semanticsbased program manipula ob jects and op erations that have to b e dealt with for

tion techniques is that they allow us to reason formally such representations is relatively small However it is

ab out the manipulations themselves and certify with not clear that a high level program representation can

some condence that such manipulations will not cause enco de lowlevel information in a reasonable way with

bad things to happ en This pap er considers the appli out implicit or explicit assumptions ab out the b ehav

cability and relevance of semanticsbased program anal ior of the co de generator This in turn implies that

ysis techniques such as abstract interpretation in the such analyses while simple to implement initially are

context of lowlevel co de optimization Sp ecically we p otentially fragile

argue that semantic mismatches b etween the kinds An example of this situation arises in the context

of information typically pro duced by semanticsbased of dereference chain length analysis in Prolog systems

analyses and the kinds of information needed by low In general variablevariable unications during the ex

level optimizations limit the utility of such formally de ecution of a Prolog program can cause p ointer chains

fensible analyses for these optimizations Sp ecically to b e set up and these need to b e dereferenced b e

we consider two kinds of semantic mismatch in Sec fore the value of a variable can b e accessed Deref

erencing arbitrarylength tagged p ointer chains is a tion we consider the level at which the concrete se

fairly exp ensive op eration so static analyses to infer mantics is considered and in Section we consider

the lengths of dereference chains can b e very helpful the problem of estimating runtime execution frequen

in improving program p erformancein particular when cies and costs

they allow dereference op erations to b e omitted en

tirely However highlevel semantics for

LowLevel Semantics and Abstract In

Prolog typically do not have much to say ab out low

terpretation

level asp ects such as p ointerchain lengths for exam

ple when two variables are unied such semantics say

It is not dicult to see that while the kinds of infor

nothing ab out how the p ointers are oriented Because

mation provided by abstract interpretation or other

of this dereference chain length analyses that manip

semanticsbased analyses are p erhaps necessary for

ulate highlevel representations of programssuch as

co de optimization they are by no means sucient

an Roy and Taylor must either those of V

Part of the problem is that the concrete semantics

limit their precision by refusing to handle any situation

on which abstract interpretations are typically based

where the highlevel semantics is not unambiguous or

are from the standp oint of lowlevel co de optimization

exp ose themselves to p otential fragilities by making as

not concrete enough They usually have little to say

sumptions ab out the co de generator Closure analysis

ab out the registers and bit vectors and p ointers and

in the Orbit compiler for Scheme provides an

other such lowlevel entities that are actually manipu

other example of the use of a high level representation

lated during program execution Indeed the concrete

2

In our Janus system for example we found that an optimization

semantics usually encountered can themselves b e seen

to eliminate unncessary dereference op erations based on an analysis

as abstractions of lowerlevel characterizations of pro

that used the abstract syntax tree of the program similar to analy

ses of Van Roy and Taylor led to incorrect co de b eing

gram b ehavior where some or all of the information

generated when the mechanism for dealing with susp ensions changed

1

It turned out that as an illadvised con venience hack the analysis

These numb ers do not include the eects of tail call optimiza

made implicit assumptions ab out whether or not the co de genera

tion though strictly sp eaking that is a lowlevel optimization in our

tor would return output values in registers These assumptions were

If the eects of tail call optimization are included the sp eed context

to handle sus rendered invalid when the co de generator was mo died

improvement is by a factor of ab out

p ensions dierently but the analysis phase did not know ab out this

Source Program Concrete Meaning

oncrete semantics c

-

P M

transformation abstraction

? ?

concrete semantics

-

0 0

P M

Approximate Program Abstract Meaning

Figure Program Analysis using Abstract Compilation

for analyzing lowlevel asp ects of a program b ehavior

than in a highlevel representation Second relation

in this case decisions ab out the lowlevel representation

ships b etween ob jects eg whether or not two ob jects

of closures are based on the structure of the abstract

overlap in memory may b e harder to reconstruct by

syntax tree for the program

examining a sequence of lowlevel op erations

It may b e p ossible to get around this problem in

Because of the large numb er of dierent op erations

some cases by lifting implementationlevel asp ects of

that might b e encountered in a lowlevel representation

a program to the source level and then treating the

of a program and the comparatively larger size of such a

analysis and optimization problems as highlevel issues

representation one might exp ect a low level abstract in

a parallel Pro This approach is taken in Prolog

terpretation to b e considerably slower than a high level

log system which extends the source language to al

one This problem can b e alleviated to some extent by

low various lowerlevel parallelization and synchroniza

a technique that with tongue rmly in cheek we call

tion issues to b e addressed at the source level An

abstract compilation The idea is the following to re

other example of such an approach can b e seen in ex

duce the cost of program analysis instead of rep eatedly

p osing lowlevel representational asp ects of data such

traversing an internal representation of the program P

as whether they are b oxed or unb oxed at the source

b eing analyzed we partially evaluate an abstract inter

level and formulating representation optimizations in

preter to with resp ect to P so as to pro duce a program

0

terms of sourcelevel program transformations

P which when executed yields the result of analyz

However it may not always b e p ossible to capture low

ing the original program P In practice for

level optimizations by lifting them to the source level

any particular analyses that we wish to implement in a

in this way for example it is not clear how the imple

compiler we will know enough ab out the corresp onding

mentation of aggregate up dates in a singleassignment

abstract interpreters that instead of invoking a general

language via compilerintro duced destructive up dates

purp ose partial evaluator on such an interpreter and the

see for example could b e expressed

input program P we can simply make a single pass over

0

at the source level

P and pro duce P indeed we initially thought of this

The alternative is to use a lower level representa

in terms of program transformation rather than partial

tion eg a sequence of intermediate co de instructions

evaluation this is illustrated in Figure The idea is

This has the advantage that the appropriate lowlevel

similar to the notion of need expressions prop osed by

details have b een made explicit and can b e reasoned

Maurer in the context of strictness analysis McN

ab out without having to resort to assumptions ab out

erney also uses a similar approach for an abstract inter

the b ehavior of other parts of the compiler This is con

pretation to verify the correctness of lowlevel compiler

ceptually cleaner and more defensible than the previous

optimizations

approach However there are two imp ortant practical

At rst glance it might app ear that such an approach

problems that arise with this approach First the num

is practical only in languages such as Prolog and Lisp

b er of op erations that have to b e accounted for is likely

where it is easy to create program fragments on the

to b e considerably larger in a lowlevel representation

y and execute them For languages such as C for ex

ample the traditional mo del for generating executable the latter p oint

co de for a program would most likely incur much to o One might feel that this is not after all such an

much IO overhead in writing out a program or ex imp ortant issue b ecause the primary technical problem

ecutable co de into a le and then reading it back in in program analysis and optimization is to ensure that

to make this worthwhile However recent work in dy bad things do not happ en ie an optimization do es

namic co de generation for such languages indi not cause a program to b ehave incorrectly It is unde

cates that the runtime overhead asso ciated with creat niably true that correctness is fundamentally more im

ing and executing co de for such languages at runtime p ortant than p erformance and that we should always

can b e made small enough to make such an approach cho ose to compute a correct resultp erhaps slowly

practical The success of dynamic co de generation in rather than an incorrect result quickly It can b e ar

the SELF system also suggests that the abstract gued however that identifying bad things happ ening

compilation approach may b e practically usable in gen with semantic incorrectness takes to o narrow a view of

eral the situation Given two computations that b oth pro

The second problem referred to ab ove is that re duce the same correct solution to a problem we would

lationships b etween ob jects that may b e relatively probably cho ose the one that is faster or uses less mem

straightforward to detect at a high level may b e much ory or is b etter according to some appropriate measure

harder to rediscover in a lower level analysis For exam of p erformance In such a setting if the p erformance of

ple a value that is easily identiable as a list or a tree at a program is adversely aected by the p o or decisions of

a high level may b e visible only as a jumble of p ointers an optimizer one can certainly argue that bad things

during a low level analysis making it much more com have happ ened

plicated to rediscover relationships b etween its comp o As an example of a p erfectly plausible optimization

nents eg compare highlevel typ e inference as in where inadequate attention to lowlevel details can lead

with comparable lowlevel analyses as in On to a p erformance degradation consider subprogram in

the other hand not all structural relationships b etween lining which is conceptually very similar to the unfold

ob jects may b e amenable to highlevel analysis eg ing transformation of Burstall and Darlington

sharing relationships b etween ob jects may dep end on The main motivation b ehind this transformation where

sp ecic implementation decisions that are invisible at a call to a subroutine is replaced by an appropriate in

a high level We have found that combining high stance of the b o dy of the called subroutine is to reduce

and lowlevel analyses works well for this The idea program execution time by eliminating the overhead

is to rst carry out a highlevel analysis and annotate asso ciated with calling the subroutine and eventually

the highlevel representation of the program with this returning from it Davidson and Holler have shown

information When this is translated to a lowerlevel however that register usage can b e adversely aected

representation eg from an abstract syntax tree to a by inlining rst the numb er of registers that have

sequence of intermediate co de instructions the high to b e saved and restored at a subroutine call may in

level prop erties are also translated into lowlevel terms crease after inlining and second register allo cation de

alongside and the lowlevel representation annotated cisions may change as a result of inlining causing some

appropriately Subsequent lowlevel optimizations can frequently accessed variables to b e stored in memory

then use the lowlevel information in a straightforward This can cause the inlined program to actually

way run slower than the program without inlining Co op er

et al rep ort a similar exp eriencethough for dier

ent reasonswith subprogram inlining in Fortran

Cost Mo dels and Co de Optimization

Richardson describ es a somewhat dierent form of

bad things happ ening in the context of this trans

A fundamental problem in lowlevel co de optimization

formation individual functions may grow enormously

is that abstract interpretation can tell us only whether a

in size as a result of inlining even though the overall

particular optimization is p ermissible it has nothing to

growth of the size of the entire program may b e rela

say ab out whether or not it is desirable in a particular

tively mo dest leading to greatly increased time and

context For example we may discover as a result of

space requirements during compilation and optimiza

that a variable may b e kept in a register

tion and in the worst case causing compilation to fail

over the course of a computation without aecting the

due to inadequate memory

result It may turn out however that this is not a

Another example of this phenomenon can b e seen

worthwhile thing to do b ecause it precludes the use of

in stack allo cation of closures in functional languages

that register to hold another more frequently accessed

The idea is that while closures need to b e heap

variable The kinds of information typically obtained

allo cated in general with enough information ab out the

from abstract interpretation provide little guidance on

lifetime of a closure in a program it may b e p ossible to guide the subprogram inlining optimization discussed

avoid this and allo cate it on the stack instead for a ab ove

discussion of various lowlevel considerations for stack Note that the need for lowlevel cost mo dels do es not

vs heap allo cation see Unless care is exercized go away if we lift lowlevel op erations to the source

however this can lead to an increase in the memory level as is done for b oxing and unb oxing op erations us

requirements of a program b ecause dead variables in ing representation typ es For example Henglein

stackallo cated closures are nevertheless traversed by and Jrgensens notion of formally optimal b oxing

the garbage collector In extreme cases this can do es not take into account machine level costs or execu

cause a program to fail at runtime due to insucient tion frequencies Because of this it may happ en that a

memory availability program that is compiled to formally optimal form may

The nal example of p otentiallyp essimizing opti b e slower at runtime than one that is not optimal in

mizations we consider is tabulation also known as this sense but which uses a low level cost mo del and

memoization where calls to a function or pro cedure execution frequency information to guide the placement

and the corresp onding return values are noted in a ta of b oxing and unb oxing op erations eg see

ble The idea is that by consulting this table subse Unfortunately the construction of reasonable low

quent calls may b e able to reuse a previously computed level cost mo dels seems nontrivial for a numb er of rea

value and thereby avoid having to actually execute the sons First it seems quite dicult to predict the con

called function An oftcited example of the b enets of crete cost of a program eg in terms of the numb er

tabulation is the naive exp onentialtime Fib onacci func of machine cycles it takes to execute the program on a

tion which runs in linear time with tabulation How particular input b ecause even if we cho ose to ignore the

ever if functions are tabulated without careful consid characteristics and b ehavior of the underlying op erat

eration of the relative costs and b enets of tabulation ing system we would have to account for machinelevel

the cost of table manipulation can overwhelm any b en asp ects of execution such as cache b ehavior in consid

ets that accrue from it As an example in an exp er erable detail One p ossibility might b e to abstract away

iment with tabulation using Ackermanns function we from such really lowlevel and more or less unpre

found that the computation generated so many entries dictable asp ects and use some kind of abstract machine

in the table that even though table lo okups incurred a description that nevertheless mo dels some of the more

great many successful hits the cost of table manage imp ortant asp ects of an implementation Such abstract

ment led to an overall slowdown in the program The cost mo dels have b een used successfully for example

large numb er of table entries also led to a signicant for data representation optimizations for improv

increase in the memory requirements of the program ing data lo cality and register allo cation see

raising again the sp ecter of runtime failure due to in for example

sucient memory However even with simplications to the machine

These examples illustrate two p oints rst without mo del to make it tractable we may need estimates of

careful attention to lowlevel details even apparently execution frequencies for dierent parts of a program

plausible optimizations can result in an overall degrada to give an estimate of its cost this is crucial for op

tion in program p erformance and second such p erfor timizations where a reduction in cost in one part of a

mance degradations should b e taken seriously as a bad program may b e traded for a p ossible increase in cost in

thing In the worst case they can lead to execution fail another part Where current systems use execution fre

ure in correctly written programs and this is no b etter quency estimates however they very often tend to rely

than an incorrectly p erformed optimization A funda on fairly simpleminded heuristics based on the static

mental motivation b ehind program analysis frameworks lo op nesting structure of the program This can lead

such as abstract interpretation is to give such analyses to estimates that are quite imprecise As an exam

a solid foundation on the mathematical semantics of ple a common heuristic used for register allo cation in

programming languages and thereby allow us to reason is to assume that each lo op is executed some

formally ab out prop erties such as correctness This in xed numb er of times usually b etween and see

turn is driven by the desire to ensure that any transfor for example Walls studies indi

mations that are p erformed do not change the b ehavior cate however that the proles of basic blo ck execution

of a program in undesirable ways This suggests the frequency and pro cedure call frequency obtained using

need for reasonable cost mo dels that are able to account this technique can b e surprisingly p o or b eing in many

for lowlevel asp ects of program execution in sucient cases not much b etter than random proles As

detail that optimizations guided by them can reason users we have exp erienced this problem in the context

ably b e exp ected to not go of up to o badly Dean and of our Janus compiler which translates programs

Chamb ers discuss the use of such cost mo dels to to C and invokes gcc our lack of explicit control over

register allo cation in the C compiler combined with tasks and another handles inputs that are small enough

its often imp erfect execution frequency estimates o cca that sequential execution is preferable At runtime the

sionally lead to the unexp ected situation where trans appropriate version of a function is selected dynami

formations at the Janus virtual machine level that one cally by comparing the size of the input arguments with

would reasonably exp ect to yield sp eed improvements a systemdep endent threshold size for that function

actually pro duced slowdowns in overall execution sp eed that is determined at compile time In principle one

As a concrete example in a b enchmark program to could imagine using a similar approach for other low

evaluate Chebyshev p olynomials when we turned o level optimizations as well generate co de for dierent

garbage collectionexp ecting an improvement in exe versions of a program fragment to account for dier

cution sp eed b ecause of a reduction in the numb er of ent various optimization scenarios and cho ose the one

explicit overow checks on the heap p ointerwe found that is appropriate in any particular context if nec

that the change in the numb er and distribution of static essary dynamically Chamb ers refers to this kind

references to the heap p ointer led to changes in the reg of application of p olyvariant sp ecialization to arbitrary

ister allo cation decisions in the C compiler that resulted pieces of a program rather than b eing limited to say

in an overall slowdown of ab out functions or pro cedures as splitting A straightforward

The problem is not entirely that static analysis prob implementation of this idea seems impractical b ecause

lems such as the estimation of execution frequencies and of the almost certain explosion in co de size it would in

costs are not amenable to formal metho ds Early work cur Moreover interactions b etween dierent lowlevel

on these problems includes that of Cohen and Zuc ker decisions in dierent versions would have to b e taken

man who consider cost analysis of Algol programs into account It would b e interesting to see whether

Wegbreit whose pioneering work on cost analy such problems could b e addressed well enough to make

sis of Lisp programs addressed the treatment of recur it practical to incorp orate semanticsbased metho ds for

sion and those of Ramshaw and Wegbreit execution frequency and cost analysis into compilers

who discuss the formal verication of cost sp ecications

Since then the question of cost analysis has b een inves

Summary

tigated by a numb er of researchers see for example

Many of these

Compiler optimizations can b e divided into two broad

use semanticsbased metho ds for example Rosendahl

classes highlevel optimizations which corresp ond to

uses abstract interpretation for cost analysis and

transformations expressible in terms of sourcelevel con

Wadler uses pro jection analysis Despite this fact

structs and lowlevel optimizations which are not so

the use of formally defensible semanticsbased tech

expressible While abstract interpretation is widely ac

niques for the estimation of execution frequencies or

cepted as a natural framework for semanticsbased pro

program costs do es not seem very common in actual

gram analyses we have found that in many cases such

compilers This could p ossibly b e due to a p erception

analyses are not quite suitable for lowlevel optimiza

that such techniques are interesting research to ols but

tions There are two main reasons for this The rst

to o exp ensive to b e part of a compiler Another reason

is that there is often a semantic mismatch b etween

may b e that the information obtained from such anal

the kinds of information abstract interpretations pro

yses which are typically prop ositions of the form on

vide and the kinds of information a compiler wants

an input of length N the function f requires at most

for its lowlevel optimizations abstract interpretations

N N computational steps are not directly

are typically formulated in highlevel program seman

amenable to lowlevel co de optimization applications

tics while for lowlevel optimization we need informa

which would prefer to have more absolute information

tion ab out machinelevel entities like registers p ointers

x is accessed times of the form variable

in memory etc The second reason is that in order

Some recent work on dynamic control of task cre

to carry out a lowlevel optimization in general it is

ation in parallel systems suggests how cost es

not enough to know that the optimization is p ermis

timates based on semanticsbased metho ds might b e

sible we need to know also that it is desirable De

incorp orated into compilers In essence the idea in

termining whether a particular optimization is desir

is to use p olyvariant sp ecialization at a lowlevel

able in a particular context requires lowlevel cost mo d

to construct dierent versions for each pro cedure one

els as well as knowledge ab out execution frequencies

version handles inputs that are large enough to justify

While there has b een a considerable b o dy of work on

the overheads asso ciated with the creation of parallel

semanticsbased metho ds for execution cost analysis of

3

programs these techniques do not seem to b e used very

While gcc version provides extensions that provide some degree

of user control over hardware register allo cation we do not use them

much within actual compilers which tend to use sim

at this time for p ortability reasons

ple and p otentially imprecise heuristics Again this is

due in part to a semantic mismatch semanticsbased References

cost analyses typically yield cost functions or execu

A V Aho R Sethi and J D Ullman Compilers

tion frequency functions that are expressed in terms of

Principles Techniques and Tools AddisonWesley

input size while for optimization purp oses it is easier

to work with absolute values for execution frequencies

and costs

Lakshman A Aiken E L Wimmers and T K

A fairly obvious solution to the rst problem is to use

Soft Typing with Conditional Typ es Proc st

a lowlevel concrete semantics that makes explicit the

ACM Symposium on Principles of Programming

entities that are of interest in the context of lowlevel

L anguages Portland Oregon Jan pp

optimizations The main pragmatic problem here is

that lowlevel program representations tend to b e con

siderably larger than highlevel representations making

R Allen and K Kennedy Automatic Translation

analyses more exp ensive A p ossible solution is to re

of FORTRAN Programs to Vector Form ACM

duce the overhead asso ciated with interpreting a pro

Transactions on Programming Languages and Sys

gram over an abstract domain by using some form of

tems vol no Oct pp

abstract compilation ie by executing an appro

A W App el and Z Shao An Empirical and Ana

priately mo died form of the lowlevel representation

lytical Study of Stack vs Heap Cost for Languages

of the program instead of interpreting its comp onents

with Closures Research Rep ort CSTR

There is the additional issue that program prop erties

Dept of Computer Science Princeton University

that are relatively easily inferrable at a highlevel may

March

b e obscured in a lowerlevel analysis but this can b e

handled by initially analyzing the program at a high

Ko drato Some Techniques J Arsac and Y

level then translating the highlevel program prop er

for Recursion Removal from Recursive Functions

ties into lowlevel terms during the translation of the

ACM Transactions on Programming Languages

program into a lowerlevel language

and Systems vol no Apr pp

The second problem can b e addressed at least in

D F Bacon S L Graham and O J Sharp

principle via p olyvariant sp ecialization at the lowlevel

Compiler Transformations for HighPerformance

This idea has b een applied to controlling dynamic task

Computing Computing Surveys vol no Dec

creation in parallel systems and app ears to work rea

pp

sonably well However a signicant problem that has

to b e addressed when applying this to lowlevel co de

P A Bigot D Gudeman and S K Debray Out

optimization is that of controlling co de growth

put Value Placement in Mo ded Logic Programs

The app eal of semanticsbased program manipula

Proc Eleventh Int Conf on Logic Programming

tion techniques is that they allow us to reason formally

June pp MIT Press

ab out the manipulations themselves and certify with

some condence that such manipulations will not cause

P A Bigot and S K Debray A Simple Approach

bad things to happ en Much of the current practice of

to Supp orting Untagged Ob jects in Dynamically

lowlevel optimizations seems guided by simple heuris

Typ ed Languages Draft Rep ort Dept of Com

tics rather than careful semantic treatment Because of

puter Science University of Arizona Tucson Nov

this it is not clear that much can b e said ab out whether

or not bad thingscan happ en an indeed we some

D Bernstein M C Golumbic Y Mansour R

times do encounter situations where apparently plausi

Y Pinter D Q Goldin H Krawczyk and I

ble improvements to a program can lead to a degra

Nahshon Spill Co de Minimization Techniques for

dation in its p erformance This is undesirable but if

Optimizing Compilers Proc SIGPLAN Con

semanticsbased techniques can b e adapted for lowlevel

ference on Programming Language Design and Im

optimizations it may b e p ossible to reduce or eliminate

plementation Portland June pp

such anomalous situations in the future

R S Bird Tabulation Techniques for Recursive

Programs Computing Surveys vol no Dec

wledgements Ackno

pp

Numerous valuable discussions with Manuel

B Bjerner and S Holmstrom A Comp ositional

Hermenegildo are gratefully acknowledged

Approach to Time Analysis of First Order Lazy

Functional Programs Proc ACM Conference on

Functional Programming Languages and Computer P Cousot and R Cousot Abstract Interpretation

Architecture pp A Unied Lattice Mo del for Static Analysis of Pro

grams by Construction or App oroximation of Fix

F Bueno M Garc a de la Banda and

p oints Proc Fourth ACM Symposium on Princi

M Hermenegildo Eectiveness of Global Anal

ples of Programming Languages pp

ysis in Strict Indep endenceBase d Automatic Pro

gram Parallelization Proc International Sympo

P Cousot and R Cousot Systematic Design of

sium on Logic Programming Nov pp

Program Analysis Frameworks Pro c Sixth ACM

MIT Press

Symp osium on Principles of Programming Lan

guages pp

R M Burstall and J Darlington A Transforma

tion System for Developing Recursive Programs

P Cousot Semantic Foundations of Program

Journal of the ACM vol no Jan pp

Theory and Analysis in Program Flow Analysis

Applications eds S S Muchnick and N D Jones

PrenticeHall

S Carr K S McKinley and CW Tseng Com

piler Optimizations for Improving Data Lo cality

J W Davidson and A M Holler Subprogram

International Conference on Architec Proc Sixth

Inlining A Study of its Eects on Program Execu

tural Support for Programming Languages and Op

tion Time IEEE Transactions on Software Engi

erating Systems San Jose California Nov

neering vol no Feb pp

pp SIGPLAN Notices vol no

K De Bosschere S K Debray D Gudeman and

G J Chaitin Register Allo cation via Graph Col

S Kannan Call Forwarding A Simple Interpro

oring Proc ACM Conference on Compiler

cedural Optimization Technique for Dynamically

Construction Boston June pp

Typ ed Languages Proc st ACM Symposium

on Principles of Programming Languages Port

C Chamb ers The Design and Implementation of

land Oregon Jan pp

the SELF Compiler an for

ctOriented Programming Languages PhD Obje

J Dean and C Chamb ers Towards Better In

Dissertation Stanford University

lining Decisions using Inlining Trials Proc

D R Chase Safety Considerations for Storage

ACM Conference on Lisp and Functional Program

Allo cation Optimizations Proc SIGPLAN

ming Orlando Florida June pp

Conference on Programming Language Design and

S K Debray Optimizing AlmostTailRecursive

Implementation Atlanta June pp

Prolog Programs Proc Functional Programming

D R Chase M Wegman and F K Zadeck

Languages and Computer Architecture Nancy

Analysis of Pointers and Structures Proc ACM

France Sept

SIGPLAN Conference on Programming Lan

S K Debray and D S Warren Automatic Mo de

guage Design and Implementation White Plains

Inferencing for Logic Programs J Logic Program

NY June pp

ming vol no Sept pp

WN Chin Safe Fusion of Functional Expres

S K Debray N Lin and M Hermenegildo Task

sions Proc ACM Conference on Lisp and Func

Granularity Analysis in Logic Programs Proc

tional Programming San Francisco June pp

ACM SIGPLAN Conference on Programming

Language Design and Implementation June

F C Chow and J L Hennessy The Priority

pp

Based Coloring Approach to Register Allo cation

ACM Transactions on Programming Languages

S K Debray and NW Lin Cost Analysis of

and Systems vol no Oct pp

Logic Programs ACM Transactions on Program

ming Languages and Systems vol no Nov

J Cohen and C Zuckerman Two Languages for

pp

Estimating Program Eciency Communications

of the ACM vol no June pp

A Deutsch On Determining Lifetime and Alias

ing of Dynamically Allo cated Data in Higher Order

K D Co op er M W Hall and L Torczon Un

ACM Sym Functional Sp ecications Proc th

exp ected Side Eects of Inline Substitution ACM

posium on Principles of Programming Languages

Letters on Programming Languages and Systems

Jan pp

vol no March pp

D R Engler and T A Pro ebsting DCG An D Kepp el S J Eggers and R R Henry A Case

Ecient Retargetable Dynamic Co de Generation for Runtime Co de Generation Technical Rep ort

System Proc Sixth International Conference on Department of Computer Science Uni

Architectural Support for Programming Languages versity of Washington

and Operating Systems San Jose California Nov

An Optimizing Compiler D Krantz ORBIT

pp SIGPLAN Notices vol no

for Scheme PhD Dissertation Yale Univer

sity Also available as Technical Rep ort

I Foster and W Winsb orough Copy Avoidance

YALEUDCSRR Dept of Computer Sci

through CompileTime Analysis and Lo cal Reuse

ence Yale University Feb

Proc International Symposium on Logic Pro

D Krantz R Kelsey J Rees P Hudak J Philbin

gramming San Diego Nov pp

and N Adams ORBIT An optimizing Compiler

MIT Press Cambridge

for Scheme Proc SIGPLAN Symposium on

P B Gibb ons and S S Muchnick Ecient In

Compiler Construction pp

struction Scheduling for a Pip elined Architecture

D Le Metayer ACE An Automatic Complexity

Proc ACM SIGPLAN Conference on Compiler

Evaluator ACM Transactions on Programming

Construction June pp

Languages and Systems vol no April

K Gopinath and J Hennessy Copy Elimination

pp

in Functional Languages Proc Sixteenth ACM

X Leroy Unb oxed ob jects and p olymorphic typ

Symposium on Principles of Programming Lan

ing Proc th ACM Symposium on Principles of

guages Austin TX Jan pp

Programming Languages Albuquerque NM Jan

D Gudeman K De Bosschere and SK Debray

pp

jc An Ecient and Portable Sequential Imple

T S McNerney Verifying the Correctness of

mentation of Janus Proc Joint Int Conf and

Compiler Transformations on Basic Blo cks us

Symp on Logic Programming Nov pp

ing Abstract Interpretation Proc Symposium on

MIT Press

Partial Evaluation and SemanticsBased Program

F Henglein and J Jrgensen Formally Optimal

Manipulation New Haven CT June pp

Boxing Proc st ACM Symposium on Prin

ciples of Programming Languages Portland OR

A Marien G Janssens A Mulkers and M

Jan pp

Bruyno oghe The Impact of Abstract Interpre

M Hermenegildo and K Greene The

tation on Co de Generation an Exp eriment in

Prolog System Exploiting Indep endent And

Co de Generation Proc Sixth International Con

Parallelism New Generation Computing vol

ference on Logic Programming Lisb on Portugal

nos pp

June MIT Press

M Hermenegildo R Warren and S K Debray

D Maurer Strictness computation using sp ecial

Global Flow Analysis as a Practical Compilation

expressions in Programs as Data Objects Oct

To ol Journal of Logic Programming vol no

pp Springer Verlag LNCS vol

Aug

A Mulkers W Winsb orough and M Bruyno oghe

P Hudak and A Bloss The Aggregate Up date

Analysis of Shared Data Structures for Compile

Problem in Functional Languages Proc Twelfth

Time Garbage Collection in Logic Programs

ACM Symposium on Principles of Programming

Proc Seventh International Conference on Logic

Languages pp

Programming Jerusalem June pp

L Huelsb ergen J R Larus and A Aiken Using MIT Press

RunTime List Sizes to Guide Parallel Thread Cre

A Mulkers W Winsb orough and M Bruyno oghe

ation Proc ACM Conference on Lisp and Func

LiveStructure Dataow Analysis for Prolog

tional Programming June pp

ACM Transactions on Programming Languages

S Kaplan Algorithmic Complexity of Logic Pro

and Systems vol no March pp

grams Proc Fifth International Conference on

Logic Programming Seattle pp MIT Press

R Paige and S Ko enig Finite Dierencing of M Sharir Some Observations Concerning For

Computable Expressions ACM Transactions on mal Dierentiation of Set Theoretic Expressions

Programming Languages and Systems vol no ACM Transactions on Programming Languages

July pp and Systems vol no April pp

J Shultis On the Complexity of HigherOrder

J C Peterson Untagged Data in Tagged En

Programs Technical Rep ort CUCS Univer

vironments Cho osing Optimal Representations

sity of Colorado Feb

at Compile Time Proc Functional Programming

Languages and Computer Architecture London

A Taylor LIPS on a MIPS Results from a Prolog

Sept pp

Compiler for a RISC Proc Seventh International

Conference on Logic Programming Jerusalem Is

S Peyton Jones and J Launchbury Unb oxed val

rael June

ues as rst class citizens in a nonstrict functional

language Proc Functional Programming Lan

Performance Prolog Implementa A Taylor High

guages and Computer Architecture pp

tion PhD thesis University of Sidney Australia

M L Powell A Portable Optimizing Compiler

K Thompson A New C Compiler Proc Sum

for Mo dula Proc SIGPLAN Symposium on

Conference London July mer UKUUG

Compiler Construction Montreal Canada June

pp

pp

P Van Roy Can Logic Programming Execute as

T A Pro ebsting and C N Fischer Lineartime

Fast as Imperative Programming PhD thesis Uni

Optimal Co de Scheduling for DelayedLoad Archi

versity of California at Berkeley

tectures Proc ACM SIGPLAN Conference

P Wadler Strictness Analysis Aids Time Analy

on Programming Language Design and Implemen

sis Proc th ACM Symposium on Principles of

tation Toronto June pp

Programming Languages Jan pp

L H Ramshaw Formalizing the Analysis of Al

P Wadler Deforestation Transforming programs

gorithms PhD Thesis Stanford University

to eliminate trees Proc European Symposium

Also available as Rep ort SL Xerox Palo Alto

on Programming Nancy France March pp

Research Center Palo Alto California

SpringerVerlag LNCS vol

B Reistad and D Giord Static Dep endent

T A Wagner V Maverick S L Graham and M

Costs for Estimating Execution Time Proc

A Harrison Accurate Static Estimators for Pro

ACM Conference on Lisp and Functional Program

gram Optimization Proc ACM SIGPLAN

ming Orlando Florida June pp

Conference on Programming Language Design and

S E Richardson Evaluating Interprocedural

Implementation Orlando Florida June pp

Code Optimization Techniques PhD Dissertation

Stanford University Also available as Tech

D W Wall Predicting Program Behavior Using

nical Rep ort CSLTR Computer Systems

Real or Estimated Proles Proc SIGPLAN

Lab oratory Stanford University Feb

Conference on Programming Language Design and

M Rosendahl Automatic Complexity Analysis

Implementation Toronto Canada June pp

Proc ACM Conference on Functional Program

ming Languages and Computer Architecture

B Wegbreit Mechanical Program Analysis

pp

Communications of the ACM vol no Sept

pp

D Sands Complexity Analysis for a Lazy Higher

Order Language Proc rd European Symposium

B Wegbreit Verifying Program Performance

on Programming May pp Springer

Journal of the ACM vol no Oct pp

Verlag LNCS vol

E Schonb erg J T Schwartz and M Sharir An

M E Wolf and M S Lam A Data Lo cality Op

Automatic Technique for Selection of Data Repre

timizing Algorithm Proc SIGPLAN Confer

sentations in SETL Programs ACM Transactions

ence on Programming Language Design and Imple

on Programming Languages and Systems vol no

mentation Toronto Canada June pp April pp