Alias Analysis of Executable Co de

Saumya Debray Rob ert Muth Matthew Weipp ert

artment of Computer Science Dep

University of Arizona

Tucson AZ USA

fdebray muth weippertgcsarizonaedu

tecture such as the DEC Abstract Alpha linktime optimizers

need to carry such as Spike alto and OM

Recent years have seen increasing interest in systems

out instruction scheduling again after linktime opti

that reason ab out and manipulate executable co de

mizations without p ointer alias information however

Such systems can generally b enet from information

the scheduler must b e conservative in its treatment of

ab out aliasing Unfortunately most existing alias

loads and stores and this can limit the amount of co de

el language analyses are formulated in terms of highlev

reordering that is p ossible As a nal example it may

features and are unable to cop e with features such

b e p ossible to scavenge registers at linktime eg by

as p ointer arithmetic that p ervade executable pro

examining the register usage of library functions but

grams This pap er describ es a simple algorithm that

the ability to use such scavenged registers eectively

can b e used to obtain aliasing information for exe

is likely to b e limited in the absence of p ointer alias

cutable co de In order to b e practical the algorithm is

information

careful to k eep its memory requirements low sacri

There is an extensive b o dy of work on p ointer alias

ing precision where necessary to achieve this goal Ex

analysis of various kinds see Section In almost all

p erimental results indicate that it is nevertheless able

cases these are high level analyses carried out on rep

to provide a reasonable amount of information ab out

resentations of source programs in terms of source lan

memory references across a variety of b enchmark pro

guage constructs and typically disregarding nast y

grams

features such as typ e casts p ointer arithmetic and

Intro duction

outofb ounds array accesses Such analyses turn out

unfortunately to b e of limited utility at the machine

Recent years have seen increasing interest in reason

co de level b ecause at this level all we have are the

ing ab out and manipulating executable les

nasty features The contents of registers and mem

When working with an executable

ory words are untyp ed bitstrings so the issue of typ e

le we typically have information ab out the entire

casts is in some sense mo ot everything is p otentially

programincluding p otentially library functions

an address Memory accesses typically involve some

that is usually not available at compile time Because

address arithmetic to compute a base address into a

of this co de manipulation and optimization at this

register followed by the use of a displacement o the

level oers b enets that are dicult or imp ossible to

base address to carry out the actual memory refer

with the compi obtain using traditional compilers As

ence Address arithmetic may also arise due to par

lation of sourcelevel programs co de transformations

ticular language features eg the use of tag bits in

on executable co de can b enet greatly from p ointer

dynamically typ ed languages to indicate the typ e of

alias information F or example inlining library rou

the value p ointed at Dereferencing op erations in the

tines may op en up opp ortunities for moving invariant

executable co de for such programs will involve nontriv

instructions out of lo ops but alias information is load

ial arithmetic involving the tag bits that is invisible

needed in order to identify such invariant load instruc

and irrelevantat the source level at the level of ex

tions To obtain the full b enets of a sup erscalar archi

tell what source language ecutable programs we cant



This work was supp orted in part by the Na

a particular piece of co de was derived from and dier

tional Science Foundation under grant CCR

ent comp onents of a program might have b een written

Copyright ACM To app ear in the Pro ceedings of

in dierent source languages so we must b e able to

the th Annual ACM SIGPLANSIGACT Symp osium on

deal with all such address arithmetic in a reasonable

Principles of Programming Languages January

way If the numb er of arguments to a function is b e expressed in terms of these eg subtraction and

large enough some of the arguments may have to b e registertoregister moves can b e mo delled in terms of

passed on the stack In such a case the arguments addition we do not consider these separately In ad

passed on the stack will typically reside at the top of dition to these we assume the usual complement of

the callers stack frame and the callee will reach into tests conditional jumps and direct and indirect un

the callers frame to access them this is nothing but an conditional jumps the only eect of these instructions

outofb ounds array reference Finally executable pro is to determine the control ow graph of the program

grams may include library functions in handwritten so we do not consider them explicitly in the context

assembly co de that violate familiar and comfortable of alias analysis We also ignore op erations on oat

sourcelevel assumptions eg that execution do es not ing p oint registers since it seems unlikely that such

jump out of the middle of one function and into the op erations would b e used for address computations

middle of another this happ ens for example in some

Lo cal Alias Analysis

Fortran library routines To illustrate some of the

problems that arise consider the fragment of C co de

A technique called instruction inspection commonly

shown in Figure together with the corresp onding as

used in compiletime instruction schedulers can b e

1

The p oint to note is the extensive use sembly co de

used to reason ab out memory references within a ba

of address arithmetic to access memory even in this

sic blo ck Here two memory reference instructions i

1

very simple program fragment For example in order

and i are taken to b e nonconicting if either of the

2

to determine whether instructions and might

following conditions hold

write to the same memory lo cation we need to b e able

to reason ab out the contents of registers r and r

they use distinct osets from the same base reg

which are dened through the arithmetic op erations in

ister r and r is not redened b etween i and i

1 2

instructions and As this example illustrates

or

p ointer arithmetic cannot b e ignored during alias anal

ysis at the machine co de level

one of the instructions uses a register known to

p oint to the stack and the other uses a register

In this pap er we describ e a lowlevel o w

known to p oint to the global data area

sensitive contextinsensitive interpro cedural p ointer

alias analysis algorithm designed and implemented in

the context of the alto link time optimizer that

Unfortunately this simple approach do es not work if

can handle signicant p ointer arithmetic and features

information ab out address arithmetic needs to b e prop

such as outofb ound references that are ignored by

agated across basic blo ck b oundaries In the next sec

most existing alias analysis algorithms

tion we describ e a global analysis that can b e used to

handle this

For simplicity in the discussion that follows we as

sume a more or less canonical RISC instruction set

ResidueBased Global Alias Analysis

Memory is accessed only through explicit load and

The Basic Idea

eg store instructions which have the form load r

a

k reg and store reg k reg where k is a con

b a b

An alias analysis will in general asso ciate each register

stant and have the eect of reading from or writing

with a set of p ossible addresses at each program p oint

to the lo cation whose address is k contents of reg

b

so we need to abstract sets of addresses to descriptions

To mo del arithmetic we assume the instructions add

or abstract address sets These need to b e easy to

src src dest and mult src src dest where

1 2 1 2

compute and compactly representable with op erations

dest is a destination register and src and src are

1 2

such as union intersection checking containment etc

source registers to simplify the discussion we abuse

that are cheap enough to b e practical for the analysis of

notation and allow either src or src to b e an integer

1 2

large programs A simple way to satisfy these criteria

constant denoting an immediate op erand These in

is to consider only some xed numb ersay m of the

structions compute resp ectively the sum and pro duct

low order bits of an address That is addresses are rep

of src and src into dest many other op erations can

1 2

m

resented by their mo dk residues where k The

1

The assembly co de shown corresp onds to that obtained us

set of all mo dk residues is Z f k g An

k

O on a DEC Alpha workstation with some edits to ing gcc

abstract address set can then b e represented as a bit

the Alpha arguments to functions are enhance readability On

m

vector of length k since mand therefore k is

typically passed in registers and register is used as

the stack p ointer

xed set op erations such as union intersection check

Source Co de Executable Co de

int f

f add r r allocate stack frame

store r r save return address

int x x is at displacement in fs stack frame

y y is at displacement in fs stack frame

gy x add r r r y

add r r r x

bsr r g r return addr goto g

g

int gint x int y arg in r arg in r

f add r r allocate stack frame

store r r save return address

x store r

y store r

g

Figure A fragment of a C program and the corresp onding assembly co de

of none indicates that the corresp onding residue set ing containment etc can b e carried out in O bit

represents mo dk residues of absolute addresses while vector op erations This representation can cop e with

a value of any indicates that the address descriptor address arithmetic eg as illustrated in Figure since

denotes all p ossible addresses More formally supp ose such arithmetic translates in a straightforward way to

that we are given an op erational semantics for the in mo dk arithmetic see for example Finally

m

struction set under consideration such a semantics is mo d k x mo d k for the since x

conceptually simple if somewhat tedious to sp ecify for representation can distinguish b etween addresses in

the simple instruction set considered here we omit a volving distinct small displacements ie less than

m

formal sp ecication due to space constraints and rely from a base register

instead on the informal description of the instructions

It turns out that mo dk residues are not by them

given at the end of Section Given a program P

selves adequate for our purp oses The problem is that

and an instruction I in P let val I denote the set

P

in many cases we wont b e able to predict the actual

such that for some input to P there is of values w

value of a register r eg the stack p ointer at a pro

an execution path from the entry p oint of P to the

gram p oint which means we wont b e able to say any

instruction I that causes I to compute w into its des

thing ab out a displacement k from r ie the address

tination register val I if I do es not compute

P

corresp onding to k r either To deal with this prob

a value into a register or if control never reaches I

lem we extend abstract address sets to addr ess descrip

Extend this to the sp ecial values none and any as

tors which take an additional comp onent that refers

follows for any program P val none fg and

P

to an instruction

val any is the set of all values Then for an analy

P

sis using mo dk residues the set of addresses denoted

n An address descriptor is a pair hI M i Denitio

by an address descriptor A hI M i in P that is the

where I is either an instruction or one of the distin

concretization of A in the context of P is

guished values fnone any g and M is a set of mo dk

residues Giv en an address descriptor A hI M i the

conc hI M i

P

instruction of instruction I is said to b e the dening

w val I x M i g fw ik x j

P

A while M is called the residue set of A

As this indicates dierent values may b e computed by

The intuition is that given an address descriptor dierent executions of a particular instruction This

hI M i M denotes a set of mo dk residues relative to implies that for the purp oses of alias analysis it is

whatever value is computed by instruction I A value not enough to consider address descriptors in isolation

This issue is addressed in more detail in Section descriptor hnone fc mo d k gi in an analysis based on

mo dk residues Individual instructions are analyzed

The relative precision of dierent address descrip

as shown in Figure The reasoning b ehind these op

tors can b e characterized via the binary relation 

erations is as follows

Denition An address descriptor hI M i is

2 2

For load instructions our analysis currently

mor e precise than a descriptor hI M i written

1 1

do esnt keep track of the contents of memory lo

hI M i  hI M i if and only if i I any or

1 1 2 2 1

cations except for readonly sections of the text

M Z or ii M or iii I I and

3

1 k 2 1 2

Otherwise we can say noth and data segments

M M

1 2

ing ab out the contents of r after the load in

struction so the resulting address descriptor is

It is straightforward to show that  is reexive and

hI fgi

transitive ie a preorder It can b e extended to a

A store instruction do es not aect address de

partial order in the usual way dene the relation

scriptors since it do es not aect the contents of

as A A if and only if A  A and A  A it is

1 2 1 2 2 1

any register

easy to show that this is an equivalence relationand

consider the quotient of  with resp ect to The set

For an instruction add src src dest Fig

a b

of address descriptors forms a lattice with resp ect to

ure shows two cases The correctness of the

this partial order In the remainder of this discussion

rst case follows straightforwardly from the rules

we abuse notation and write  to refer to the result

for mo dk arithmetic the second case is

ing partial order In particular the equivalence class

obviously safe but merits some discussion if

containing hI Z i for all I as well as hany M i for all

k

A A or I I its easy to

a b a b

M denotes a total lack of information and is written

see that we cant say anything ab out the re

as the equivalence class containing hI i for all I

for some sult of the op eration if I I I

a b 0

denotes the empty set of addresses and is written as

I its tempting to think that the resulting ad

0

0

Our analysis asso ciates an address descriptor with each

dress descriptor could b e given as hI M i where

0

2

0

register at each program p oint of interest If a regis

M fx x mo d k j x M x M g

a b a a b b

0

ter r has an asso ciated address descriptor hI M i at a

but this is not the case since M do esnt account

program p oint we will sometimes abuse terminology

for the fact that the values b eing added have as

and refer to instruction I as the dening instruction

comp onents two p ossibly dierent values from

for r at that p oint

val I

P 0

Algorithm The Analysis

For an instruction mult src src dest Fig

a b

ure shows three cases The correctness of the

Eects of Individual Instructions

rst case follows easily from the rules for mo dk

arithmetic the second case can b e thought of as

As mentioned earlier the dening instruction comp o

widening A to hnone Z i which is obviously

b k

nent of an address descriptor allows us to refer to mo d

safe and then applying the rst case the rea

k residues relative to whatever value is computed by

soning for the third case is analogous to that for

the dening instruction When examining an instruc

the add instruction ab ove

tion I with destination register r if we cant say any

thing ab out the value of r after instruction I then

In typical RISC co de the most commonly encoun

instead of setting the address descriptor for r to we

tered address expression by far involves a xed dis

use I as the dening instruction for r and asso ciate

placement o a base register which corresp onds to the

the address descriptor hI fgi with r at the p oint im

add instruction discussed ab ove As such it is esp e

mediately after I To simplify the discussion we as

cially imp ortant that this case b e handled eciently

sume that an immediate op erand c yields an address

Supp ose that the instruction under consideration is

2

Strictly sp eaking the analysis should map each register at

add reg c reg It turns out that given an ad

a b

each program p oint to a set of address descriptors For prag

dress descriptor hI M i for reg with M represented

a

Section for detailswe use a widening matic reasonssee

op eration to ensure that at each program p oint each register

3

Our implementation uses the contents of these readonly sec

is mapp ed to a singleton set of address descriptors For sim

these include global variables as tions to obtain global addresses

plicity we do not distinguish b etween such a set and the single

well as addresses of jump tables and functions called indirectly

address descriptor it contains

through function p ointers

Input An instruction I

Output An address descriptor for the destination register of I

d case I of Metho

load r addr If addr corresp onds to a readonly memory lo cation with contents val then the address

descriptor for r is hnone fval mo d k gi otherwise it is hI fgi

store r addr this instruction do es not have any eect on any address descriptors

add src src dest Let the address descriptors for src and src immediately b efore instruction I b e

a b a b

A hI M i and A hI M i resp ectively There are two p ossibilities

a a a b b b

0 0

If A A and I none the situation where I none is symmetric let A hI M i

a b a b b

0

where M fx x mo d k j x M x M g The address descriptor for dest is hI fgi if

a b a a b b

0 0

A and is A otherwise

Otherwise we cant say anything ab out the result of this op eration so the address descriptor for

dest after I is taken to b e hI fgi

mult src src dest Let the address descriptors for src and src immediately b efore instruction I b e

a b a b

A hI M i and A hI M i resp ectively There are three p ossibilities

a a a b b b

If A A and b oth I and I are none let M fx x mo d k j x M x M g

a b a b c a b a a b b

0 0 0

and A hnone M i The address descriptor for dest is hI fgi if A and is A otherwise

c

Otherwise if A A and I none the case where I none is symmetric let

a b a b

0

x Z g and A hnone M i The address descriptor for dest M fx x mo d k j x X

a b k c c a b a

0 0

is hI fgi if A and is A otherwise

Otherwise we cant say much ab out the result of the multiplication so the address descriptor for

dest after instruction I is hI fgi

esac

Figure Analysis of individual instructions

0

as a bit vector the bit vector M in the descriptor each program p oint Given the handling of individual

0

hI M i for reg can b e obtained simply by rotating instructions as describ ed in the previous section the

b

up the bitvector for M by c bits and this is easy analysis is now a conceptually straightforward forward

to implement eciently As an example supp ose that dataow analysis where we compute the meetoverall

4

M f g in a mo d paths solution residue analysis and c with union as the meet op erator

0

then M f g mo d f g If we represent

It turns out that if each register at each program

these sets as bit vectors with the smallest element on

p oint is mapp ed to a set of address descriptors the

the right then X rotating up ie to the

memory requirements for the analysis can b ecome ex

left by bits gives us the vector which is

cessive for large programs This is due partly b ecause

0

precisely the bit vector for M

fully linked executables tend to b e considerably larger

than source language mo dules and partly b ecause rea

Address Descriptors Propagating

soning ab out address arithmetic is usually less precise

than say reasoning ab out aliasing at the source level

Conceptually if we consider all p ossible execution

As a pragmatic measure therefore a widening op era

paths through a program each register at each pro

4

Since our current implementation is not contextsensitive in

gram p oint will corresp ond to a set of values ab

w a meetover its treatment of interpro cedural information o

stracting from this one would exp ect an analysis to

allpaths solution suces a contextsensitive treatment would

map each register to a set of address descriptors at

a meetoverallvalidpaths solution have required B1 add r30,-272,r30 (1)

[18 instrs]

B2

[42 instrs] B7

B3 B4 [23 instrs]

[18 instrs] [104 instrs] B9 store ..., 80(r30) (4) B8 B5 add r21,32,r21 (3) load ..., 0(r21) (5) add r21,32,r21 (6) [2 instrs] [56 instrs]

[157 instrs] B6 add r30,136,r21 (2) B10 [8 instrs] [6 instrs]

B11

[10 instrs]

Figure Flowgraph for Example Program ijpeg function jpeg idct ifast

conclusion as that from the result of the widening op tion is used to ensure that at each program p oint

0 1

eration Conversely if I I I then whether or each register is mapp ed to a singleton set of address

b

a a

not r and r are p ossible aliases dep ends on whether descriptorsor equivalently a single address descrip

a b

0 1

or not M has a nonempty intersection with M M tor As mentioned in Section the set of address

b

a a

again this is the same as with the widening op eration descriptors forms a lattice with resp ect to the precision

is dened to b e ordering  The widening op eration

The resulting analysis is reasonably memory

simply the meet op eration with resp ect to  In eect

ecient for each basic blo ck we need two address de

what this do es is that if a program p oint B has two

scriptors p er register one for the in set at the entry to

predecessors B and B such that the address descrip

0 1

set at the exit Thus the blo ck and one for the out

tors for a register r at B and B are A hI M i

0 1 0 0 0

for a given choice of k the analysis requires RN k w

and A hI M i resp ectively where neither A nor

1 1 1 0

bits of memory for a program with N basic blo cks on

A are and I I then the address descriptor for

1 0 1

a machine with R registers where w is the numb er of

r at B is A A

0 1

5

bits p er machine word

While this widening results in less accurate infor

Reasoning ab out Alias Relationships

mation in some sensethis is reected in the exp eri

mental results on the precision of our analysis shown in

Given two address descriptors A hI M i and

1 1 1

Table it do esnt really change the alias relationships

A hI M i at two p oints in a program under what

2 2 2

that are determined To see this consider a basic blo ck

conditions can we conclude that they denitely do not

B with two precedessors B and B Supp ose that we

0 1

If I I we cannot say refer to the same address

1 2

have a register r whose address descriptors at the exit

a

much ab out any relationship that may hold b etween

0 0 1 1

from B and B are given by hI M i and hI M i re

0 1

a a a a

A and A and so have to assume that they may

1 2

sp ectively and we want to determine whether this is

refer to the same lo cation However it is not su

p ossibly aliased to a register r with address descriptor

b

cient to require that I I and M M since

1 2 1 2

hI M i at the entry to B If the dening instructions

b b

the value computed by a particular instruction may

from two address descriptors are dierent we cant say

b e dierent when that instruction is executed at dif

much ab out any relationship that may hold b etween

ferent times The following prop osition gives a simple

0 1

them This means that if I I it will necessarily

a a

sucient condition for determining that two address

b e the case that I will b e dierent from at least one

b

0 1 5

This can b e reduced to RN k w bits as in our implemen

of I and I leading us to conclude that we cannot

a a

tation by storing only out sets since the in set of a blo ck can

rule out aliasing b etween r and r this is the same

a b

b e computed fairly easily from the out sets of its predecessors

expressions denote disjoint sets of addresses rst carry out contextinsensitive interpro cedural con

stant propagation to identify references to global ad

dresses followed by the global alias analysis describ ed

Prop osition A ddress descriptors A hI M i

1 1

earlier The extended lo cal analysis pro ceeds as fol

and A hI M i at program point at program point p

1 2 2

lows two memory reference instructions i and i do

1 2

p denote disjoint sets of addresses if i I dominates

2

not conict if one of the following holds

b oth p and p ii either p dominates p or p dom

1 2 1 2 2

inates p and iii M M

1 1 2

one of the instructions uses a register known to

p oint to the stack and the other uses a register

Pro of Conditions i and ii ensure that b oth the

known to p oint to the global data area note that

program p oints p and p see the same value computed

1 2

b ecause of the constant propagation carried out

by instruction I Condition iii then ensures that

earlier in this case i and i need not b elong to

relative to this value the set of addresses referred to

1 2

the same basic blo ck or

at p is disjoint from that referred to at p 2

1 2

i and i which use address expressions k r

1 2 1 1

Example As an example of the application of this

and k r resp ectively are b oth in the same ba

2 2

analysis to a real program Figure shows the ow

sic blo ck B and there are two p ossibly empty

idct ifast which im graph of the function jpeg

chains of instructions whose eects are to com

plements a fast integer inverse discrete cosine trans

of r into register pute the value c contents

0 1

form from the SPEC b enchmark program ijpeg

r and c contents of r into r for some reg

1 2 0 2

To reduce clutter only a few relevant instructions are

ister r such that either b oth chains use the same

0

shown explicitly the numb er in brackets at the lower

denition of r in the blo ck B or neither use any

0

left hand corner of each basic blo ck indicates the total

denition of r in B and c k c k

0 1 1 2 2

numb er of instructions in that basic blo ck Register

r is the stack p ointer while r is used to walk

Exp erimental Results

through a lo cal array of structures with a stride of

bytes

We evaluated our analysis on the SPEC b ench

marks as well as some nonSPEC applications agrep

Using the current implementation of our analysis

a pattern matching utility appbt and appsp com

which uses mo d residues the address descriptor for

6

putational uid dynamics co des originally from NAS

register r immediately after instruction in blo ck

a simulation program to compute nb o dy barneshut

B is computed as h fgi where is the instruc

gravitational interactions latex a p opular do cu

tion in blo ck B that denes the value of r Each

a numerical ment formatting to ol and pseudoknot

iteration of the lo op BBBB increments r

b enchmark that nds the dimensional structure of

by so the address descriptor for r on entry to

a nucleic acid molecule The input programs were

blo ck B is h f gi however register r is not

compiled with the DEC C compiler V invoked

changed in the lo op so its address descriptor in B is

Wlr Wld Wlz non shared for as cc O

h fgi Since the requirements of Prop osition

the C programs and the DEC Fortran compiler ver

are trivially satised within blo ck B we can conclude

sion invoked as f O Wlr Wld Wlz

from this that the store instruction which assigns

non shared for the Fortran programs resulting in

to lo cation r refers to a dierent lo cation than

statically linked executables The measurements re

instruction which accesses lo cation r 2

p orted here were carried out after rst removing dead

and unreachable co de from these executables as well

as trivial loads no ops inserted for scheduling and

Alias Analysis in alto

alignment purp oses and redundant loads of the gp reg

Alto Another LinkTime Optimizer a prototyp e

ister using alto The timings were obtained on a

linktime optimizer we have implemented uses a

DEC Alpha workstation with a MHz Alpha

combination of an extended version of the lo cal analy

pro cessor with Mbytes of main memory running

sis describ ed in Section and the global analysis de

Digital Unix Table shows the precision of the

scrib ed in Section to reason ab out aliases in exe

analysis while Table shows its the time and space

cutable co de we conclude that a pair of memory ref

requirements

erences will not access overlapping sets of lo cations if

6

We used the sequential C versions available from

either analysis is able to determine that this is so We

ftpcswisceduwwtMiscNAS

Program Total One Few Total Known Unknown

applu

apsi

compress

fpppp

gcc

go

hydrod

ijp eg

li

mksim

mgrid

p erl

sucor

swim

tomcatv

turbd

vortex

wave

a SPEC b enchmarks

Program Total One Few Total Known Unknown

agrep

appbt

appsp

barneshut

latex

pseudoknot

b NonSPEC applicatio ns

Total Total no of loadstore instructions static counts Key

residue set has cardinality One No of loadstore instructions whose mo dk

Few No of loadstore instructions whose mo dk residue set has cardinality n n k

Total Known OneFew

Total Known Unknown Total

Table Precision of Analysis loadstore instructions

It can b e seen that in the programs tested the anal Precision

ysis is able to provide information for roughly

Traditionally the precision of alias analysis algorithms

of the memory reference instructions Prelimi

is often presented in terms of the average size of p oints

nary investigations indicate that much of the loss in

to sets or alias sets In our context however there

precision o ccurs due to three reasons First since we

are no p ointsto or alias sets a more meaningful mea

dont keep track of the contents of memory informa

sure p erhaps is the relative numb er of memory

tion ab out a register is lost if it is saved to memory and

referencesie load and store instructionsfor which

subsequently restored Second the widening op eration

the analysis is able to provide information that would

describ ed in Section which causes information to

not have b een available otherwise This information is

b e lost if a register can have dierent dening instruc

presented in Table The numb ers presented corre

tions at dierent predecessors of a join p oint in the

sp ond to mo dk residues with k this choice was

control ow graph The third reason which is related

determined in part by the fact that the set of mo dk

to the second is that since our analysis is context

residues for this choice of k corresp onds to a bit vector

insensitive at the interpro cedural level p ointer argu

that ts exactly in one bit machine word combined

ments to a pro cedure with multiple call sites will b e

with the lo cal analysis describ ed in Section

come widened to

Program Basic Blocks Instructions Analysis Time sec Memory used Mbytes

applu

apsi

compress

fpppp

gcc

go

hydrod

ijp eg

li

mksim

mgrid

p erl

sucor

swim

tomcatv

turbd

vortex

wave

a SPEC b enchmarks

Program Basic Blocks Instructions Analysis Time sec Memory used Mbytes

agrep

appbt

appsp

barneshut

latex

pseudoknot

b NonSPEC applicatio ns

Table Cost of Analysis

Because of the widening op eration describ ed in Sec Cost

tion the memory requirements of the analysis

Table gives the time and space costs of our analy

are linear in the numb er of basic blo cks in the input

sis Columns and give the size of each b enchmark

program we feel that this is essential if the analysis is

measured resp ectively in the total numb er of basic

to b e usable for large programs

blo cks and instructions in the program Column then

Utilit y

gives the total analysis time in seconds while column

gives the total memory requirements of the analy

At this p oint the only optimization for which we have

sis in Mbytes The analysis times range from ab out

had the time to evaluate the utility of the alias analysis

seconds to seconds with the gcc program an out

describ ed here involves reducing the numb er of load

lier with a total analysis time of a little over a minute

op erations executed by using scavenged registers to

These numb ers are somewhat higher than we would

eliminate some unnecessary load instructions moving

like but the reason for this is that every instruction

lo opinvariant load instructionstypically arising due

within a basic blo ck is examined whenever that basic

to inliningout of lo ops and via partial redundancy

blo ck is pro cessed As Figure indicates the time

elimination Preliminary results are shown in Table

taken to analyze a program in practice varies essen

which gives dynamic counts of the numb er of load

tially linearly as the numb er of instructions in the pro

instructions for some of our b enchmarks The column

gram The memory requirement of the analysis typi

alias gives the numb er of load op erations executed No

cally varies from ab out Mbytes to Mbytes with

in the absence of any alias analysis at all ie where

gcc having a high requirement of ab out Mbytes

any pair of references to memory were considered to 70

60

50

40

30

20 Analysis Time (secs)

10

0 50 100 150 200 250 300

Program Size (no. of instructions x 1000)

Figure Variation of analysis time with input size

6

Program Load Operations Executed Improvement

Noalias Inspect Alto Inspect Alto

appbt

appsp

barneshut

compress

fpppp

ijp eg

li

mksim

p erl

pseudoknot

wave

Table Utility of Analysis Deletion of unnecessary load instructions

p otentially access the same lo cations the Inspect ready b een carried out by the compiler leaving few

column gives the numb er of load op erations when we loads available for easy removal Global analysis gives

used simple insp ection as describ ed in Section for b etter results including of the total numb er of

intrablo ck load optimizations and the Al to column load instructions removed for the fpppp b enchmark

gives the numb er of load op erations executed when and for appbt The reason for the improvement

programs were optimized using our analyses to disam for fpppp is that it contains a very heavily executed

biguate memory references Since all other optimiza basic blo ck that is so large that the register pressure

tions such as deletion of deadunreachable co de inlin forces the compiler to spill a numb er of variables to

ing etc are carried out in the same way by all three memory alto is able to scavenge some registers at

versions considered with the only dierence arising link time and use them to retain some of the spilled

out of the way in which p otential conicts in memory variables in registers thereby allowing the spill co de to

accesses were identied Noalias forms a fair basis b e deleted The overall p ercentage improvements are

for comparisons The last two columns give the p er nevertheless relatively mo dest this is consistent with

centage reduction in the numb er of load op erations ob the results of Co op er and Lu To a great extent

alias tained using lo cal insp ection measured as No the reasons for this are twofold rst the compiler

InspectNoalias and global analysis measured has already done a go o d job of removing memory op

as Noalias AltoNoalias resp ectively erations via global register allo cation and second in

many cases a lack of free registers prevented us from

It can b e seen from Table that improvements

optimizing away load op erations that our alias analysis

due to purely lo cal alias analysis are small to nonex

had inferred as optimizable To some extent impreci

istent This do es not come as a surprise since at op

sion in our analysis arising from the sources discussed

timization level O global register allo cation has al

in Section also aected the numb er of memory

op erations deemed suitable for optimization Conclusions

Recent years have seen increasing interest in reasoning

Related Work

ab out and manipulating executable les Such manip

While a numb er of systems have b een describ ed for

ulations can b enet greatly from information ab out

linktime co de optimization

aliasing Unfortunately there is a fundamental mis

to the b est of our knowledge any alias analysis carried

match b etween the features present in executable pro

out by these systems is limited to fairly simple lo cal

grams and the features handled by existing p ointer

analyses

alias analyses such analyses are typically formulated

in terms of sourcelevel constructs and do not handle

There is an extensive b o dy of work on p ointer alias

features such as p ointer arithmetic and outofb ound

analysis of various kinds see for example

array references whereas these are precisely the fea

tures encountered in executable programs This pa

The work most closely related to ours is that of

p er describ es a simple algorithm that can handle these

Wilson and Lam who describ e a lowlevel p ointer

features and which can b e used for alias analysis of

alias analysis for C programs Their work attempts

executable programs In order to b e practical the al

to deal with nasty features of real programs and

gorithm is careful to keep its memory requirements

can handle simple p ointer increments and decrements

low sacricing precision where necessary to achieve

but is unable to cop e with the more complex address

this goal Exp erimental results indicate that it is nev

arithmetic common in executable co de see Example

ertheless able to provide nontrivial information ab out

Also it restricts itself to C language features

roughly of the memory references across a

and so cannot handle arithmetic arising from idiosyn

variety of b enchmark programs

cracies of other languages eg manipulation of p oint

ers with tag bits that may b e encountered in exe

Acknowledgemen ts

cutable co de Their algorithm is contextsensitive at

the interpro cedural level however while our current

Comments from the anonymous reviewers help ed im

implementation is contextinsensitive conceptually it

prove the pap er signicantly

would not b e to o dicult to obtain a contextsensitive

References

version of our algorithm but we have not had time

to implement this yet The remaining analyses cited

A V Aho R Sethi and J D Ullman Compil

are all high level analyses that typically disregard typ e

ers Principles Techniques and Tools Addison

casts p ointer arithmetic outofb ounds array accesses

Wesley

etc As argued earlier such analyses are of limited util

ity at the machine co de level

J E Barnes and P Hut A Hierarchical

O N log N Force Calculation Algorithm Na

Also related is the work on dep endence analysis

ture

in the scientic computing literature see for exam

ple While the goals of this work are con

D R Chase M Wegman and F K Zadeck

ceptually similar to oursnamely disambiguating ar

Analysis of Pointers and Structures Proc SIG

ray references whose indices can involve arithmetic

PLAN Conference on

expressionsthe algorithms used for dep endence anal

and Implementation June pp Design

ysis are very dierent from that describ ed here Since

dep endence analysis is typically formulated as a source

JD Choi M Burke and P Carini Ecient

level intrapro cedural analysis the analysis problems

FlowSensitive Interpro cedural Computation of

tend to b e relatively small in size Because of this de

PointerInduced Aliases and Side Eects Proc

p endence analyses are able to use relatively more so

th ACM Symposium on Principles of Program

phisticated but also more exp ensive algorithms than

ming Languages Jan pp

ours We do not know of any attempts to apply such

algorithms for wholeprogram analysis and it is not

R Cohn D Go o dwin P G Lowney and N

obvious to us that the algorithms involved would scale

Rubin Spike An Optimizer for AlphaNT Ex

up to problems of this size

ecutables Proc USENIX Windows NT Work

shop Aug

K D Co op er and K Kennedy Fast Interpro D W Go o dwin Interpro cedural Dataow Anal

cedural Alias Analysis Proc th ACM Sym ysis in an Executable Optimizer Proc SIG

posium on Principles of Programming Languages PLAN Conference on Programming Language

Jan pp Design and Implementation June pp

K D Co op er and J Lu Register Promotion in

C Programs Proc SIGPLAN Conference on R L Graham D E Knuth and O Patashnik

Programming Language Design and Implementa Concrete Mathematics AddisonWesley

tion June pp

S Horwitz P Pfeier and T Reps Dep endence

P Cousot and R Cousot Abstract Interpreta Analysis for Pointer Variables Proc SIGPLAN

tion A Unied Lattice Mo del for Static Analysis Conference on Programming Language Design

of Programs by Construction or App oroximation and Implementation June pp

of Fixp oints Proc F ourth ACM Symposium on

J Hummel L J Hendren and A Nicolau

Principles of Programming Languages pp

A General Data Dep endence Test for Dynamic

PointerBased Data Structures Proc SIGPLAN

Conference on Programming Language Design D Coutant Retargetable HighLevel Alias Anal

ACM Symposium on Principles and Implementation June pp ysis Proc th

of Programming Languages Jan pp

M S Johnson and T C Miller Eectiveness of

a MachineLevel Global Optimizer Proc SIG

K De Bosschere and S K Debray alto A PLAN Symposium on Compiler Construction

LinkTime Optimizer for the DEC Alpha Tech June pp

nical Rep ort Dept of Computer Science

N D Jones and S S Muchnick Flow analysis

The University of Arizona June

and optimization of LISPlike structures in Pro

A Deutsch On determining lifetime and alias gram Flow Analysis eds S S Muchnick and N

ing of dynamically allo cated data in higherorder D Jones Prentice Hall pp

functional sp ecications Proc th ACm Sym

N D Jones and S S Muchnick A exible ap

posium on Principles of Programming Languages

proach to interpro cedural data ow analysis and

Jan pp

programs with recursive data structures Proc

A Deutsch Interpro cedural MayAlias Analysis th ACM Symposium on Principles of Program

for Pointers Beyond klimiting Proc SIGPLAN ming Languages Jan pp

Conference on Programming Language Design

W Landi and B G Ryder Pointerinduced

and Implementation June pp

Aliasing A Problem Classication Proc th

ACM Symposium on Principles of Programming A Diwan K S McKinley and J E B

Languages Jan pp Moss Typ eBased Alias Analysis Manuscript

Dept of Computer Science University of Mas

W Landi and B G Ryder A Safe Approximate

sachusetts Amherst

Algorithm for Interpro cedural Pointer Aliasing

Proc SIGPLAN Conference on Programming M Emami R Ghiya and L J Hendren

Language Design and Implementation June ContextSensitive Interpro cedural Pointsto

pp Analysis in the Presence of Function Pointers

Proc SIGPLAN Conference on Programming

J R Larus and E Schnarr EEL Machine

Language Design and Implementation June

indep endent Executable Editing Proc SIG

pp

PLAN Conference on Programming Language

Design and Implementation June pp M F Fern andez Simple and Eective Link

Time Optimization of Mo dula Programs Proc

SIGPLAN Conference on Programming Lan

J R Larus and P N Hilnger Detecting Con

guage Design and Implementation June

icts Between Structure Accesses Proc SIG

pp

PLAN Conference on Programming Language

Design and Implementation June pp H Zima and B Chapman Supercompilers for

ector Computers ACM Press New Paral lel and V

Y ork

T Romer G Vo elker D Lee A Wolman

W Wong H Levy B N Bershad and J

B Chen Instrumentation and Optimization of

WinIntel Executables USENIX Win

dows NT Workshop to app ear

E Ruf ContextInsensitive Alias Analysis Re

considered Proc SIGPLAN Conference on

Programming Language Design and Implementa

tion June pp

M Shapiro and S Horwitz Fast and Accurate

FlowInsensitive PointsTo Analysis Proc th

ACM Symposium on Principles of Programming

Languages Jan pp

A Srivastava and D W Wall A Practical

System for Intermo dule Co de Optimization at

of Programming Languages LinkTime Journal

pp March

A Srivastava and D W Wall Linktime Opti

mization of Address Calculation on a bit Ar

chitecture Proc SIGPLAN Conference Pro

gramming Language Design and Implementation

June pp

B Steensgaard Pointsto Analysis in Almost

Linear Time Proc th ACM Symposium on

Principles of Programming Languages Jan

pp

D W Wall Global Register Allo cation at Link

Time Proc SIGPLAN Symposium on Com

piler Construction July pp

W E Weihl Interpro cedural data ow analysis

in the presence of p ointers pro cedure variables

and lab el variables Proc ACM Symposium on

Principles of Programming Languages Jan

pp

R P Wilson and M S Lam Ecient Context

Sensitive Pointer Analysis for C Programs Proc

SIGPLAN Conference on Programming Lan

guage Design and Implementation June

pp

M Wolfe Optimizing Supercompilers for Super

computers MIT Press Cambridge Mass

S Wu and U Manb er Agrep A Fast Ap

Win proximate PatternMatching To ol Usenix

ter Technical Conference San Francisco

Jan pp