<<

Relo cating Machine Instructions by Currying

Norman Ramsey

University of Virginia

Relo cation adjusts machine instructions to account for changes in the lo cations of the instructions

themselves or of external symb ols to which they refer Standard linkers implement a nite of

relo cation transformations suitable for a single architecture These transformations are enumer

ated named and engraved in a machinedep endent ob jectle format and linkers must recognize

them by name These names and their asso ciated transformations are an unnecessary source of

machinedep endence

An alternative is to use SLED Sp ecication Language for Enco ding and Deco ding to sp ecify

representations of machine instructions Instructions are describ ed by constructors which denote

functions mapping lists of op erands to instructions binary representations Any op erand can b e

designated as relo catable meaning that the op erands value need not b e known at the time the

instruction is enco ded From a SLED sp ecication the New Jersey MachineCo de To olkit can

generate functions that enco de instructions in the native binary representation For instructions

with relo catable op erands the to olkit also computes relo cating transformations To ol writers

can create machineindep endent that uses these transformations to relo cate machine

instructions For example mld a retargetable built with the to olkit needs only lines of

C co de for relo cation and that co de is machineindep endent

The to olkit discovers relo cating transformations by currying enco ding functions An attempt to

enco de an instruction with a relo catable op erand results in the creation of a closure The closure

can b e applied when the values of the relo catable op erands b ecome known Currying provides a

general machineindep endent metho of relo cation

Currying rewrites a term into two nested terms The standard implementation has the

rst allo cate a closure and store therein its op erands and a p ointer to the second Using this

strategy in the to olkit means that when it builds an application the to olkit generates co de for

many dierent inner termsone for each instruction that uses a relo catable address Hoisting

some of the computation out of the second into the rst makes many of the second s identicala

handful are enough for a whole instruction set This optimization reduces the size of machine

dep endent assembly and linking co de by for the Alpha MIPS SPARC and PowerPC and

by ab out for the Pentium It also makes the second s equivalent to relo cating transformations

named in standard ob jectle formats

Categories and Subject Descriptors D Op erating Systems Systems Programs and Utili

tiesLinkers D Programming Techniques Applicative Functional Programming D

Programming Languages Language Classicationsspecialized application languages

General Terms Relo cation Linking Currying

Additional Key Words and Phrases higherorder functions

INTRODUCTION

Compiling whole programs is slow compiling units separately and linking the com

piled units into a program sp eeds up the editcompilego cycle For separate compi

lation a must b e able to emit instructions and without knowing the

exact lo cations either of the instructions and data the compiler itself emits or of

Authors current address Norman Ramsey Department of Science University of Vir

ginia Charlottesville VA Email nrcsvirginiaedu This work supp orted in part by

NSF grant numb er ASC

 N Ramsey

the instructions and data emitted by other compilations Using

makes this task easy b ecause in assembly language all lo cations are represented

symb olically Symb olic assemblylike units can b e linked to form programs Fraser

and Hanson Jones but the linker or must translate all units from

symb olic form into the binary representation required by the target hardware It

is b elieved to b e more ecient to translate each unit separately into a binary form

called relocatable

Ob ject co de must contain more than just instructions and data To supp ort

delayed binding of lo cations it must also represent

The symb ols dened in the ob ject le and the lo cations to which they are

b ound

The symb ols imp orted from other units ie external symb ols

The transformations that must b e applied to the instructions and data to

account for their eventual placement at absolute addresses and also for the

placements of the external symb ols on which they dep end

Applying these transformations is called

Current ob jectco de formats force to ol writers to handle relo cation in a machine

dep endent way Given an instructionset architecture a human b eing examines the

instructions and determines which op erands can b e relo catable addresses and what

relo cating transformations are needed Each transformation is named and linkers

and other to ols must recognize transformations by name The names are informal

and machinedep endent so retargetable to ols that manipulate ob ject co de must

recognize each set of names on each machine

This pap er makes several contributions It presents a machineindep endent au

tomatic metho d of discovering relo cating transformations It presents an opti

mization that makes the cost of the automatic metho d comparable to the cost of

handimplemented metho ds and makes the discovered transformations equivalent

to the transformations used in standard ob jectle formats Finally the pap er gives

a machineindep endent representation of the transformations

This new technique for relo cating machine instructions is an enabling technology

for building machineindep endent to ols for static incremental and dynamic linking

It will also simplify the construction of retargetable to ols that transform ob ject

co de Ob jectco de transformation which is growing in imp ortance is used for

proling and tracing Ball and Larus testing Hastings and Joyce

enforcing protection Wahb e et al optimization Srivastava and Wall

and binary translation Sites et al There are even frameworks for creating

applications that transform ob ject co de Johnson Larus and Schnarr

Srivastava and Eustace

The techniques presented here build on the New Jersey MachineCo de To olkit

Ramsey and Fernandez which reads a compact machine description and

generates functions that enco de instructions The machine description is written in

SLED Sp ecication Language for Enco ding and Deco ding which relates three rep

resentations of instructions a symb olic representation akin to assembly language

assembly language itself and the binary representation used by the hardware Real

instruction sets can b e sp ecied with mo dest eort our Alpha MIPS SPARC

and Pentium sp ecications Ramsey and Fernandez are and

Relo cating Machine Instructions by Currying 

lines In SLED an instruction is represented symb olically by its name and a

list of its op erands the collection of all instructions resembles an algebraic data

typ e The machine description indicates which op erands are relo catable addresses

and currying the enco ding with resp ect to those op erands results in a

relo cating transformation

Currying rewrites the enco ding function into two nested terms In the standard

implementation the outer allo cates a closure and stores therein its op erands and

a p ointer to the inner which uses the contents of the closure to enco de relo cate

the instruction The inner s are the relo cating transformations discovered by the

to olkit and the closures take the place of relo cation entries in traditional ob ject

les

Using the standard implementation of currying the to olkit generates co de for

many dierent inner termsone for each instruction that uses a relo catable ad

dress Hoisting some of the computation out of the inner into the outer makes

many of the inner s identicala handful are enough for a whole instruction set

This optimization is closely related to fully lazy lamb dalifting Peyton Jones

It reduces the size of machinedep endent assembly and linking co de by for

the Alpha MIPS SPARC and PowerPC and by ab out for the Pentium It

also makes the relo cating transformations discovered by the to olkit equivalent to

those that are now implemented by hand To supp ort machineindep endent use

of these transformations the to olkit asso ciates each one with a string that can b e

interpreted to have the eect of applying the transformation These strings can b e

used in an ob ject le as meaningful formal machineindep endent names

DESCRIBING INSTRUCTION REPRESENTATIONS

SLED describ es the binary representation of an instruction as a sequence of tokens

On a RISC machine each instruction is a single bit token On a machine like

the Pentium formats vary for example the instruction add DX has an

bit op co de token followed by another bit token that has b oth op co de and

addressmo de bits followed by the bit displacement and the bit immediate

op erand

Each token in an instruction is partitioned into elds a eld is a contiguous

range of bits within a token On RISC machines dierent instruction formats are

represented by dierent partitions of the instruction token Fields contain op co des

op erands mo des or other information Op co des and op erands can b e distributed

among multiple elds

Patterns constrain the values of elds they may constrain elds in a single token

or in a sequence of tokens They can b e used to describ e binary representations of

op co des of whole instructions and of groups of instructions

Constructors connect the symb olic and binary representations of instructions At

a symb olic level an instruction is an op co de the constructor applied to a list of

op erands The result of the application is a sequence of tokens which is describ ed

by a pattern For each constructor the to olkit derives an enco ding function that

emits the constructors binary representation We get relo cating transformations by

currying the enco ding functions The enco ding functions generated from a machine

description form part of an applicationprogram API to an assembler

 N Ramsey

for that machine The to olkit includes a library of other functions which complete

the API

Tokens and elds

A machine description includes the names sizes and p ositions of the elds used to

form tokens This information can b e found in architecture manuals For example

the MIPS manual Kane p A uses a picture to sp ecify elds

31 26 25 21 20 16 15 0

op rs rt immediate

31 26 25 0

op target

31 26 25 21 20 16 15 11 10 6 5 0

op rs rt rd shamt funct

The picture can b e formalized in SLED as follows

fields of instruction

op rs rt rd shamt funct

target immed offset base cond

breakcode ft fs fd format

This declaration denes not only the elds used in the formats pictured ab ove but

also offset cond and other synonyms that app ear in the MIPS manual

Patterns

Patterns constrain b oth the division of streams into tokens and the values of the

elds in those tokens They are comp osed from constraints on elds A constraint

xes the range of values a eld may have The typical range has a single element

eg op Patterns may b e comp osed by conjunction concatenation

or disjunction Conjoining patterns constrains elds within a single token

concatenating them constrains a sequence of tokens This pap er uses patterns that

constrain all the bits in a sequence of tokens such patterns are equivalent to binary

representations

Constructors

A constructor connects the symb olic and binary representations of an instruction

by mapping a list of op erands to a pattern The lefthand side of a constructor

sp ecication gives the instructions name op erands and assemblylanguage syn

tax The righthand side contains a pattern that describ es the instructions binary

representation That pattern may contain free identiers which refer to the con

structors op erands For example the following constructor describ es the MIPS

add instruction

constructors

add rd rs rt is op funct rd rs rt

where on the righthand side rd is an abbreviation for the pattern constraining

the eld rd to b e equal to the rst op erand since the rst op erand is named rd

The same rule applies to the uses of rs and rt on the righthand side

Some instructions have op erands that cannot b e used directly as eld values The

most common are PCrelative branches in which the op erand is the target address

Relo cating Machine Instructions by Currying 

but the corresp onding eld contains the dierence b etween the target address and

the program counter Constructor sp ecications may include equations that express

relationships b etween op erands and elds the equations app ear in braces after the

op erands For example the sp ecications for the MIPS bne and bltzal instructions

are

constructors

bltzal rs addr addr L offset is

op cond rs offset L epsilon

bne rs rt addr addr L offset is

op rs rt offset L epsilon

epsilon is the pattern sp ecifying the empty sequence of tokens Here it serves only

as an anchor for the lab el L which is b ound to the lo cation of the instruction fol

lowing the branch The exclamation p oint in offset is a signextension op erator

The equation in braces sp ecies the relationship b etween the target address addr

and the offset used in the instructions binary representation

A branch target address is computed from the sum of the address of the

instruction in the delay slot and the bit offset signextended and

multiplied by Kane p A

The to olkit solves this equation to compute offset as a function of addr and the

program counter The equation has a solution only when the target address and

the program counter dier by a multiple of and when the computed offset ts

in bits and the generated enco ding function checks these conditions

INSTRUCTION ENCODING AND RELOCATION

The to olkit uses the eld lo cations and constraints to gure out the bit manip

ulations needed to enco de an instruction If addr and the program counter are

known at the time a bltzal for example is enco ded we can emit the binary

1

representation directly by using the following function

16

rs addr emit j j addr PC j rs

where I use notation for bit manipulation enco des the op constraint

enco des the cond constraint and rs puts the op erand rs

into the instruction The remaining disjunct expresses the computation needed to

compute the value of the offset eld given a target address addr and a program

counter PC The computation includes arithmetic a shift and a narrowing to

bits PC represents the address at which the instruction is to b e lo cated

If PC and addr are unknown we cant emit the instruction we must create

relo cation information instead A typical compiler or assembler emits the instruc

tion with the displacement bits set to zero along with a relo cation entry that

tells the linker how to adjust the displacement bits when the relevant lo cations

b ecome known The relo cation entry names the instruction the address on which

it dep ends and the transformation needed to adjust the displacement bits

1

To simplify the presentation I have omitted such details as the check that the target address

and program counter dier by a multiple of

 N Ramsey

The to olkit can discover relo cating transformations from SLEDs description of an

instruction The description tells how an instructions op erands determine its nal

binary representation after relo cation When called up on to emit an instruction

referring to an unknown lo cation an assembler must delay enco ding emit a partial

instruction and record a relo cating transformation that can b e used to compute the

nal instruction once the lo cation is known This pro cedure amounts to currying

the enco ding function We must know which op erands are relo catable addresses

since theirs are the values that may not b e known when an ob ject le is created

The MIPS sp ecication contains the directive relocatable addr which sp ecies

that all op erands named addr are relo catable addresses

Currying the bltzal enco ding function yields

16

rs addr emit j j addr PC j rs

When applied to a particular rs this enco ding function returns a closure containing

rs and the inner term To generate C or Mo dula co de it helps to convert to

an explicit closurepassing style App el Chapter Converted functions

ie function values are represented by closures A closure is a record containing a

term which represents the functions algorithmic content and the values of the

functions free variables In the term the functions free variables are replaced

by references to the closure These references take the form Ri which denotes

the ith element numb ered from of closure record R The closure b ecomes an

explicit argument to the term so after the transformation the term has no free

variables

In relo cation the inner terms describ e relo cating transformations When an

enco ding function is curried dierent applications of the outer function create dif

ferent closures These closures share a term but they dier in the other contents

of the closure recordthe values of the free variables For example every bltzal

closure is a pair of the form

R R addr emit j j

bltzal

16

addr PC j R

rs

but dierent closures may have dierent values of rs

Closure conversion also changes the way functions are invoked In the original

form we could invoke a relo cation closure R addr by simple function ap

plication R addr After closure conversion we must fetch the term out of the

closure and pass the closure as an extra argument If the closureconverted version

is R we invoke it by R R addr

bltzal bltzal bltzal

The implementation of closure conversion is straightforward We add a closure

argument to each function We discover the free variables in the b o dy of the function

and put each in the closure and we replace each o ccurrence of a free variable in

the b o dy with co de to get its value from the closure

OPTIMIZING RELOCATION CLOSURES

In the scheme outlined ab ove each relo catable instruction needs its own closure

function Compiling these functions takes time and they take up space in an appli

cation or an ob ject le We can reduce the numb er of closure functions by moving

Relo cating Machine Instructions by Currying 

computation from the inner to the outer I call this movement hoisting by anal

ogy with the CPS transformation that moves variable denitions from one scop e to

another App el Chapter It simplies the inner s creating opp ortunities

for them to b e shared Hoisting is very closely related to fully lazy lamb dalifting

Peyton Jones Chapter and the analysis required to implement it is rem

iniscent of the bindingtime analyses used in partial evaluation Jones et al

Unlike these other techniques hoisting is not intended to programs run faster

Hoisting might result in marginally faster linking but its purp ose is to reduce the

numb er of dierent terms needed to implement relo cation

Hoisting is implemented by a variation on closure conversion Op erands of

outer s are available b efore those of inner s If we think in terms of binding times

b ound arguments are always late ie not available to compute with until the

function is applied Free variables are always early ie available to compute

with when the closure is created In ordinary closure conversion free variables are

replaced with references to the closure To p erform the hoisting transformation

we want to replace not only free variables but also terms that dep end only on free

variables Such terms are called free expressions in Peyton Jones and a free

expression that is not a prop er sub expression of another free expression is said to

b e maximal Fully lazy lamb dalifting rewrites terms to make the maximal free

expressions additional op erands hoisting moves them into the closure For exam

ple to convert the function ca b c we hoist a b creating a closure of the

form R cR c a b

We can implement closureconversion with hoisting by rewriting a functions

abstract syntax tree in a b ottomup walk

Leaf no des are free expressions unless they are variables b ound by the

innermost enclosing abstraction

Internal no des are free expressions if and only if all their children are free

expressions To simplify the computation replace each free internal no de

e f e e e with a fresh variable v which is free by denition To

1 2 n

rememb er what v stands for create the substitution v f e e e

1 2 n

Comp ose these substitutions during the tree walk

When we reach a abstraction all free expressions have b een replaced with vari

ables and since no variable can b e a prop er sub expression of another the free

variables represent the maximal free expressions of the original term We could

recover the original b o dy of the term by applying the substitution to it but

instead we closureconvert the rewritten form then the substitution to the

closure Thus in the example given ab ove

We b egin with ca b c

We rewrite it to cv c with substitution v a b

By ordinary closure conversion we get R R cR c v

We apply to the closure pro ducing R R cR c a b We can

save computation by applying only to the variables in the closure applying

it to the term has no eect since after closure conversion the term has no

free variables

 N Ramsey

To get b etter results with closures for machine instructions we rewrite expres

sions involving asso ciative and commutative op erators to bring free expressions

together This rewriting step can reduce the numb er of maximal free expressions

resulting in simpler s and smaller closures The relevant op erators include in

teger addition assuming that it do es not overow and bitwise or Briggs and

Co op er use an equivalent technique to improve the eectiveness of partial

redundancy elimination in a traditional optimizing compiler the rank they assign

to each variable corresp onds to the numb er of s b etween a free o ccurrence of a

variable and its binding instance

Rearranging asso ciative and commutative op erators gives the following relo cation

2

closure for bltzal

16

R R addr emitR j addr PC R

bltzal

j j r s

The new term can b e shared with other relativebranch instructions since all

information ab out the op co de and ab out the register argument rs has b een hoisted

out of the term and into the closure

Hoisting moves integer literals like in this example into closures Such

literals take up space and we can improve the closures by using a heuristic if a

value to b e stored in the closure is an integer literal push it back into the term

instead of storing it in the closure We dont push other constant expressions into

the term The heuristic works b ecause integer literals tend to arise from address

computations which are typically the same across instructions but other constant

expressions often come from op co des which are dierent for every instruction To

preserve the distinction we delay constant folding until after hoisting

Applying the heuristic to the bltzal instruction yields a smaller closure

16

R R addr emitR j addr PC

bltzal

j j r s

The literal has moved back into the term and the closure record is back down

to elements

In the examples given so far each instruction is represented by a single token If

an instruction has op erands x and y and if its binary representation is computed

by the function f x y we can characterize the hoisting transformation as moving

part of this computation outside the abstraction

00 0 0

xy emitf x y xhy emitf f x y f xi

00 0

where the angle brackets h i stand for closure creation and f f x y f x y

To o keep the inner simple I have inlined references to the closure record

When an instruction is represented by a sequence of tokens as is common on

CISC machines there is an opp ortunity for further improvement we can move the

2 16

The astute reader may wonder why the literal used in masking is not moved to the closure

k

The to olkits intermediate form restricts masking op erations to constants of the form and

16

the constant k is attached directly to the op erator so is not a free expression in the

sense dened ab ove For similar reasons the in    is not moved to the closure

Relo cating Machine Instructions by Currying 

sequence op erator itself outside the abstraction

xy seq femitf x y emitf x y g

1 n

00 0 0 00 0 0

xseq fhy emitf f x y f xi hy emitf f x y f xig

1 1 1 n n n

In other words instead of creating one closure to relo cate all the tokens in the

sequence we can create a separate closure to relo cate each token Although it

may create more closures this improvement yields smaller closures b ecause each

closure holds information ab out only one token and it yields fewer unique closure

functions b ecause it creates opp ortunities for sharing closure functions b etween

dierent instructions The improvement is esp ecially useful on the Pentium where

normally only one token in a sequence dep ends on the relo catable address and the

others can b e emitted immediately requiring no further relo cation Formally when

f x y dep ends only on x we rewrite

00 0 0

hy emitf f x y f xi emitf x

i i i

which emits the token and creates no closure The measurements for hoisting in

Section incorp orate this improvement

RELOCATION CLOSURES IN C

Creating ecient C co de to p erform relo cation by currying requires some rene

ments There is no need to put global variables in any closure b ecause globals are

accessible to all functions Therefore there is no need to convert toplevel func

tions to closurepassing style b ecause all their free variables are globals This is

just as well since C exp ect functions in an API to b e implemented

in standard C style not in closurepassing style

The enco ding functions and relo cation closures generated by the to olkit treat re

lo catable addresses as values of an abstract data typ e with two op erations known

3

and force Force takes a relo catable address and pro duces an integer absolute

address Known tells whether force can b e applied The relo catable address is

supplied when the instruction is enco ded what may not yet b e available is the

actual lo cation denoted by the address We have to keep track of the address so

we can force it to a lo cation at relo cation time and the easiest way is to store it in

the closure

Ordinary enco ding functions which create no relo cation information emit co de

at a current lo cation which is part of the global state of the to olkits enco ding

library Relo cation closures should not emit instructions at the current lo cation

but at the lo cation of the original enco ding attempt This lo cation to o is stored

in the closure and instead of emit which emits a token at the current lo cation

at which emits a token at a lo cation given explicitly we use emit

The program counter PC gets sp ecial treatment It is another name for the

lo cation of the original enco ding attempt and we have to save this lo cation so we

know where to put the relo cated instruction If we handled PC as we handle other

3

The to olkits library of machineindep endent assembly and linking co de represents a relo catable

address as a lab el plus a constant oset This representation is adequate for almost all

applications Szymanski but application writers could substitute another representation

 N Ramsey

htype of closure i

typedef struct Oclosure

ClosureHeader h contains lambdaterm etc

ClosureLocation loc

struct RAddr a unsigned u v

OClosure

hrelocating transformation i

static void clofunOClosure c Emitter emitat

emitatcloc

cvu locationcva pclocationcloc xffff

hclosure creation i

OClosure c OClosure mallocsizeof c

static struct closureheader h clofun

ch h

hinitialize cloc with current PC i

cva addr

cvu rs

hsave closure c for future use i

Fig Representing closures in C

variables we would store it in the closure but since it is already in a sp ecial part

of the closure we rewrite references to PC to refer to that lo cation

Applying these renements to the MIPS bltzal instruction pro duces an enco ding

function that can b e represented as follows

rs addr hRemit atR

16

R j force R force R

PC

addr

j j r s i

The real enco ding function is still more complicated since it emits the instruction

directly when addr and PC are known and it also checks the multipleof and

tsinbits conditions

The closureconverted form is easily represented in C as shown in Figure As

with the other examples Figure omits all checking co de as well as such details

as converting p ointer typ es and recording the size of the closure The C co de binds

emitat as late as p ossible the enables dierent implementations in

dierent applications The nal argument to emitat is the size of the token b eing

emitted that size has b een omitted from the other examples in this pap er

The closure shown in Figure has the same information as a relo cation entry

used in standard ob jectco de formats like COFF Gircys and ELF Prentice

Hall a For example a COFF relo cation entry contains an rvaddr that cor

Relo cating Machine Instructions by Currying 

resp onds to the loc eld b oth store the lo cation of the instruction to b e relo cated

It contains an rsymndx eld that corresp onds to the va eld b oth store the

relo catable address on which the relo cation dep ends Finally it contains an rtype

eld that corresp onds to the h eld b oth identify the relo cating transformation

ELF relo cation entries are similar except ELF combines rsymndx and rtype into

a single word Relo cation entries in standard formats have nothing corresp onding

to the vu eld of the closure shown in Figure instead they store that infor

mation in the space to b e o ccupied by the instruction after relo cation The to olkit

could use this spacesaving trick which would reduce the largest closure num

b ers in Table I but for the time b eing it seems more interesting to make relo cation

closures idemp otent Idemp otent closures should b e useful in to ols that relo cate

instructions rep eatedly like incremental linkers

A nal renement is needed to write relo cation closures to disk In memory

the relo cating transformation is represented as a function p ointer which is neither

machineindep endent nor meaningful when written to disk Instead we describ e

relo cating transformations using a subset of PostScript Ramsey extended

with sp ecial op erators to get addresses and values out of closures The machine

indep endent representation of the transformation in the bltzal closure again omit

ting the tests of conditions is

cla force add clloc force sub

bitshift narrows clv orb

clloc force emitat

The rst line takes the relo catable address from the closure subtracts and sub

tracts the lo cation of the instruction b eing relo cated computing addr PC

The second line shifts right bits narrows to bits and combines the result

with the rest of the instruction as stored in the closure The third line stores the

instruction which is bytes wide at the prop er lo cation

The to olkit generates a table that asso ciates the function p ointers used in clo

sures with machineindep endent strings like the one shown ab ove A machine

indep endent ob ject le might include one one copy of each transformation To

minimize the space required to store these transformations the to olkit can enco de

the transformation in a sp ecialized but unreadable byteco de It packs the full trans

formation in the bltzal closure including the tests of conditions into the byte

string given by the C literal

rxvnxnxvxecxxxxM

The upp er half of Table I in the section shows how much space is needed to

hold the byteco des for all the transformations on each of ve target machines

EXPERIMENTAL RESULTS

I have implemented currying and hoisting in the New Jersey MachineCo de To olkit

Ramsey and Fernandez mld Fernandez a retargetable optimizing

linker uses enco ding functions and relo cating transformations generated by the

to olkit mld needs only lines of C co de for relo cation and it uses the same

co de on all platforms the co de keeps a list of relo cation closures and applies them

when the addresses on which they dep end b ecome known Other applications that

 N Ramsey Hosts Ratio Ratio I Ratio RS SP Ratio MIPS Ratio Alpha Largest de Byteco Closure catable Relo Instructions bject ob C bject ob bject ob Sizes a All Instruction closure functions bject ob bug bject ob size routines ode co insts in in ode co ode co ode co K gcc ode co are coun w ere Those ode co ts compiled K K K K K Plain giv sizes routines Alpha e the needed T K K K K K Hoist b sizes 0.86 0.84 0.88 0.87 0.89 able y w gcc ere to I of except compiled the node enco K K K K K Plain Space instruction MIPS the and sa with K K K K K Hoist P 0.74 0.77 0.79 0.78 0.78 vings en eocate relo tium lcc sets from F K K K K K noding enco Plain raser all hoisting T SP instructions argets AR and routines K K K K C Hoist K Hanson 0.74 0.75 0.75 0.72 0.76 optimization on K K K K Plain the K PPC Alpha K K K Hoist K K host 0.79 0.73 0.77 0.76 0.77 whic K K K K K h Plain triggered P en tium K K K K K Hoist 0.59 0.58 0.59 0.61 0.60

Relo cating Machine Instructions by Currying 

might use the generated enco ding and relo cating co de include assemblers linkers

wholeprogram optimizers and ob jectco de transformers

One can imagine several measures of the p erformance of a relo cation metho d

the space required to store the relo cation co de the time required to execute it

the space required to store ob ject mo dules and the time required to execute the

relo cated binary co de Ideally one could use these measures to compare currying

with standard metho ds of relo cation but mld is the only linker that uses relo cation

by currying and mlds assumptions make meaningful comparisons dicult For

example mld uses no ob ject mo dules so it is imp ossible to measure their sizes

This section fo cuses on the size of the relo cation co de and the time required to

execute it There is no need to compare the sp eed of binary co des as relo cated by

currying or by handwritten co de since b oth metho ds result in identical

binaries

This section also compares SPARC relo cating transformations discovered by the

to olkit with transformations dened by the ELF ob jectco de standard

Co de size and hoisting

I used the to olkit to generate enco ding and relo cating co de for the Alpha MIPS

SPARC and Pentium as sp ecied in Ramsey and Fernandez and also for

the PowerPC as sp ecied by Doug Currie of Flavors Technology Table I

shows the amount of space consumed in an application by generated enco ding

functions and relo cating transformations The column lab els across the top name

the sp ecications of the target machines for which ob ject co de can b e generated

or relo cated The results in Table I dep end only on the sp ecications and on the

to olkit itself they are indep endent of the program b eing relo cated

The upp er part of Table I describ es prop erties of the instructionset sp ecications

and of the co de generated to implement them Each instruction accounts for an

enco ding function as do es each addressing mo de A relo catable instruction has

an op erand that is or contains a relo catable address The table shows how hoisting

reduces the numb er of closure functions On the Pentium the numb er of closure

functions without hoisting is greater than the numb er of relo catable instructions

b ecause the to olkit expands addressing mo des inline and generates a dierent clo

sure for each combination of instruction and addressing mo de Many instructions

on the Pentium use eective addresses which come in mo des of which involve

relo catable addresses

The byteco de size in Table I shows how many bytes of machineindep endent

byteco de are needed to represent the closure functions as describ ed ab ove This size

is shown only for the hoisted functions b efore hoisting the closure functions dont

have a meaningful byteco de representation b ecause the byteco de isnt rich enough

to handle the more complicated closures The last line in the top half of Table I

shows the numb er of extra words in addition to the lo cation stored in the largest

closure Only on the Pentium do es relo cating one token at a time result in smaller

closures This result makes sense b ecause the Pentium is the only machine that

uses sequences of tokens in which not all tokens dep end on a relo catable address

The to olkit supp orts crossarchitecture assembly and linking The lower part of

Table I shows how much space the enco ding and relo cating functions take up for

 N Ramsey

4

every combination of host and target machine Each row lab el identies a dierent

host machine on which the relo cation co de runs The data in the table are the

sizes as compiled with gcc for co de generated with and without hoisting The

savings from hoisting are shown in b old as ratios The reduction in ob jectco de

size ranges from on the RISC sp ecications to ab out on the Pentium

sp ecication The dierences in savings are explained by the dierences in the

prop ortion of instructions that use relo catable addresses

When generating enco ding functions the to olkit trades space for time generating

sp ecialized co de for every combination of instruction and addressing mo de The

to olkit do es not enco de an eective address until it knows in what instruction the

address is used Because of the inline expansion of addressing mo des this tactic

is sp ectacularly costly on the Pentium A meaningful measurement of the value

of hoisting on the Pentium will have to await the elimination of this co de bloat

Practical applications like mld use a subset of the full Pentium sp ecication

Hoisting reduces the size of relo cation co de to less than two p ercent of the size

of enco ding co de The sizes of relo cation functions in SPARC ob ject les are

Alpha MIPS SPARC PPC Pentium

K K K K K

To compare one of these sizes with handwritten co de I examined the SPARC

relo cation co de used in the GNU and Solaris linkers These linkers implement

all of the transformations in the ELF standard but the to olkit implements

only The GNU co de is tabledriven the Solaris co de is not

The sizes of the three dierent implementations of relo cation are comparable

The to olkit uses bytes of co de to implement transformations The GNU

linker uses bytes of co de to interpret table entries and the table requires

bytes p er transformation for a total of bytes for transformations These

measurements describ e the GNU co de as altered to work inside mld as part of

the alteration I made several simplifying assumptions and eliminated co de accord

ingly The alterations are describ ed in greater detail b elow The Solaris linker uses

bytes of co de to implement transformations

Sp eed of relo cation

I estimated dierences in relo cation sp eed by transplanting GNU and Solaris re

lo cation co de into mld I removed signicant p ortions of the GNU co de to try to

make the linktime assumptions like mlds assumptions For example I eliminated

supp ort for multiple symb ol tables and for generation of relo catable ob ject co de

and I removed many sanity checks and assertions The GNU co de relo cates into

and out of sp ecialized sections but mld works directly in memory so where p os

sible I mo died the GNU co de to use memory addresses directly instead of sections

and osets The Solaris co de required similar but less sweeping mo dications

To use the mo died co de in mld I translated the to olkits relo cation closures

into ELFstyle relo cation entries I used mld to link four of the SPEC b ench

marks eqntott li gcc and espresso doing relo cation all three ways The

4

Not having access to a PowerPC to act as a host machine I used an RS which has a similar

instruction set

Relo cating Machine Instructions by Currying 

three metho ds yield identical instructions Using spix from the Shade distribution

Cmelik and Kepp el I measured the numb er of SPARC instructions needed

to relo cate these programs Relo cation by currying takes fewer instructions

than the GNU relo cation co de but the Solaris relo cation co de takes fewer

instructions than relo cation by currying These measurements should not b e taken

to o seriously b ecause the foreign relo cation co de is far removed from its original

context and the three metho ds vary in the assumptions they make and the amount

of work they do For example relo cation by currying checks to see if symb ols are

dened the GNU co de makes some less stringent sanity tests and the Solaris co de

makes no checks at all The measurements do indicate that relo cation by currying

costs ab out the same as standard metho ds

The most interesting variation in metho d may b e that the GNU and Solaris co de

require that the contents of each section b e stored in contiguous memory This

requirement is awkward for mld b ecause it uses a lifetimebased memory allo ca

tor Hanson and it do es not know section sizes in advance Relo cation by

currying p ermits the contents of sections to b e split into any numb er of contiguous

blo cks but there is a p erformance p enalty Every time an instruction is relo cated

the relo cation co de must search for the contiguous blo ck containing that instruc

tion If the searching is done in advance and the search time not counted the cost

of relo cation by currying drops by ab out This change simulates the op eration

of a linker of ob ject mo dules in which sections are always contiguous b ecause they

are made so by an assembler

Relationship to standards

The relo cating transformations discovered by the to olkit are equivalent to those

used in standard ob ject formats For example the to olkit discovers ve trans

formations for the SPARC and they are equivalent to the transformations named

SPARC WDISP R SPARC WDISP R SPARC HI R SPARC LO and R SPARC R

in the ELF format for the SPARC Prentice Hall b provided we represent the

relo catable address as the sum of the lab el S and the oset A In ELF terminology

these values are called the symb ol and the addend The to olkit discovers only ve

transformations b ecause the SLED sp ecication for the SPARC designates fewer

op erands as relo catable than do es standard SPARC assembly language We can

make the to olkit discover more transformations simply by making more op erands

relo catable adding a few lines to the SPARC sp ecication helps the to olkit discover

RSPARC and RSPARC If we add constructors to store relo catable addresses

in bit and bit tokens the to olkit discovers RSPARC and RSPARC

There are transformations the to olkit do es not discover Some are sp ecialized

versions of the ones that are discovered For example several ELF transformations

are sp ecialized to refer to lo cations relative to the start of a global oset table or

a pro cedure linkage table Some relo cation entries in standard ob ject les cannot

b e discovered by the to olkit b ecause they represent more than just transformations

For example the RSPARCDAT relo cation entry names the same transforma

tion as RSPARC but it also instructs the linker to create an entry in the global

oset table

 N Ramsey

Size of object co de

Because mld creates no ob ject mo dules I cannot directly measure the eects of

relo cation by currying on the sizes of ob ject mo dules but I can make predictions

based on the exp eriment of emb edding GNU relo cation co de in mld Each relo

cation closure can b e translated into a relo cation entry using the standard ELF

representation The only auxiliary needed is a mapping of small

integers to terms instead of b eing xed by a machinedep endent ob jectco de

standard this mapping must b e stored in an ob ject le As shown in Table I one

could use the to olkits byteco ded representation of terms to store this mapping

in no more than bytes p er ob ject le It might b e p ossible to reduce the size of

the byteco des by using Pro ebstings sup erop erator technique Pro ebsting

DISCUSSION

Relo cation by currying is a simple abstract machineindep endent mo del of relo

cation Abstract relo catable addresses have a cost exp osing the lab el oset

representation at co degeneration time would enable extra savings Osets are

always available at enco ding time and they could b e hoisted out of closure func

tions The closures would take less space b ecause a lab el o ccupies at most half the

space of a lab el oset pair The ELF ob jectco de standard enables such space

optimizations by providing for relo cation entries b oth with and without osets

Exp osing the representation of relo catable addresses would also make it p ossible

to treat certain lab els like those of the ELF global oset table and pro cedure

linkage table as sp ecial cases Such treatment would make it p ossible to shrink

machineindep endent ob ject co de by moving these sp ecial lab els back into the s

Currying and hoisting make it p ossible to write ecient machineindep endent

to ols that relo cate machine instructions In particular by keeping the numb er of

distinct relo cating transformations small hoisting makes a machineindep endent

ob ject co de practical The New Jersey MachineCo de To olkit can derive C imple

mentations of relo cating transformations from a set of machine descriptions and

a to ol writer can incorp orate those implementations to provide ecient relo cation

on a numb er of platforms If the to ol includes an for the byteco de

representation of relo cating transformations it can relo cate instructions for any

machineeven a machine that do esnt exist when it is released

Acknowledgements

Mary Fernandez help ed create the to olkit on which this work is based and she put

together an mld I could use for p erformance measurements Peter Sestoft provided

helpful p ointers to the literature on partial evaluation and functional program

ming Mary Fernandez Vince Russo Zhong Shao and Michal Young criticized the

manuscript in helpful ways

REFERENCES

Appel A W Compiling with Continuations Cambridge University Press Cambridge

Ball T and Larus J R Optimally proling and tracing programs In Conference Record

of the th Annual ACM Symposium on Principles of Programming Languages Albuquerque

NM

Relo cating Machine Instructions by Currying 

Briggs P and Cooper K D Eective partial redundancy elimination Proceedings of

the ACM SIGPLAN Conference on Design and Implementation

in SIGPLAN Notices June

Cmelik B and Keppel D Shade A fast instructionset simulator for execution proling

In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of

Computer Systems

Fernandez M F Simple and eective linktime optimization of Mo dula programs

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and

Implementation in SIGPLAN Notices June

Fraser C W and Hanson D R A machineindep endent linker SoftwarePractice

Experience Apr

Fraser C W and Hanson D R A Retargetable C Compiler Design and Implementa

tion BenjaminCummings Redwo o d City CA

Gircys G R Understanding and Using COFF Nutshell Handb o oks OReilly Asso

ciates Sebastop ol CA

Hanson D R Fast allo cation and deallo cation of memory based on ob ject lifetimes

SoftwarePractice Experience Jan

Hastings R and Joyce B Purify Fast detection of memory leaks and access errors In

Proceedings of the Winter USENIX Conference San Francisco CA

Johnson S C Postloading for fun and prot In Proceedings of the Winter USENIX

Conference

Jones D W Assembly language as ob ject co de SoftwarePractice Experience

Aug

Jones N D Sestoft P and Sndergaard H Mix A selfapplicable partial evaluator

for exp eriments in compiler generation Lisp and Symbolic Computation

Kane G MIPS RISC Architecture Prentice Hall Englewo o d Clis NJ

Larus J R and Schnarr E EEL machineindep endent executable editing Proceedings

of the ACM SIGPLAN Conference on Programming Language Design and Implementation

in SIGPLAN Notices June

Peyton Jones S L The Implementation of Languages Inter

national Series in Prentice Hall Englewo o d Clis NJ

Prentice Hall a System V Application Binary Interface Third ed Prentice Hall Englewo o d

Clis NJ Unix Press

Prentice Hall b System V Application Binary Interface SPARC Architecture Processor

Supplement Third ed Prentice Hall Englewo o d Clis NJ Unix Press

Proebsting T A Optimizing an ANSI C interpreter with sup erop erators In Conference

Record of the nd Annual ACM Symposium on Principles of Programming Languages San

Francisco California

Ramsey N A retargetable debugger PhD thesis Princeton University Department of

Computer Science Also Technical Rep ort CSTR

Ramsey N and Fernandez M F New Jersey MachineCo de To olkit architecture sp eci

cations Tech Rep TR Department of Computer Science Princeton University Oct

Revised Decemb er

Ramsey N and Fernandez M F Sp ecifying representations of machine instructions

ACM Transactions on Programming Languages and Systems To app ear

Sites R L Chernoff A Kirk M B Marks M P and Robinson S G Binary

translation Communications of the ACM Feb

Srivastava A and Eustace A ATOM A system for building customized program

analysis to ols Proceedings of the ACM SIGPLAN Conference on Programming Language

Design and Implementation in SIGPLAN Notices June

Srivastava A and Wall D W A practical system for intermo dule co de optimization

Journal of Programming Languages Also available as WRL Research Rep ort

Decemb er

 N Ramsey

Szymanski T G Assembling co de for machines with spandep endent instructions Com

munications of the ACM Apr

Wahbe R Lucco S Anderson T E and Graham S L Ecient softwarebased fault

isolation In Proceedings of the Fourteenth ACM Symposium on Principles