Relocating Machine Instructions by Currying
Total Page:16
File Type:pdf, Size:1020Kb
Relo cating Machine Instructions by Currying Norman Ramsey University of Virginia Relo cation adjusts machine instructions to account for changes in the lo cations of the instructions themselves or of external symb ols to which they refer Standard linkers implement a nite set of relo cation transformations suitable for a single architecture These transformations are enumer ated named and engraved in a machinedep endent ob jectle format and linkers must recognize them by name These names and their asso ciated transformations are an unnecessary source of machinedep endence An alternative is to use SLED Sp ecication Language for Enco ding and Deco ding to sp ecify representations of machine instructions Instructions are describ ed by constructors which denote functions mapping lists of op erands to instructions binary representations Any op erand can b e designated as relo catable meaning that the op erands value need not b e known at the time the instruction is enco ded From a SLED sp ecication the New Jersey MachineCo de To olkit can generate functions that enco de instructions in the native binary representation For instructions with relo catable op erands the to olkit also computes relo cating transformations To ol writers can create machineindep endent software that uses these transformations to relo cate machine instructions For example mld a retargetable linker built with the to olkit needs only lines of C co de for relo cation and that co de is machineindep endent The to olkit discovers relo cating transformations by currying enco ding functions An attempt to enco de an instruction with a relo catable op erand results in the creation of a closure The closure can b e applied when the values of the relo catable op erands b ecome known Currying provides a general machineindep endent metho d of relo cation Currying rewrites a term into two nested terms The standard implementation has the rst allo cate a closure and store therein its op erands and a p ointer to the second Using this strategy in the to olkit means that when it builds an application the to olkit generates co de for many dierent inner termsone for each instruction that uses a relo catable address Hoisting some of the computation out of the second into the rst makes many of the second s identicala handful are enough for a whole instruction set This optimization reduces the size of machine dep endent assembly and linking co de by for the Alpha MIPS SPARC and PowerPC and by ab out for the Pentium It also makes the second s equivalent to relo cating transformations named in standard ob jectle formats Categories and Subject Descriptors D Op erating Systems Systems Programs and Utili tiesLinkers D Programming Techniques Applicative Functional Programming D Programming Languages Language Classicationsspecialized application languages General Terms Relo cation Linking Currying Additional Key Words and Phrases higherorder functions INTRODUCTION Compiling whole programs is slow compiling units separately and linking the com piled units into a program sp eeds up the editcompilego cycle For separate compi lation a compiler must b e able to emit instructions and data without knowing the exact lo cations either of the instructions and data the compiler itself emits or of Authors current address Norman Ramsey Department of Computer Science University of Vir ginia Charlottesville VA Email nrcsvirginiaedu This work supp orted in part by NSF grant numb er ASC N Ramsey the instructions and data emitted by other compilations Using assembly language makes this task easy b ecause in assembly language all lo cations are represented symb olically Symb olic assemblylike units can b e linked to form programs Fraser and Hanson Jones but the linker or loader must translate all units from symb olic form into the binary representation required by the target hardware It is b elieved to b e more ecient to translate each unit separately into a binary form called relocatable object code Ob ject co de must contain more than just instructions and data To supp ort delayed binding of lo cations it must also represent The symb ols dened in the ob ject le and the lo cations to which they are b ound The symb ols imp orted from other units ie external symb ols The transformations that must b e applied to the instructions and data to account for their eventual placement at absolute addresses and also for the placements of the external symb ols on which they dep end Applying these transformations is called relocation Current ob jectco de formats force to ol writers to handle relo cation in a machine dep endent way Given an instructionset architecture a human b eing examines the instructions and determines which op erands can b e relo catable addresses and what relo cating transformations are needed Each transformation is named and linkers and other to ols must recognize transformations by name The names are informal and machinedep endent so retargetable to ols that manipulate ob ject co de must recognize each set of names on each machine This pap er makes several contributions It presents a machineindep endent au tomatic metho d of discovering relo cating transformations It presents an opti mization that makes the cost of the automatic metho d comparable to the cost of handimplemented metho ds and makes the discovered transformations equivalent to the transformations used in standard ob jectle formats Finally the pap er gives a machineindep endent representation of the transformations This new technique for relo cating machine instructions is an enabling technology for building machineindep endent to ols for static incremental and dynamic linking It will also simplify the construction of retargetable to ols that transform ob ject co de Ob jectco de transformation which is growing in imp ortance is used for proling and tracing Ball and Larus testing Hastings and Joyce enforcing protection Wahb e et al optimization Srivastava and Wall and binary translation Sites et al There are even frameworks for creating applications that transform ob ject co de Johnson Larus and Schnarr Srivastava and Eustace The techniques presented here build on the New Jersey MachineCo de To olkit Ramsey and Fernandez which reads a compact machine description and generates functions that enco de instructions The machine description is written in SLED Sp ecication Language for Enco ding and Deco ding which relates three rep resentations of instructions a symb olic representation akin to assembly language assembly language itself and the binary representation used by the hardware Real instruction sets can b e sp ecied with mo dest eort our Alpha MIPS SPARC and Pentium sp ecications Ramsey and Fernandez are and Relo cating Machine Instructions by Currying lines In SLED an instruction is represented symb olically by its name and a list of its op erands the collection of all instructions resembles an algebraic data typ e The machine description indicates which op erands are relo catable addresses and currying the enco ding function with resp ect to those op erands results in a relo cating transformation Currying rewrites the enco ding function into two nested terms In the standard implementation the outer allo cates a closure and stores therein its op erands and a p ointer to the inner which uses the contents of the closure to enco de relo cate the instruction The inner s are the relo cating transformations discovered by the to olkit and the closures take the place of relo cation entries in traditional ob ject les Using the standard implementation of currying the to olkit generates co de for many dierent inner termsone for each instruction that uses a relo catable ad dress Hoisting some of the computation out of the inner into the outer makes many of the inner s identicala handful are enough for a whole instruction set This optimization is closely related to fully lazy lamb dalifting Peyton Jones It reduces the size of machinedep endent assembly and linking co de by for the Alpha MIPS SPARC and PowerPC and by ab out for the Pentium It also makes the relo cating transformations discovered by the to olkit equivalent to those that are now implemented by hand To supp ort machineindep endent use of these transformations the to olkit asso ciates each one with a string that can b e interpreted to have the eect of applying the transformation These strings can b e used in an ob ject le as meaningful formal machineindep endent names DESCRIBING INSTRUCTION REPRESENTATIONS SLED describ es the binary representation of an instruction as a sequence of tokens On a RISC machine each instruction is a single bit token On a machine like the Pentium formats vary for example the instruction add DX has an bit op co de token followed by another bit token that has b oth op co de and addressmo de bits followed by the bit displacement and the bit immediate op erand Each token in an instruction is partitioned into elds a eld is a contiguous range of bits within a token On RISC machines dierent instruction formats are represented by dierent partitions of the instruction token Fields contain op co des op erands mo des or other information Op co des and op erands can b e distributed among multiple elds Patterns constrain the values of elds they may constrain elds in a single token or in a sequence of tokens They can b e used to describ e binary representations of op co des of whole instructions and of groups of instructions Constructors connect the symb olic and binary representations of instructions At a symb olic level an