Alias Analysis of Executable Code

Alias Analysis of Executable Co de Saumya Debray Rob ert Muth Matthew Weipp ert artment of Computer Science Dep University of Arizona Tucson AZ USA fdebray muth weippertgcsarizonaedu tecture such as the DEC Abstract Alpha linktime optimizers need to carry such as Spike alto and OM Recent years have seen increasing interest in systems out instruction scheduling again after linktime opti that reason ab out and manipulate executable co de mizations without p ointer alias information however Such systems can generally b enet from information the scheduler must b e conservative in its treatment of ab out aliasing Unfortunately most existing alias loads and stores and this can limit the amount of co de el language analyses are formulated in terms of highlev reordering that is p ossible As a nal example it may features and are unable to cop e with features such b e p ossible to scavenge registers at linktime eg by as p ointer arithmetic that p ervade executable pro examining the register usage of library functions but grams This pap er describ es a simple algorithm that the ability to use such scavenged registers eectively can b e used to obtain aliasing information for exe is likely to b e limited in the absence of p ointer alias cutable co de In order to b e practical the algorithm is information careful to k eep its memory requirements low sacric There is an extensive b o dy of work on p ointer alias ing precision where necessary to achieve this goal Ex analysis of various kinds see Section In almost all p erimental results indicate that it is nevertheless able cases these are high level analyses carried out on rep to provide a reasonable amount of information ab out resentations of source programs in terms of source lan memory references across a variety of b enchmark pro guage constructs and typically disregarding nast y grams features such as typ e casts p ointer arithmetic and Intro duction outofb ounds array accesses Such analyses turn out unfortunately to b e of limited utility at the machine Recent years have seen increasing interest in reason co de level b ecause at this level all we have are the ing ab out and manipulating executable les nasty features The contents of registers and mem When working with an executable ory words are untyp ed bitstrings so the issue of typ e le we typically have information ab out the entire casts is in some sense mo ot everything is p otentially programincluding p otentially library functions an address Memory accesses typically involve some that is usually not available at compile time Because address arithmetic to compute a base address into a of this co de manipulation and optimization at this register followed by the use of a displacement o the level oers b enets that are dicult or imp ossible to base address to carry out the actual memory refer with the compi obtain using traditional compilers As ence Address arithmetic may also arise due to par lation of sourcelevel programs co de transformations ticular language features eg the use of tag bits in on executable co de can b enet greatly from p ointer dynamically typ ed languages to indicate the typ e of alias information F or example inlining library rou the value p ointed at Dereferencing op erations in the tines may op en up opp ortunities for moving invariant executable co de for such programs will involve nontriv instructions out of lo ops but alias information is load ial arithmetic involving the tag bits that is invisible needed in order to identify such invariant load instruc and irrelevantat the source level at the level of ex tions To obtain the full b enets of a sup erscalar archi tell what source language ecutable programs we cant This work was supp orted in part by the Na a particular piece of co de was derived from and dier tional Science Foundation under grant CCR ent comp onents of a program might have b een written Copyright ACM To app ear in the Pro ceedings of in dierent source languages so we must b e able to the th Annual ACM SIGPLANSIGACT Symp osium on deal with all such address arithmetic in a reasonable Principles of Programming Languages January way If the numb er of arguments to a function is b e expressed in terms of these eg subtraction and large enough some of the arguments may have to b e registertoregister moves can b e mo delled in terms of passed on the stack In such a case the arguments addition we do not consider these separately In ad passed on the stack will typically reside at the top of dition to these we assume the usual complement of the callers stack frame and the callee will reach into tests conditional jumps and direct and indirect un the callers frame to access them this is nothing but an conditional jumps the only eect of these instructions outofb ounds array reference Finally executable pro is to determine the control ow graph of the program grams may include library functions in handwritten so we do not consider them explicitly in the context assembly co de that violate familiar and comfortable of alias analysis We also ignore op erations on oat sourcelevel assumptions eg that execution do es not ing p oint registers since it seems unlikely that such jump out of the middle of one function and into the op erations would b e used for address computations middle of another this happ ens for example in some Lo cal Alias Analysis Fortran library routines To illustrate some of the problems that arise consider the fragment of C co de A technique called instruction inspection commonly shown in Figure together with the corresp onding as used in compiletime instruction schedulers can b e 1 The p oint to note is the extensive use sembly co de used to reason ab out memory references within a ba of address arithmetic to access memory even in this sic blo ck Here two memory reference instructions i 1 very simple program fragment For example in order and i are taken to b e nonconicting if either of the 2 to determine whether instructions and might following conditions hold write to the same memory lo cation we need to b e able to reason ab out the contents of registers r and r they use distinct osets from the same base reg which are dened through the arithmetic op erations in ister r and r is not redened b etween i and i 1 2 instructions and As this example illustrates or p ointer arithmetic cannot b e ignored during alias anal ysis at the machine co de level one of the instructions uses a register known to p oint to the stack and the other uses a register In this pap er we describ e a lowlevel o w known to p oint to the global data area sensitive contextinsensitive interpro cedural p ointer alias analysis algorithm designed and implemented in the context of the alto link time optimizer that Unfortunately this simple approach do es not work if can handle signicant p ointer arithmetic and features information ab out address arithmetic needs to b e prop such as outofb ound references that are ignored by agated across basic blo ck b oundaries In the next sec most existing alias analysis algorithms tion we describ e a global analysis that can b e used to handle this For simplicity in the discussion that follows we as sume a more or less canonical RISC instruction set ResidueBased Global Alias Analysis Memory is accessed only through explicit load and The Basic Idea eg store instructions which have the form load r a k reg and store reg k reg where k is a con b a b An alias analysis will in general asso ciate each register stant and have the eect of reading from or writing with a set of p ossible addresses at each program p oint to the lo cation whose address is k contents of reg b so we need to abstract sets of addresses to descriptions To mo del arithmetic we assume the instructions add or abstract address sets These need to b e easy to src src dest and mult src src dest where 1 2 1 2 compute and compactly representable with op erations dest is a destination register and src and src are 1 2 such as union intersection checking containment etc source registers to simplify the discussion we abuse that are cheap enough to b e practical for the analysis of notation and allow either src or src to b e an integer 1 2 large programs A simple way to satisfy these criteria constant denoting an immediate op erand These in is to consider only some xed numb ersay m of the structions compute resp ectively the sum and pro duct low order bits of an address That is addresses are rep of src and src into dest many other op erations can 1 2 m resented by their mo dk residues where k The 1 The assembly co de shown corresp onds to that obtained us set of all mo dk residues is Z f k g An k O on a DEC Alpha workstation with some edits to ing gcc abstract address set can then b e represented as a bit the Alpha arguments to functions are enhance readability On m vector of length k since mand therefore k is typically passed in registers and register is used as the stack p ointer xed set op erations such as union intersection check Source Co de Executable Co de int f f add r r allocate stack frame store r r save return address int x x is at displacement in fs stack frame y y is at displacement in fs stack frame gy x add r r r y add r r r x bsr r g r return addr goto g g int gint x int y arg in r arg in r f add r r allocate stack frame store r r save return address x store r y store r g Figure A fragment of a C program and the corresp onding assembly co de of none indicates that the corresp onding residue set ing containment

Alias Analysis of Executable Code

Using the GNU Compiler Collection (GCC)

Memory Tagging and How It Improves C/C++ Memory Safety Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov Google February 2018

Statically Detecting Likely Buffer Overflow Vulnerabilities

Aliasing Restrictions of C11 Formalized in Coq

Teach Yourself Perl 5 in 21 Days

User-Directed Loop-Transformations in Clang

Towards Restrict-Like Aliasing Semantics for C++ Abstract

Autotuning for Automatic Parallelization on Heterogeneous Systems

Program Sequentially, Carefully, and Benefit from Compiler Advances For

Alias-Free Parameters in C Using Multibodies Medhat Assaad Iowa State University

Aliasing Contracts, a Novel, Dynamically-Checked Alias Protection Scheme for Object- Oriented Programming Languages

Flexible Alias Protection