Backwardscompatible b ounds checking for arrays and p ointers in

programs

Richard W M Jones and Paul H J Kelly

Department of Computing

Imp erial College of Science Technology and Medicine

Queens Gate London SW BZ

Abstract within which the resulting p ointer should p oint

A p ointer in C can b e used in a context divorced

This pap er presents a new approach to enforcing array

from the name of the storage region for which it is valid

b ounds and p ointer checking in the C language Check

its intended referent and this has prevented a fully

ing is rigorous in the sense that the result of p ointer

satisfactory b ounds checking mechanism from b eing de

arithmetic must refer to the same ob ject as the orig

veloped There is overwhelming evidence that b ounds

inal p ointer this ob ject is sometimes called the in

checking is desirable and a number of schemes have

tended referent The novel asp ect of this work is

b een presented The main dierence b etween our work

that checked co de can interoperate without restriction

and Kendalls bcc and Steens rtcc is that in

with unchecked co de without interface problems with

our scheme the representation of p ointers is unchanged

some eective checking and without false alarms This

This is crucial since it means that interoperation with

backwards compatibility prop erty allows the overheads

nonchecked mo dules and libraries still works and much

of checking to b e conned to susp ect mo dules and also

checking is still p ossible Compared with interpretative

facilitates the use of libraries for which source co de is

schemes like SabreC we oer the p otential for much

not available The pap er describ es the scheme its pro

higher p erformance Patil and Fischer present a

totype implementation as an extension to the GNU C

sophisticated technique with very low overheads using

compiler presents exp erimental results to evaluate its

a second CPU to p erform checking in parallel Unfor

eectiveness and discusses p erformance issues and the

tunately their scheme requires function interfaces to b e

eectiveness of some simple optimisations

changed to carry information ab out p ointers so also

has the interoperation problem

Another approach is exemplied by the commercially

Introduction and related work

available checking package Purify Purify pro cesses

the binary representation of the software so can handle

C is unusual among programming languages in provid

binaryonly co de Each memory access instruction is

ing the programmer with the full p ower of p ointers

mo died to maintain a bit map of valid storage regions

Languages in the PascalAlgol family have arrays and

and whether each byte has b een initialised Accesses

p ointers with the restriction that arithmetic on p oint

to unallo cated or uninitialised lo cations are rep orted

ers is disallowed Languages like BCPL allow arbitrary

as errors Purify catches many imp ortant bugs and is

op erations on p ointers but lack types and so require

fairly ecient However Purify do es not catch abuse

clumsy scaling by ob ject sizes

of p ointer arithmetic which yields a p ointer to a valid

An advantage of the PascalAlgol approach is that

region which is not the intended referent Fischer and

array references can b e checked at runtime fairly e

Patil provide evidence for the imp ortance of this

ciently in fact so eciently that there is a go o case

renement

for b oundschecking in pro duction co de Bounds check

Our goals in this pap er are to describ e a metho d of

ing is easy for arrays b ecause the array subscript syn

b ounds checking C programs that fullls the following

tax sp ecies b oth the address calculation and the array

criteria

Backwards compatibility the ability to mix checked

co de and unchecked libraries for which the source

Presented at AADEBUG LinkopingSweden

may b e proprietary or otherwise unavailable

Works with all common C programming styles

Rigorously rejects violations of the ANSI C stan Bounds checking is not blo cked or weakened by the

dard use of a cast ie type co ercion Casts can prop erly

b e used to change the type of the ob ject to which a

Checks static and stack ob jects as well as ob jects

p ointer refers but cannot b e used to turn a p ointer

dynamically allo cated with malloc

to one ob ject into a p ointer to another A corollary

is that b ounds checking is not type checking it do es

Understands scop e of automatic variables

not prevent storage from b eing declared with one data

Performance including the ability to b e able to

structure and used with another

distribute programs with checks compiled in

More subtly note that for this reason b ounds check

ing in C cannot easily validate use of arrays of structs

There remain some circumstances in which checking is

which contain arrays in turn

incomplete as we describ e later these are fairly un

Casts and unions can b e used to create a p ointer

common in practice The main shortcoming of the im

from an ob ject of any other type in a machinedependent

plementation describ ed in this pap er is that the p erfor

way This cannot b e checked using our technique nor

mance is currently p o or However the approach has

by earlier approaches to b ounds checking since there is

fundamental p erformance advantages over previously

no ob ject for the p ointer to b e derived from

published work Because checked co de interoperates

easily with unchecked co de the p erformance p enalty

is conned to those mo dules where it is needed Fur

The technique and its advantages

thermore there is substantial scop e for optimisation of

lo opinvariant p ointers and p ointers which are induc

In this section we review earlier approaches and explain

tion variables Because the p ointer representation is

the basis for the new approach

unchanged there is no residual overhead once checking

co de is eliminated We return to this issue in Section

Earlier approaches to carrying b ounds

information

Overview of this pap er

The next section reviews the problem of b ounds check Storage object

ing for C and the limitations the language places on

the checking that can b e done In the following section

the new approach is introduced and we explain how unlike earlier schemes our b ounds checking scheme al

base:

lows interoperation with unchecked co de Then we give

pointer:

some details of our implementation and discuss some

their eectiveness Finally we dis

optimisations and limit:

the eectiveness of the scheme in the light of our

cuss Enhanced pointer

exp erience with some large and wellknown C programs

Figure Mo died p ointer representation p ointer

Ob jects b ounds checking in C and its

baseaddressextent triple

limitations

In earlier work in this area b ounds

information is carried with each p ointer at runtime A

ANSI C conveniently allows us to dene an object as

simple approach is to represent each p ointer as a triple

the fundamental unit of memory allo cation Ob jects

the p ointer together with the storage regions base ad

are created by declarations or allo cations such as those

dress and limit or extent Checking is then straight

shown in Table which may b e static automatic ie

forward The larger size of p ointers requires changes in

stackallocated or dynamically allo cated

storage allo cation and the co de generator must b e mo d

Ob jects are stored sequentially in memory and can

ied to copy p ointers correctly The change in p ointer

not overlap Op erations are p ermitted which manipu

size can b e avoided by replacing each p ointer with an

late p ointers within ob jects but p ointer op erations are

index into a table which contains the p ointerbaselimit

not p ermitted to cross b etween two ob jects There is

triple

no ordering dened b etween ob jects and the program

The net eect of b oth metho ds is the same When

mer should never b e allowed to make assumptions ab out

the program at runtime comes to use a p ointer it

how ob jects are arranged in memory

must rst verify that the op eration that is ab out to b e

int a A simple variable

int a An array

struct f g a A single record

struct f g a An array of records

malloc A single unit of memory allo cated with malloc

Table Typical ob jects

p erformed is correct It uses the information ab out the Every valid p ointervalued expression in C derives

base and size of the array or structure b eing p ointed to its result from exactly one original storage ob ject If

to decide if a particular index is legal the result of the p ointer calculation refers to a dierent

ob ject it is invalid

Although it sometimes useful to know where an in

Unchanged p ointer representation

valid p ointer has b een calculated rep orting every in

stance can yield many false alarms We therefore re

The problem with b oth these schemes is that the mo d

place such incorrectlyderived p ointers with a p ointer

ied p ointer representation is not interpreted correctly

value which is always invalid called ILLEGAL Dened

by co de compiled without b ounds checking enabled

as void in our implementation This ensures

This is a problem wherever a p ointer is passed to or

that a b ounds error is rep orted when the p ointer is ac

from an unchecked pro cedure whether as a parameter

tually used

a result or in a global variable It is of course often

p ossible to translate p ointers where necessary called

encapsulation in bcc and rtcc but this is incon

Example p ointers to ob jects

venient and dicult to do reliably eg where a func

tion p ointer may refer either to checked or an unchecked

Because of these diculties in rtcc only op

routine Dead space between objects erating system calls are encapsulated all libraries must

b e recompiled a b cd e f

In this pap er we show that the p ointer representa

tion need not b e changed This avoids the need either

for encapsulation or recompilation The result is im

proved functionality eg to work with mo dules and

libraries provided in binaryonly form and p otentially p2 p3

improved p erformance since welltested mo dules

also p1

can run without checking

Checking p ointer use how the scheme

works

Figure Ob jects arranged in memory

Given these considerations in our metho d p ointers are

Figure shows an example layout for several ob jects

represented as simple addresses as in ordinary C pro

of various sizes p erhaps arising from static allo cations

grams We maintain a table of all known valid stor

or from calls to malloc Supp ose we have p ointers p

age ob jects Using the table we can map a p ointer to

p and p referring to the ob jects or p erhaps to their in

a descriptor of the ob ject into which it p oints which

ternal comp onents their type is immaterial since casts

contains the base extent and additional information to

may have b een used Table shows p ermissible p ointer

improve error rep orting

op erations given the rule that p ointer op erations are

We have to check b oth p ointer arithmetic and p ointer

only p ermitted to take place within an ob ject and not

use Pointer arithmetic must b e checked b ecause the re

b etween ob jects

sult must never b e allowed to refer to an ob ject dierent

from the one from which it is originally derived This

is b ecause the ob ject for which the p ointer is valid can

only b e determined by checking the p ointer itself by

lo oking it up in the ob ject table

p p Permitted Both p ointers are within the same

ob ject

p p Not p ermitted Makes assumptions ab out the

layout of ob jects in memory

Increment p until p p Not p ermitted As so on as p is incremented

b eyond the end of ob ject b a b ounds error will

b e rep orted

Table Permissible op erations on p ointers pp

in Figure

Unfortunately we cannot pad parameters passed to Problem legal outofrange array p oint

functions since this would mean that the parameter

ers

layouts assumed by checked and nonchecked co de would

An awkward complication arises with arrays Consider

b e incompatible This results in a small ambiguity We

the correct co de in Figure

resolved this partially in our implementation by agging

function parameters and treating them sp ecially Essen

f

tially when lo oking up p ointers to parameters we treat

f

a reference to aN as a p ossible p ointer to the next

int p

ob ject in memory If there is an adjacent parameter

int a int mallo c sizeofint

then the p ointer will p oint to the next ob ject

for p a p < a p

This is an instance where checking is incomplete a

p

p ointer to an array passed as a parameter can b e in

return a

cremented to p oint to the later parameters without an

g

error b eing rep orted Using the p ointer to refer to ear

lier parameters or elsewhere will b e trapp ed correctly

Figure Iterating over an array

In practice this solution was satisfactory since al

though it is p ossible to pass actual structures and struc

On exit from the lo op p p oints to a The nal

tures containing arrays as parameters this is very rare

p increments p b eyond the range for which it is valid

and even then most cases can b e caught The infre

although the resulting p ointer is never used Accord

quency of use and the fact that we catch many cases

ing to the denition of p ermissible p ointer op erations

anyway make this p otential lo ophole an extremely mi

ab ove this should b e agged as an error since p may

nor concern

now p oint to a dierent ob ject

The ANSI C standard section lines

states that for an array declared Type aN a pro

Ob jects originating in unchecked co de

grammer may only generate p ointers to elements a

a up to aN The last element do es not literally

When an ob ject is allo cated in checked co de it is en

exist and any attempt to dereference a p ointer to aN

tered in the ob ject table When the resulting p ointer

will result in undened b ehaviour or in our case a

is used in checked co de b ounds checking works fully

b ounds error It is not p ermissible to create a p ointer

If the p ointer is passed to unchecked co de unchecked

to for instance element a of an array and such

accesses can o ccur

programs will not b e p ortable to architectures where

When a p ointer is passed from unchecked to checked

all ob jects are stored in separate segments

co de it may originate either from a checked or unchecked

To overcome this problem we place at least one byte

allo cation note that dynamicallyallocated ob jects are

of dead space b etween ob jects in memory allo cations

always registered since even unchecked co de must call

are often aligned to or byte b oundaries in mem

the checked malloc function

ory so there may b e several bytes b etween adjacent ob

There are two cases

jects A p ointer to aN can now b e distinguished from

The p ointer passed from unchecked to checked co de

a p ointer to the next adjacent ob ject in memory see

p oints into a checked ob ject

the App endix for an example

This may b e correct as it may have b een derived

1

There is a subtle assumption here if the size of the ob ject were

from a p ointer passed to it or it may b e the result

not an integer multiple of the array element size then aN could lie

more than one byte b eyond its limit dep ending on the size of the

insucient space for aN

element type However this case is a b ounds error since there is

f

from a call to the mo died malloc storage allo

f

cator In this case checking will pro ceed normally

int a

It may b e incorrect the p ointer may b e improp

if goto lab el

erly derived from some other ob ject This case is

f

indistinguishable and no error will b e rep orted

int b

The p ointer passed from unchecked to checked co de

p oints into an ob ject which do es not app ear in the

lab el

ob ject table b ecause the space was allo cated in

unchecked co de

if goto lab el

This is detected when the p ointer is used Al

g

though it may b e helpful to issue a warning mes

lab el

sage and to p erform basic sanity checks the pro

gram can pro ceed without false alarms This is

g

b ecause the key check is whether in p ointer arith

metic the result refers to the same ob ject as the

Figure Stack ob jects created and destroyed by goto

p ointer from which it was derived If the original

p ointer is not registered the result should not b e

Accidental use of unchecked p ointers in checked

In our implementation we used the C construc

co de to damage checked ob jects is thereby pre

tordestructor mechanism of our compiler GCC to track

vented

such variables This is fairly common since many C

compilers are built to handle C to o Details lie b e

yond the scop e of this pap er

Maintaining the ob ject table tracking

Parameters are a sp ecial form of stack ob ject Care

creation and deletion of ob jects

must b e taken to ensure that parameters are created

once on entry to the function and deleted on exit even

At runtime we track ob jects as they are created and

if the pro cedure exits with return early on The C

deleted We maintain an ordered list of ob jects in mem

constructordestructor mechanism can handle this to o

ory and employ a fast metho d to convert p ointers to ob

Ordinary stack ob jects must b e padded as describ ed

jects Several suitable structures are available for this

in section Parameters are not padded so that

purp ose We used a splay tree in our implementation

checked and unchecked functions have compatible pa

but other structures such as tries and skiplists might

rameter layouts ANSI C prevents using the return

b e suitable

value of a function as an lvalue immediately Since all

Static ob jects global variables variables declared as

return values are therefore copied into a variable in the

static in functions and string constants p ersist over

calling function there is no need to take sp ecial action

the lifetime of the program A simple mo dication to

checking or padding aggregate function results

the compiler andor the linker can b e made to pro duce

a list of these ob jects As indicated ab ove it is not

necessary to nd ob jects in the unchecked parts of the

Implementation in an existing compiler

co de

Dynamically allo cated ob jects those declared with

We implemented our b ounds checking scheme in the

malloc and destroyed with free can b e tracked by a

GNU C compiler GCC In this section we briey ex

simple mo dication to the C library Although malloc

plain how this was done The resulting program is freely

often introduces padding anyway care is needed with

available from a variety of sources

ob jects allo cated dynamically by other means such as

mmap and sbrk

Checking p ointer op erations

Stack ob jects present greater diculties since the

C goto command may mean that they are created and

We altered GCC to replace p ointer op erations with calls

destroyed at several dierent places in the co de see

to a library of checking functions Typically when the

Figure

programmer writes p i where p has a p ointer type

In this co de fragment b is in scop e b etween the inner

and i is an integer the compiler replaces it with

set of curly brackets The goto label statement has

b ounds check ptr plus intp i sizeofT T

the side eect of creating b and goto label destroys

FILE LINE

it In addition b must b e created and destroyed if and

when control passes the inner curly brackets

Using existing C mechanisms to track T is the type of the p ointer p FILE and LINE

are macros that expand to the current le and line num

stack ob jects

b er and are used to lo cate errors when they o ccur

Ordinary stack ob jects not function parameters are

padded by tricking GCC into b elieving they are one byte

operatoroperand types

larger than they really are A patch to the GCC alloca

p ointer integer array reference

function catches variablesized stack ob jects

p ointer element reference to record eld

In order to deregister stackallocated ob jects on blo ck

p ointer integer yields pointer

exit we used the constructordestruc tor mechanism built

p ointer integer yields pointer

into GCC and designed to handle C ob jects even

p ointer p ointer yields integer

where they may b e created or destroyed by uses of goto

p ointer p ointer comparisons

The co de shown in Figure contains several stack vari

p ointer p ointer

ables in dierent scop es The co de is compiled as if the

p ointer p ointer

user had written the version in Figure

p ointer p ointer

p ointer p ointer

Finding statically allo cated ob jects at

p ointer p ointer

p ointer dereference

compile and link time

p ointer postincrement

We mo died the backend of GCC slightly to construct a

p ointer postdecrement

table of statically allo cated ob jects such as global vari

p ointer preincrement

ables and string constants Each source le compiled

p ointer predecrement

with b ounds checking enabled will contain such a table

and this is automatically loaded at runtime b efore the

Table Op erators requiring checking

program starts running The design of GCC enabled

this to b e done in a straightforward manner

Table shows the op erators where checking co de has

It is desirable to track down ob jects declared in unchecked

to b e added Note that we must check p ointer arith

co de to o although not strictly necessary as describ ed

metic as well as p ointer use We also check p ointer

earlier A simple to ol was written that takes a library

comparisons and subtractions since the result is valid

archive or ob ject le and writes out a table of static ob

only if the op erands refer to the same aggregate

jects contained therein This table can then b e linked

In order to handle comp ound op erators correctly and

to the program

eciently we sp ecically detect and replace the follow

Static ob jects are padded by asking the linker to

ing patterns

allo cate one extra byte after each ob ject

pointer is replaced with pointer

Minimal mo dications to malloc and

pointerinteger is replaced with pointer integer

free

pointer element is replaced with pointer o

We mo died the GNU malloc library to register dynam

setofelement

ically allo cated ob jects as they are created and dereg

As describ ed ab ove certain p ointer op erations silently

ister them as they are freed A single extra byte of

return the sp ecial representation ILLEGAL Dened as

padding is added to each ob ject when it is allo cated

void in our implementation when they fail This

The new library is linked automatically and replaces

allows programmers to make illegal p ointers and only

all calls to the previous malloc family of functions

have them caught later if the programmer attempts to

dereference them For instance in an array declared

Mo dications to C library functions

int a attempting to generate a results in an

ILLEGAL p ointer which is caught when used later All

Unlike many other C compilers GCC usually works with

p ointer op erations catch ILLEGAL p ointers passed and

the systeminstalled C library on whatever op erating

throw b ounds errors

system it runs In most instances the source to these

libraries is not freely available so users will b e forced to

run them without b ounds checking This implies that a

call to a function such as strcpy passing a bad p ointer

will not result in a b ounds error but in a segmentation

fault or in random damage to memory

int sum int n int a

To detect such errors we replaced many C library

f

functions with ecient b oundschecked versions Calls

int i s

to the ANSI str and mem functions are checked in

for i i < n i

this way The implementations of memcpy and strcpy

s ai

also check for illegal copying of overlapping memory seg

return s

ments

g

Splay trees to lo ok up p ointers quickly

Figure Vector sum example with stack ob jects

In order to reduce the overhead of converting p ointers

to ob jects on the o ccasions when that is necessary we

int sum int n int a

store the ob ject list as a splay tree Splay trees

f

are binary trees where frequently used no des migrate

b ounds push function enters a function context A

towards the top of the tree In tests it was found that

matching call to b ounds p op function will

the lo okup function was iterated on average times

delete parameters

p er call on a typical large program We unrolled the rst

two iterations of the lo op to optimise these cases

b ounds push function sum

b ounds add parameter object n sizeof int

b ounds add parameter object a sizeof int

Performance and optimisations

Extra scop e created around the function GCC will

For the b ounds checking scheme outlined ab ove to b e

b ounds p op function when leaving this call

useful careful consideration must b e given to optimis

scop e

ing the co de pro duced In particular it is p ossible to

reduce the number of accesses to the splay tree that

f

are required quite considerably In the next few para

Declare stack objects and use GCCs destructor

graphs we describ e some simple optimisations we have

b ounds delete stack object is mechanism to ensure

implemented some further optimisations which should

called for each variable however we leave scop e

b e straightforward to add and we briey discuss the

even if we leave with goto

problematic cases which remain

int i

b ounds add stack object i sizeof int

Eliminating calls to register unused vari

int s

ables

b ounds add stack object s sizeof int

If the programmer never takes the address of a stack

for i i < n i

variable then no p ointer can ever b e generated that

s int

refers to that variable and so it is unnecessary even

b ounds check array reference a i

to consider that variable for b ounds checking purp oses

sizeof int

This is extremely eective as addressable lo cal vari

b ounds delete stack object s

ables are rare in typical programs

b ounds delete stack object i

g

Eliminating lo okups in lo ops over ar

end

b ounds p op function sum Delete a n

rays

return s

For further signicant gains in p erformance we sug

g

gest a simple scheme for optimising lo ops over arrays

using co de motion Consider the fragment of co de in

Figure Vector sum example with stack ob ject man

Figure after b ounds checking co de has b een added in

agement using the C constructordestructor mech

a simpleminded way In Figure we have made the

anism

p ointertoob ject conversion explicit by inlining part of

the pro cedure call

Evaluation

int a i

for i i < i

We have used the mo died compiler to recompile a wide

This is the co de substitution for ai i

variety of applications software In this section we re

int

view our exp erience with reference to some substan

b ounds check array referencea i sizeofint i

tial and freelyavailable C programs We comment on

the problems we encountered the eectiveness of the

scheme in nding errors and the p erformance of the re

Figure Co de after simpleminded substitution of a

sulting co de with b ounds checking enabled for the entire

checking function

program excluding libraries

We compiled the scripting and GUI language TclTk

int a i

in its entirety around lines of co de We made

for i i < i

changes to the source co de see table

f

object obj b ounds nd object a

if obj obj>base < ai

no of instances

ai < obj>extent

ai i

else

Contravening ANSI standard by

throw a b ounds error and exit

p ointing to negative array osets

g

Fixing p ointer nasties such as

adding osets to NULL p ointers

Figure Co de after partially inlining the checking

Using p ointers that refer to ob jects

function

freed in a realloc

Changes to supp ort goto restric

Clearly the call to do the p ointertoob ject conver

tion caused by using C con

bounds find object should b e moved outside sion

structor and destructor mecha

the lo op in the co de motion phase of the optimiser

nism

An ecient compiler would then b e able to remove

the b ounds checking tests objbase ai and

ai objextent entirely and replace them with

Table Changes made to the source of TclTk

two tests outside the lo op

This may b e done if there is a way to sp ecify that

The resulting interpreter ran all the Tk demos cor

the call is a constant function ie has the same return

rectly although noticably more slowly than without

value when called multiple times provided that ob jects

checking The interactive scripts were still quite usable

are not added or deleted in b etween calls GCC do es not

and resp onsive but the authors would not recommend

provide a way to encapsulate this subtlety and so our

using b ounds checking in pro duction co de until the fur

implementation do es not yet make this optimisation

ther optimisations suggested ab ove have b een made

TM

Lo ops which iterate through arrays using p ointers

We also compiled Ghostscript a freeware PostScript

instead of incrementing an array subscript as ab ove

interpreter We needed to x the nonANSI imple

are more dicult bounds find object will b e ap

mentation of stacks that Ghostscript uses it initial

plied to the p ointer which is not lo op invariant Here

izes p ointers to the element of each stack but the

a more sp ecialised optimisation for induction variables

changes involved were relatively minor and the pro

should help

gram ran without error Again there was a noticable

slowdown when drawing complex graphical images but

the program was by no means unusable

Diculties optimising lo ops over linked

GCC itself compiles with the b ounds checking patches

structures

Unfortunately GCC makes extensive use of obstacks

which are large singlyallo cated areas of memory that

Lo ops over linked lists tree structures and the like pro

may contain many variablesized ob jects Since the

vide a greater challenge We were not able to devise an

b ounds checking library treats these areas of memory

ecient metho d of optimising lo ops that traverse linked

as single ob jects simple b ounds errors b etween the el

data structures although the splay tree we used to im

ementary ob jects contained inside are not detected In

plement the ob ject table will tend to cache frequently

hindsight we should have mo died GCCs obstack li

used ob jects like the elements in the list near the top

brary very slightly to interact correctly with the b ounds enabled is substantial but in many cases this can b e

checking library by allo cating and deleting the simple alleviated by optimisation and this is the most press

ob jects explicitly ing direction for further enhancements The technique

MicroEMACS a simple text editor that has b een has b een applied to a wide variety of C programs with

p orted and used widely actually has b ounds errors which generally go o d results

this program picked up immediately

Although it is p ossible to construct programs that

Acknowledgements We would like to acknowledge

p erform very badly indeed when b ounds checking is

the helpful comments of our anonymous referees

added such as programs that solely iterate over long

linked lists doing almost no work at each no de real

References

programs are for the most part quite usable Never

theless a go o d implementation of this technique must

American National Standard for Information Sys

consider optimisation issues very carefully It is un

tems C Technical Rep ort

likely that we could ever achieve the p erfor

ANSI X ANSI Inc New York USA

mance loss that would b e acceptable if programs are

to b e distributed with b ounds checks compiled in In

practice most programs showed a times slowdown

LO Andersen Program Analysis and Specializa

which is comparable to other commercial b ounds check

tion for the C Programming Language PhD thesis

ing packages

DIKU University of Cop enhagen Denmark

Fischer and Patil provide interesting evi

DIKU Research Rep ort

dence for the practical imp ortance of checking p ointers

DClark Splay trees Dr Dobbs Journal page

are used to refer only to the intended referent compared

December

with the checking provided by to ols such as Purify

DDSleator and RETarjan Selfadjusting binary

search trees Journal of the ACM

Further work

We plan to investigate optimisation techniques further

DWFlater YYesha and EKPark Extensions

and when we have done so we will present b enchmark

to the C programming language for enhanced fault

p erformance comparisons While we hop e to achieve

detection Software Practice and Experience

fairly go o d p erformance using conventional data ow

June

analysis as describ ed earlier there is also scop e for inter

pro cedural optimisation and ultimately it may b e p os

R Hastings and B Joyce Purify fast detection

sible to validate nontrivial examples at compiletime

of memory leaks and access errors In Proceedings

using for example partial evaluation

of the Winter USENIX Conference pages

The range query lo okup on which checking is based

is critical to p erformance and there is scop e for exp er

JLSteen Adding runtime checking to the

imental work to tune our splay tree approach and to

p ortable C compiler Software Practice and Ex

study alternatives

perience

There remain some lo opholes in our checker The

most serious in practice is that it is p ossible to manu

MVZelkowitz PR McMullin KRMerkel and

facture erroneous p ointers using unions casts and uni

HJLarsen Error checking with p ointer variables

tialised data At considerable p erformance cost we

In Proceedings of the ACM National Confer

could maintain a shadow of the accessible store indi

ence ACM New York USA

cating whether it has b een initialised and whether it is

John Ousterhout Tcl and the Tk Toolkit Addison

a p ointer There is scop e for optimisation and doing so

Wesley

would b e a substantial pro ject

Harish Patil and Charles Fischer Lowcost con

current checking of p ointer and array accesses in

Summary and conclusions

C programs In nd International Workshop on

We have shown how b ounds checking can b e provided

Automated and Algorithmic Debugging AADE

in a convenient form with recompilation conned to

BUG St Malo France May

the les where problems are susp ected The execution

time p enalty for co de compiled with b ounds checking

Harish Patil and Charles Fischer Lowcost con

current checking of p ointer and array accesses in C

programs Software Practice and Experience

RWMJones Bounds checking patches for the

GNU C compiler Available via the world

wide web from httpwwwdsedocicacuk

rjboundscheckinghtml and via anonymous

ftp from

ftpdsedocicacukpubmiscbcc

SCKendall Bcc runtime checking for C pro

grams In USENIX Toronto Summer Con

ference Proceedings USENIX Asso ciation El Cer

rito California USA

SKaufer RLop ez and SPratap Sab erC an

interpreterbased programming environment for

the C language In USENIX San Francisco

Summer Conference Proceedings USENIX Asso ci

ation El Cerrito California USA

App endix Examples

This app endix presents a number of small examples which illustrate the techniques p ower and limitations

Basic example illustrating simple b ounds checking

include

void main f

char Afg

char Bfabcdef ghig

char p A

while

putcharp

g

Output from the b oundschecking runtime system

ShowItWorkscBounds error attempt to reference memory overrunning the end of an object

ShowItWorksc Pointer value xeae

ShowItWorksc Object A

ShowItWorksc Address in memory xead xeae

ShowItWorksc Size bytes

ShowItWorksc Element size bytes

ShowItWorksc Numb er of elements

ShowItWorksc Created at ShowItWorksc line

ShowItWorksc Storage class stack

Simple example showing AN is a valid p ointer

A p ointer is allowed to refer to the byte after the object from which it

is derived The array is padded by one byte if necessary so that this

is distinguishable from an illegal op eration

include

main

f

int a p

Initialize array a to

for p a p < a p

p

Now p p oints to a which is a valid address but if we

try to use it well get an error

p OK > sets a to

p Bounds error > tries to set a to

g

Output from the b oundschecking runtime system

OneBeyondArrayBoundscBounds error attempt to reference memory overrunning the end of an object

OneBeyondArrayBoundsc Pointer value xeae

OneBeyondArrayBoundsc Object a

OneBeyondArrayBoundsc Address in memory xeab xeadf

OneBeyondArrayBoundsc Size bytes

OneBeyondArrayBoundsc Element size bytes

OneBeyondArrayBoundsc Numb er of elements

OneBeyondArrayBoundsc Created at OneBeyondArrayBoundsc line

OneBeyondArrayBoundsc Storage class stack

Checking of outofrange automatics

include

char G

void f f

char Afg

G A

g

void main f

f

putcharG

g

In this example the global variable G is used to capture a p ointer into a stackallocated array The p ointer is invalid

after the function f has returned Output from the b oundschecking runtime system

OutOfRangeAutomaticscBounds warning unchecked stack object used at address xbef

Arrays within structures are not checked

struct f

int obj

int obj

g s

main f

int i

for i i < i

sobji i no b ounds error reference is within allo cation object

g

This example illustrates a limitation on b ounds checking as we have dened it The variable s consists of a single

storage ob ject and the b ounds checking do es not verify that its use is consistent with the type declaration To do so

would considerably add to the systems complexity but more imp ortantly would lead to false rep orts in situations

where casts are used quite legitimately

Arrays within arrays are not checked

Abuse of subarrays of a multidimensional array cannot b e checked

int i

double a

main f

for i i < i

ai i No b ounds error reference is within allo cation object

g

As in the previous example the array a consists of a single ob ject and b ounds errors are rep orted only when a

reference outside the whole array is derived

Pointer to unchecked ob ject passed to checked co de

In le uncheckedc

fn void int unchecked

f

static int a

return a

g

In le checkedc

extern int unchecked fn void

int main

f

int a unchecked fn i

for i i < i

ai i No b ounds error

g

When the variable a is used it is found to have no corresp onding ob ject table entry Although a warning can b e

issued here it is not necessarily an error since the p ointer may have b een imp orted from an unchecked mo dule Note

that this problem can b e overcome by adding the ob ject to the ob ject tree by hand using

bounds note constructed object

Correct interoperation with nontrivial system calls

Example to show interworking with system calls etc under SunOS

Allo cate a page region set VM protection to disallow access install a handler to

catch the resulting faults reenable access and continue Lo op runs over end of region

include

include

include

char region

int pagesize

void SEGVHandlersig co de scp addr

int sig co de struct sigcontext scp char addr

f

Reinstate the page in question

char pagebase char intaddr pagesize pagesize

READ j PROT WRITE mprotectpagebase pagesize PROT

Now we should return and restart the faulting instruction

g

void main f

char p

signalSIGSEGV SEGVHandler

pagesize getpagesize

region vallo cpagesize

mprotectregion pagesize PROT NONE

for p region p<regionpagesize ppagesize

p p

g

Output from the b oundschecking runtime system

SignalscBounds error attempt to reference memory overrunning the end of an object

Signalsc Pointer value x

Signalsc Object unnamed

Signalsc Address in memory x xf

Signalsc Size bytes

Signalsc Element size bytes

Signalsc Numb er of elements

Signalsc Storage class heap

This example is intended to demonstrate that b ounds checking can b e used even in quite sophisticated contexts

with subtle interoperation with the op erating system Although the actions of the system calls themselves are not

checked of course they could b e the fault address addr passed to the signal handler is checkable with no sp ecial

arrangement