<<

Software Reliability via Run- Result-Checking

Hal Wasserman

University of California, Berkeley

and

Manuel Blum

City University of Hong Kong and

University of California, Berkeley

We review the eld of result-checking, discussing simple checkers and self-correctors. We

argue that suchcheckers could pro tably b e incorp orated in software as an aid to ecientdebug-

ging and enhanced reliability. We consider how to mo dify traditional checking metho dologies to

make them more appropriate for use in real-time, real-numb er computer systems. In particular,

we suggest that checkers should b e allowed to use stored randomness: i.e., that they should

b e allowed to generate, prepro cess, and store random bits prior to run-time, and then to use

this rep eatedly in a series of run-time checks. In a case study of checking agen-

eral real-numb er linear transformation (for example, a Fourier Transform), we present a simple

checker which uses stored randomness, and a self-corrector which is particularly ecient if stored

randomness is employed.

Categories and Sub ject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging;

F.2.1 [Analysis of Algorithms]: Numerical Algorithms|Computation of transforms; F.3.1

[Logics and Meanings of Programs]: Sp ecifying and Verifying and Reasoning ab out Programs

General Terms: Reliability, Algorithms, Veri cation

Additional Key Words and Phrases: Built-in testing, concurrent error detection, debugging, fault

tolerance, Fourier Transform, result-checking, self-correcting

This work was supp orted in part by a MICRO grant from Hughes Aircraft Company and the

State of California, by NSF Grant ccr92-01092, and by NDSEG Fellowship daah04-93-g-0267.

A preliminary version of this pap er app ears as: \Program result-checking: a theory of testing

meets a test of theory," Proc. 35th IEEE Symp. Foundations of Computer (1994), pp.

382{392.

Name: Hal Wasserman

Aliation: Computer Science Division, University of California at Berkeley

Address: Berkeley, CA 94720, [email protected] erkeley.edu, http.cs.b erkeley.edu/halw

Name: Manuel Blum

Aliation: Computer Science Department, City University of Hong Kong

Address: 83 Tat Chee Ave., Kowlo on Tong, Kowlo on, Hong Kong, [email protected]

Aliation: Computer Science Division, University of California at Berkeley

Address: Berkeley, CA 94720, [email protected] erkeley.edu, http.cs.b erkeley.edu/blum

Permission to make digital or hard copies of part or all of this work for p ersonal or classro om use is

granted without fee provided that copies are not made or distributed for pro t or direct commercial

advantage and that copies show this notice on the rst page or initial screen of a display along

with the full citation. Copyrights for comp onents of this work owned by others than ACM must

b e honored. Abstracting with credit is p ermitted. Tocopy otherwise, to republish, to p ost on

servers, to redistribute to lists, or to use any comp onentofthiswork in other works, requires prior

sp eci c p ermission and/or a fee. Permissions may b e requested from Publications Dept, ACM

Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 869-0481, or [email protected].

2  H. Wasserman and M. Blum

1. RESULT-CHECKING AND ITS APPLICATIONS

1.1 Assuring Software Reliability

Metho dologies for assuring software reliability form an imp ortant part of the tech-

nology of programming. Yet the problem of eciently identifying software bugs

remains a dicult one, and one to which no p erfect solution is likely to b e found.

Software is generally debugged via testing suites: one runs a program on a

variety of carefully selected inputs, identifying a bug whenever the program fails

to p erform correctly. This approach leaves two imp ortant questions incompletely

answered.

First, how do we know whether or not the program's p erformance is correct?

Generally, some sort of an oracle is used here: our program's output may be

compared to the output of an older, slower program b elieved to b e more reliable,

or the programmers may sub ject the output to a painstaking (and likely sub jective

and incomplete) examination byeye.

Second, given that a test suite feeds a program only selected inputs outofthe

often enormous space of all p ossibilities, how do we assure that every bug in the

co de will b e evidenced? Indeed, particular combinations of circumstances leading to

a failure maywellgountested. Furthermore, if the testing suite fails to accurately

simulate the input distribution which the program will encounter in its lifetime, a

supp osedly debugged program may in fact fail quite frequently.

One alternative to testing is formal veri cation, a metho dology in which math-

ematical claims ab out the b ehavior of a program are stated and proved. Thus it is

p ossible to demonstrate once and for all that the program must behave correctly

on all p ossible inputs. Unfortunately, constructing such pro ofs for even simple

programs has proved unexp ectedly dicult. Moreover, many programmers would

likely nd it unpleasanttohave to formalize their exp ectations of program b ehavior

into mathematical theorems.

Another alternativeis fault tolerance. According to this metho dology, the re-

liability of critical software maybeenhancedbyhaving several groups of program-

mers create separate versions. At run-time, all of the versions are executed, and

their outputs are compared. The gross ineciency of this approach is evident, in

terms of programming manp ower as well as either increased run-time or additional

parallel-hardware requirements. Moreover, the metho d fails if common misconcep-

tions among the several development groups result in corresp onding errors [Butler

and Finelli 1993].

We nd more convincing inspiration in the eld of communications, where error-

detecting and error-correcting co des allow for the identi cation of arbitrary

run-time transmission errors. Such co des are pragmatic and are backed by mathe-

matical guarantees of reliability.

We wish to provide such run-time error-identi cation in a more general compu-

tational context. This motivates us to review the eld of result-checking.

1.2 Simple Checkers

It is a matter of theoretical curiositythat, for certain computations, the time re-

quired to carry out the computation is asymptotically greater than the time re-

quired, given a tentative answer, to determine whether or not the answer is correct.

Software Reliability via Run-Time Result-Checking  3

As an easy example, consider the following task: given as input a comp osite integer

c, output any non-trivial factor d of c. Carrying out this computation is currently

b elieved to b e dicult, and yet, given I/O pair hc; di,ittakes just one division to

determine whether or not d is a correct output on input c.

These ideas have b een formalized into the concept of a simple checker [Blum

1988]. Let f b e a function with smallest p ossible computation time T (n), where n

is input length (or, if a strong lower b ound cannot b e determined, we informally

set T (n) equal to the smallest known computation time for f ). Then a simple

checker for f is an algorithm (generally randomized) with I/O sp eci cations as

follows:

|Input: I/O pair hx; y i.

|Correct output: If y = f (x), accept; otherwise, reject.

|Reliability: For all hx; y i: on input hx; y i the checker must return correct output

with probability(over internal randomization)  p , for p a constant close to 1.

c c

|\Little-o rule": The checker is limited to time o(T (n)).

The \little-o rule" is imp ortant in that it requires a simple checker to b e ecient,

and also in that it forces the checker to determine whether or not y = f (x)bymeans

other than simply recomputing f (x); hence the checker must be in some manner

di erent from the program which it checks. We will see in Section 1.5 that this

gives rise to certain hop es that checkers are indeed \indep endent" of the programs

1

they check,andsomay more reliably identify program errors.

For an example of simple checking, consider a sorting task: input ~x =(x ;:::;x )

1 n

is an arrayofintegers; output ~y =(y ;:::;y ) should b e ~x sorted in increasing or-

1 n

der. Completing this task|at least via a general comparison-based sort|is known

to require time (n log n). Thus, if we limit our checker to time O (n), this will

suce to satisfy the little-o rule.

Given h~x; ~y i, to check correctness we must rst verify that the elements of ~y

are in increasing order. This may easily b e done in time O (n). But wemust also

check that ~y is a p ermutation of ~x . It might be convenient here to mo dify our

sorter's I/O sp eci cations, requiring that each elementof~y have attached a p ointer

to its original p osition in ~x. Any natural sorting program could easily b e altered

to maintain such p ointers, and once they are available we can readily complete our

checkintime O (n).

But what if ~y cannot be augmented with such pointers? Similarly, what if ~x

and ~y are only available on-line from sequential storage, so that O (1)-time p ointer-

dereferencing is not p ossible? Then wemay still employ a randomized metho d due

to [Wegman and Carter 1981; Blum 1988], which requires only one pass through

eachof~x and ~y .

1

Variants of the little-o rule have also b een considered: for example, requiring checkers to use

less space than the programs they check, or to b e implementable with smaller-depth circuits. For

example, a rather trivial result is that, if the successive con gurations of an arbitrary Turing

Machine computation are available, then the computation may be checked by a very shallow

circuit. Unfortunately, it is only the mechanical op eration of the Turing Machine whichisbeing

checked here, not the correctness of the Turing Machine's program. Hence we return to the

traditional little-o rule, which has generally proved to b e the b est route to creative, non-trivial

checkers.

4  H. Wasserman and M. Blum

We randomly select a deterministic hash-function h from a suitably de ned set

2

of p ossibilities, and we compare h(x )+ + h(x )withh(y )+ + h(y ). If ~y

1 n 1 n

is indeed a p ermutation of ~x, the twovalues must b e equal; conversely, it is readily

proven that, if ~y is not a p ermutation of ~x,thetwovalues will di er with probability

1

 . Thus, if we accept only h~x; ~y i pairs which pass t such trials, then our checker

2



t

1

has probabilityoferror ,whichmay b e made arbitrarily small.

2

A nal note: our de nition of simple checking is not new, but has p erhaps not

been clearly expressed in the checking literature. In particular, simple checking

has b een de ned only implicitly, as the most ecient case of complex checking

(Section 1.3). Here our particular interest in very fast checking has led us to bring

out this distinction.

1.3 Self-Correctors

Again let f be a function with smallest (known) computation time T (n). Let D

be a well-de ned probability distribution on inputs to f , and let P b e a program

such that, if x is chosen D -randomly, P(x) equals f (x) with probability of error

limited to a small value p. In the current pap er, D will always be the uniform-

random distribution; so we are requiring simply that P computes f correctly on

most inputs.

That P indeed has this prop ertymay b e determined (with high probability) by

1

D -random inputs and using some sort of an oracle|or a simple testing P on 

p

checker|to determine the correctness of each output. We envision this testing

stage as b eing completed prior to run-time. Another p ossibilityistheuseofaself-

tester [Blum et al. 1993], which can givesuch assurances eciently,atrun-time,

and with little reliance on any outside oracle.

A self-corrector for f [Blum et al. 1993; Lipton 1991] is then an algorithm

(generally randomized) with I/O sp eci cations as follows:

|Input: x (an input to f ), along with P , a program known to compute f with

probability of error on D -random inputs limited to a small value p. The corrector

is allowed to call P rep eatedly, using it as a subroutine.

|Correct output: f (x).

|Reliability: For all hx; Pi: on input hx; P i the corrector must return correct

output with probability (over internal randomization)  p , for p a constant

c c

close to 1.

|Time-b ound: The corrector's time-b ound, including subroutine calls to P,must

b e limited to a constantmultiple of P's time-b ound; the corrector's time-b ound,

counting each subroutine call to P as just one step, must b e o(T (n)).

As an example [Blum et al. 1993], consider a task of multiplication over a nite

eld: input is w; x 2 F ; output is pro duct wx 2 F . We assume addition over F to

be quicker and more reliable than multiplication.

Assume weknow from testing that program P computes multiplication correctly

1

for all but a  fraction of p ossible hw; xi inputs. (Note that this knowledge is

100

2

A rather unusual set of hash functions is required; moreover, the time required to calculate h,

and hence to complete a check, is debatable. Refer to [Blum 1988] for details.

Software Reliability via Run-Time Result-Checking  5

1

in itself only a very weak assurance of the reliabilityof P,asthat fraction

100

1

of \dicult inputs" mightwell app ear far more than of the time in the life of

100

the program.) Then a self-corrector may b e sp eci ed as follows:

Algorithm 1. On input hw; xi,

|Generate uniform-random r ;r 2 F .

1 2

|Cal l P four to calculate P(w r ;x r ), P(w r ;r ), P(r ;x r ),

1 2 1 2 1 2

and P( r ;r ).

1 2

|Return, as our corrected value for wx,

y := P(w r ;x r ) + P(w r ;r ) + P(r ;x r ) + P(r ;r ):

c 1 2 1 2 1 2 1 2

Whydoesthiswork? Note that each of the four calls to P is on a pair of inputs

2

uniform-randomly distributed over F , and so will return the correct answer with

1

probability of error at most . Thus, all four return values are likely to b e correct:

100

4

the probabilityoferrorisatmost . And if all the return values are correct, then

100

y = P(w r ;x r ) + P(w r ;r ) + P(r ;x r ) + P(r ;r )

c 1 2 1 2 1 2 1 2

= (w r )(x r )+(w r )r + r (x r )+r r

1 2 1 2 1 2 1 2

= [(w r )+r ][(x r )+r ]

1 1 2 2

= wx:

Note that corrected output y is sup erior to unmo di ed output P(w; x) in that

c

there are no \dicult inputs": for any hw; xi, eachtime we compute a corrected

4

. Moreover value y for wx, the chance of error (over the choice of r ;r )is

c 1 2

100

(1) (t)

observe that, if we compute several corrected outputs y ;:::;y (using new ran-

c c

(i)

dom values of r ;r for each y )andpick the ma jority answer as our nal output,

c

1 2

we therebymake the chance of error arbitrarily small. The price wepayforthis

enhanced reliabilityisa multiplication of P's usual run-time by a factor of 4t. This

p erformance loss will b e further considered b elow.

A nal note on self-correcting. Ab ovewe sp eak of needing an assurance that P is

correct on all but a small fraction of inputs. Such language implies an assumption

that P's behavior is xed (except for p ossible randomization) prior to the self-

correcting pro cess: i.e., that P is not an adversary which can adapt in resp onse to

the corrector. This assumption is necessary, as an adaptable adversary could easily

3

fo ol a corrector such as the ab ove. Achecker, in contrast, is required to b e secure

against even an adaptable adversary.

One may also de ne a complex checker [Blum 1988], which outputs an ac-

cept/reject correctness-check similar to that of a simple checker, but which, like

a self-corrector, is allowed to p oll P at several lo cations (and so, like a self-corrector,

tends to b e less ecient than a simple checker). In general, a complex checker seeks

to determine the correctness of P(x) by comparing P(x) to several of P's other

outputs. More formally,acomplex checker for f is de ned as follows:

3

[Gemmell et al. 1991] however demonstrates that self-testing/correcting may b e p ossible even

when a program has limited adversarial p ower.

6  H. Wasserman and M. Blum

|Input: x (an input to f ), along with P, a program which may compute f on

all, some, or no inputs. The checker is allowed to call P rep eatedly, using it as a

subroutine.

|Correct output: If P is correct b oth on x and on all the other outputs sampled

by the checker, must accept. If P(x) is incorrect, must reject.

|Reliability: For all hx; P i: on input hx; P i the checker must return correct

output with probability (over internal randomization)  p , for p a constant

c c

close to 1.

|Time-b ound: The checker's time-b ound, including subroutine calls to P ,must

b e limited to a constantmultiple of P 's time-b ound; the checker's time-b ound,

counting each subroutine call to P as just one step, must b e o(T (n)) (where T (n)

is again the smallest p ossible time to compute f ).

Observe a necessary weakness of this de nition: in the case that P(x) is correct

but at least one of P's other outputs sampled by the checker is incorrect, the

checker may either accept or reject. This weakness is necessary (if we are to

allow algorithms which go beyond simple checking) b ecause, in the case that P

returns only junk answers, we cannot exp ect the checker to gure out whether or

not P( x) is correct. In spite of this weakness, the complex checker tells us what we

need to know: if the checker accepts,weknowthatP(x) is correct; if the checker

rejects,weknowthat P failed on at least one of a small, known set of inputs.

An example complex checker (for graph isomorphism) is describ ed in Section 1.7.

For more information on result-checking, refer to the annotated bibliography.

1.4 Debugging via Checkers

We will argue here that emb edding checkers in software pro ducts may be a sub-

stantial aid in debugging those pro ducts and assuring their reliability. It is our

hop e that result-checking may form the basis for a debugging metho dology more

rigorous than the testing suite and more pragmatic than veri cation.

Throughout the pro cess of software testing, emb edded checkers may b e used to

identify incorrect output. They will thereby serve as an ecient alternative to

a conventional testing oracle. Furthermore, even after software is put into use,

leaving the checkers in the co de will allow for the e ective detection of lingering

bugs. We envision that, each time a checker nds a bug, an informative output

will be written to a le; this le will then be p erio dically reviewed by software-

maintenance engineers. Another p ossibilityis an immediate warning message, so

that the user of a critical system will not b e dup ed into accepting erroneous output.

While it could b e argued that buggy output would b e noticed by the user in any

case, an automatic system for identifying errors should in fact reduce the probability

that bugs will b e ignored or forgotten. Moreover, a checker can catch an error in a

program comp onent whose e ects may not b e readily apparent to the user. Thus

achecker mightwell identify a bug in a critical system b efore it go es on to cause

a catastrophic failure. Moreover, checkers may trigger automatic self-correction:

potentially a metho d for the creation of extremely reliable systems.

By writing a checker, one group of programmers may assure themselves that an-

other group's co de is not undermining theirs by passing along erroneous output.

Similarly, the software engineer who creates the sp eci cations for a program comp o-

Software Reliability via Run-Time Result-Checking  7

nent can write a checker to verify that the programmers have truly understo o d and

provided the functionality he required. Since checkers dep end only on the I/O sp ec-

i cations of a computational task, one maycheckeven a program comp onent whose

internals are not available for examination, such as a library of utilities running on a

remote machine. Thus checkers facilitate the discovery of those misunderstandings

at the \seams" of programs which so often underlie a software failure. Evidently,

all of this applies well to ob ject-oriented paradigms.

Unlikeveri cation, checking reveals incorrect output originating in any cause|

whether due to software bugs, hardware bugs, or transient run-time errors. More-

over, since a checker concerns itself only with the I/O sp eci cations of a computa-

tional task, the intro duction of a p otentially unreliable new algorithm for solving

an old task may not require substantial change to the asso ciated checker. And so,

as a program is mo di ed throughout its lifetime|often without an adequate rep eat

of the testing pro cess|its checkers will continue to guard against errors.

1.5 Checker-Based Debugging Concerns

 Performance loss: Simple checkers are inherently ecient: due to the little-o

rule,asimplechecker should not signi cantly slow the program P that it checks.

Self-correctors, on the other hand, require multiple calls to P, and so multiply

execution-time by a small constant (or, alternatively, require additional parallel

hardware). In a real-time computer system, such a slowdown may not b e acceptable.

Hence it is desirable to checka given program by implementing both a simple

checker and a self-corrector. The simple checker may then b e run on each output,

while only in the (hop efully rare) case that the output is incorrect need the self-

corrector b e run as well; hence the self-corrector is kept o the common-case path.

Alternatively, in Section 1.6 we will consider a change to our de nitions which will

allow for certain self-correcting algorithms to run at real-time sp eeds.

 Real-numb er issues (see [Gemmell et al. 1991; Ar et al. 1993]): Traditional

result-checking algorithms, such as the example self-corrector from Section 1.3,

often rely on the orderly prop erties of nite elds. In many programming tasks, we

are more likely to encounter real numb ers|i.e., approximate reals represented by

a xed number of bits.

Suchnumb ers will b e limited to a legal sub domain of <. In Section 2.4, we will

see how to mo dify a traditional self-correcting metho dology to account for this.

Moreover, limited precision will likely result in round-o errors within P. Wemust

then determine howmuch arithmetic error may allowably o ccur within P , and must

check correctness only up to the limit of this error-delta. In Section 2, we will see

that limited precision calculations, while a substantial challenge, are bynomeans

fatal to our checking pro ject.

 Testing checkers: Achecker, likeanysoftware comp onent, must b e carefully

debugged. Otherwise, a degenerate checker C may indiscriminately rep ort program

P to b e correct, thereby giving a false sense of assurance.

The problem of constructing a suitable testing suite for a checker is non-trivial.

Evidently testing a checker on correct program output is insucient: wemust also

test the checker's ability to recognize bugs|and not just agrant, random bugs,

but various and subtle ones. One p ossibilitywould b e to employamutation mo del

of software error: i.e., to test C on the output from mutated variants of P .

8  H. Wasserman and M. Blum

 Buggy checkers: What if wenevertheless fail to debug our checker? Indeed,

it may b e ob jected that result-checking b egs the question of software reliability: for

checking may fail due to the checker's co de b eing as buggy as the program's. To

this ob jection we resp ond with three arguments: arguments which are, however,

heuristic rather than rigorous.

First, checkers can often b e simpler than the programs they check, and so, pre-

sumably, less likely to have bugs. For example, mo dern computers may use intricate,

arithmetically unstable algorithms to compute matrix multiplication in time, say,

2:38 3 2

O (n ), rather than the standard O (n ). And yet Freivalds's O (n )-time checker

for matrix multiplication [Freivalds 1979] uses only the simplest of multiplication

algorithms, and so is seemingly likely to b e bug-free.

Second, observe that one of the following four conditions must hold eachtimea

(p ossibly buggy) simple checker C checks the output of a program P:

(I). P(x) is correct; C correctly accepts it.

(I I). P(x) is incorrect; C correctly rejects it.

(I I I). \False alarm": P(x) is correct; C incorrectly rejects it.

(IV). \Missed bug": P(x) is incorrect; C incorrectly accepts it.

Only a \missed bug" is a truly bad outcome: for a \false alarm," while annoying,

at least draws our attention to a bug in C.

To reduce the likeliho o d of missed bugs, it suces to rule out a strong correlation

between x values at which P fails and x values at which C(x; P(x)) fails. It is our

heuristic contention that such a correlation is unlikely. Indeed, recall that one e ect

of the \little-o rule" is that C do esn't have sucient time to merely reduplicate the

computation p erformed by P. Thus we claim (heuristically) that C must b e doing

something essential ly di erent from what P do es, and so, if buggy,may reasonably

be exp ected to make di erent errors. But then we would exp ect few correlated

errors; moreover, we would exp ect more uncorrelated than correlated errors, so

that a given bug could well b e identi ed and xed b efore it go es on to generate any

4

correlated errors.

Finally, say that correlated errors nonetheless o ccur. This might in particular

be exp ected due to faults in hardware or software comp onents on which both P

and C dep end. We argue that, even in this case, a \missed bug" may not result.

Indeed, consider our checker for a sorting program (Section 1.2), which determines

whether or not ~y is a permutation of ~x by comparing h(x )+ + h(x ) with

1 n

h(y )++ h(y ): assuming that the sorting program has failed, h(x )++ h(x )

1 n 1 n

1

and h(y )+ + h(y ) will not match (with probability  ). If the checker

1 n

2

4

It could be argued that the little-o rule only prohibits C from duplicating the most time-

consuming of P's comp onents; P 's simpler comp onents may b e trivially duplicated in C,and

so, if buggy, might pro duce correlated errors. To this argumentwe resp ond that C's primary

function is to check the central algorithm of P; smaller comp onents needed as subroutines by

both P and C should, if necessary,bechecked separately.

As an illustration of a related issue, consider a standard max-flow program, which pro ceeds

by successively adjusting ows until it nds a con guration whichachieves optimal owacrossa

particular cut. Achecker for this program need do little more than verify that optimal owis

indeed achieved across the cut|which is what the program itself do es in its nal stage. Hence

checking this sort of program seems trivial and ine ective. Our resp onse is simply that sucha

program indeed requires no separate checker, b ecause it is naturally self-checking.

Software Reliability via Run-Time Result-Checking  9

simultaneously fails, it may miscalculate h(x )+ + h(x ) and/or h(y )+ +

1 n 1

h(y ); but, we argue, it seemingly is not likely to fail in exactly such a manner as

n

to get values for the twosumswhich happ en to match each other.

Manycheckers pro ceed in this way, calculating twoquantities and declaring an

error if they do not match. Degenerate checker behavior seems unlikely to yield

matching quantities; and so we argue that checkers, even when buggy themselves,

have a natural resistance to the bad case of missing a program bug.

1.6 Stored Randomness

We here intro duce an extension to traditional checker de nitions: we suggest that

randomized checkers, rather than having to generate fresh random bits for each

check, should b e allowed to use prepro cessed stored randomness. That is, prior

to run-time|while the program is idle, or during b o ot-up, or even during software

development|we generate random bits and do some (p erhaps lengthy) prepro cess-

ing; these random bits, together with prepro cessed values dep endent on them, form

one or more random packages. These packages are stored; then, at run-time,

5

eachcheck requires such a package as an additional input.

Stored randomness has several potential advantages. First, it allows for much

or all of a checker's randomness to be generated b efore run-time. This could be

useful particularly in doing very quick checks of low-level functionalities such as

micropro cessor arithmetic (as in [Blum and Wasserman 1996]): in this context,

having to generate random bits at run-time could b e an inconvenience. Second, we

shall see that stored randomness allows for the use of checking algorithms which

would otherwise require to o much computation at run-time: bypaying a time-cost

of (T (N )) in prepro cessing, we\cheat" some work out of the way, so that each

run-time check can then b e completed in time o(T (N )).

For a trivial example of stored randomness, we can reconsider the multiplica-

tion self-corrector of Section 1.3. If in a prepro cessing stage r ;r are generated,

1 2

P(r ;r ) is calculated, and package R = hr ;r ; P(r ;r )i is stored, this leaves

1 2 1 2 1 2

only three calls to P at run-time, rather than four. However, this improvementis

not particularly impressive. Stored randomness will b e further explicated through

more substantial examples: in Sections 2.1 and 2.5, we will present a simple checker

which is p ossible only if one allows stored randomness, and a self-corrector which,

if one allows stored randomness, can run at real-time sp eeds.

Howmuch randomness must be stored? Let N be our xed input length, and

let ` b e the number of checks we exp ect to carry out at run-time. One approach

would be to prepro cess ` packages. Then each checkwould haveits own \fresh"

randomness, which would certainly be good from the point of view of reliability.

However, wemight exp ect ` to b e  N or even unknown during prepro cessing; so

this trivial approach could have to o high a cost in prepro cessing time and storage

space.

An opp osite approachwould b e to generate a single random package and use it

for every check. But would the rep eated use of a \stale" random package reduce

5

This metho d requires input length N to b e xed prior to run-time; equivalently, new prepro cess-

ing stages will b e required as input length increases. This use of stored information is analogous to that in the P/p oly mo del of computation.

10  H. Wasserman and M. Blum

the reliabilityofourchecker? Sections 2.1 and 2.5 initially cho ose this approach;

and Lemmas 4 and 10 argue that the consequent reduction in reliabilityisnotfatal.

Moreover, in Sections 2.1 and 2.5 wegoontosuggestanintermediate approach,

which requires a number of stored packages that is O (N ) and indep endent of `.

Lemmas 5 and 11 argue that the resulting stored-random checkers are then highly

reliable.

1.7 Variations on the Checking Paradigm

It may still b e ob jected that most programming tasks are less amenable to checking

than are the clean mathematical problems with which theoreticians usually deal.

Nevertheless, it is our exp erience that manysuch tasks may b e sub jected to inter-

esting checks. The following is alist of checking ideas which maysuggestto the

reader how de nitions can b e generalized to makechecking more suitable as a to ol

for software debugging.

 Partial checks: One may nd it sucient to check only certain asp ects of

a computational output. Programmers may fo cus on prop erties they think partic-

ularly likely to reveal bugs|or particularly critical to the success of the system.

Certain checkers might only b e capable of identifying outputs incorrect byavery

large error-delta. Even just checking inputs and outputs separately to verify that

they fall in a legal domain mayprove useful.

 Timing checks: One might wish to monitor the execution-time of a software

comp onent; an unexp ectedly large (or small) time could then reveal a bug. For

instance, a programmer's exp ectation that a particular subroutine should take time

2

 10n on input of variable length n could b e checked against actual p erformance.

 Checking via interactive pro ofs: There exist interactive pro ofs of cor-

rectness for certain mathematical computations; and manysuchinteractive pro ofs

may equivalently be regarded as complex checkers. For example, it follows from

the IP metho d of [Shamir 1992] that, if program P claims to solve a PSPACE-

complete problem, then there is a checker for P which requires p olynomial time

plus a p olynomial number of calls to P.

Similarly, the following interactive-pro of metho d, due to [Goldreich et al. 1991],

may equivalently b e regarded as a complex checker. Say that P claims to decide



graph isomorphism. If P says that A B , we can check this by running P on

=

successively reduced versions of A and B , thereby forcing P to reveal a particular



isomorphism. Less trivially,if P claims that A 6 B , we rep eat several times the

=

following check: generate C , a random isomorphism either of A (with probability

1 1

) or of B (with probability ); then ask P to tell us which one of A or B is

2 2



B , P can only guess whichofthetwowe p ermuted. isomorphic to C . If A

=

For more on interactive pro of, refer to the bibliography.

 Changing I/O sp eci cations: In Section 1.2, wesawhow an easy augmen-

tation to the output of a sorting program could make the program easier to check.

Similarly, consider the problem of checking a (log n)-time binary-search task. As

traditionally stated (see [Bentley 1986, p. 35], where it features in a relevantdis-

cussion of the diculty of writing correct programs), binary search has as input a

key k andanumerical array a[1];:::;a[n], where a[1]    a[n], and as output i

suchthat a[i]=k , or0if k is not in the array.

Note that the problem as stated is uncheckable: for, if the output is 0, it will

Software Reliability via Run-Time Result-Checking  11

take (log n) steps to con rm that k is indeed not in the array. But saythatwe

mo dify the output sp eci cation for our binary-search task to read: output i such

that a[i]=k , or, if k is not in the array, hj; j +1i suchthat a[j ]

Any natural binary-search program can easily b e mo di ed to givesuch output, and

completing an O (1)-time check is then straightforward.

 Weak or o ccasional checks [Blum, A. 1994]: Certain pseudo-checkers have

only a small probability of noticing a bug. For example, if a search program, given

key k and array a[1];:::;a[n], claims that k do es not o ccur in a, we could check

this by selecting an elementofa at random and verifying that it is not equal to k .

1

This weak checker has probability of identifying an incorrect claim.

n

Alternatively, to save time a lengthy check might be employed only on o cca-

sional I/O pairs. Given that many bugs cause frequent errors over the lifetime of a

program, suchweak or o ccasional checks maywell b e useful.

 Batchchecks [Rubinfeld 1992]: For certain computational tasks, if one stores

up a numb er of I/O pairs, one maycheck them all at once more quickly than one

could have checked them separately. While it may be to o late to correct output

after completing such a retroactivecheck, this metho d can when applicable provide

avery time-ecient identi cation of bugs.

 Inecient auto-correcting: When a checker identi es a program error, it

could correct by, for example, loading and running an older, slower, but hop efully

more reliable version of the program. This should be necessary only on rare o c-

casions, so the overall loss of sp eed should not be unreasonable. For example, a

3

trivial O (n )-time matrix-multiplication program can b e kept in reserve in case a

2:38

more sophisticated O (n )-time program fails.

2. CASE STUDY: CHECKING REAL-NUMBER LINEAR TRANSFORMATIONS

Wewillnow consider the problem of checking and correcting a general linear trans-

formation, de ned as follows: input to program P is n-length vector ~x of real

numb ers; correct output is n-length vector ~y = ~xA, where A is a xed n  n matrix

of real co ecients. Extensions of our results to more general cases|e.g., ~y of di er-

ent length than ~x, or comp onents complex rather than real|are straightforward.

However, we do require n and A to b e xed prior to run-time.

2

Evidently,any such transformation may b e computed in O (n ) arithmetic op era-

tions. We assume that, for non-trivial transformations A, the smallest p ossible time

2

to compute the transformation may range from (n)to(n ). Our simple checker

will taketime O (n) (as will our self-corrector, not counting one or more calls to P),

6

so the little-o rule is satis ed for all but the (n)-time transformations. For exam-

ple, the Fourier Transform is linear and may b e computed in time (n log n) via the

Fast Fourier Transform; if one makes the reasonable assumption that an O (n)-time

Fourier Transform is not p ossible, then our algorithms qualify as checkers for this

transform.

Assume now that program P is intended to carry out the ~x 7! ~xA transformation

6

In the more general case that ~y has length m,any non-trivial linear transformation will have

smallest p ossible computation-time on range (maxfm; ng)to(mn). It is easily veri ed that

our simple checker and our self-corrector eachtake time O (maxfm; ng); and so, once again, the little-o rule is satis ed for all but the fastest of transformations.

12  H. Wasserman and M. Blum

on a computer with a limited-accuracy, xed-p oint representation of real numbers.

P's I/O sp eci cations mightthenbeformulated as follows:

|Input: ~x =(x ;:::;x ), where each x is a xed-p oint real number and so is

1 n i

limited to a legal sub domain of <: say, to domain [1; 1].

|Output: P(~x ) = ~y =(y ;:::;y ), where each y is a xed-p ointreal.

1 n i

+

~

|Let  2< b e a constant; let =( ;:::; ) b e the \error vector" ~y ~xA.

1 n

Then:

|I/O pair h~x; ~y i is de nitely correct i , for all i, j j .

i

p

|I/O pair h~x; ~y i is de nitely incorrect i there exists i such that j j6 n.

i

Note that, due to arithmetic round-o errors within P , a small error-delta is

to be regarded as acceptable: sp eci cally, we mo del P's exp ected error as a at

error of at most  in each comp onentof~y . Also note that wewillonlyrequire

the simple checker of Section 2.1 to accept I/O which is de nitely correct and to

reject I/O whichis de nitely incorrect. It is seemingly unavoidable that there is an

intermediate region of I/O pairs h~x; ~y i which could allowably b e either accepted or

rejected. However, this intermediate region currently app ears to b e altogether to o

p

large (i.e., condition \exists i suchthat j j6 n"istoostrong). It seems that

i

we should b e able to do b etter; and, in Section 2.2, we shall suggest an improvement.

2.1 A Simple Checker

Motivation: To develop a simple checker for P , we will take the approach of

generating a randomized vector ~r and trying to determine whether or not ~y  ~xA

by calculating whether or not ~y  ~r  (~xA)  ~r . This metho d is a variant of that in

[Ar et al. 1993, Section 4.1.1].

To facilitate the calculation of (~xA)  ~r , we will employ the metho d of stored

randomness (Section 1.6): by generating ~r and prepro cessing it together with A

prior to run-time, we will then b e able to complete the calculation of (~xA)  ~r with

just O (n) run-time arithmetic op erations. Thus wewillachieve the strong result

2

of checking a (n )-time computation in time O (n).

Algorithm 2. Our checker's preprocessing stage, to becompleted prior to run-

time, is speci ed as fol lows:

|For k := 1 to 10:

(k )

|Generate and store ~r , an n-length vector of form (1; 1;:::;1), where

each  is chosen positive or negative with independent 50/50 probability.

(k ) (k ) T T 7

|Calculate and store ~v := ~r A (where A denotes the transpose of A).

Then, to check an I/O pair h~x; ~y i at run-time, we employ this O (n)-time check:

|For k := 1 to 10:

(k ) (k ) (k )

|Calculate D := ~y  ~r ~x  ~v .

p

(k )

|If jD j6 n, reject.

|If al l 10 tests arepassed, accept.

7

Since this calculation need b e done only a few times, and when time is not at a premium, its

correctness can b e assured, e.g., by employing simple brute-force metho ds, or bychecking the

result by hand.

Software Reliability via Run-Time Result-Checking  13

Why do es this work? First note that

(k ) (k ) (k )

D = ~y  ~r ~x  ~v

(k ) (k ) T

= ~y  ~r ~x  (~r A )

(k ) (k )

= ~y  ~r (~xA)  ~r

(k )

= (~y ~xA)  ~r

(k )

~

=   ~r

=      ;

1 n

8

where each  is p ositive or negative with indep endent 50/50 probability.

Lemma 1. For any de nitely correct h~x; ~y i,

h i

p

1

(k )

Pr jD j6 n  :

(k )

~r

10; 000; 000

Proof. Each j j . Thus, by a Cherno Bound [Alon et al. 1992, Theorem

i

A.16],

p

 

p

2

(6 n) =2n

n  2e Pr j     j6

1 n

1

< :

10; 000; 000

Lemma 2. For any de nitely incorrect h~x; ~y i,

h i

p

1

(k )

Pr : jD j < 6 n 

(k )

~r

2

p

Proof. Weknow there exists i suchthat j j6 n. Fix all comp onents of

i

(k )

(k )

~r except for r . Then observe that, for at least one of the two p ossible values

i

p

(k ) (k ) (k )

(k )

j must be  j j  6 +  + r , jD j = j r f1g of r n. Thus

n

i n 1

1

i

p

1

(k ) (k )

of the equally likely choices of random vector ~r . jD j < 6 n for at most

2

Lemma 3. (I). For any de nitely correct h~x; ~y i,

1

Pr : [checker mistakenly rejects ] 

(1) (10)

~r ;:::;~r

1; 000; 000

(I I). For any de nitely incorrect h~x; ~y i,

1

10

: Pr [checker mistakenly accepts ]  2 <

(1) (10)

~r ;:::;~r

1; 000

(k )

Proof. (I). By Lemma 1, the probabilityofa mistaken reject for each ~r is

1

. Thus, the probability of a mistaken reject on any of the 10 tries is 

10;000;000

1

 10  .

10;000;000

1

(k )

(I I). By Lemma 2, the probability of failing to reject at each ~r is  . Thus,

2

10

the probability of rejecting on none of the 10 indep endent tries is  2 .

8

Wehave employed a linear-algebra identity: the dot-pro duct of twovectors is equivalently their



(k ) T

matrix-pro duct once the second vector (i.e., row-matrix) has b een transp osed; thus ~x  ~r A =

 

T T

(k ) T (k ) (k )

~x ~r A = ~xA ~r =(~xA)  ~r .

14  H. Wasserman and M. Blum

While Lemma 3 embodies our intuition of the checker's correctness, it do es

not deal with the implications of using the same stored \random package" R =

(1) (10) (1) (10)

h~r ;:::;~r ;~v ;:::;~v i in each run-time check. The following claims sug-

gest that this use of storedrandomness do es not seriously undermine the reliability

of the checker:

Lemma 4. Assume that we preprocess random package R and use this package

to check ` runs of P. Then, for probabilities over the choiceofR (whereallresults

except (II) are independent of the value of `):

(I). Assume that a bug causes at least one de nitely incorrect I/O pair. Then

1

. the probability that our checker wil l miss this bug is 

1;000

(I I). If `  10; 000, then the probability that any de nitely correct I/O pair wil l

1

be mistakenly rejectedis .

100

1

(I I I). The probability that  of al l de nitely incorrect I/O pairs wil l be mis-

2

1

. takenly acceptedis

500

1

(IV). The probability that  of al l de nitely correct I/O pairs wil l be

10;000

1

. mistakenly rejectedis

100

Proof. (I) and (I I) follow readily from Lemma 3. Toprove (I I I), note a conse-

quence of Lemma 3 (I I): if we select R randomly and h~x; ~y i uniform-randomly from

the list of all de nitely incorrect I/O pairs out of our ` runs, then the probabil-

1

ity that our checker accepts given input h~x; ~y i and randomization R is  .

1;000

But if (III) were false, then this same probability would, on the contrary, be

1 1 1

>  = . (IV) is proved analogously.

500 2 1;000

While Lemma 4 gives a reasonably strong assurance, it is not entirely satisfactory.

Indeed, when stored randomness is used as describ ed ab ove, there is no run-time

randomization, so that if, for example, the checker fails on an input h~x; ~y i which

o ccurs rep eatedly in the ` runs, then it will fail on each rep etition. This has several

potentially bad consequences. First, if P is regarded as an adversary, it can defeat

the checker by learning its \bad inputs." Second, observe that if fresh randomness

could b e used for eachcheck, then eachcheckwould have independent probabilityof

1

error  ; hence the probabilityof many checker errors during ` runs of P would

1;000

be exp onentially small. With stored randomness, the probability of many errors

is small (p er Lemma 4 (III,IV)) but not exponential ly small. Note that frequent

errors could result in system failure, if checker errors trigger lengthy self-correcting

pro cedures and if we are dealing with a soft real-time system in which average

run-time must b e kept low.

Hence we now intro duce a more sophisticated approachto stored randomness,

one which allows stored-random checkers to b e as reliable as checkers using fresh

random bits.

(1) (10) (1) (10)

Recall that R = h~r ;:::;~r ;~v ;:::;~v i is the prepro cessed \random

package" needed by our checker, and that h~x; ~y i is an I/O pair to be checked.

Also recall that we are assuming input length to b e xed b efore run-time: sp ecif-

ically,letusassume that the value of h~x; ~y i can b e completely represented by N

N

bits, so that there are at most 2 p ossible values for h~x; ~y i.

Algorithm 3. Atpreprocessing time, we create, not just a single random pack-

  

. Then, at run-time, we pick R 2 ;:::;R age R , but 4N random packages R

u

1

i

4N

Software Reliability via Run-Time Result-Checking  15

  

fR ;:::;R g, and use R to check h~x; ~y i. For a smal ler probability of error,

1

i

4N

  

we can repeat this check with several more R 2 fR ;:::;R g and return the

u

1

i

4N

majority answer.

The following lemma argues that our improved stored-random checker has an

independent probability of error at each run-time check which can b e made arbi-

trarily small. Also observe that P cannot fo ol the checker even if P is an adversary

 

whichknows the values of R ;:::;R .

1

4N

5N  

Lemma 5. With probability of failure  2 , R ;:::;R wil l be such that, for

1

4N

1

 

all h~x; ~y i,atmost of R ;:::;R arebad for checking h~x; ~y i. (Hence, checking

1

4N

4

  

with t values of R 2 fR ;:::;R g and taking the majority answer results in a

u

1

i

4N

run-time error-probability which is exponential ly smal l in t.)

Proof. Analogous to the standard pro of that BPP  P/p oly:

Recall from Lemma 3 that, for anygiven h~x; ~y i, the probability(over the random

choice of a package R ) that the checker fails given input h~x; ~y i and randomization

10

R is  2 .

Hence, for anygiven h~x; ~y i, the probability(over the random choice of packages

10N

R ;:::;R )thatal l of R ;:::;R are bad for checking h~x; ~y i is  2 .

1 N 1 N

 

) that there ex- Hence the probability (over the random choice of R ;:::;R

1

4N

 

suchthatallof ists any h~x; ~y i and any size-N subset R ;:::;R of R ;:::;R

1 N

1

4N



4N

N 10N N 4N 10N

R ;:::;R are bad for checking h~x; ~y i is  2   2  2  2  2 =

1 N

N

5N

2 .

This is in fact a general metho d allowing anysimplechecker to make use of stored

randomness without compromising its reliability. Only O (log N ) bits of run-time

randomness are then needed for eachcheck.

2.2 A Stronger Simple Checker

~

Recall that  = ( ;:::; ) is the \error vector" ~y ~xA. Also recall that,

1 n

~

while the full value of vector  is not calculable in time O (n), we can use stored,

(k ) (k ) (k )

~

prepro cessed ~r and ~v to eciently calculate randomized scalar value ~r  .

The assurance that our simple checker will reject instances for whichany j j

i

p

6 n is not entirely satisfactory: it leaves to o large a gap b etween instances which

are reliably accepted and those which are reliably rejected. A stronger metho d

would b e to approximate

q

2

2

~

 +  + jj =

n

1

and accept or reject by comparing this value to a suitable threshold. This metho d

is particularly appropriate in that engineers often use this ro ot-mean-square

measure to representoverall error in calculations suchasFourier Transforms.

~

What we need then is an ecient metho d for approximating jj. The lemmas

2

~

b elowprove that it is indeed p ossible to eciently approximate jj .

(1) (m)

Lemma 6. Let ~r ;:::;~r be n-length vectors, each component 1 with inde-

pendent 50/50 probability, and consider randomized quantity

 

m

h i

X

2

1

(k )

~

~r   : X =

m

k =1

16  H. Wasserman and M. Blum

2

For any positive  and p: if m  , then, with probability greater than 1 p,

2

p

2 2

~ ~

(1 )jj  X  (1 + )j j :

Proof. Observethat

" #

2

n

h i

X

2

(k )

(k )

~

~r   = r 

i

i

i=1

n n

X X X

(k ) (k )

2

=  + r r  

i j

i

i j

i=1 i=1

j 6=i

n1 n

X X

(k ) (k )

2

~

= jj +2 r r   :

i j

i j

i=1 j =i+1

   

m m n1 n

X X X X

1 2

(k ) (k )

2

~

=) X = jj + r r  

i j

i j

m m

i=1 j =i+1

k =1 k =1

 

2

2

~

R; = jj +

m

n n1 m

X X X



(k ) (k )

2

r r   is the randomized \error term" times R := where

i j

i j

m

j =i+1 i=1

k =1

whose size we wish to b ound.

mn(n1)

Note that R is a sum of pairwise independent quantities. Thus:

2

m n1 n

h i

X X X

(k ) (k )

Var [ R ] = Var r r  

i j

i j

i=1 j =i+1

k =1

m n1 n

X X X

2 2

=  

i j

i=1 j =i+1

k =1

 

m n n

X X X

1

2 2

  

i j

2

i=1 j =1

k =1

2 0 13

!

 

m n n

X X X

1

2 2

4 @ A5

=   

i j

2

i=1 j =1

k =1

 

m

i h

X

1

2 2

~ ~

jj jj =

2

k =1

4

~

mjj

= :

2

 

4 4

~ ~

2

mjj 2jj

2 2

Thus the variance of \error term" R is   = . Since weare

m m 2 m

2

2 4

~

,thisis p jj . assuming m 

2

p



2

2

~

>jj mustbelessthanp. This follows R But then the probabilitythat

m

by Chebyshev's Inequality: i.e., if the contrary were true, then we would have

  

2

2 4

~

R >p j j ,whichcontradicts what wehaveproved ab ove. Var m

Software Reliability via Run-Time Result-Checking  17

(k ) (k )

Lemma 7. Recal l once again that, using preprocessed h~r ;~v i, we can cal-

(k )

~

culate ~r   in time O (n). Then: for any positive  and p, we can, in time

1 1

2

O (n( ) log ( )),calculate a value Y such that, with probability greater than 1 p,

 p

2 2

~ ~

(1 )j j  Y  (1 + )jj :

8 1

)e and m := d e. We rep eat 2t times the calcu- Proof. Let t := d6ln(

2

p 

lation of a randomized X -value as describ ed in Lemma 6, using prepro cessed

(1) (2tm) (1) (2tm)

h~r ;:::;~r ;~v ;:::;~v i. We then return the median of X -values X ;:::;

1

X as our value for Y .

2t

2

~

Why do es this work? By Lemma 6, each X approximates jj to within the

i

1

desired range with indep endent probabilityof error less than . And the median

4

of the X -values can b e out of range only if at least t of the 2tX-values are out of

t=6

range. By a Cherno Bound, the probabilityofthisis

2.3 Simple Checker|Variants

 A fuller statement of our algorithm would include consideration of arithmetic

round-o error in the checker itself as wellasin P. However, under the reasonable

assumption (for a xed-p oint system) that addition and negation do not generate

round-o errors, our checker could generate error only in the n real-number mul-

(k )

tiplications needed to calculate ~x  ~v . It is our heuristic exp ectation that such

error would not b e large enough to signi cantly a ect the validity of our checker.

 Evidently, di erentvalues could b e used for the constants in our algorithm of

Section 2.1. Our goal was to particularize a pragmatic checker|one which satis es

reasonably well each of the following desirable conditions: small gap b etween the

de nitions of de nitely correct and de nitely incorrect; small chance of a mistaken

accept;very small chance of a mistaken reject; and small run-time.

To generalize: let our de nition of de nitely incorrect b e that there exists j j

i

p p

(k ) (k )

c n. Let our algorithm calculate t values of D and reject i any jD jc n.

2

c =2

(We used c =6, t =10above.) Then the error b ound in Lemma 1 is 2e . The



2

t

1

c =2

. Lemma error b ound in Lemma 3 (I) is 2te , and that in Lemma 3 (I I) is

2

4 (I I I) generalizes: for any q 2 (0; 1], the probabilitythatanatleastq p ortion of



,where is the all de nitely incorrect I/O pairs will b e mistakenly accepted is 

q

error b ound from Lemma 3 (I I). Lemma 4 (IV) generalizes analogously. Lemma

10

5 applies to any simple checker which has probability of error  2 (and any

checker can have its error-b ound reduced to this level by rep eating the check a

small constantnumb er of times).

2.4 A Self-Corrector

Motivation: Todevelop a self-corrector for P, we start with the traditional ap-

proach of adding a random displacement to the lo cation at whichwemust p oll P .

However, to preventthe comp onents of our input-vectors from going out of legal

domain [1; 1], we here nd it necessary to p oll P at a lo cation whichisaweighted

average of desired input ~x and a random input ~r . If weweighttheaverage so that

~r dominates, the resulting vector is near-uniform random, allowing us to provethe

reliability of our self-corrector. This metho d may b e compared to those employed in [Gemmell et al. 1991].

18  H. Wasserman and M. Blum

Assume that wehave tested P on a large number of random input vectors:

i.e., vectors (of xed length n) generated by cho osing each comp onent uniform-

randomly from the set of xed-p ointrealnumb ers on legal domain [1; 1]. Assume

wehaveveri ed that P is correct on these random input vectors ( an allowable

error-delta); a version of our simple checker might be employed to facilitate this

veri cation. Through such a testing stage (or by use of a self-tester [Ergun  1995]),

we can satisfy ourselves that (with high probability) the fraction of inputs on which

1

P returns incorrect output is very small: say,  .

10;000;000

Once wehave this assurance, we can employthe following self-corrector for P ,

whose time-b ound is two calls to P plus O (n):

Algorithm 4. On input ~x ,

|Generate random input vector ~r .

|Cal l P to calculate P(~r ).

  

1 1

|Cal l P to calculate P ~x + 1 ~r . (Note that this is possible because, for

n n

 

1 1

any legal input vectors ~x and ~r , ~x + 1 ~r wil l also be legal|i.e., each

n n

component wil l beinlegal domain [1; 1].)

|Return, as our corrected value for P(~x ),

    

1 1

~x + 1 ~r (n 1)  P(~r ): ~y := n  P

c

n n

Lemma 8. For any ~x :

      

1 1 3

Pr P ~x + 1 ~r is incorrect  :

~r

n n 10; 000; 000

Proof. Fix ~x. As ~r varies over all legal input vectors, each comp onentofvector

  

1 1 1

~x + 1 ~r varies over a 1 fraction of legal domain [1; 1]. Thus, since

n n n

 

1 1

~x + 1 ~r is distributed uniform- ~r is a random input vector, the value of

n n

9

randomly over a \neighb orho o d" of input vectors whose size, as a fraction of the



n

1 1 1

space of all legal input vectors, is 1  > .

n e 3

  

1 1

~x + 1 ~r is at So the probability of hitting an incorrect output at P

n n

most three times the probability of hitting an incorrect output at a truly uniform-

1

random lo cation, whichwe knowtobe  .

10;000;000

Lemma 9. For any ~x: ~y wil l (approximately) equal desired output ~xA, with

c

4

probability of error (over the choiceof~r )  .

10;000;000

  

1 1

~x + 1 ~r Proof. Based on testing and on Lemma 8, weknowthat P

n n

3 1

and P(~r ) are b oth likely to b e correct: chance of error is  + =

10;000;000 10;000;000

4

. And if they are indeed b oth correct, then (barring problems of round-o

10;000;000

error, which will b e discussed b elow),

    

1 1

~x + 1 ~r (n 1)  P(~r ) ~y = n  P

c

n n

9

Arithmetic rounding must b e done with care to assure this uniformity.

Software Reliability via Run-Time Result-Checking  19

    

1 1

~x + 1 ~r A (n 1)~rA = n

n n

   

1 1

= n ~xA + n 1 ~rA (n 1)~rA

n n

= ~xA:

Unfortunately, our self-corrector reduces the arithmetic precision of P. For, in

calculating

    

1 1

~y := n  P ~x + 1 ~r (n 1)  P(~r );

c

n n

log n bits of information are lost when we divide ~x by n (assuming xed-p oint

arithmetic). Moreover, the nal multiplications by n and (n 1) will magnify the

 errors in the values returned by P . It would seem, then, that the self-corrected

P mayhave p oly(n) times the arithmetic error of the original P. A more conclusive

error-analysis would require knowledge of the internals of P.

Thus we will have greater con dence in this corrector if our system allows for

the use of more bits of arithmetic precision than are seemingly required for the

ordinary op eration of P. In particular, O (log n) bits of additional precision might

b e exp ected to comp ensate for a p oly(n)-factor increase in error. However, a more

detailed error-analysis, requiring knowledge of the internals of P ,would b e needed

to determine whether or not this is in fact true.

Rather than delving further into error-analyses, we merely note that questions of

the round-o errors whichcheckers and correctors should exp ect and will pro duce

are imp ortant to real-number checking, and may be of interest to the numerical

analysis community.

2.5 AFaster Self-Corrector

Another problem with our self-corrector is that it slows execution sp eed by a factor

of at least 2. We can x this problem by employing the metho d of storedrandomness

(Section 1.6):

Algorithm 5. Prior to run-time, we generate and storearandom input vector

~r . We also calculate and store (n 1)P(~r ). By repeated ly using these stored values,

we then need only one run-time cal l to P to calculate

    

1 1

~y := n  P ~x + 1 ~r (n 1)  P(~r ):

c

n n

Observe that our mo di ed self-corrector|like our simple checker|adds only

O (n) arithmetic op erations to the execution-time of P. Assuming that P is an

! (n)-time program, the resulting loss of p erformance should b e negligible.

Pro ceeding as in Lemma 4, we can readily derive the following results|results

which suggest that the use of stored randomness do es not seriously undermine the

reliability of our corrector:

Lemma 10. Assume that we generate \random package" R = h~r; (n 1)P(~r ) i

and use this storedpackage to self-correct ` runs of P. Then, for probabilities over

the choiceofR :

20  H. Wasserman and M. Blum

(I). If `  10; 000,thentheprobability that any corrected output wil l beerroneous

4

is  .

1;000

1

(I I). For any `, the probability that  of the corrected outputs wil l be

10;000

4

erroneous is  .

1;000

Lemma 10, like Lemma 4, is not entirely satisfactory. As in Section 2.1, wecan

derive a stronger result by using a slightly more sophisticated metho d:

Fix input length: assume that N bits suce to representany legal input ~x. Also

x the behavior of P: i.e., assume (as is usual for a self-corrector) that P is not

able to adapt in resp onse to the corrector.

Algorithm 6. At preprocessing time, we generate, not one random package

 

R = h~r; (n 1)P(~r )i, but 4N packages R ;:::;R . Then, for each run-time

1

4N

  

self-correction, we use package R 2 fR ;:::;R g.

u

1

i

4N

The following lemma argues that each run-time self-correction will then havea

small indep endent probability of error. Pro of is analogous to that of Lemma 5.

5N  

wil l be such that, Lemma 11. With probability of failure  2 , R ;:::;R

1

4N

1

 

for all ~x , at most are bad for correcting P(~x ) . Moreover, by of R ;:::;R

1

4N

4

1

5N

increasing the number of stored packages we can reduce error-bounds 2 and

4

as desired.

This is a general metho d allowing any self-corrector to take advantage of stored

randomness without compromising its reliability. However, a di erence between

this result and Lemma 5 should b e noted: here P is not allowed to b e an adversary

which can adapt to the corrector. This seems a necessary requirement, as a program

 

knowing the values of stored packages R ;:::;R could readily make use of this

1

4N

knowledge to fo ol the corrector.

2.6 Self-Corrector|Variants

 To generalize: let p b e an upp er b ound on the p ortion of input vectors ~x for

1

ab ove.) Then the error b ound in which P(~x ) is incorrect. (We used p =

10;000;000

Lemma 8 is ep, and that in Lemma 9 is (e +1)p. Lemma 10 (I I) generalizes: for

any q 2 (0; 1], the probability that an at least q p ortion of the corrected outputs

(e+1)p

. Lemma 11 applies to any self-corrector which has will be erroneous is 

q

10

probability of error  2 .

 Having calculated corrected output ~y as describ ed ab ove, we could then em-

c

ployaversion of our simple checker to verify the correctness of ~y . This mo di ed

c

form of P then provides b oth checking and self-correcting, while adding only O (n)

arithmetic op erations to P 's usual execution-time.

 The checking situation would be somewhat di erent if our computer used

oating-p oint, rather than xed-p oint, representations. Our mo del of P's arith-

metic errors as a at  in each comp onent of output would need mo di cation,

 

1 1

and the problem of designing expressions suchas ~x + 1 ~r |i.e., random-

n n

ized expressions near-uniformly distributed over the set of legal inputs|would b e

more complex. In [Blum and Wasserman 1996], we deal with oating-p oint arith-

metic by checking only op erations on normalized mantissas (for we regard such

Software Reliability via Run-Time Result-Checking  21

op erations as the most arduous and error-prone p ortion of micropro cessor arith-

metic); we thereby reduce, essentially,tothe xed-point case.

3. CONCLUSION

Wehave stated our b elief that result-checking mayprovea valuable to ol for software

debugging and for the creation of extremely reliable systems. Continuing work

toward this goal advances on two fronts. First there is theoretical : work

which adapts checking metho dologies to make them more easily applicable. This

can include: making checking algorithms more ecient; extending checking to work

with limited-accuracy real numbers; and developing philosophical/mathematical

analyses of the p otential reliabilityofcheckers.

The current pap er sp eci es an ecient linear-function checker (Sections 2.1{

2.3) and corrector (Sections 2.4{2.6), using storedrandomness as a \cheat" to save

time. These algorithms also deal with issues of checking real-numb er computations,

extending the work of [Gemmell et al. 1991; Ar et al. 1993]. Moreover, Section 1.5

gives some preliminary result-checking \philosophy." The reader may also refer to

[Blum and Wasserman 1996], which considers the checking of symb olic

libraries and of micropro cessor arithmetic.

Future theoretical researchmight b e exp ected to derivechecking algorithms with

powers analogous to those of the current pap er, but applicable to broader classes

of functions. Also interesting is the topic of storedrandomness |and, in general, of

saving time bycheating work into a prepro cessing stage. The metho ds of Sections

2.1 and 2.5 seem to b e the only checking algorithms so far to make essential use of

prepro cessing. More examples will p erhaps b e forthcoming.

Result-checking theory must b e joined with applied research. With Hughes Air-

craft we are conducting pilot studies of checker-based debugging [Bo ettcher and

Mellema 1995]. In particular, Dr. David Shreve of Hughes has implemented the

simple checker of Section 2.1 as a means of testing a Fourier Transform used in

radar software. Through such studies, it must b e determined how e ective result-

checkers are at agging bugs, and whether creating checkers is an ecientuseof

programming man-hours. As mentioned in Section 1.5, the problem of debugging

checkers is also intriguing.

Finally, an extensive topic for further study, bridging theory and application, is

the creation of partial checkers for \messy" functionalities commonly found in real-

world software. Applied result-checking will at last dep end on the developmentof

arich, general family of heuristics for software checking.

The fol lowing bibliography of checking literature is only partial. It is not intendedasa

comprehensive survey.

REFERENCES

}

Alon, N., Spencer, J. H., and Erdos, P. 1992. The Probabilistic Method. John Wiley

and Sons. Cited for a Cherno Bound in Section 2.1.

Ar, S., Blum, M., Codenotti, B., and Gemmell, P. 1993. Checking approximate compu-

tations over the reals. In Proc. 25th ACM Symp. Theory of Computing (1993), pp. 786{795.

Extends checking metho dologies to computations on limited-accuracy real numb ers. Exam-

ples: matrix multiplication, inversion, and determinant; solving systems of linear equations.

Arora, S. and Safra, S. 1992. Probabilistic checking of pro ofs; anewcharacterization

22  H. Wasserman and M. Blum

of NP. In Proc. 33rd IEEE Symp. Foundations of Computer Science (1992), pp. 2{13.

p

Equates NP languages with those in interactive-pro of class PCP(log n; log n).

Babai, L. and Fortnow, L. 1991. Arithmetization: a new metho d in structural complexity

theory. Computational Complexity 1, 1, 41{46. Preliminary version: \A characterization of

#P by arithmetic straight line programs," Proc. 31st IEEE Symp. Foundations of Com-

puter Science (1990), pp. 26{34. Further develops a technique from [Babai, Fortnow, and

Lund] of translating Bo olean formulae into multivariate p olynomials; hence allows for p oly-

nomial concepts to b e applied to the study of certain complexity classes.

Babai, L., Fortnow, L., Levin, L., and Szegedy, M. 1991. Checking computations in

p olylogarithmic time. In Proc. 23rdACM Symp. Theory of Computing (1991), pp. 21{31.

Avariant of [Babai, Fortnow, and Lund] with lower time-b ounds, this pap er intro duces an

unusual typ e of veryfastchecker for NP computations. Suchcheckers could b e regarded as

\hardware checkers," in that they ensure that the hardware follows instructions correctly,

but don't ensure that the software is correct.

Babai, L., Fortnow, L., and Lund, C. 1991. Non-deterministic exp onential time has two-

prover interactive proto cols. Computational Complexity 1,1, 3{40. Preliminary version:

Proc. 31st IEEE Symp. Foundations of Computer Science (1990), pp. 16{25. Proves that

NEXP = 2IP, and hence that NEXP-complete problems have complex checkers.

Beaver, D. and Feigenbaum, J. 1990. Hiding instances in multioracle queries. In Proc. 7th

Symp. Theoretical Aspects of Computer Science (1990), pp. 37{48. Considers the question:

can one get one or more servers to compute f (x) without revealing the value of x to any

server? Also proves that p olynomials can b e self-corrected.

Beigel, R. and Feigenbaum, J. 1992. On b eing incoherent without b eing very hard. Com-

putational Complexity 2, 1, 1{17. Resp onse to questions of [Blum and Kannan 1995; Yao

1990], including a pro of that all NP-complete languages are coherent.

Bentley, J. 1986. Programming Pearls. Addison-Wesley. A discussion here of the diculty

of writing a correct binary-search program is cited in Section 1.7.

Blum, A. 1994. Personal communication. Idea of weak checking (Section 1.7).

Blum, M. 1988. Designing programs to check their work. Technical Rep ort TR-88-009, Int'l

Computer Science Institute. Formal intro duction of simple and complex checking (though

anticipated by, e.g., [Freivalds 1979]). Considers checking of graph isomorphism, sorting,

and several group-theoretic computations. See also [Blum and Kannan 1995].

Blum, M., Evans, W., Gemmell, P., Kannan, S., and Naor, M. 1994. Checking the

correctness of memories. Algorithmica 12, 2/3 (Aug./Sept.), 225{244. Intro duces data-

structure checking. Demonstrates that, given a small, secure database, one maycheck the

correctness of a large, adversarial database.

Blum, M. and Kannan, S. 1995. Designing programs that check their work. Journal of

the ACM 42, 1 (Jan.), 269{291. Preliminary version: Proc. 21st ACM Symp. Theory of

Computing (1989), pp. 86{97. Closely related to [Blum 1988].

Blum, M., Luby, M., and Rubinfeld, R. 1993. Self-testing/correcting with applications to

numerical problems. Journal of Computer and System 47, 3, 549{595. Preliminary

version: Proc. 22nd ACM Symp. Theory of Computing (1990), pp. 73{83. Intro duces self-

testing and self-correcting, and gives applications to a variety of fundamental mathematical

computations.

Blum, M. and Wasserman, H. 1996. Re ections on the p entium division bug. IEEE Trans-

actions on Computers 45, 4 (April), 385{393. A companion-piece to the current pap er; ar-

gues that checking and correcting may b e used to enhance the reliability of micropro cessor

arithmetic.

Boettcher, C. and Mellema, D. J. 1995. Program checkers: practical applications to

real-time software. In Test Facility Working Group Conf. (1995). Progress rep ort on a pilot

study with Manuel Blum intended to test the usefulness of result-checking as a software

development to ol.

Butler, R. W. and Finelli, G. B. 1993. The infeasibility of quantifying the reliabilityof

life-critical real-time software. IEEE Transactions on SoftwareEngineering 19, 1 (Jan.),

Software Reliability via Run-Time Result-Checking  23

3{12. Demonstrates inherent limitations of conventional software testing and of fault tol-

erance.

Cleve, R. and Luby, M. 1990. A note on self-testing/correcting metho ds for trigonometric

functions. Technical Rep ort 90-032, Int'l Computer Science Institute. Applies the metho ds

of [Blum et al. 1993] to trigonometric functions.



Ergun, F. 1995. Testing multivariate linear functions: overcoming the generator b ottleneck.

In Proc. 27th ACM Symp. Theory of Computing (1995), pp. 407{416. Extends self-testing

to functions (e.g., the Fourier Transform) for which traditional self-testers prove inecient.

See also [Ravikumar and Sivakumar 1995].

Fortnow, L. and Sipser, M. 1988. Are there interactive proto cols for co-NP languages?

Information Processing Letters 28, 5 (Aug.), 249{251. Suggests that co-NP may not b e

contained in IP. [Lund et al. 1992] later proved the contrary.

Freivalds, R. 1979. Fast probabilistic algorithms. In Mathematical Foundations of Com-

puter Science , Number 74 in Lecture Notes in Computer Science, pp. 57{69. Springer-

Verlag. Includes a simple checker for matrix multiplication.

Gemmell, P., Lipton, R. J., Rubinfeld, R., Sudan, M., and Wigderson, A. 1991. Self-

testing/correcting for p olynomials and for approximate functions. In Proc. 23rd ACM

Symp. Theory of Computing (1991), pp. 32{42. Extends self-testing/correcting of poly-

nomials to several dicult cases: e.g., for domains other than nite elds, or for programs

capable of (limited) adversarial adaptation. Also commences the extension of traditional

checking metho dologies to computations on limited-accuracy reals.

Gemmell, P. and Sudan, M. 1992. Highly resilient correctors for p olynomials. Information

Processing Letters 43, 4 (Sept.), 169{174. Gives near-optimal self-correctors for programs

which claim to compute multivariate p olynomials. As long as a program is correct on a

1

+

+  fraction of inputs (for  2< ), self-correcting is p ossible.

2

Goldreich, O., Micali, S., and Wigderson, A. 1991. Pro ofs that yield nothing but

their validity or all languages in NP have zero-knowledge pro of systems. Journal of the

ACM 38, 3 (July), 691{729. Preliminary version: Proc. 27th IEEE Symp. Foundations of

Computer Science (1986), pp. 174{187. Includes interactive-pro of algorithms for problems

including graph isomorphism.

Kannan, S. 1990. Program Result-Checking with Applications. Ph.D. thesis, Computer Sci-

ence Division, University of California, Berkeley. Includes checkers for group-theory and

linear-algebra problems.

Lipton, R. J. 1990. Ecient checking of computations. In Proc. 7th Symp. Theoretical

Aspects of Computer Science , Number 415 in Lecture Notes in Computer Science, pp.

207{215. Springer-Verlag. Proves that logspace suces to checkmany computations.

Lipton, R. J. 1991. New directions in testing. In Distributed Computing and Cryptography ,

Volume 2 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science,

pp. 191{202. American Mathematics So ciety. Indep endently paralleling [Blum et al. 1993],

intro duces what are essentially self-correctors (though under a di erent name). Proves that

#P-complete problems have self-correctors.

Lund, C., Fortnow, L., Karloff, H., and Nisan, N. 1992. Algebraic metho ds for in-

teractive pro of systems. Journal of the ACM 39, 4 (Oct.), 859{868. Preliminary version:

Proc. 31st IEEE Symp. Foundations of Computer Science (1990), pp. 2{10. Metho dol-

ogy for constructing an interactive pro of system for any language in the p olynomial-time

hierarchy. Applications to [Shamir; Babai, Fortnow, and Lund].

Micali, S. 1992. CS pro ofs and error-detecting computation. Technical rep ort, MIT Lab for

Computer Science. Also: \CS pro ofs," Technical Rep ort TM-510, MIT Lab for Computer

Science, 1994. Gives result-checkers for NP-complete problems, sub ject to the assumptions

that wehaveavailable a random oracle which can serve as a cryptographically secure hash-

function, and that the program b eing checked has insucient time to nd collisions in this

hash-function.

Nisan, N. 1989. Co-SAThasmulti-prover interactive pro ofs. E-mail announcement. Initi-

ated events leading to [Lund, Fortnow, Karlo , and Nisan; Shamir; Babai, Fortnow, and Lund].

24  H. Wasserman and M. Blum

Ravikumar, S. and Sivakumar, D. 1995. On self-testing without the generator b ottleneck.

In Proc. 15th Conf. Foundations of SoftwareTechnology and Theoretical Computer Sci-

ence , Numb er 1026 in Lecture Notes in Computer Science, pp. 248{262. Springer-Verlag.

Extends the results of [Ergun  1995].

Rubinfeld, R. 1990. A Mathematical Theory of Self-Checking, Self-Testing, and Self-

Correcting Programs. Ph.D. thesis, Computer Science Division, University of California,

Berkeley. Closely related to [Blum et al. 1993].

Rubinfeld, R. 1992. Batch checking with applications to linear functions. Information

Processing Letters 42, 2 (May), 77{80. Intro duces batch checking: i.e., collecting I/O

pairs and checking them simultaneously.For certain functions it is p ossible for this combined

check to b e more ecient than separate checks.

Rubinfeld, R. 1994. On the robustness of functional equations. In Proc. 35th IEEE Symp.

Foundations of Computer Science (1994), pp. 288{299. Shows that functions satisfying

any of a broad class of functional equations (e.g., f (x + y )=f (x)+f (y )) may b e self-

tested and self-corrected. Hence sp eci es testers and correctors for many naturally o ccurring

functions, suchastanx,1=(1 + cot x), and cosh x.

Rubinfeld, R. and Sudan, M. 1992. Self-testing p olynomial functions eciently and over

rational domains. In Proc. 3rdACM-SIAM Symp. Discrete Algorithms (1992), pp. 23{32.

Extends checking metho dologies from nite elds to integer and rational domains.

Rubinfeld, R. and Sudan, M. 1996. Robust characterizations of p olynomials with appli-

cations to program testing. SIAM Journal on Computing 25, 2 (April), 252{271. Demon-

strates that p olynomials have prop erties which allow them to be self-tested and self-

corrected. See also [Rubinfeld 1994].

Schwartz, J. T. 1980. Fast probabilistic algorithms for veri cation of p olynomial identities.

Journal of the ACM 27, 4 (Oct.), 701{717. A fundamental result (credit for which is also

given to Zipp el and DeMillo{Lipton): to determine (with high probability) whether two

p olynomials are identical, it generally suces to check their equality at a random lo cation.

Applications include: checking multiset equality; proving that two straight-line arithmetic

programs compute the same function.

Shamir, A. 1992. IP = PSPACE. Journal of the ACM 39, 4 (Oct.), 869{877. Preliminary

version: Proc. 31st IEEE Symp. Foundations of Computer Science (1990), pp. 11{15.

It follows from this result that a program P which claims to solve a PSPACE-complete

problem maybechecked in p olynomial time plus a p olynomial numb er of calls to P.

Vainstein, F. S. 1991. Error detection and correction in numerical computations byal-

gebraic metho ds. In Proc. 9th Symp. Applied Algebra, Algebraic Algorithms and Error{

Correcting Codes , Number 539 in Lecture Notes in Computer Science, pp. 456{464.

Springer-Verlag. See [Vainstein 1993].

Vainstein, F. S. 1993. Algebraic Methods in Hardware/Software Testing. Ph.D. thesis,

EECS Department, Boston University. Uses the theory of algebraic and transcendental eld-

extensions to design partial complex checkers for all functions that may b e constructed from

x

x, e ,sin(ax + b), and cos (ax + b) using op erators + and fractional exp onentiation.

Valiant, L. G. 1979. The complexity of computing the p ermanent. Theoretical Computer

Science 8, 2 (April), 189{201. De nes #P-completeness and proves that computing the

p ermanent of a matrix is #P-complete. See also [Lipton 1991].

Wegman, M. N. and Carter, J. L. 1981. New hash functions and their use in authenti-

cation and set equality. Journal of Computer and System Sciences 22, 3 (June), 265{279.

Includes an idea for a simple checkofmultiset equality (completed in [Blum 1988]).

Yao, A. C. 1990. Coherent functions and program checkers. In Proc. 22nd ACM Symp.

Theory of Computing (1990), pp. 84{94. Function f is coherent i on input hx; y i one can

f

determine whether or not f (x)=y via a BPP algorithm which is not allowed to query f

at x. Author proves the existence of incoherent(andthus uncheckable) functions in EXP.

See also [Beigel and Feigenbaum 1992].