SelfValidated Numerical Metho ds

and Applications

Jorge Stol

Instituto de Computacao Universidade Estadual de Campinas UNICAMP

Luiz Henrique de Figueiredo

Lab oratorio Nacional de Computacao Cientca LNCC

Preface

st

This monograph is a set of course notes written for the Brazil

ian Collo quium held at IMPA in July It gives an

overview of the eld of selfvalidated numerics computation mo dels in

which approximate results are automatically provided with guaranteed

error b ounds We fo cus in particular on two such mo dels interval

arithmetic and ane arithmetic

Interval arithmetic IA was develop ed in the s by Ramon E

Mo ore IA is the simplest and most ecient of all validated nu

merics mo dels and not surprisingly the most widely known and used

After two decades of relative neglect IA has been enjoying a strong

and steady resurgence driven largely by its successful use in all kinds

of practical applications We are condent that many readers of this

monograph will nd IA to b e a useful to ol in their own work as well

Ane arithmetic AA is a more complex and exp ensive computation

mo del designed to give tighter and more informative b ounds than IA in

certain situations where the latter is known to p erform p o orly The AA

mo del was prop osed and develop ed recently by the authors al

though a similar mo del had b een develop ed in by E R Hansen

Apart from its usefulness for certain sp ecial applications AA is b eing

presented here as an example of the many topics for research that are

still unexplored in the eld of selfvalidated numerical metho ds

We ap ologize to the reader for the length and verb osity of these notes

but likePascal we didnt have the time to make them shorter

Je nai fait celleci plus longue que parce que je nai pas eu le loisir de la faire

plus courte Blaise Pascal Lettres Provinciales XVI i

ii

Acknowledgements

st

We thank the Organizing Committee of the Brazilian Mathematics

Collo quium for the opp ortunity to present this course

We wish to thank Joao Comba who help ed implementa prototyp e

ane arithmetic package in Mo dula and Marcus Vinicius Andrade

who help ed debug the C version and wrote an implicit surface raytracer

based on it Ronald van Iwaarden contributed an indep endent imple

mentation of AA and investigated its p erformance on branchandb ound

Douglas Priest and Helmut Jarausch

provided co de and advice for rounding mo de control

We wish to thank also Sergey P Shary and Lyle Ramshaw for valu

able comments and references and Paulo Correa de Mello for the use of

his computing equipment

The authors research has been supp orted over the years partly

by grants from Brazilian research funding agencies CNPq CAPES

FAPESP FAPERJ and by the State University of Campinas UNI

CAMP the Pontical Catholic University of Rio de Janeiro PUCRio

the Institute for Pure and IMPA the National

Lab oratory for Scientic Computation LNCC and Digitals Systems

Research Center DEC SRC in Palo Alto

Figures and were generated with Geomview Figure

was generated with xfarbe Figures and were kindly provided

by Luiz Velho The other gures were generated with software written

by the authors

Jorge Stol Luiz Henrique de Figueiredo

Instituto de Computacao UNICAMP Lab Nacional de Computacao Cientca

Caixa Postal Rua Lauro Muller

Campinas SP Brazil Rio de Janeiro RJ Brazil

stolfidccunicampbr lhflnccbr

May

Contents

Intro duction

Approximate computations

Error mo dels

Floatingp ointnumber systems

The IEEE oatingp oint standard

Intro duction

Intervals

Computing with IA

Sp ecic op erations

Utility op erations

The error explosion problem

Avoiding error explosion

Ane arithmetic

Ane forms

Joint range of ane forms

Sp ecial ane forms

Conversions b etween IA and AA

Computing with AA

Ane op erations

Nonane op erations

Optimal ane approximations

Square ro ot

The minrange approximation

Exp onential iii

iv Contents

Recipro cal

Multiplication

Division

The mixed AAIA mo del

Comparing AA and IA

Implementation issues

Optimization techniques

Hansens Generalized Interval Arithmetic

Some applications

Zeros of functions

Level sets

Ray tracing

Global optimization

Surface intersection

Bibliography

Chapter

Intro duction

Approximate computations

Many numerical computations esp ecially those concerned with mo d

eling physical phenomena are inherently approximate they will not

deliver the true exact values of the target quantities but only some

values that are in some sense near the true ones

Approximate numerical computation has been an essential to ol of

science and technology for several centuries The history of the eld is

actually as long as that of science itself the ancient Babylonians were

already computing recipro cals and square ro ots as truncated sexagesimal

a fractions One of Archimedess bestknown endeavors was

numerical approximation for The greatest mathematicians

of mo dern history such as Newton and Gauss were deeply concerned

with this eld The rst electronic computers were expressly designed for

numerical computing and that application is still an overriding concern

in the design of mo dern CPU chips

Error analysis

The dierence between a computed value and the true value of the

corresp onding quantity is commonly called the error of that computed

value

Some sources of error are external to the computation the inputs

mayhave b een contaminated by measurement error or missing data or

Intro duction

the computation may b e based on a simplied mathematical mo del that

later proves to b e inadequate

Other sources of error are internal due to the discrete nature of

digital computing to resource limitations on computing time storage

capacity or program complexity or to compatibility constraints such

as hardware oatingp oint formats and decimal inputoutput conver

sion These factors usually force the original mathematical mo del to b e

replaced by a discrete approximation with nite steps truncated

rounded arithmetic etc

The accuracy of numeric algorithms is notoriously hard to analyze

In practice it is often imp ossible or unfeasible to predict mathematically

the magnitude of the roundo and truncation errors hidden in the output

of a numeric program

In order to be truly useful every approximate numerical pro cedure

must be accompanied by an accuracy specication a statement that

denes the magnitude of the errors in the output values usually as a

function of the input values and their errors The accuracy sp ecication

must b e supp orted byanerror analysis a mathematical pro of that the

output errors ob ey the sp ecications

Unfortunatelyeven for relatively simple algorithms a rigorous error

analysis is often prohibitively long or dicult or b oth Moreover a

useful accuracy sp ecication often requires that the inputs satisfy a host

of prerequisitesthis matrix must be wellconditioned that function

must have b ounded derivatives these formulas should not overow etc

In practice these prerequisites are often imp ossible to guarantee or even

to check

As a consequence numerical algorithms are often put to use without

accuracy sp ecications much less a prop er error analysis Interpretation

of the results is left to the user who must rely on his intuition crude

tests or pure luck

This unfortunate state of aairs has even led to prejudice against

approximate numerical computing in contemp orary computer science

tsuccess in applications The eld is despite its imp ortance and eviden

generally p erceived by outsiders as sloppy and fuzzy and hence not

really precise and scientic and hence not a resp ectable part computer

sciencewhere as in mathematics there is no place for things that are correct

Approximate computations

A simple and common approach for estimating the error in a oating

point computation is to rep eat it using more precision and compare the

results If the results agree in many decimal places then the computa

tion is assumed to be correct at least in the common part However

this common practice can b e seriously misleading as the following sim

ple example by Rump shows Consider the evaluation of

f y x x y y y y xy

for x and y Note that x y and all co ecients in f

are exactly representable in oatingp oint b e it binary or decimal

that computing f in FORTRAN on a IBM S Rump rep orts

mainframe yield

f using single precision

f using double precision

f using extended precision

Since these three values agree in the rst seven places common practice

would accept the computation as correct However the true value is f

not even the sign is right in the computed results

Similar results can b e obtained with Maple

x y

evaluate in floating point

fyxxyyyyxy

f

evaluate in exact rational arithmetic

yyxy fyxxyy

f

show decimal equivalent

evalff

Selfvalidated numerics

In resp onse to this problem there have arisen several mo dels for self

validated computation SVC also called automatic result verica

tion in which the computer itself keeps track of the accuracy of

Maple is a registered trademark of Ontario Inc

Intro duction

computed quantities as part of the pro cess of computing them So if

the magnitude of the error cannot b e predicted at least it can b e known

a posteriori

The simplest and most p opular of these mo dels is R Mo ores interval

arithmetic which we study at length in Chapter There are

many other mo dels however such as E Hansens generalized interval

arithmetic and the el lipsoid calculus of Chernousko Kurzhanski

and Ovseevich In Chapter we describ e our own mo del similar

to Hansens whichwecallane arithmetic

A basic limitation of selfvalidated numerical mo dels is that they can

only determine the output errors a posteriori Still this capability is

quite sucient in certain applications esp ecially those where the errors

are mostly due to external causes If the output errors as computed by

the mo del are deemed to o large then the resp onse must involve some

external actionacquire more data stop the pro cess alert a human

op erator etc

In cases where the output errors are mainly due to internal approxi

mations and rounding a selfvalidated mo del makes it p ossible to auto

matically redo or rene the computation with increased precision until

the output errors are acceptable This approach has b een turned into a

fundamental principle in lazy real arithmetic

Of course there are many applications such as realtime pro cess

control where one needs a priori guarantees on the accuracy and time

liness of the results In such cases the selfvalidated approach is not

of much help one still needs to p erform a rigorous error analysis prior

to implementation in order to guarantee that the output errors will b e

always acceptable

Error mo dels

For this monograph we will start from a very general mo del of self

validated computation We assume that the original goal was to evaluate

m

n

z f x for some mathematical function f R R but we actually

had to implementa discrete computation Z F X where X and Z

are approximate values discrete mathematical ob jects that carry only



This would allow the correct result to b e obtained in Rumps example using only

oatingp oint arithmetic

Error mo dels

partial information ab out the values of the corresp onding continuous

quantities x and z

There are many dierent selfvalidated computation mo dels that t

this pattern They are distinguished by the nature of the approximate

values they use probability distributions intervals boxes ellipsoids

polytopes condence intervals lazy digital expansions interval bags

and many more

Probabilistic error mo dels

In the natural sciences and engineering it is customary to view approxi

mate values in a statistical sense the computed result Z is seen to dene

a for the unknown true quantities z z

n

Typically the errors are assumed to follow a Gaussian normal dis

tribution The computed result Z should then sp ecify the mean and

variance of each true quantity z and p ossibly the full covariance ma

i

trix for the z that is the joint Gaussian probability distribution of

i

the vector z z

n

In this framework a selfvalidated computing mo del should automat

ically compute the statistical parameters for the output quantities z

i

given those for the inputs x

j

Unfortunately this mo del is limited to relatively simple situations

where the measurement errors can be mo deled by a Gaussian distri

butions and the computations use only linear formulas with negligible

roundo errors When these conditions do not hold computing the prob

ability distribution of the error or even its mean and variance app ears

to b e an intractable mathematical problem

mo dels Rangebased

Toavoid the apparent limitations of the probabilistic mo del most self

validated numerical mo dels are based on range analysis ie use ranges

rather than distributions as approximate values

Sp ecically the approximate value Z denes a range Z for the

quantity z ie a set of real values that is guaranteed to contain the true

value of z provided that the input quantities x lie in the range X

sp ecied by the input value X We refer to this prop erty as the funda

mental invariant of range analysis see Section

Intro duction

In the simplest range analysis mo dels such as interval arithmetic

each comp onent Z of the computed value denes a range of values Z

i i

for the corresp onding real quantity z The output Z do es not include

i

any constraint relating two or more quantities z in other words the

i

range Z for the output vector z z z is merely the Cartesian

n

pro ducts of those individual ranges Z Z Z All combinations

n

Z Z are in principle allowed by this mo del of z z in the b ox

n n

In more sophisticated mo dels such as ane arithmetic and the el

lipsoid calculus the output Z includes also some information on par

tial dep endencies between the quantities z and the inputs x Thus

i j

the computed result Z may give more information ab out the vector z

than one would get by considering its meaning for each z indep en

i

dently Therefore the Z together dene a joint range for the vector

i

z z x x which is generally a prop er subset of the Cartesian

n m

pro duct of the individual ranges

Some mo dels fall halfway between these two extremes Hansenss

generalized interval arithmetic for example records only correla

tions b etween the output quantities z and the inputs x but not among

i j

the inputs or among the outputs see Section

For eciency reasons each rangebased SVC mo del restricts its ranges

nm

to a sp ecic family R of subsets of R whose members can be

nm

eciently represented handled and combined boxes ellipsoids p oly

top es etc

ariant of range analysis The fundamental inv

Whatever the shap e of the allowed ranges all rangebased SVC mo dels

m n

provide for every function f R R a range extension F R

m

R with the following prop erty which we shall call the fundamental

n

invariant of range analysis

If the input vector x x lies in the range jointly deter

m

mined by the given approximate values X X then the

m

quantities z z f x x are guaranteed to lie in the

n m

range jointly dened by the approximate values Z Z

n

F X X

m

Ideally the joint range determined by the outputs Z should b e as tight

i

as p ossible namely the set of all vectors f x x such that x X

m j j

Error mo dels

In practice however a rangebased numerical routine is allowed to err

on the conservative side if that is necessary to keep the output ranges

representable or desirable for eciency reasons We shall see plentyof

examples in the following chapters

Relative accuracy

As we shall see amajor problem in all forms of rangebased computa

tion is the excessive conservativism of the resultsthe computed ranges

are often much wider than necessary Therefore in order to eectively

compare dierent algorithms and approaches we must develop some

quantitative measure of this conservativism

Let Z F X be a range computation that purp orts to represent

m n

the mathematical computation z f x where f R R Its

relative accuracy is by denition the ratio b etween the size of the ideal

range Y f f x x X g and that of the computed range Z By

size we mean the measure appropriate to the space in question length

for one dimension area for twoandsoon

Because of the fundamental invariant of range analysis the relative

accuracy is therefore a number between zero meaning that the output

range is innitely wider than the ideal range and one meaning that

the two ranges are essentially the same

measure then the relative Note that if the input range has zero

accuracy is likely to b e zero b ecause of roundo errors Therefore the

concept is useful only when X is large enough to make roundo errors

irrelevant

Condencerange mo dels

The fundamental invariant stated ab ove requires that the input values

lie in the range describ ed by the X This requirement is to o strict

i

for scientic and engineering applications where the input quantities

are obtained by physical measurement and hence may be aected by

essentially unb ounded error

In order to use range analysis in such applications wemust replace

absolute guarantees by probability statements That is each approxi

mate value Z sp ecies a condence range for the corresp onding quan

tity z a range of values Z as b efore and also a real number p the Z

Intro duction

probabilityor condence level of z lying in that range

More generally an ensembleofapproximate values Y Y Y for

k

k

quantities y y sp ecies a subset of R and an asso ciated condence

k

level p the probabilityof the vector y y lying in that set Note

Y

k

that this approach is quite dierent from the Gaussianerror mo del here

no assumption is made ab out the shap e or variance of the probability

distribution except that its integral inside the range Y isatleastp

Y

In this condenceinterval framework an SVC mo del should tell us

howto compute the jointrange and condence level for the outputs of

a formula giv en the same data ab out the inputs Again the mo del

is allowed to err on the conservative side when computing ranges and

probabilities

Most of the ordinary ie nonprobabilistic rangebased SVCmod

els can b e easily adapted to the condencerange interpretation Sp ecif

ically one should compute the ranges as in the nonprobabilistic mo del

and then evaluate the condence level according to the laws of proba

bility For example if x lies in the interval with probability

and y lies in with probability then x y lies in

with probability at least In general we have

p p p

x y

f xy

Floatingp oint number systems

A oatingpoint number system is a scheme for representing real numbers

in discrete machines Toallow a wide range of real numbers to be

represented oatingp ointnumb er systems enco de a fraction part called

mantissa and a scale part called exponent More precisely a oating

point number system has a base usually or and enco des real

numb ers as adic fractions of the form

d d d

p

e e

d d

p

p

where the mantissa m d d is written in base and e is

p

the exp onent A oatingp oint number system is characterized by the

base the precision p and the exponent range e e e Most

min max computers use base whereas most hand calculators use base

The IEEE oatingp oint standard

Figure shows a oatingp oint system with p e

min

and e Note that oatingp ointnumb ers are not uniformly

max

spaced but instead display logarithmic clustering

Figure Nonuniform distribution of oatingp ointnumb ers

There are two sources of errors when representing real numbers in

oatingp oint First a oatingp oint number system only represents a

nite set of numb ers Thus most real numb ers will b e either to o large

in absolute value to b e represented resulting in overow or to o small in

absolute value resulting in underow Second not every real number is

a adic fraction and so most real numb ers in the range of the oating

pointnumb er system will fall b etween two oatingp ointnumb ers and

one of them has to be chosen to represent it This choice is called

rounding and the error committed is called the roundo error

For example decimal fractions suchasor are very p opular

as a step sizes in numerical computation but have no exact represen

tation in binary oatingp oint systems Thus in binary oatingp oint

arithmetic and b ecause of roundo errors

It would b e much safer to use diadic fractions instead suchas or

sp ecially in long computations

The IEEE oatingp oint standard

The IEEE oatingp oint standard is arguably one of the most sig

nicantdevelopments in numerical computing since the adventofFOR

TRAN Until the widespread adoption of the IEEE standard in the early

s every computer manufacturer and often every computer mo del

had its own oatingp ointnumb er system with its own base precision

yoverow un range and its own semantics for rounding commutativit

derow division by zero etc Moreover those rules were usually illogi

cal and p o orly do cumented b eing generally the consequence of decisions

made bythe hardware designers who were more concerned with sp eed and cost than with mathematical precision

Intro duction

The IEEE standard ended this era of confusion The standard

p ostulates quite rigid formats for singleprecision bit and double

precision bit oatingp oint formats To be precise there are two

IEEE oatingp oint standards ANSIIEEE Std and IEEE

Std The rst one deals sp ecically with bit and bit bi

nary formats whereas the second covers oatingp oint systems of any

base and precision Since most machines follow b oth standards wecan

safely view them as one

The standard also ties down precisely the semantics of the four basic

arithmetic op erations and of certain common transcendental functions

by requiring that they b e as correct as logically p ossible That is hard

ware conforming to the IEEE standard must interpret the op erands as

rational numb ers compute the exact result as in mathematics and then

round it to the nearest representable number in a sp ecic direction

ver the standard provides control over rounding a feature that Moreo

is essential to SVC see Section Finally the standard sp ecies

precisely the results of exceptional op erations such as division byzero

overow and underow To this end it reserves certain bit patterns to

denote two innite values and and a series of error co des

or notanumbers NaN and extends the semantics of all op erations

to accept and return these sp ecial values

Essentially the standard do es not leave unsp ecied a single signi

cant bit of the oatingp oint mo del Thanks to this unforgiving strict

ness every oatingp ointnumber that can be represented in the IEEE

standard format can b e stored in any standardcompliantmachine with

out loss of precision Moreover any standardcompliant pro cessor that

p erforms the same sequence of oatingp oint op erations on the same

data will return precisely the same result down to the last bit This

provided a muchwelcome p ortabilityofnumerical programs and data

The IEEE standard is so thorough and lled such a need that it

was quickly adopted by most computer manufacturers down to the last

bit Likeevery standard this one has several technical aws whichare

all the more irritating for having b een cast in stone for decades to come

Still as in most other elds a bad standard is b etter than no standard

Besides all its practical contributions to p ortability robustness and

do cumentation the IEEE standard has had an enormous psychological

impact on the programming community Suddenly it b ecame worth

The IEEE oatingp oint standard

while to worry ab out roundo errors in a precise way It b ecame p os

sible at least in principle to design truly robust numerical algorithms

and give rigorous pro ofs of their correctness even in the presence of

underow and overow Thus the IEEE standard prepared the wayfor

selfvalidated computation

Sp ecial oatingp oint values

As mentioned ab ove the IEEE oatingp oint standard denes in addi

tion to ordinary nite numb ers certain sp ecial values

the innities and whose meaning and prop erties are for

the most part obvious

the notanumber values collectively denoted by NaN which are

the conventional result of indeterminate op erations like

and

negative zero which by denition is the recipro cal of the

and viceversa

As we shall see the innities and are very useful for self

validated computation since they allow us to represent the notion of

no lower b ound and no upp er b ound resp ectively

The NaN values behave rather p eculiarly in comparisons a NaN is

neither greater than equal to nor less than any other valueincluding

itself A NaN may signify either as no value more than one value

or anyrealvalue dep ending on the context Since manySVC mo dels

have other ways of representing these concepts NaN values tend to be

little used

We shall use the term oat for any IEEE oatingp oint value

nite innite or NaN We shall denote by F the set of all nite oats

whichwe shall consider as a subset of the real line Randby F the set

of numeric oats F F f g Note that NaN do es not b elong

to either F or F

Negative zero

One of the most controversial features of the IEEE standard is the ex

istence of a negative zero While it is p ossible to conco ct

Intro duction

examples where this feature saves an instruction or two in the vast ma

jority of applications this value is an annoying distraction and a p ossible

source of subtle bugs

Unlike innite values which merely extend the domain of arithmetic

op erations without changing their semantics for ordinary numb ers the

intro duction of negative zero actually aects the semantics of manyop

erations in nonobvious and mathematically inconsistent ways For in

stance the square ro ot of negative zero is dened to b e negative zero

Fortunately negative zero behaves like ordinary zero in many re

sp ects In particular in numeric comparisons negative zero turns out

to b e equal to and not less than ordinary zero Thus we can usually

pretend that ordinary zero and minus zero are the same value and we

shall adopt this viewp oint here Howev er one must watch out for o cca

sional pitfalls for instance IO routines will usually print negativezero

as and its sign comes out as instead of

Rounding mo de control

A feature of the IEEE standard that is highly relevant to SVC is its

provision for rounding control A standardcompliant pro cessor must

allow the programmer to sp ecify the direction in which computed results

are rounded to representable numb ers As we shall see this feature is

essential for the ecient implementation of interval arithmetic and other

selfvalidated computation mo dels

In this monograph weusethe notation hE i for the value of expres

sion E evaluated in IEEE oatingp oint arithmetic with the default

rounding mo de to nearest or to even We also write E for a nu

meric oat p ossibly that is greater than or equal to the value of a

formula E that is the value of E rounded up to a representable number

not necessarily the smallest one Similarlywe write E for the value

of E rounded down to a representable value

In the sp ecial case when E consists of a single arithmetic op eration

E and E are by denition the result of computing E on an IEEE

compliant pro cessor with the rounding direction set as sp ecied If E

contains two or more op erations then eachmust b e rounded in the ap

propriate direction so as to ensure that the nal result is rounded as

x x

the denomi sp ecied For example when evaluating x x

nator must be rounded towards whereas the numerator and the

The IEEE oatingp oint standard

quotientmust b e rounded towards

Sometimes the correct rounding mo de to use when evaluating a sub

expression dep ends on the values of other subexpressions For example

x x

in xy the result of y must be rounded up if x is p ositive

and down if x is negative We shall generally avoid such complicated

situations and use external tests to ensure that each op eration inside a

or can b e evaluated with a single statically determined rounding

mo de

Unfortunately the roundingmo de controls of most pro cessors are

extremely inconvenient to use so that changing the rounding mo de is

h exp ensivetypically a dozen machine instructions To minimize suc

changes we can use the identities

E E E E

These formulas work for all oat values b ecause according to the IEEE

standard sign negation is involutory exact and never overows

Single vs double precision

When implementing an SVCmodelonemayhavetocho ose b etween us

ing single precision bit or double precision bit Three decades

ago the dierence in sp eed and storage space dictated that most nu

merical computing should b e p erformed in single precision with double

precision b eing used only when really necessary Nowadays storage is

rarely a limiting factor and most oatingp oint pro cessors will b e just

as fast on bit op erations as on bit ones so the advantages of single

precision have all but disapp eared and double precision is increasingly

b eing seen as the default For instance the original denition of the C

programming language stated that all oatingp oint arithmetic was to

b e done in double precision p

Moreover many mathematical libraries and programming languages

will automatically convert arguments and results to the doubleprecision

format Now another questionable feature of IEEE denormalized

numb ers has the unfortunate consequence of making conversion be

tween single and double precision very exp ensive on certain machines

b ecause it has to b e done by software Therefore it is quite p ossible in

the authors own exp erience for a numerical computation to b ecome

much faster when converted from single to double precision

Chapter

Interval arithmetic

In this chapter we describ e interval arithmetic the simplest and most

ecient of all validated numerics mo dels We also discuss how to write

pro cedures for most elementary op erations and functions

Intro duction

Interval arithmetic IA also known as interval analysis is a range

based mo del for numerical computation where each real quantity x is

represented byaninterval x of oatingp ointnumb ers Those intervals

are added subtracted multiplied etc in such a way that each com

puted interval x is guaranteed to contain the unknown value of the

corresp onding real quantity x

Interval arithmetic was invented in the s by Ramon E Mo ore

then a Ph D student at Stanford University Its prestige and

p opularity among the community has b een somewhat

of a rollercoaster ride Interest in IA was quite high for a few years af

ter Mo ores thesis at which time it seem to have been oversold as a

panacea for all numerical computation problems As the euphoria sub

sided a reaction set in and for the next two decades IA was viewed very

negativelyto suchanextent that authors who needed to use intervals

in their algorithms rep ortedly had to call them segments or ranges

in order to get their pap ers published

However in recent years there has been a strong and steady resur

gence of interest in IA Practitioners and researchers in the most varied

Interval arithmetic

eldsfrom pure mathematics to computer graphics to economics

have found that IA provides a simple and relatively ecient solution

to computational problems that were intractable under the classical

approach IA is now appreciated for its ability to manipulate imprecise

data keep track automatically of truncation and roundo errors and

prob e the b ehavior of functions eciently and reliably over whole sets

of arguments at once

Successful applications of IA include for example robust ro ot nders

for ray tracing domain enumeration for solid mo deling

surface intersection global optimization

We will discuss some imp ortant applications of IA in

Chapter It is also noteworthy that interval arithmetic recently played

ey role in settling the double bubble conjecture a longstanding a k

op en problem in the theory of minimal surfaces

This revival of IA was greatly help ed by the publishing and universal

acceptance of the IEEE oatingp oint standard The standard man

dated the implementation of directed rounding which is indisp ensable

for practical implementations of IA Moreover adoption of the standard

by all ma jor computer manufacturers encouraged the development of

p ortable IA packages Finally the high visibility of the

standard undoubtedly made programmers more aware of the oating

point roundo problem and hence interested in selfvalidated numeric

computation

Interval arithmetic and related techniques nowhave a dedicated jour

nal Reliable Computing formerly Interval Computations published by

Kluwer Academic Publishers a central web site containing a wealth of

information and links and several established conferences

Intervals

In interval arithmetic each quantity x is represented by an interval

x hi ofrealnumb ers meaning that the true value of x is x xlo

known to satisfy x lo x x hi

Those intervals are added subtracted multiplied etc in sucha way

that each computed interval is guaranteed to contain the unknown

value of the quantity it represents Thus for example the sum and

Intervals

dierence of two intervals x and y is computed as

x y xlo ylo x hi yhi

x y xlo y hi x hi y lo

These formulas ignore roundo errors overow and other details which

we address in Section

The reader maycheck that these are the smallest intervals that con

tain x y and x y resp ectively for all p ossible pairs x x and y y

Analogous formulas can be devised for multiplication division square

ro ot and all common mathematical functions see Section

Precise denition of intervals

In the spirit of SVC b efore pro ceeding any further wemust dene very

precisely the representation and semantics of intervals in the IA mo del

Accordinglywe dene a nonempty interval as a set of the form

xlo x hi f x R x lo x x hi g

where x lo the lower bound of the interval is in F fg and x hi

the upper bound is in F fg

We also dene the empty interval as synonymous of the empty set

The upp er and lower b ounds of are not dened

Note that every nite oat x can b e represented as an interval x

x Every other real numb er can b e approximated byaninterval a b

where a and b are consecutiveoatvalues p ossibly innite

Note that the bounds of an interval are oat values p ossibly innite

but its elements are drawn from the nite real numbers R Thus for

p

example the interval includes and and but not

is the same as R the set of all nite In particular

real numb ers

The obvious way to represent a nonempty interval x in the com

puter is by record with two oat comp onents x lo and x hi The empty

interval could then be represented by any such pair with x lo x hi

We shall use sp ecically b ecause this choice simplies the

implementation in some cases

Since and are not real numb ers the pairs withx lo xhi

or x lo x hi would denote the empty set to o However

Interval arithmetic

it is advisable to outlaw these two pairs altogether so that the common

test x can b e consistently implemented as x lo x hi

In summary a pair xlo x hi represents a nonempty interval if

x lo F fg x hi F fg and x lo x hi or the empty

interval if x lo x hi The pairs

NaN NaN a NaN NaN a

are not valid intervals for any oat a

Wesay that an intervalx stradd les arealnumber z whenx lo z

x hi When x lo z or x hi z and the interval is not empty wesay

that x merely touches z

Computing with IA

For every op eration f x y from reals to reals such as sum pro duct

square ro ot etc the interval arithmetic mo del denes a corresp onding

interval extension f x y This op eration returns some interval

preferably the smallest onethat contains all values of f x y where

the variables xy range indep endently ov er the given intervalsx y

For elementary op erations implementing these interval extensions is

generally straightforward we need only devise formulas for the maxi

mum and minimum values of f when the arguments x y vary inde

pendently over sp ecied intervals Often a case analysis is required

For certain functions determining the exact maxima and minima

may be too dicult In such cases it is acceptable to return any com

putable interval that contains the theoretical range of the function not

necessarily the smallest one That is we are allowed to increase the

upp er b ound and decrease the lower b ound doing so do es not violate

ariant of range analysis Section the fundamental inv

Once we have implemented interval extensions for all elementary

op erations and functions interval extensions for a complicated function

can b e computed by comp osing these primitiveformulas in the same way

the primitive op erations are comp osed to compute the function itself In

other words any algorithm for computing a function using primitiveop

erations can b e readily and automatically interpreted as an algorithm

for computing an interval extension for the same function This is sp e

cially elegant to implement with programming languages that supp ort

Computing with IA

op erator overloading suchasAda C Fortran and PascalXSC

but can be easily implemented in any programming language either

manually or with the aid of a precompiler Thus the class of functions

for which interval extensions can be easily and automatically com

puted is much larger than the class of rational p olynomial functions

This proves the fundamental invariant of range analysis for IA

Handling roundo errors

A common reason for widening the result interval is the need to rep

resent the endp oints as oatingp oint values In order to preserve the

fundamental invariant we must be careful to round each b ound in the

prop er direction namely the lower b ound must be rounded towards

and the upp er b ound towards

This concern also applies to anyintermediate values that may aect

the computed b ounds Suchvalues must always b e rounded in the most

conservative directionthe one which leads the resulting interval to b e

widened rather than narrowed In particular we can always replace the

input intervals by wider ones See Section for an example where

this action is necessary

Handling overow

In some op erations wemust also worry ab out the p ossibilityofoverow

when computing the extrema of f in the given interval Fortunately in

IEEEcompliant oatingp oint arithmetic overow generally pro duces a

sp ecial innityvalue with the appropriate sign so that we do not need

those cases explicitly Thus if overow o ccurs the resulting to handle

interval will automatically extend to innity in either or b oth directions

and the fundamental invariant will b e preserved

However the IEEE standard also sp ecies that certain op erations

such as or and result in the sp ecial nota

numb er value NaN Recall that we decided in Section to forbid

intervals with NaN endp oints b ecause of its ambiguous meaning and

bizarre prop erties Therefore whenever an op eration might return NaN

we must test for that event and return either R or

as appropriate The reason whywe outlawed the intervals

and is precisely to reduce the need for such tests

Interval arithmetic

Handling domain violations

When implementing an elementary function such as square ro ot and

logarithm which is dened only for a prop er subset of the real line one

should simply ignore any part of the argument interval that is outside

domain of denition

Thus for example the square ro ot routine when given

should simply return It would not b e appropriate to signal an

error in this case b ecause the argumentinterval is merely a conservative

estimate of the true range of the corresp onding quantity The fact that

the interval extends into the negative values do es not imply that the

quantitymaybenegative

This soft p olicy towards undened values may seem to violate the

fundamental invariant of range analysis After all do es not

p

contain all p ossible values of x when x ranges over some of

those values are undened or imaginary However if an algorithm

p

says to compute x at some point and exp ects a real result then the

squarero ot routine must assume that the true value of x would always

b e p ositiveinany exact evaluation of the algorithm If the true value of x

could be negative at that point then the algorithm would be logically

incorrect Now the IA mo del cannot guarantee that the nal intervals

are correct if the exact algorithm is not correct otherwise

would b e the only valid output

On the other hand if the argument interval to an IA op eration is

entirely outside the domain of denition of the corresp onding function

then something is clearly wrong with the program In that case the IA

routine should probably signal an error

Another alternativeinsuch cases is to use a sup ersoft p olicyand

return the emptyinterval It is then the programmers resp onsibility

to test whether the result is and take action if necessary In that

case for consistencyevery IA op eration should return whenever one

or more op erands are

Early detection vs soft failure

The choice b etween signalling an error and returning is a sp ecial case

of a classical dilemma of software engineering the tension b etween early

detection versus soft failure

Sp ecic op erations

The early detection p olicy requires that exceptional conditions

such as endofle or division by zero be treated as immediate breaks

in the control ow which require explicit handling One advantage of

this p olicy is that programming errors that derive from or cause such

conditions tend to b e detected so oner and hence are easier to debug It

also has the merit of forcing the programmer to b e aware and handle all

exceptional cases

The soft failure p olicyincontrast implies that exceptional condi

tions should b e handled as ordinarily as p ossible namely by enco ding

them as distinguished values that can b e returned and assigned likeany

other value This approach usually simplies the co de that follows pro

cedure calls since it is not necessary to handle the exceptional results

explicitly On the other hand under this policy one often needs extra

tests at pro cedure entry to recognize and handle the exceptional values

ware engineering exp erts seem still divided on this issue The de Soft

signers of the IEEE standard avoided taking a stand on this matter they

provided innities and NaNs in accordance to the softfailure approach

but also allowed the programmer to sp ecify whether the creation of such

values should cause an error trap as required for early detection

Sp ecic op erations

We now describ e in detail how to compute interval extensions for the

elementary op erations and functions

We shall use a Pascallike syntax Iteration and branching will be

sp ecied with for while and if commands with obvious semantics

However command grouping will be indicated by indentation alone

without the begin end brackets of Pascal Comments app ear in

y Assignment statements will b e written variable italics guarded b

value Variables will b e declared by var name TypeasinPascal but

declarations may app ear at the b eginning of any comp ound statement

as in C or Algol The typ e Float stands for any IEEE oatingp oint

value except NaNandFinite is Float nf g

For actual implementations of interval op erations see the public do

main libraries

Interval arithmetic

Negation

We b egin with negation b ecause of its simplicity

IAnegx Interval Interval

Computes x

return x hi x lo

Note the reversal of the upp er and lower b ounds Note also that

negation of intervals unlike almost any other op eration is not aected

by roundo or overow Note nally that the co de ab ove works even

for the sp ecial intervals R and

Addition

The co de for addition is straightforward except that wemust explicitly

return if either argumentis

IAaddx y Interval Interval

Computes x y

if x or y then

return

else

return x lo ylo x hi yhi

The reader may want to check that this algorithm works even for

intervals with one or two innite b ounds Note that a and

b imply a b even when a b overows the nite

oatingp oint range and similarly for a b and Therefore the

algorithm ab ove cannot return the forbidden intervals

as long as they are not given as arguments and

One might think that the initial tests for could be avoided if

were consistently represented by since the general formula

would then give

y y lo y hi

as desired Unfortunately this simplication would fail when adding

to R b ecause is NaN So the tests seem

unavoidable

Sp ecic op erations

Translation

A common sp ecial case of addition is translation of an interval by a nite

oat c whichmay b e regarded as an interval of zero width There is no

reasonable way to extend this op eration for innite values of cbecause

a b is whichisnota valid interval

IAshiftx Interval c Finite Interval

Computes x c

if x then

return

else

return x lo c x hi c

Subtraction

To subtract twointervals we merely add the rst to the negation of the

second Combining the two op erations into a single pro cedure weget

IAsubx y Interval Interval

Computes x y

if x or y then

return

else

return x lo y hi x hi y lo

Scaling

Scaling an interval by a p ositive factor is straightforward scaling by a

negative factor requires swapping the b ounds Scaling by of course

should result in the degenerate interval Wemust handle this case

explicitlyincase the interval has innite b ounds b ecause NaN

in IEEE arithmetic As in the case of translation there is no reasonable

way to dene scaling when c is innite

Interval arithmetic

IAscalex Interval c Finite Interval

Computes c x

if x then

return

else if c then

c x hi return c x lo

else if c then

return c x hi c x lo

else

return

Multiplication

For the multiplication routine we need formulas for the maximum and

minimum of xy when the pair x y ranges over a rectangle xlo

x hi ylo y hi The key observation here is that for a xed value

of x the pro duct xy is linear hence monotonic in y and viceversa It

follows that the extrema must o ccur at corners of the rectangle

The simplest implementation is thus

IAmulx y Interval Interval

Computes x y naive version

if x or y then

return

else if x or y then

return

else

a min fx lo y lo x lo y hi x hi y lo x hi y hi g

b max f x lo y lo x lo y hi x hi y lo x hi y hi g

return a b

Note that we must handle separately the cases where one of the

op erands is in case the other one has innite b ounds which

would lead to NaN b ounds in the result

Sp ecic op erations

This routine requires eight multiplications in all nontrivial cases

However we can reduce this to only two multiplications in most cases

and four in only one case by testing the signs of the op erands in order

to determine which corners yield the maximum and minimum pro duct

There are nine main cases to consider dep ending on whether each

interval is entirely nonnegative entirely nonp ositive or straddles zero

see Figure If either x or y has consistent sign then the two

extremal corners are determined by the sign combinations alone If

both intervals straddle zero then the signs alone do not suce we

must evaluate the pro duct at all four corners and compare the results

Figure The nine cases for multiplication p ossible maximum 

p ossible minimum

Interval arithmetic

IAmulx y Interval Interval

Computes x y

if x or y then

return

or y then else if x

return

else if x lo then

if y lo then

return x lo y lo x hi y hi

else if y hi then

x lo y hi return x hi y lo

else

return x hi y hi x hi y lo

else if x hi then

if y lo then

return x lo y hi x hi y lo

else if y hi then

return x hi y hi x lo y lo

else

return x lo y hi x lo y lo

else

if y lo then

return x lo y hi x hi y hi

else if y hi then

x lo y lo return x hi y lo

else

a min fx lo y hi x hi y lo g

b max f x lo y lo x hi y hi g

return a b

Recipro cal

The recipro cal function x is not dened for x Hence as discussed

in Section the IA implementation must implicitly exclude that

argumentvalue from the input interval x

The algorithm has two main cases dep ending on whether the input

interval x straddles zero or not In the rst case the recipro cal may

Sp ecic op erations

assume arbitrarily large and arbitrarily small real values hence the cor

rect interval result must be R In the second case the

recipro cal is monotonic so we only need to evaluate it at the interval

endp oints

However input intervals with zero b ounds must be handled sepa

rately The IEEE standard denes asand but

we cannot trust that zero b ounds in the argument will have the right

sign due to the existence of negative zero

IAinvx Interval Interval

Computes x

if x then

return

else if x lo and x hi then

return

else if x then

return

else

y y

if x hi then a else a x hi

x x

if x lo then b else b x lo

b return a

x x

y y

It should be noted that x hi may be and x lo may

be even when the denominators are nite Fortunately these p os

sible overows will not aect the correctness of the result

Division

The division of x by y could b e implemented in IA as the pro duct of x

by y However we may gain a couple of bits of accuracy by co ding

a sp ecial routine for division For example the latter should b e able to

without any roundo error whereas evaluate

the recipro cal of would intro duce some rounding error

In fact if the b ounds ofy are to o close to zero then their recipro cals

may overow leading to an innite range for x y even when the

quotientmay still b e nite

Like multiplication and recipro cal the division algorithm must be

broken down into several distinct cases dep ending on the signs of the

Interval arithmetic

op erands The co de structure is a bit simpler than multiplication how

ever b ecause in the three cases where y straddles zero the result is

simply the entire real line

IAdivx y Interval Interval

Computes x y

if x or y or y then

return

else if x then

return

else if y lo then

if x lo then

x x

y y

return x lo y hi x hi y lo

else if x hi then

x x

y y

x hi y hi x lo y lo return

else

x x

y y

return x lo y lo x hi y lo

else if y hi then

if x lo then

x x

y y

return x hi y hi x lo y lo

else if x hi then

x x

y y

x lo y hi return x hi y lo

else

x x

y y

return x hi y hi x lo y hi

else

return

Square ro ot

The IA version of square ro ot is extremely simple b ecause the function

is monotonic and is not liable to overow or underow for any nite

arguments The only sp ecial precaution we must take is to remove the

negative part of the input range if any using the sup ersoft p olicy describ ed in Section

Sp ecic op erations

IAsqrtx Interval Interval

p

Computes x

if x or x hi then

return

else if x lo then

x x

p

x hi return

else

x x

p p

return x lo x hi

y y

Logarithm

The co de for logarithm is quite similar to that of square ro ot except

that log is not dened at zero and tends to at its right

IAlogx Interval Interval

Computes logx

if x or x hi then

return

else if x lo then

x x

return log x hi

else

x x

y y

return log x hi log x lo

This co de can b e used for computing logarithms in any base Note

that we do not need to handle the case x lo separately b ecause

log is in IEEEcompliant platforms

Exp onential

x

The exp onential function expx e is also monotonic and dened

everywhere

Interval arithmetic

IAexpx Interval Interval

Computes expx

if x then

return

else

x x

y y

expx hi return expx lo

The oatingp oint evaluation of exp x will overow for values of x

ab ove a few hundred Ideally these overows should not require sp e

x x

y y

cial handling in the co de expx hi should b e and expx lo

should be the maximum nite value in which case the interval result

would b e correct

Sine and cosine

There are several features of the trigonometric functions sin and cos

that require sp ecial attention For one thing they are nonmonotonic

therefore when computing their extremal values in some interval we

must consider their lo cal maxima and minima as well as the interval

endp oints

The lo cal extrema of cos occur at integer multiples of Thus

after disp osing of the empty case our rst step is to scale the input

interval x by If the resulting range straddles an even integer then

the interval x contains a maximum and we can set the upp er b ound

of the result to Symmetrically if the scaled range straddles an odd

integer then the lower bound is If only one of these conditions

holds then we must compare the values of cos at the endp oints of x

in order to determine the other b ound for the result Finally if the

scaled interval straddles no integers then cos is monotonic increasing

or decreasing in the original interval x and the extremal values occur

at x lo and x hi

take into account all roundo errors In Here as always we must

particular since is not exactly representable as a oat value wemust

treat it as an interval lo hi of nonzero width where lo and

hi are consecutive oats

Sp ecic op erations

IAcosx Interval Interval

Computes cosx

if x then

return

else

Scale x by

y y y y

if x lo then a x lo hi else a x lo lo

x x x x

if x hi then b x hi lo else b x hi hi

odd and even integers in a b Check for

m bac

n dbe

if n m then

x hi There are no extremal values in xlo

if evenm then

u cosx lo v cosx hi

else

u cosx hi v cosx lo

else if n m then

At most one extremal value in xlo x hi

if evenm then

u v max fcosx lo cosx hi g

else

u min f cosx lo cosx hi g v

else

x hi Thereseem to be maxima and minima in xlo

u v

return u v

The notation b ac means as usual the greatest integer not greater

than a In order to avoid integer overow which in most machines is

either fatal or hard to detect this quantitymust b e computed entirely

in oatingp oint with the floor function from the standard C math

libraryorequivalent That way the op eration m bac will return the

correct result without rounding even if j aj is very large or the exp onent

of b ac is greater than that of a Similar remarks apply to d be

The logic b ehind this algorithm is somewhat subtle Note that a

and b may b e aected by roundo errors and so n m do es not

Interval arithmetic

imply that the original interval x actually straddles any lo cal maxima

or minima Nevertheless we can safely assume that it do es and use the

parityofm to decide whether this presumed extremum is or A

similar observation applies to the case n m Here we are relying

on an imp ortant principle of interval arithmetic in a correctly imple



mented IA op eration replacing any argument x by another interval x

that contains x will not aect the validity of the result

The co de works even when x is innite in one or b oth directions

Thecodeforsinisentirely analogous except that the lo cal extrema are

Thus wemust subtract from a rounding down and shifted by

from b rounding up

Other elementary functions

The preceding examples should oer sucient guidance for the reader to

implementany other elementary function f in IA assuming the availabil

ity of a routine that computes f in oatingp oint with directed rounding

When such a routine is not available however the task b ecomes

much harder Implementing say arctan x with directedrounding and

lastbit accuracy requires far more work than most numerical program

mers can spare

Still if one has an algorithm F x that computes f xwithknown

error b ound x then one can simulate crudely the desired directed

x x

y y

rounding pro cedure by computing F x x or F x x as

appropriate

Utility op erations

We will now describ e some useful IA op erations that are sp ecic to

intervals rather than mere interval extensions of ordinary op erations

Midp oint

The midpoint of an interval is a nite oat value contained in the inter

val and as close as p ossible to its center x lo xhi By denition

the midp oint of a semiinnite interval is either M or M where

M MaxFloat is the maximum nite oatingp oint numb er and the

yconvention midp ointof R is b

Utility op erations

Because of p ossible overow we must divide each endp oint by

b efore adding them Because of p ossible underow in the division we

must round each term in a dierent direction

IAmidx Interval Float

Computes the approximate midpoint of x Assumes x R

if x or x R then

return

else if x lo xhi then

return x lo

else if x lo then

return MaxFloat

else if x hi then

return MaxFloat

else

x x

y y

return h x lo hix i

The divisions by are exact except for numb ers of very small mag

nitude when the quotientmay b e rounded by half a unit in the last bit

x x

y y

At worst the sum x lo x hi will dier from the exact mid

pointby half a unit in the last bit and therefore will lie in the original

interval x

The nal rounding of the sum may increase the error to of the

last bit so the returned result may not be the most accurate answer

p ossible In any case since rounding cannot cross over a representable

oat the result will still b e inside x

Radius

The halfwidth or radius of an interval is half of the dierence b etween

the upp er and low er endp oints rounded upwards As sp ecial cases the

halfwidth of unb ounded intervals is and that of is zero

The halfwidth of every b ounded interval is representable as a nite

oat This prop erty makes the halfwidth more useful than the total

width x hi x lo which may overow for some b ounded intervals

Another useful prop erty of the halfwidth is that it is zero if and only if

the interval is empty or contains a single p oint

Interval arithmetic

In order to maximize the usefulness of the halfwidth r we must

round it in such a way that the given interval x is contained in the

m r where m is the midp oint of x as computed interval m r

by IAmid The simplest waytoachieve this goal is to derive the half

width from the midp oint

IAradx Interval Float

Computes the halfwidth of x

if x or x lo xhi then

return

else

m IAmidx

return max fm x lo x hi mg

Thanks to the rules of IEEE arithmetic this co de will return

whenever x lo or x hi Recall that and

are not valid intervals

In practice since IArad and IAmid are often used together it may

b e convenienttocombine them into a single pro cedure that returns b oth

parameters

Meet intersection

The settheoretical intersection or meetoftwointervals x and y is an

other interval p ossibly empty denoted byx y orx y The intersection

is trivial to compute

IAmeetx y Interval Interval

Returns x y ie x y

if x or y or x lo y hi or x hi y lo then

return

else

return max f x lo y lo g min fx hi y hi g

If the internal representation of is anypaira b with ab then

the if test can b e eliminatedthe max min formula will automatically return when appropriate

Utility op erations

This op eration is typically used when we obtain by dierent lines of

 

reasoning or computation two ranges x and x that are b oth known to

 

contain some quantity x The intervalx x condenses that information

into a single interval

Join convex hull

The union of twointervalsx andy may not b e a single interval However

we can easily compute their join or convex hul l which is the smallest

interval x y that contains b oth

IAjoinx y Interval Interval

Returns x y

if x then

return y

else if y

return x

else

max fx hi y hi g return min fx lo y lo g

The tests for can b e omitted only if the empty interval is consis

tently representedas If pairs like are also allowed

to represent then they must be handled as sp ecial cases as shown

ab ove

The join op eration is often used when co ding the interval version of

a conditional algorithm If x is variable then a test like

if x

then y f x

else y g x

can often b e translated into

u f x

v gx

y u v

where f and g are the interval extensions of f and g

Interval arithmetic

Note in this example that the case x is incorrectly included in

b oth branches The rst statement should havebeen

u f x

but standard IA only allows for closed intervals However as discussed in

Section the computation f must tolerate the widening of

even if f is undened for x to

The error explosion problem

The main weakness of IA is that it tends to be to o conservative the

computed interval for a quantity may be much wider than the exact

range of that quantity often to the point of uselessness This problem

is particularly severe in long computation chains where the intervals

computed at one stage are inputs for the next stage Unfortunately

such deep computations are not uncommon in practical applications

This overconservatism is mainly due to the assumption that the

unknown values of the arguments to primitive op erations may vary

independently over the given interval If this assumption is not valid

that is if there are any mathematical constraints between those quan

tities then not all combinations of values in the given intervals will

be valid In that case the result interval computed byIAmaybemuch

wider than the exact range of the result quantity

As an extreme example when weevaluate the expression x x with

for xweget IA given the intervalx

instead of which is the true range of the expression The

IA subtraction routine cannot tell that the two given intervals actually

denote the same quantity since they could also denote two indep endent

quantities that just happ en to have the same range

For a less extreme and more typical example consider evaluating

x x where x is known to lie in the interval x Applying

the formulas of Section blindlyweget

x

x x

On the other hand a trivial analysis shows that the true range of

x x is The relative accuracy of the IA computation is

The error explosion problem

thus meaning the resulting interval was

times wider than what it should b e

The large discrepancy b etween the twointervals is due to the inverse

relation b etween the quantities x and xwhich is not known to the

interval multiplication algorithm This problem aects all op erations

with two or more arguments if the corresp onding quantities are not

indep endent and are correlated in the wrong way then the result

interval maybemuch wider than necessary

Error explosion

The overconservatism of IA is particularly bad in a long computation

chain b ecause the overall relative accuracy of the chain tends to b e the

pro duct of the relative accuracies of the individual stages In such cases

one often observes an error explosion as the evaluation advances down

the chain the relative accuracy of the computed intervals decreases at

an exp onential rate Thus after a few such stages the intervals may

easily b e to o wide to b e useful by many orders of magnitude

For an example of this phenomenon consider the function g x

p p

x x x Figure a shows the graph of g x black

curve and the result of evaluating g x with standard IA for consec

utive equal intervals x in Figure b shows the same data

for the second iterate hxg g x of the same function Although the

k k

iterates g converge to a constant function the intervalsg x computed

by standard IA diverge

Figure Error explosion in IA estimates for iterated functions

Interval arithmetic

Error explosion and sub division depth

When the IA evaluation of f x y pro duces an interval that is to o

wide for the purp ose at hand we can often improve matters by parti

tioning the argument range x y into two or more subranges

evaluating f on each of these and combining the results into a single

interval However this technique is not very eective against error ex

e accuracy of an IA op eration is generally plosion b ecause the relativ

indep endentof the width of the input intervals That is IA has basi

cally rstorder approximation error So if the relative accuracy of a

computation is to o small to be useful by a factor of then we will

probably have to split the domain into subintervals to obtain a

useful result

Avoiding error explosion

In order to avoid this error explosion problem we should try to arrange

the computation in sucha wayastoavoid unfavorable correlations b e

tween the arguments of the IA op erations In particular minimizing

the number of o ccurrences of a variable in a formula usually results in

tighter range estimates if eachvariable o ccurs only once then the range

estimates pro duced by IA are exact

Another general technique is to lump several arithmetic op erations

into a single macro op eration and write a sp ecialpurp ose IA routine

for it Since the routine can take into account the correlation between

shared subexpressions it may be able to compute a tighter range for

the result than which could b e given by using IA at each step

However these remedies can only be applied to relatively simple

computations over restricted domains When the expression to b e com

puted is determined only at runtime or involves dozens of variables

and op erations avoiding bad correlations is almost imp ossible In such

cases one should consider using more sophisticated SVC mo dels such

as those describ ed in Chapter

Powers

A trivial but imp ortant example of avoidable error explosion is the eval

n

uation of powers z x The naive IA implementation based on re

Avoiding error explosion

p eated multiplication will show p o or accuracy for intervals that straddle

zero In particular for x the evaluation of z x in IA

even though x cannot b e negative as x x will give z

For this reason IA libraries should always include sp ecial routines

for squares and other integer p owers Here is a typical example

IAsqrx Interval Interval

Computes x

if x then

return

else if x lo then

x x

return x lo x hi

y y

else if x hi then

x x

return x hi x lo

y y

else if x hi x lo then

x x

x hi return

else

x x

return x lo

Bezier b ounding for p olynomials

Another imp ortant example is the evaluation of a p olynomial hxover

an interval x A naive IA evaluation either as a sum of powers or

through Horners rule is likely to be aected by negative correlation

among its terms We can obtain a tighter range for hx by computing

the BezierBernstein co ecients of h over x and returning the

smallest interval that contains them all

Alternating series

As another example consider the problem of evaluating a series f x

P



i

x If the argument x or some of the co ecients a are negative a

i i

i

then there will often b e adverse correlation b etween the various terms In

that case the series computed with IA will have p o or relative accuracy

even if the series itself is strongly convergent

However if we happ en to know that the terms have alternating signs

and nonincreasing magnitude then we can usually improve the relative

Interval arithmetic

accuracy by evaluating two terms at a time and estimating the maxi

mum range of the pair analytically

As a concrete example supp ose we want to evaluate sin x by the

Taylor series



k

X

x

k

z

k

k

for x This is merely an example of series manip

ulation and not necessarily the b est way to compute sin x Evaluating

each term separatelyweget

The sum of the rst three intervals is and the sum

of the rst four is Since the terms have alternating

signs we know that the innite series lies between these two partial

sums thus we can safely set z

Now the true range of sin x in that interval is

The relative accuracy is thus only If wecontinued adding more

terms the total interval would not shrink any furtherin fact it would

slowly grow wider b ecause of accumulated roundo error

On the other hand we can rewrite the series as



k k

X

x x

z

k k

k



k

X

x x

k k k

k

If x lies in then each term of this series is nonnegative and

monotonically increasing with x So for argument ranges x contained

in that interval a tight range for each term will b e

x x

y y

f xhi r f xlo

k k k

Avoiding error explosion

where

k

x x

f x

k

k k k

In our case the rst few terms are

Since the original series is alternating and x lies in the

sum of all the terms with k in the series lies between and

Thus we can safely replace those terms by the interval

and return z The relative

accuracy is now

Unfortunately these remedies are hardly applicable when the expres

sion to be computed is determined only at runtime or involves more

than a few variables and op erations

Chapter

Ane arithmetic

In this chapter we describ e another metho d for range analysis whichwe

call ane arithmetic AA This mo del is similar to standard interval

arithmetic to the extent that it automatically keeps track of rounding

and truncation errors for each computed quantity In addition AA keeps

trackofcorrelations between those quantities

Thanks to this extra information AA is able to provide muchtighter

b ounds for the computed quantities with errors that are approximately

quadratic in the uncertainty of the input variables This advantage of

AA is esp ecially noticeable in computations of great arithmetic depth

or sub ject to cancellation errors

As one may exp ect the AA mo del is more complex and exp ensive

than ordinary interval arithmetic However we b elieve that its higher

accuracy will b e worth the extra cost in many applications as indicated

by the examples given in Chapter

Ane forms

In ane arithmetic a partially unknown quantity x is represented by

an ane form x which is a rstdegree p olynomial

x x x x x

n n

The co ecients x are nite oatingp oint numb ers and the are

i i

symb olic real variables whose values are unknown but assumed to lie in

the interval U We call x the central value of the ane

Ane arithmetic

form x the co ecients x are the partial deviations and the are the

i i

noise symbols

Each noise symbol stands for an indep endent comp onent of the

i

total uncertainty of the quantity x the corresp onding co ecient x gives

i

the magnitude of that comp onent The source of the uncertaintymaybe

either external due to original measurement error indeterminacyor

numerical approximation aecting some input quantity or internal

due to arithmetic roundo series truncation function approximation

and other numerical errors committed in the computation of x

In particular the internal sources of error include the need to cast

the results of nonlinear op erations as ane forms As we shall see in

Section this casting requires approximating a nonlinear function of

the noise symbols by an ane function The error of this approxima

i

tion will b e represented in the result by a new noise symbol

k

The fundamental invariant of ane arithmetic

The semantics of ane forms is formalized bythefundamental invariant

of ane arithmetic

At any stable instant in an AA computation there is a single

assignment of values from U to each of the noise variables

in use at the time that makes the value of every ane form

equal to the value of the corresponding quantity in the ideal

computation

By stable instant we mean any time when the algorithm is not p erform

ing an AA op eration

t range of ane forms Join

The key feature of the AA mo del is that the same noise symbol may

contribute to the uncertaintyoftwo or more quantities inputs outputs

or intermediate results arising in the evaluation of an expression

The sharing of a noise symbol by two ane forms x and y in

i

dicates some partial dep endency between the underlying quantities x

and y The magnitude and sign of the dep endency is determined by the

Joint range of ane forms

corresp onding co ecients x and y Note that the signs of the co e

i i

cients are not signicant in themselves but the relative sign of x and y

i i

denes the direction of the correlation

For example supp ose that the quantities x and y are represented by

the ane forms

x

y

From this data we can tell that x lies in the interval x

and y lies in y However since they b oth include the same

noise variables and with nonzero co ecients they are not entirely

indep endent of each other In fact the pair x y is constrained to lie

in the dark grey region of R depicted in Figure

Obviously this dep endency information would b e lost if we were to

replace x and y by the intervals x and y Taken individually these

intervals enco de precisely the same ranges of values as the ane forms

Taken jointly however they only tell us that the pair x y lies in the

shown in light grey in Figure rectangle x y

28

12

614

Figure Joint range of two partially dep endent quantities in AA

The shap e of joint ranges

In Figure observethatthejoint range of x and y is a convex poly

gon symmetric around the central pointx y Each pair of parallel

Ane arithmetic

sides corresp onds to a noise variable app earing in x or y The co ef

i

cients of determine the length and direction of those two sides the

i

corresp onding plane vectors are x y and x y The value

i i i i

pairs x y lying on those sides are obtained from the ane forms by

varying over U while all other noise variables are xed at or

i j

in some sp ecic pattern

In general if wehave m ane forms dep ending on n noise symb ols

alues for the corresp onding m quantities then the set of p ossible jointv

m

will be a centersymmetric convex p olytop e in R That polytope is

m n

the parallel pro jection on R of the hyp ercub e U by the ane map

consisting of the m ane forms

Each k dimensional face of this p olytop e corresp onds to a subset E

of k noise variables app earing in the ane forms The p oints on that

k

face are obtained by ranging the variables in E over U while xing the

remaining variables at some sp ecic combination of and

Sp ecial ane forms

meaning no As in IA it is convenient to have sp ecial ane forms

value and R meaning any real value

Note that the set of values describ ed by an ordinary ane form is

necessarily b ounded b ecause all co ecients are nite oats and non

empty b ecause the set always contains the center value Therefore

the precise computer representations of and R must b e established by

convention For details over p ossible representations see Section

Note also that the sp ecial form R do es not record any dep endency

information That is if x y Rthenwe cannot infer any constraint

or relationship b etween quantities x and y The range of the p ointx y

as implied by those ane forms is the whole plane R

tors may b e tempted to allow innite oats Prosp ective AA implemen

as co ecients as inx in order to representunb ounded quan

k

tities within the general AA mo del Unfortunately such innite forms

cannot convey the relationship b etween say x and x when x has un

b ounded range Moreover in order to avoid NaNs the ane arithmetic

routines would have to test for such forms and handle them separately

Therefore such innite forms would be merely equivalent representa

tions of the single sp ecial value Rasdenedabove

Conversions between IA and AA

Computer representation

Computer representations of ane forms will b e discussed in Section

Until then we will describ e AA algorithms in a representationindep endent

manner based on the following conventions

We assume an unlimited supply of noise variables eachidentied

i

byitsindex i a p ositiveinteger

We denote by E x the set of indices of all noise variables that

app ear in the ane form x

We denote by x the co ecient of the noise variable in the ane

i i

form x In particular x if i E x

i

The central value of x will b e denoted by x

The pro cedure newsym is assumed to return the index of a new

noise variable not used in any ane form computed so far

For any nite oat value cwe denote byc its AA representation

namely the ane form x with center value x c and E x fg

Conversions between IA and AA

Conversion b etween ane forms and ordinary intervals is often required

esp ecially in the input and output of numerical programs Although

simple in principle the conversions requires some care in the handling

of roundo errors Also sp ecial intervals such as and unb ounded

intervals must b e handled separately

ws that may Because of the unavoidable roundo errors and overo

o ccur in the conversion b etween intervals and ane forms the twomod

els are not exactly equivalent

In particular some nite ane forms must b e converted to innite or

semiinnite intervals Conversely all semiinnite intervals and some

nite ones must b e converted to R

Ane arithmetic

Conversion from AA to IA

If a quantity x is describ ed by the ane formx x x x

n n

then its value is guaranteed to b e in the interval

x radx x x radx

where

n

X

radx j x j

i

i

Note that x is the smallest interval that contains all p ossible values

of x assuming that each ranges indep endently over the interval U

i

Obviouslythisconversion discards all knowledge of constraints be

tween the computed quantities that was preserved in their ane forms

The quantity radx dened by plays an imp ortant role in arith

metic op erations see Section Wecallitthetotal deviation of x

AAradx AAForm Float

Computes radx

if x then

return

else if x R then

return

else

x x

P

return fj x j i Exg

i

IAfromAAx AAForm Interval

Converts x to interval

if x then

return

if x R then

return R

else

r AAradx

x r return x r

Computing with AA

Conversion from IA to AA

Given an interval x a b representing some quantity x in IA an

equivalent ane form for the same quantity is given by x x x

k k

where x is the midp ointoftheinterval and x is its halfwidth

k

b a b a

and x x

k

The noise symbol represents the uncertainty in the value of x that

k

is implicit in its interval representation x Since the interval tells us

nothing ab out p ossible constraints between the value of x and that of

other variables must b e distinct from all other noise symb ols used so

k

far in the same computation

AAfromIAx Interval AAForm

Converts x to ane form

if x then

return

if x lo or x hi then

return R

else

r IAradx

if r then return R

m IAmidx

k newsym

return m r

k

Note that the correctness of this co de dep ends on IAradx b eing

large enough to comp ensate any rounding of IAmidx as explained

in Section

Computing with AA

In order to evaluate a formula with AA wemust replace each elementary

op eration on real quantities by a corresp onding op eration on their ane

forms returning an ane form

Lets consider sp ecically a binary op eration z f x y The cor

resp onding AA op eration z f x y is a pro cedure that computes an

ane form for z f x y that is consistent with ane forms x y

Ane arithmetic

By denition

x x x x

n n

y y y y

n n

n

for some unknown values of U Therefore the quantity z

n

is a function of the namely

i

z f x y

f x x x y y y

n n n n

x y by an ane form The challenge nowistoreplacef

z z z z

n n

that preserves as much information as p ossible ab out the constraints

between x y and z that are implied by but without implying

any other constraints that cannot b e deduced from the given data

Ane op erations

If the op eration f itself is an ane function of its arguments x and y

then formula can b e expanded and rearranged into an ane combi

nation of the noise symbols Except for roundo errors and overows

i

this ane combination describ es all the information ab out the quanti

ties x y and z that can b e deduced from the given ane formsx and y

and the op eration f In particular for any R

x y x y x y x y

n n n

x x x x

n n

x x x x

n n

x Note that according to the formulas ab ove the dierence x

between an ane form and itself is identically zero The subtraction

formula knows that in this case the op erands are actually the same

quantity and not just two quantities that happ en to have the same range

of p ossible values from the fact that they share the same noise symbols

with the same co ecients

Ane op erations

By the same token linear identities such as x y x y or

x x x which do not hold in IA do hold in AA except for

oatingp oint roundo errors More generally computations that con

sist only of ane op erations with numeric co ecients will have relative

accuracy near unity

Negation

Negation is one of the few exact AA op erations

AAnegx AAForm AAForm

Computes x

if x or x R then

return x

else

var z AAForm

z x

for each i in E x do

z x

i i

return z

Handling roundo errors

For op erations other than negation we must take into account the

oatingp oint roundo errors that may o ccur when computing the co ef

cients of the result

One might think that as in IA it suces to round each co ecient z

i

in the safe direction namely away from zero However in AA there

is no safe direction for rounding a partial deviation z If the noise

i

variable o ccurs in some other ane form w then any change in z

i i

in either direction would imply a dierent correlation b etween the

quantities z and w and would falsify the fundamental invariantofane

arithmetic

In order to preserve the fundamental invariant whenever a computed

co ecient z diers from its correct value by some amount d we must

i

account for this error byaddingan extra term d where is a noise

r r

symb ol that do es not o ccur in any other ane form

Ane arithmetic

General ane op erations

An AA op eration must also handle the sp ecial forms and R Moreover

if any co ecient of the result overows then the op eration must return

R as the result

All these cases are taken into account by the following co de which

computes the general ane op eration in twovariables x y The

routine also accepts an extra uncertainly co ecient to be added to

the result This uncertaintyiscombined with the rounding error term

AAaffinex y AAForm Finite Float

AAForm

Computes x y

Assumes

if x or y then

return

else if x R or y R or then

return R

else

var z AAForm

z hx y i

if jz j then return R

a x y

b x y

x x

max fb z z ag

for each i in E x Ey do

z hx y i

i i i

a x y

i i

b x y

i i

x x

maxf b z z ag

i i

if then return R

k newsym z

k

return z

Note that we are allowed to round away from zero only b ecause

the noise variable is not yet shared byany other ane form

k

In practical implementations it is worth having several versions of

this co de sp ecialized for addition subtraction scaling and translation

The inner lo op can b e signicantly simplied in these cases

Nonane op erations

It is quite annoying that wehavetoevaluate the co ecient x y

i i

three times with dierent rounding mo des only to obtain an upp er

b ound on its error We can save two multiplications at the cost of

losing one bit of precision by computing just a and b and then setting

x x

z a b a

Here is a place where a trivial redesign of the oatingp oint pro

cessor interface would make a signicant dierence in computing time

For example supp ose the pro cessor returned in a sp ecial register fperr

the most recent op eration some upp er b ound to the roundo error of

p erformed Then we could simplify the b o dy of the for lo op ab oveby

u hx i fperr

i

v hy i fperr

i

z hu v i fperr

i

thus saving four multiplications and two additions p er co ecient

Nonane op erations

Lets now consider the case of a nonane op eration z f x y If x

and y are describ ed by the ane forms x and y then z is describ ed by

the formula

z f x x x y y y

n n n n

f

n

n

where f is a function from U to R If f is not ane then z cannot

be expressed exactly as an ane combination of the noise symbols

i

In that case wemust pick some ane function of the

i

a

f z z z

n n n

n

that approximates f reasonably well over its domain U and

n

then add to it an extra term z to represent the error intro duced by

k k

this approximation That is we return

a

z f z

n

k k

z z z z

n n

k k

Ane arithmetic

The term z will representthe residual or approximation error

k k

a

e f f

n n n

The noise symbol must b e distinct from all other noise symb ols that

k

already app eared in the same computation and the co ecient z must

k

b e an upp er b ound on the absolute magnitude of e thatis

jz jmax fje j U g

n n

k

Note that the substitution of z for e representsalossof

n

k k

information from this point on the noise symbol will b e implicitly

k

assumed to b e indep endentfrom when in fact it is a function of

n

them Any subsequent op eration that takes z as input will not b e aware

of this constraint between and and therefore may return an

n

k

ane form that is less precise than necessary

a

However as we shall see if the approximation f is prop erly chosen

then the error term z will dep end quadratically on the widths of the

k

ranges of the input variablesx andy so that its magnitude will decrease

even in the relative sense as those ranges b ecome smaller

Selecting the ane approximation

There are n degrees of freedom in the choice of the ane approxima

a

tion f In order to keep the algorithms reasonably simple and ecient

a

that are themselves ane com we will consider only approximations f

binations of the input forms x and ythatis

a

f x y

n

Thus wehave only three parameters to determine instead of n

For some op erations f the most accurate ane approximation to

f e may not be of the form However the restriction

n

to has relatively minor consequences The reason is that for

smo oth functions f the dierence between the two optimal approxi

mations restricted and unrestricted dep ends quadratically on the size

of the input ranges

Moreover for univariate functions f x the restriction is p erfectly

harmless b ecause it can be shown that the b est ane approximation

to f is indeed of the form x

Optimal ane approximations

The general algorithm

a

Once we have selected the approximation f of the form we can

use the generalpurp ose routine AAaffine of Section to compute

a

the ane form f x yx y To this form we then add the

extra term z which can b e combined with the roundo error incurred

k k

by AAaffine

In summary the general binary op eration z f x y can b e imple

mented as follows

AAForm AAfx y AAForm

Computes f x y

h Cho ose i

h Find max fjf x y x y j U gi

n

return AAaffinex y

Of course this same approach can be used for op erations with one ar

gument or more than two arguments

Optimal ane approximations

There are many goals wecan aim for when cho osing the ane approx

imation Accuracy is usually an imp ortant goal but hardly the

only one We will often have to settle for a less accurate solution in

exchange of eciency co de simplicity or other practical criteria

Accuracy measures

The accuracy of the result z can be quantied in many ways For in

co ef stance we can measure its error by the magnitude of the extra

cient z This number measures the uncertainty in the true value of

k

quantity z that the ane formz allows but fails to relate to the argument

uncertainties

n

Alternatively we can use the volume of the p olytop e P jointly

xy z

determined by the ane forms of x y and z f x y This volume

measures the uncertainty in the lo cation of the p ointx y z

Fortunately it turns out that for approximations of the form

the two error measures are equivalent It is easy to see that in this case

Ane arithmetic

the p olytop e P is a prism with vertical axis and parallel oblique bases

xy z

whose pro jection on the xy plane is the joint p olytop e P dened byx

xy

and y and whose height in the z direction is jz j Therefore the

k

volume of P is j z j times the volume of P Since the latter do es

xy z xy

k

a

not dep end on the approximation f minimizing the volume of P is

xy z

equivalent to minimizing j z j

k

Chebyshev minimax approximations

Approximations that minimize the maximum absolute error are the sub

ject of Chebyshev

Sp ecically let F be some space of functions p olynomials ane

forms etc An element of F that minimizes the maximum absolute

dierence from a given function f over some sp ecied domain is known

as a Chebyshev or minimax F approximation to f over

Chebyshev approximation theory is a welldevelop ed eld with many

nontrivial results and a vast literature Fortunately for us the sub

theory of ane approximations is relatively simple and easy to under

stand in geometric terms

Univariate Chebyshev ane approximations

In particular for univariate functions the minimax ane approximation

is characterized by the following prop erty

Theorem Let f be a bounded and continuous function from some

b to R Let h be the ane function closedandbounded interval I a

that best approximates f in I under the minimax error criterion Then

there exist three distinct points uv and w in I where the error f x hx

has maximum magnitude and the sign of the error alternates when the

three points are considered in increasing order

This theorem provides an algorithm for nding the optimum approx

imation in many cases via the following corollary

Minimumerror Chebyshev approximations are not to b e confused with the trun

cated expansions of f in the Chebyshev orthogonal polynomial basis The latter

do not minimize the maximum error although they usually come quite close to

Square ro ot

Theorem Let f beabounded and twicedierentiable function dened



on some interval I a b whose second derivative f does not change

a

sign inside I Let f xx be its minimax ane approximation

in I Then

The coecient is simply f b f ab a the slope of the

line r x that interpolates the points a f a and b f b

The maximum absolute error wil l occur twice with the same sign

once with the opposite at the endpoints a and b of the range and



sign at every interior point u of I where f u

The independent term is such that u f ur uand

the maximum absolute error is jf u r uj

Note that this result gives us an algorithm for nding the optimum



co ecients and as long as wecansolve the equation f u

Geometry of Chebyshev approximations

Recall that the goal of AA is to keep track of the relationships b etween

the quantities o ccurring in a computation When we use a Chebyshev

minimumerror approximation in the computation of z f x we are

in a sense trying to preserve as much information as we can ab out the

relationship ofz andx More precisely consider the set P of all p ossible

pairs of values x z that are consistent with the ane forms x and z

that is

P f x z x x x x z x z

n n

k k

U g

n

k

The set P is a parallelogram with altitude x hi x lo and base z

k



rotated see Figure Clearlyby minimizing the approximation

error we are minimizing the area of this parallelogram which we

can view as a measure of how much information was lost ab out the

relationship b etween x and z

Square ro ot

To illustrate the use of Theorem lets examine in detail how the square

p

ro ot op eration z x is implemented in AA

Ane arithmetic

a u b

Figure Geometry of Chebyshev approximations

The exact solution

As explained in Section the rst step is to select a go o d ane

approximation for the nonane function

p

p

x x x x

n n

when the noise variables range indep endently over U Since

n

square ro ot is an univariate function it can be shown that the best

ane approximation in the sense of minimizing the maximum absolute

error has the form

x x x x

n n

In fact the problem reduces to nding the best ane approximation

p

x to the univariate function x when x ranges over the interval

xa b See Figure

p

x has negative For the time b eing lets assume that a Since

second derivative for all p ositive x Theorem applies and tells us that

p

p

is the slop e of the line r x that go es through a a and b b

namely

p

p

b a

p

p

b a

b a

p

The p oint u where the graph of x has slop e is the solution of

p

u namely

p

p

a b a b

u

Square ro ot

a u b

Figure Chebyshev approximation for the square ro ot function

According to Theorem the optimum indep endenttermis

p p

p p

a b a b f ur u

p

u

p

a b

and the maximum error is

p

p

f u r u b a

p

p

a b

The maximum absolute error o ccurs at the endp oints of the in

terval where the curve lies below the line x and at the point

p

p

c a b where the curveliesabove the line

p

x is Therefore the optimal ane form for z

z z z z

n n

k k

where is a new noise variable and

k

z x

z x i n

i i

z

k

Geometric interpretation

Geometrically these computations determine the parallelogram P with

p

twovertical sides that encloses the graph of x in the intervalx and has

Ane arithmetic

the smallest p ossible vertical extent see Figure Since the width

ofx is xed that is also the enclosing parallelogram with minimum area

The ane function x is the oblique axis of P and the maximum

error is half of P s vertical extent

The parallelogram P is merely the joint range of the pair z x as

implied by the ane forms x and z Thus by minimizing the maxi

mum error we are preserving as much information as we can ab out the

p

x and x relationship of z

p

In contrast consider evaluating z x with standard IA assuming

that x and x have the same range The pairs of values x y consistent

with these intervals cover the entire rectangle R x z whose area is

greater area than that of P

Coping with roundo errors

Formulas assume that we can compute and exactly

In practice the computation of must b e carried out in oating p oint

and so we will get only an approximation to the optimum slop e see Figure

a u b

Figure Approximation to optimum slop e

Now to compute we cannot simply substitute for in for

mula b ecause the derivation of that formula used the fact that

was the slop e of the chord r x Instead wemust compute conservative

p

estimates for the dierence x x at the endp oints of I and at the

p

point v of I where the slop e of x equals

Square ro ot

p

y y

d a a

a

x x

p

d v v

v

p

d b b

y y

b

Assuming is close enough to that v lies inside I d will b e greater

v

p

than the other two values and the dierence x x for any x I

will lie in the interval spanned by d d and d Wemay then take the

a v

b

and its approximate midp oint of this range as the indep endent term

approximate radius as the maximum error estimate see Figure

The p oint v is but we do not need to compute it explicitly

Substituting symb olically into equation weget

r

d

v

Formulas must then be changed to use and in

stead of and Naturallythe computation of z z z by these

n

formulas will be aected by roundo errors which must be estimated

and combined with to obtain the error term z

k

Actually the computations b ecome a bit simpler if we work with

instead of itself Here is the detailed co de

AAsqrtx AAForm AAForm

p

Computes x

if x or x R then

return x

x AAtoIAx

if x then return

if x hi then return R

Chebsqrtx

return AAinvaffinex

The routine AAinvaffine computes x in a manner entirely

similar to AAaffine Section

ThecodeforChebsqrt is

Ane arithmetic

Chebsqrtx Interval Float

Computes a Chebyshev approximation x

p

x for x x to

Assumes x is nonempty and bounded and x lo

p

r x lo

y y

a

x x

p

x hi r

b

r r

a

b

y y

d r r

a a a

y y

d r r

b b b

d min f d d g

min a

b

x x

d

max

IAmidd d

min max

IAradd d

min max

return

There are several subtle points in this co de First the interval

a b over which the ane approximation is computed is not the

range x lo x hi ofx but the slightlywiderinterval r r That

a

b

is a and b are dened retroactively as the exact squares of the ap

proximate square ro ots r and r This convention allows us to avoid

a

b

p

b b some square ro ots For instance in the computation of d

y y

b

p

where we need b rounded down we can use r in its place b ecause

b

x x

p p

b b r On the other hand we must use r instead of

y y

b b

x hi for the second b in that formula

The pro cedure also assumes that the value lies b etween r and r

a

b

which is true in the IEEE oatingp oint standard This condition en

p

sures that the maximum of x x lies b etween r and r and therefore

a

b

that the minimum is either at r or r Actually for maximum accuracy

a

b

should be rounded to the nearest Float Rounding up or down is

more ecient however and the precision loss is minimal

Note that the implied range of x is clipp ed to the interval

As we observed in Section it is convenient to assume that any

intrusion of x into the negative numb ers may be due to sloppiness of

the range computation and do esnt imply that the real quantity x can

assume negativevalues

Square ro ot

The handling of overows could b e improved As it is the pro cedure

returns R if the computation of x overows which may happ en if

some of the co ecients x are near the end of the nite Float range

i

However the square ro ot of jx j jx j jx j is always a nite value so

n

the overow could b e avoided with some care Spurious overows may

x x

also o ccur in the computation of r d

ro ots Note that the cost of this algorithm is essentially two square

plus a few oatingp oint op erations for each x

i

Oversho ot

p

x has The use of a Chebyshev approximation in the computation of

one signicant drawback Note that the range of values for z that is

implied by the ane form z that is the vertical extent of the parallel

ogram P in Figure is actually wider than the range that would b e

computed using ordinary interval arithmetic The explanation is that

the new noise variable actually has a hidden nonlinear dep endency

k

on the other noise variables such that the value of z is negative when

k k

the other terms approach the maximum value If we were to take this

dep endency into account we would conclude that the maximum value

p

of z is indeed b but since we assume that the are indep endent we

i

must count z as p ositive at the upp er end of the interval

k k

This problem is particularly vexing when the range of xasimplied

by its ane form is partially negative If we compute z as describ ed

the implied range for z will contain some negativevalueseven though

the square ro ot function is never negative

So the ane form based on Chebyshev approximation trades some

knowledge ab out the range of z for knowledge ab out the relationship

between z and x If there is any merit to the AA approach then in

complex computations the trade should generally be worthwhile that

is in subsequen t op eration we hop e to gain enough by cancellation of

noise terms to comp ensate for the extrawide interval

In any case recall that the co ecient z dep ends quadratically on

k

the width of the input interval which is still a qualitativeimprovement

over the rstorder errors of ordinary interval arithmetic

Ane arithmetic

The minrange approximation

p

As a matter of fact we can pro duce a ane form z for x that implies

a tight range for z if we settle for less than the optimal Chebyshev

approximation We only need to cho ose the co ecients and in such

away that the joint range P of the forms x and z x has the

p

same vertical extent as the piece of the graph of x subtended by the

interval a b see Figure

a b

Figure Minrange approximation for the square ro ot

It is easy to see that the smallest such parallelogram has the top side

tangent to the graph at the higher endp ointoftheinterval a b The

parameters of this approximation are

p p

p

b b a

p p

b b

p

We say that x is a minrange ane approximation to x

in the interval a b

Note that the set P is still a prop er subset of the rectangle R a

p

p

b a b so the resulting ane formz is strictly more informative

than the result of ordinary interval arithmetic Moreover the ratio of

the areas is

s

r

jP j a b a

p

p

jR j b b

b a

which go es to zero linearly as the relative width b ab go es to zero

In other words the mo died AA approximation ab ove still has higher

order of convergence than IA

The minrange approximation

Incidentally the IA square ro ot algorithm can be viewed as this

same idea taken to the extreme where the parallelogram P is the whole

p

p

b a b see Figure This corresp onds to rectangle a

p

approximating x by a constant function in the interval a b that

is cho osing

p p

p p

a b b a

With these choices the returned ane formz x contains

k

only the indep endent term and the single noise term whichisnot

k

related to any other quantity and this is essentially the ane form

p

p

interpretation of the ordinary interval a b

a b

Figure The IA approximation for square ro ot

There are other reasons that may justify the choice of a suboptimal

ane approximation suchasavoiding overows simplifying the algebra

or the handling of rounding errors We will see some examples in the

following sections

Handling roundo errors

The routine MinRangesqrt below implements the minrange approxi

mation formulas with due care for roundo errors It is meantto be a

replacement for the routine Chebsqrt called in AAsqrt

As in Chebsqrt the approximation is actually computed for the

p



r which contains x Since x for all interval I r

a

b

p

x I the dierence x x is minimum at x r and maximum at

a

x r The correctness of the result then follows b

Ane arithmetic

MinRangesqrtx Interval Float

Computes an ane approximation x

p

x for x x minimizing the output range to

Assumes x is nonempty and bounded and x lo

p

r x lo

y y

a

x x

p

x hi r

b

r

b

y y

d r r

min a a

x x

d r r

max

b b

IAmidd d

min max

IAradd d

min max

return

Exp onential

x

The exp onential function f x expx e in AA is quite similar

to square ro ot The key step is computing an ane approximation

a

f xx to expxintheinterval x x

The Chebyshev approximation

Since the second derivativeofexpiseverywhere p ositive the Chebyshev

approximation is parallel to the chord ie

b a

e e

b a

We cannot compute the exact value of but only some approxima

a b

tion In any case as long as lies b etween e and e the dierence

x

e x will be maximum at either x a or x b and minimum at

x b

x u log the abscissa where e has slop e If e then the

x

dierence e x will be maximum at either x a and minimum at

x b See Figure a Either way from these extremal dierences we

can compute and as in the square ro ot formulas The complete pro cedure is

Exp onential

AAexpx AAForm AAForm

Computes expx

if x or x R then

return x

x AAtoIAx

if x then return

if x hi then return R

Chebexpx

return AAaffinex

where Chebexp is

Chebexpx Interval Float

Computes a Chebyshev approximation x

to expx for x x

Assumes x is nonempty and bounded from above

x x

e expx hi

b

w x hi x lo

if w then

e

a

else

y y

e expxlo

a

x x

e e w

a

b

if then

d e

min a

d e

max

b

else if e then

b

y y

d expx hi x hi

min

x x

d expx lo x lo

max

else

x x

d expx lo x lo

a

d e x hi

b b

y y

d log

min

d max f d d g

max a

b

IAmidd d

min max

IAradd d

min max

return

Ane arithmetic

a u b a b

Figure Chebyshev left and minrange right approximation for exp

The minrange approximation

The Chebyshev approximation is somewhat tricky and exp ensive to

compute and is plagued by large undersho ots when the interval is

mo derately wide In particular it extends into the negative range for

x

radx The reason is obvious if one lo oks at a plot of y e over

such a wide range

In practice therefore one may prefer to use the minrange approxi

mation see Section which is easier to compute and has no over

or undersho ot This approximation preserves less information on the

x

dep endency b etween x and e but this loss is signicant only for wide

intervals where the dep endency is mostly nonlinear anyway For small

intervals the minrange approximation still has quadratic error

In the minrange approximation the slop e is chosen as the deriva

x

b that tive of e at the lowest end of the argument interval a

a

is e See Figure b The co ecients and are then computed

as in Chebexp

Recipro cal

MinRangeexpx Interval Float

Computes a minrange ane approximation x

to expx for x x Assumes x is nonempty and bounded

y y

e expx lo

a

x x

e expx hi

b

e

a

if then

d e

max

b

d

min

else

d e x hi

max

b

e x lo d

a min

IAmidd d

min max

IAradd d

min max

return

Recipro cal

For the recipro cal f xx in AA we pro ceed prettymuch as for the

square ro ot except that overow and undersho ot are amajor concern

For these reasons and for the sake of co de simplicity it seems b est to use

the minrange approximation instead of the Chebyshev one Figure

a b a u b

Figure Minrange left and Chebyshev right approximation for x

If the interval x a b includes zero then the recipro cal may

have arbitrarily large andor arbitrarily small values so the only valid

Ane arithmetic

result is R If the interval is entirely p ositivea then the slop e of

the minrange approximation is the derivative of x at x b namely

b We must round b downwards in order to prevent over

ow The case of negative argument range b is similar

AAinvx AAForm AAForm

Computes x

if x or x R then

return x

x AAtoIAx

if x then return

if x lo and x hi then return R

MinRangeinvx

return AAaffinex

where MinRangeinv is

MinRangeinvx Interval Float

Computes a minrange approximation x

to x for x x

Assumes x is nonempty and zerofree

a minfj x lo j j x hi jg

b maxfj x lo j jx hi jg

y y

b

x x

d a a

max

y y

d b b

min

IAmidd d

min max

if x lo then

d IAradd

max min

return

Multiplication

Lets now consider the multiplication of ane forms that is the evalua

tion of z f x y xy given the ane formsx andy for the op erands

x and y

Multiplication

The pro duct of the ane forms is of course a quadratic p olynomial

f on the noise symb ols

n

f x y

n

n n

X X

x x y y

i i i i

i i

n n n

X X X

x y x y y x x y

i i i i i i i

i i i

It is not hard to see that the b est ane approximation to f

n

consists of the ane terms from the expansion ab ove

n

X

A x y x y y x

n i i i

i

plus the b est ane approximation to the last term

n n n n

X X X X

Q x y x y

n i i i i i j i j

i i i j

Observe that Q is a centersymmetric function in the sense that

n

Q Q Moreover its domain U is also center

n n

n n

symmetric that is U i U From these

n n

prop erties it follows easily that the b est Chebyshev ane approxima

n

tion to Q over U is itself a centersymmetric ane function that is

to say a constant function

More precisely if a and b are the minimum and maximum values

n

b est ane approximation to the latter of Q over U thenthe

n

is the constant function a b and its maximum error is b a

Thus we should return

a b b a

z A

n

k

where is a new noise symbol

k

Computing the extremal values a and b of Q in U is not trivial The

b est algorithm known to the authors runs in O m log m time where m

is the numb er of nonzero noise terms in x and y

Ane arithmetic

Fortunately the exact b ounds are not necessary We can use instead

the trivial range estimate Q radxrady That is we can return

n

X

z x y x y y x radxrady

i i i

k

i

It can b e shown that the error of this approximation is at most four times

the error of the b est ane approximation The worst case happ ens when



the joint range of x and y is a square rotated with resp ect to the

axes that is whenx x u v and y y u v where u andv are

ane forms with the same range r r but disjoint noise symb ols

In that case the residual Q isu v and its true range is r r

On the other hand since radx rady r the trivial estimate will

be Q r r

The reader mayhave noticed that the trivial estimate Q is precisely

the range of Q as it would b e computed by standard IA without taking

into account any correlations between the two factors But then how

could the result of formula b e more accurate that the ordinary IA

pro duct of x andy The answer is that formula uses standard

IA only to estimate the quadratic residual The formula still allows

negatively correlated terms to cancel out in the linear part whereas in

standard IA even the linear part maybeoverestimated

Indeed even though it may b e four times wider than the optimum

the trivial estimate Q radx rady is still quadratic in the total

y which is enough to make the AA width of the input intervals x and

multiplication asymptotically more accurate than standard IA as the

input ranges get smaller

Multiplication example

To illustrate these formulas lets evaluate the expression

z x r x s

for x r and s Converting the

ordinary intervals to ane forms weget

x r s

Multiplication

Therefore

x r

x s

z x r x s

In the quadratic term each factor considered indep endently has

the range Therefore a quick estimate for the range of that

term is

Q

Using this estimate we obtain for z the ane form

z

The range of z implied by this ane form ab oveis

A more precise analysis reveals that the true range of the quadratic

term Q ab ove assuming the input noise symb ols e are indep endent

is actually and that of the pro duct z is The relative

accuracy of this AA computation is therefore

For comparison standard IA would return

whose relative accuracy is In words the

IA interval is more than twice as wide as it should b e whereas the AA

result is only wider

Observe that in this example the uncertainty asso ciated to the noise

symbol which is shared by both op erands happ ened to cancel out

to rst order in the nal result This cancellation do es not o ccur in the

IA computation which is the main reason for the larger uncertainty in the IA result

Ane arithmetic

The multiplication routine

As usual the implementation must also estimate the roundo errors and

add them to the term z Here is the complete co de

k

AAmulx y AAForm AAForm

Computes x y

if x or y then

return

else if x R or y R then

return R

else

var p Interval

r AAradx

x

r AArady

y

x x

r r

x y

p IAmulx x y y

y

x

IAmidp

IAradp

return AAaffinex y

Note that the form x y includes two instances of the term x y

one of which is canceled by the term Implementors should consider

expanding the call to AAaffine inline and optimizing the computation

of z to b e just

Division

Division of ane forms is harder than multiplication To b egin with

division by quantities that may wander close to zero that is whose

uncertainty is comparable to their average magnitude is inherently in

accurate and unstable a mo dest slop in the computation of the divisor

may cause its range to overlap zero in which case the division cannot

b e carried out

ays to compute an acceptable ane That b eing said there are manyw

formz for the quotient x y The simplest is to rewritex y as a pro duct

The mixed AAIA mo del

x y This twostep approach do es have quadratic convergence

b ecause any rst order correlations between x and y are preserved by

the AA recipro cal routine

AAdivx y AAForm AAForm

Computes x y

return AAmulx AAinvy

The mixed AAIA mo del

The oversho ot and undersho ot problem that we observed in the anal

p

ysis of x and expxshow that AAs goal of recording the correlation

between quantities leads sometimes to range estimates for individual

variables that are worse than those pro duced by standard IA This prob

lem is more likely to happ en in simple computations where uncertainty

cancellation do es not haveachancetooccur

We have already seen one way of coping with this problem namely

using minrange approximations instead of Chebyshev ones This solu

tion do es cure the oversho ot problem but loses some of the correlation

information For example the minrange approximation to sin x over

is which contains no hint of the fact that sin x is

monotonically increasing in that interval

Another solution to the oversho ot problem which actually improves

the overall accuracy of computations is to combine the AA and IA rep

resentations in a single mo del That is the representation x of a quan

tity x consists of b oth an ordinary interval x and an ane form x The

purp ose of the former is to provide tight ranges for individual variables

in simple op erations while the latter is optimized to record correlations

between quantities The joint range implied by representations x y

is then the intersection of the joint range ofx y a centersymmetric

convex p olytop e and the b ox x y

There is more to this mixed AAIA model AAIA than just p erform

ing the same computation in AA and IA and intersecting the resulting

ranges The two mo dels can and should interact synergistically at each

step with each mo del using the others information to improve its own

accuracy

Ane arithmetic

f x y will Sp ecically the AAIA pro cedure that implements z

use the IA ranges of the arguments x and y when selecting the ane

approximation x y for f and use it to compute the AA

comp onent z f x y of the result The pro cedure will then compute

the IA comp onentas z f x y z

Since the IA ranges x and y generally tighter than the AAimplied

ranges x and y the ane approximation chosen by the AAIA pro ce

dure is likely to b e tighter in the sense of having a smaller error term

than its pureAA counterpart Conversely whenever the correlation in

formation results in an accurate AA form the interval z will b e tighter

than its pureIA counterpart

Thus the mixed AAIA mo del can often pro duce b etter results than

running the IA and AA computations in parallel and may pro duce

usable results even in cases where both pure mo dels fail due to error

explosion

Comparing AA and IA

Numerical exp eriments seems to conrm our claim that AA is in general

substantially more precise than standard IA and less prone to error

explosion

ObviouslyAA is more complex and exp ensive than IA However

orth the extra cost in many we b elieve that its higher accuracy will b e w

elds where IAs error explosion may b e a problem such as computer

graphics We shall see some examples in Chapter

Example iterated functions

For instance consider the function

q q

g x x x x

whichwe used in Section to illustrate the error explosion problem

Figure shows the result of evaluating g and its iterate hxg g x

with AA over equal intervals in This picture should b e

compared to Figure whichshows the results of standard IA over the

same intervals

Comparing AA and IA

Figure Avoiding error explosion in iterated functions with AA

Cancellation of uncertainty

The main reason whyAA is usually more accurate than IA is the can

cellation phenomenon describ ed in Section which tends to make

the range of computed quantities smaller than the corresp onding inter

vals computed by standard IA Indeed except for roundo errors any

computation chain that involves only ane op erations will b e evaluated

by AA with relative accuracy that is the range of the computed

ane form will b e the true range of the corresp onding quantity

Cancellation of internal errors

Another feature of AA is that the ane form of each computed quan

tity keeps track of how muc h of its uncertainty is attributable to the

linearization and roundo errors committed at each previous step sep

arately Thus these linearization errors themselves may cancel out in

later op erations instead of always adding up as they usually do in IA

For example let x x x and y y y and consider the

following AA computation

p

u x y v u z u v

The rst step will compute an ane form u u u u u

where the term u represents the linearization and roundo errors of

the division Similarly the second step will compute v v v

v v v where v represents the linearization and round

o errors of the square ro ot Note the term v which records the

uncertainty in v that was inherited from the previous division step In

the last step this term will b e subtracted from u meaning that the

error committed in the division do es not aect z as much as it aects u

Ane arithmetic

Needless to say in standard IA the errors corresp onding to v and u

would b e added instead of subtracted

Quadratic convergence

Since the AA mo del keeps track of the rstorder dep endency b etween

variables the loss of information in an AA computationas measured by

the new error co ecients z will in general dep end quadratically on

k

the size of the input intervals Therefore as the ranges of the op erands

get smaller the error term z will b ecome less imp ortantnot only

k k

in absolute terms but also relative to the other terms

That is in AA the relative accuracy of each op eration Section

will b e inversely prop ortional to the width of the input intervals Thus

in a long computation chain halving the input intervals will not just

all steps of the chain more halve the output ones but will also make

accurate and therefore improve the accuracy of the result by a factor

that is roughly exp onential in the length of the chain

One should keep in mind however that quadratic convergence ap

plies only to the ane approximation errors and not to arithmetic

roundo errors which are prop ortional to the magnitude of the quanti

ties irresp ective of their uncertainties The relative roundo errors are



small ab out for double precision co ecients but they do dene

alower limit to the size of input intervals Once roundo errors b egin to

dominate the uncertainty of the result reducing the width of the inputs

eg by domain sub division will not b e of muchhelp

When to use AA

Since AA errors are quadratic whereas IA errors are linear there is usu

ally a critical width for the input intervals b eyond which AA is not ac

curate enough to b e worth its added exp ense Therefore in applications

such as global optimization and zero nding which dep end on recursive

domain exploration one should ideally use the faster IA mo del at rst

ve b ecome small enough and switch to AA once the subregions ha

The problem with this idea is that the critical width cannot be ef

fectively determined b eforehand Therefore in practice one will simply

try computing the range with IA and redo the test with AA if IA was

inconclusive

Implementation issues

Implementation issues

To test the practicality and usefulness of AA wehave implemented the

p

basic op erations in C for the Sun SPARCstation We

describ e b elow some choices that we made in our prototyp e implemen

tation but which are not part of the AA mo del prop er

Representation of ane forms

In our prototyp e implementation of AA we represent an ane form x

dep ending on m noise symb ols byanarrayofm consecutive bit

words The rst two words contain the central value x and the num

ber m then come the m terms each consisting of a partial deviation x

i

and the corresp onding index i aninteger value that uniquely identi

es the noise symbol All real quantities are enco ded as IEEE bit

i

oatingp ointnumbers

The noise symb ol indices need to b e stored b ecause ane forms are

quite sparse although a longrunning program may create billions of

indep endent noise symb ols each ane form will typically dep end only

on a small subset of them Therefore it is imp erativethatwe store for

each ane form x only the terms x that are not zero

i i

Thus in general each ane form that o ccurs in a computation will

have a dierent number of terms with a dierent set of noise symbol

indices Two ane forms are dep endent only when they include terms

with the same index

Algorithms that op erate on two or more ane forms such as the

addition and multiplication routines describ ed ab ove typically need to

while computing match corresp onding terms from the given op erands

the terms of the result In order to sp eed up this matching we make

sure that the terms of every ane form are always sorted in increasing

order of their noise symb ol indices

Memory and index management

Ane forms are typically stored in a sp ecial storage pool SA which is

managed like a stack In general a routine that p erforms AA compu

tations should reset the SA topofstack pointer right b efore exiting to

the value it had on entry This action implicitly discards all ane forms

Ane arithmetic

computed during the routines execution and recycles their storage Of

course if the routine is supp osed to return any of these ane forms

then it must copy them to the new topofstack p osition and adjust the

pointer accordingly

As mentioned ab ove new noise symb ols are constantly b eing created

while the program runs Practically every time we compute a new ane

form weneedtointro duce a brand new noise symb ol to represent the

linearization and roundo errors committed in that op eration The noise

symbols do not consume any storage by themselves but each requires

a distinct index For this purp ose we use a global counter that keeps

track of the highest index in use at any moment

To avoid running out of indices after AA op erations an in

to manage the noise dustrial strength implementation of AA should

symb ol namespace to o as a stack when exiting from a pro cedure one

should reset the noise symbol counter to the value it had up on entry

This action implicitly discards all the noise symbols created during

the pro cedure and allows their indices to be recycled If the pro ce

dure returns an ane form as its result then any new noise symbols

that o ccur in the latter must be renumb ered while the result is copied

to its prop er lo cation

Space and time cost

Consider the AA evaluation of an expression or a sequence of chained

expressions with m op erations where the input values are ane forms

that dep end on a certain set of n noise symb ols Each op eration

n

will contribute one more noise symbol to this set representing the lin

earization and roundo errors of that step Therefore each computed

value will dep end at most on n m noise symb ols Since the cost of

any basic AA op eration is prop ortional to the size of the op erands the

whole expression can b e evaluated in O mn m time and space

Optimization techniques

High computational cost is the main obstacle to the use of AA in prac

tical applications esp ecially those with mo dest precision requirements

Implementors of AA libraries should therefore b e sensitiveto eciency

Optimization techniques

issues We will now describ e some helpful optimization techniques for

AA programs

Condensing noise variables

On long computations with m n most terms in each ane form will

b e recording errors due to previous op erations In general it is not worth

keeping track of all those errors separately For the sake of eciency

it is desirable to insert at selected points extra co de to condense the

ane forms

The idea is to replace two or more terms z z by a single

i i j j

term z where z jz j jz j and is a brandnew noise

i j

k k k k

symbol This op eration reduces the size of the ane form z p ossibly at

the cost of losing correlation information

No information will be lost if the noise variables are

i j

exclusive to z that is they do not app ear in any other ane form

that is still alive A value is alive at some point if it may be used

further on Also the loss of information is likely to be minimal if the

condensed co ecients jz j jz j are small compared to the other

i j

noise co ecients of z The condensation mightmake a dierence only if

the larger co ecients happ ened to cancel out later on Hence we may

condense all terms ofz form which are smaller than some fraction of z

For example supp ose the computation is a lo op where only two

variables x and y are carried from one iteration to the next Supp ose

x and y b egin as simple ane forms each dep ending on a single noise

variable At the end of the rst iteration the joint range of those two

variables will b e a centersymmetric p olygon D whose complexityisat

m where m is the numb er of op erations p erformed in the worst

b o dy of the lo op If wejustwent on each iteration would add another

m terms to those forms

We can solve this problem by condensing all noise variables at the

end of each iteration That is we replace x and y by the new forms

x x x x

i i j j

y y y y

i i j j

where are two brand new noise symb ols The joint range of these

i j

new forms is a parallelogram P with sides parallel to x y andx y

i i j j

Ane arithmetic

The new co ecients x x y y should b e chosen so that this parallel

i j i j

ogram contains the original domain D preserving that AA invariant

The parallelogram of minimum area enclosing the convex p olygon D

can b e computed in O m log m steps and its area is at most times

the area D Thus at a mo dest cost in time and accuracy we can keep

the size of the ane forms b ounded by O m indenitely

Dep ending on the context it may be imp ortant to know how the

nal result correlates with the input quantities In that case we should

condense only internal noise variables ie those that were created

during the computation itself but preserve the external ones ie

those that w ere present in the input forms

This technique b egs the question howmany terms do we real ly need

to keep in the ane forms Supp ose that at some p oint of the compu

tation there are k ane forms that are still alive ie that may be

used later In principle at that p ointwe could replace all those forms

by a new set of k ane forms dep ending exclusively on k new noise

variables in such a way that the volume of the joint range increases

only by a constant factor that dep ends on k

Unfortunately this result is not very useful since computing the

smallest k dimensional paralelotop e that contains the old joint range is

a dicult problem when k

Static storage allo cation

Another promising optimization is the compilation of AA algorithms

into an ordinary programming language likeC

Many applications of AA such as those describ ed in Chapter can

b e co ded as pro cedures that take ordinary intervals as parameters con

vert them to ane forms and evaluate a linear nonlo oping chain of

expressions on those values In such cases the compiler could predict

statically the set of noise symb ols aecting each computed ane form

The compiler could then allo cate the ane forms statically on the or

dinary pro cedurecall stack using a separate simple variable for each

co ecient The noise symb ol indices would then b e sup eruous In this

context the AA arithmetic op erations that lo op over the terms such

ould b e expanded inline avoiding the as AAaffine and its variants w

overhead of merging the term lists

Hansens Generalized Interval Arithmetic

Shared subexpressions

In actual programs it is common for the same subformula to app ear

as an op erand of two or more op erations With ordinary oatingp oint

or with standard IA evaluating such shared subexpressions more than

once is merely a waste of time With AA however multiple evalua

tions may also make the results less accurate The reason is that each

evaluation of a shared subformula represents the linearization errors of

the latter by a dierent set of noise symb ols preventing those errors

from canceling out in later steps Therefore when co ding expressions

like x y x y for AA evaluation it is doubly imp ortant to

identify common subexpressions like x and y and compute each of

them only once Symb olic manipulations programs can identify common

subexpressions and make co ding much easier For example in Maple

wehave

Cxyxyoptimized

t xx

t yy

t ty

t tttt

Hansens Generalized Interval Arithmetic

Ane arithmetic can be viewed as a simplication of generalized in

terval arithmetic GIA a computation mo del prop osed in by

E R Hansen For a fuller discussion ab out GIA including ap

plications see

In its original formulation GIA addresses sp ecically the problem of

computing one or more functions from a xed set of n quantities x x

n

which are given as ordinary intervals x x Every quantity z that is

n

derived from these variables including the function result is represented

as a list z of n intervals with the understanding that

n

z x x x

n n

Note that the x in this formula are unknown quantities so the formula

i

is kept unevaluatedjust as ane forms of AA From this representation

Ane arithmetic

we can obtain a range for z namely

z x x x

n n

where the formula is evaluated as in standard IA

This mo del is obviously similar to AA not only in its representation

but also on its main virtuenamely that in principle it can mo del ane

dep endencies b etween quantities with error that shrinks quadratically

with the size of the input intervals

Conceptual dierences

However GIA and AA are not mathematically equivalent and neither

is a sp ecial case of the other In general conversion from one represen

tation to the other entails loss of information The dierence is evident

when one considers the joint range of twoquantities u and v when they

are describ ed in either mo del

As wehave seen in AA the joint range is always a centersymmetric

convex p olygon in the uv plane In a computation with n input vari

ables and m steps the joint range may have up to n m sides In

GIA the joint range may be a nonconvex p olygon whose complexity

is prop ortional to n alone Consider for example the GIA forms

x u

v x

where x is the only input variable If x ranges over then the

value pairs u v that are allowed by these forms is the bowtieshap ed

region shown in Figure

Moreover the fact that the GIA co ecients are intervals rather than

numb ers implies that uncertainty cancellation will not be as complete

as it can b e in the AA mo del For example ifu is the form given ab ove

then the GIA evaluation of u u will pro duce

x y

whereas the analogous AA computation returns exactly zero For the

same token we can exp ect that error explosion will o ccur in GIA more often than in AA

Hansens Generalized Interval Arithmetic

Figure Joint range in GIA

Finallyby lumping all internal errors into the indep endentinterval

or in the co ecients the GIA mo del prevents those errors from

n

canceling out

Practical dierences

GIA and AA dier also in a number of practical details that aect

their eciency and exibility First in the GIA representation each

co ecienttakes twice as much space as in AA and the extra space do es

not seem to translate into increased accuracy

Another dierence is that the number of terms in a GIA form is

xed and their order is tied to the order of the function arguments

en when This choice forces one to handle and op erate on all n terms ev

the quantity at hand dep ends on a small subset of the input variables

Moreover the GIA representation is implicitly tied to a sp ecic set

of variables which limits the scop e of GIA forms to the b o dy of a single

function or to a set of functions with the same arguments a feature

that hinders the comp osition of complex functions from simpler ones

For example supp ose that the function f x y is dened in terms

of some previously dened function g u v w The values of u v and

w computed by f with GIA will b e expressed as GIA forms with three

co ecients over the variables x and y whereas the values handled in

ternally by g are fourterm GIA forms over u v andw Therefore the

Ane arithmetic

call to g from f will require some nontrivial conversion b etween the two

typ es of GIA forms usually with loss of information

Ane arithmetic is more exible in this regard In the implementa

tion originally prop osed byComba and Stol and detailed in Sec

tion the names of the noise variables are recorded in the ane

form and their scop e is the entire program As a consequence ane

forms can be passed across function b oundaries without conversion or

loss of information An example of application where this issue is quite

relevant is the raytracing of implicit surfaces describ ed in Section

Chapter

Some applications

In this chapter we describ e some applications of interval arithmetic to

problems in numerical analysis computer graphics geometric mo del

ing and global optimization We also show that ane arithmetic can

pro duce b etter results than IA

Zeros of functions

Solving equations is a fundamental problem in mathematics The sim

plest kind of equation is f x dened by a real function f R

of a single variable x A solution also called a root is of course any

number x such that f x We also say that x is a zero of f

Sometimes nding any one zero of f is sucient Sometimes all zeros

are required

There exist explicit formulas for computing all zeros of p olynomial

functions of degree at most The formula for degree is trivial The

formula for degree is the well known quadratic formula The formulas

for degrees and based on the famous Cardano solution of the cubic

are less well known and not frequently used b ecause they may require

complex arithmetic even when all ro ots are real

No formulas exist for solving p olynomial equations of higher degree

or transcendental equations Thus in general wemust resort to numer

ical approximation metho ds Moreover we should keep in mind that

It can be argued that mathematics advances by continuously redening what

equation and solution mean

Some applications

it is very probable that the zeros of f will have no exact oatingp oint

representation even if weknowintheoryhow to compute them exactly

For example the trivial equation x has no exact oatingp oint

solution in base or So approximate solutions are all we can

obtain with a computer

What do es it mean to solve an equation f x approximately

Two natural interpretations are

Find a number close to a root More precisely given nd

f Clearly x suchthat jx x j for some exact ro ot x of

the whole p oint is to nd such x without knowing x

Find a root of an equation that is close to the original equation

More precisely given nd x such that jf xj In

this casex is a solution of the equation f x f x whichmay

be thought as a p erturbation of the original equation b ecause

f x is small

Both denitions of approximate solution are useful sometimes even

in combination the adequate denition dep ends on the application In

sense in the oatingp oint world pro any case both denitions make

vided that the tolerances and are not to o small

There are many classical numerical metho ds for nding zeros of func

tions However to a certain extent their success dep ends on having

previously isolated a ro ot Once a ro ot has b een isolated these metho ds

usually converge quickly Thus the hard part in solving equations is

isolating the ro ots

One way to isolate a ro ot is to nd a small interval a b such

that f a and f bhave dierent signs In this case wesay that a bis

termediate Value Theorem if f is a bracketing interval for f By the In

continuous in a bracketing interval a b then f must have at least one

zero in a b

Bisection

A simple algorithm that is guaranteed to nd a ro ot of f in a bracketing

interval a bisbisection This algorithm successively divides a bracket

ing interval at its midp ointinto two parts cho osing the half that is still

abracketing interval

Zeros of functions

bisecta b f f R R

a

b

Finds a zero of f in a bracketing interval a b

c a b

if b a then return c

f f c

c

if f f then

a c

return bisecta c f f

a c

else

return bisectc b f f

c

b

Starting with a call to bisecta b f afb this algorithm com

putes a sequence of nested bracketing intervals each interval half as

wide as its predecessor Thus bisection always converges to a ro ot of f

This observation can b e used as the basis of a constructive pro of of the

Intermediate Value Theorem

There are many things that may go wrong when we try to implement

the bisection algorithm in oating point For one thing if a and b are

nite but large then the formula a b may return Also if

b a is not rounded prop erly ie upwards then the result may

not be within distance of a ro ot We can avoid these problems by

using IAmid and IArad to compute these quantities see Section

Another p ossible problem is that the pro duct f f may underowto

a c

zero even when f and f If this happ ens then the pro cedure

a c

will take the wrong branch and the result may be arbitrarily far from

any ro ot Thus instead of multiplying the values we must work with

the signs only

A subtler problem is that bisect maygointo innite recursion if c

is still greater than If IAmid gets rounded to a or b while b a

has been prop erly co ded then this can happ en only when a and b are

consecutive Float values In that case the user has chosen to o small a

tolerance and there is no way to get any closer to the bracketed ro ot

or to tell whichof a and b is closer to it We should then give up and

return either a or b

Incorp orating these changes into the basic algorithm weget

Some applications

FPbisecta b Finite Finite Finite

Given a nite nonempty interval a b and f g

such that f a and f b

returns a value c that is either within distance of a root of f

or is one of the two representable Floats closest to such root

b c IAmida

if c a or c b or IArada b then return c

f f c

c

if f then

c

return FPbisecta c

else if f then

c

return FPbisectc b

else

return c

The search b egins with a call to FPbisecta b signf b

Bisection is slow only one bit of precision is obtained at each itera

tion Thus it needs dlog ja bj e steps to reach tolerance

The wellknown iterative metho d of Newton if prop erly used will

have faster convergence doubling the numberofbitsateach step How

ever Newtons metho d is not guaranteed to converge in all cases Even

when the mathematical function f satises all the conditions for conver

gence the rounding errors in its oatingp oint implementation f may

cause Newtons metho d to lo op or diverge In Section we will see

howtoachieveconvergence rates similar to Newtons while retaining the

robustness of bisection

al bisection Interv

Bisection is one example of a guaranteed metho d not a common situa

tion in numerical metho ds However bisection has limitations it needs

to start with a bracketing interval and it only nds one ro ot in the

interval even if there are manyroots

A more serious limitation is that bisect do es not nd a ro ot of the

intended function f but rather a ro ot of its oatingp oint implementa

tion f Even if the values of f and f are very close their ro ots maybe

arbitrarily dierent

Zeros of functions

As long as we are conned to the realm limitedprecision arithmetic

there is no satisfactory solution to this problem The inevitable roundo

errors can always turn a nearro ot into a true ro ot or viceversa How

ever if we have an interval implementation f of f then we can use it

to discard parts of the domainx that are guaranteed not to contain any

ro ots of f

 

The idea is to evaluate z f x for various subintervals x of



starting with itself Ifz do es not contain the value zero thenx cannot

contain a ro ot of f and can b e discarded Otherwise the subinterval is

split into twohalves which are recursively tested This pro cess continues

tervals are smaller than a sp ecied tolerance or cannot b e until the in

sub divided any further

In the end we are left with a subset x of consisting of zero or

more intervals that is guaranteed to contain al l the ro ots of f in

even if is not bracketing for f Moreover the intervals that make

up x are small or indivisible and when tested with f their signs turn

out ambiguous We will say that x is an approximate root set of the

original function f

IArootsx Interval stream of Interval

Given a nite interval x outputs a sequenceof subintervals

that are either indivisible or have radius at most

and constitute an approximate root set of f

if f x then

c IAmidx

if c xlo or c xhi or IAradx then

output x

else

c output IArootsxlo

output IArootsc x hi

For many applications the approximate ro ot set x computed by IAroots

is an acceptable surrogate for the true set of ro ots of f In a sense this

improved bisection algorithm is able to nd al l ro ots of f inside any

given interval Obviously the accuracy and usefulness of x is limited

by the accuracy of f and by our computation budget

Since IAroots explores the left half b efore the right half it nds all

ro ots in order from left to right as they o ccur inx If only the smallest

Some applications

ro ot is required then the metho d can b e mo died to stop as so on as a

ro ot is found This variant is useful in applications suchasray tracing

when one is interested in nding the rst intersection of a ray with a

surface see Section or sp ectral analysis of matrices when one is

usually interested only in the rst few eigenvalues

Note that IAroots may rep ort two or more intervals for eachroot

or nearro ot of f The reason is that the range estimator f is liable

to return an ambiguous sign for intervals that are close to a ro ot but do

not straddle it Therefore wemaywant to pip e the output of IAroots

through a lter that merges any consecutive abutting intervals This step

y exactly do es not guarantee that each ro ot of f will be represented b

one interval but it may reduce the callers overhead

Using ane arithmetic

If the ranges computed by f are much wider than the true range of f

in x then IAroots may b ecome quite slow since it will be forced to

split the intervals more nely in order to resolve sign ambiguities It

may also output a large number of intervals for each ro ot and also

manyintervals that contain no ro ot

As we observed in Section this problem is particularly likely

when f is a complicated function with correlated subexpressions When

this interval explosion may be a problem we may consider adapting

IAroots to use ane arithmetic instead of standard IA

The trivial approach is to use AA only inside the range estimator f

that is we compute z f xbyconverting x to an ane form x then

evaluatingz f x in the AA mo del or the mixed AAIA mo del and

returning z z

However if we are going to use AA then we might as well take

advantage of the extra information it provides Sp ecically we should

by a similar routine AAroots that evaluatesz f x replace IAroots

itself and then uses the information contained in the ane form z to

cho ose the split p oint c

The routine AAroots b egins by converting the interval x to the

ane form x x The result of z f x will b e

z z z z z

n n

We may then conclude that the graph of f in x is contained in the

Zeros of functions

parallelogram P with center x z and horizontal pro jection x whose

top and b ottom sides haveslopez x and whose left and right sides are

vertical and have length where jz j jz j jz j It follows

n



then that any ro ots of f in x must be conned to the subinterval x

where the parallelogram P intersects the xaxis Thus we can replace x



by x b efore splitting the interval in half Here is the detailed co de

AArootsx Interval stream of Interval

Given a nite interval x outputs a sequenceof subintervals

that are either indivisible or have radius at most

and constitute an approximate root set of f

var k newsym

var x AAForm IAmidx IAradx

k

var z AAForm f x

x x AArootAuxx z k

if x then

c IAmidx

if c xlo or c xhi or IAradx then

output x

else

c output AArootsxlo

output AArootsc x hi

The routine AArootAux computes the intersection of P and the xaxis

AArootAuxx z AAForm k Index Interval

Given ane forms x x x and z

k k



returns an interval x containing the intersection

of the xaxis with the joint range of x and z

x x

P

var fj z j i Ez nfk gg

i

var z z z

var u IAdivx x z z

k k k k

return IAshiftux

For smo oth functions AAroots will exhibit quadratic convergence

doubling the number of bits at each step until the interval x b ecomes

so small that roundo errors start to dominate In any case convergence

will b e at least linear one bit gained p er iteration as in IAroots

Some applications

Level sets

Many applications deal with scalar elds h R where is a subset

d

of R For example h may be temp erature pressure or height Of

sp ecial interest are the subsets of where h is constant called the level



sets of h ie the sets h cfx hxcgforc R A picture

showing how the level sets of h are distributed and how their geometry

changes as c varies contains a great deal of qualitative information ab out

the b ehavior of h Figure Such contour maps are widely used

in scientic applications Familiar examples are temp erature charts in

weather maps and level curves in altitude maps

0.200

0.100

0.000

-0.100

-0.200

-0.300

-0.400

-0.500

Figure Contour map of a scalar eld

Enumeration

Consider the computation of an approximation of a single level set C



Without loss of generalitywemaytake C h the level set at level

zero b ecause the level set at level c of h is the level set at level zero of

h c For concreteness we shall consider only the twodimensional case

ie d However the discussion b elow can b e mo died for arbitrary

dimension

The zero set C is the set of solutions of the equation hx Since

we are now dealing with an equation in several unknowns the set C

do es not need to be a single point or even nite In general C is a

curve but it can b e almost anything We also say that C is an implicit

 d

Every closed set in R is the zero set of a C function

Level sets

curve dened by the equation hx Implicit curves and surfaces

are imp ortant in geometric mo deling

A simple and general technique for computing an approximation of

alevel curve C in is enumeration

decomp ose into small cells

identify which cells intersect C

approximate C within each intersecting cell

The simplest cellular decomp ositions used in step are regular grids

of squares or triangles Figure Such decomp ositions are often used

in practice b ecause their top ology and geometry are well understo o d If

the cells are suciently small then C can be approximated by linear

segments in step

Figure Simple cellular decomp ositions

Step the enumeration of the intersecting cells is usually the most

exp ensive step in this metho d In the simplest schema the cells that

intersect C are identied by sampling The function h is evaluated at the

vertices of each cell if the signs of those values are not identical then the

cell necessarily intersects C again this follows from the Intermediate

Value Theorem if h is continuous The p oints where C intersects the

b oundary of the cell can be found by bisection or by simple linear

interp olation if the cell is small

Obviously the converse do es not hold when all the values of h at

the vertices ha ve the same sign we cannot conclude that the cell do es

Some applications

not intersect C This is a form of aliasing in the sampling related to

the size of the cell Therefore cell sizes must be carefully chosen to

avoid missing features due to undersampling Figure On the other

hand cho osing very small cells can be exp ensive a uniform cellular

d

decomp osition of having n cub es along each main direction has n

d

cub es but only O n cub es will intersect C Thus cho osing a smaller

cell size to avoid aliasing in the sampling will greatly increase the number

of cells to b e scanned and also increase the fraction of useless tests

Therefore the approximation of level curves with uniform enumeration

t is simple but not ecien

Figure Missing features due to undersampling

Adaptive enumeration

To nd the cells intersecting a level curve C without visiting all cells

in a cellular decomp osition of we need a way to discard quickly

and reliably large p ortions of that cannot contain pieces of C Point

sampling can only prove the presence of C in some region of to reduce

the numb er of cells scanned we need a test pro cedure that can also prove

the absence of C in a region

Range analysis can provide such a test If wehaveaninterval version

h of h then we can enumerate the cells intersecting C by an adaptive

pro cedure which is essentially a twodimensional version of IAroots

see Section We explore recursively starting with itself as the

Level sets

initial cell Ifacellisproved to b e empty then it is ignored otherwise

it is sub divided into smaller cells which are then explored recursively

until the cells are small enough to approximate C

For rectangular decomp ositions a simple way to divide a cell into

sub cells is to bisect it orthogonally to its widest direction or cyclically

bisect along one of the co ordinate directions at each step resulting in a

d tree decomp osition of Another p opular variant divides the cells

into four equal parts resulting in a quadtree decomp osition of

The meaning of small enough dep ends on the application For

rendering it might mean smaller than a pixel For other applications

such as mo deling it may dep end on some other numerical criterion For

instance testing how closely h can b e approximated by a linear function

inside the cell allows p olygonal approximations to adapt to the curvature

of C

Note that the test pro cedure is not required to be complete in the

it may fail to prove either the presence or the absence of C sense that

in a given cell In particular a cell that is declared small enough may

still have unknown status Each application must decide what to do

with those indeterminate cells discard them treat them just like the

cells that do intersect C or handle them in some sp ecial way Point

sampling may b e useful at this stage to help identify some intersecting

cells

To test whether a cell K intersects C we evaluate hx y

of K onto the co ordinate axes The where x and y are the pro jections

interval thus computed is guaranteed to contain all values of h for p oints

inside K If this interval do es not contain zero then K cannot contain

zeros of h Of course the converse do es not holdif the interval contains

zero we cannot conclude that h vanishes somewhere in K

An adaptiveenumeration metho d based on IA is

Some applications

IAroots K Cell stream of Cell

Enumerates a set of subcel ls of cel l K

that are either smal l or indivisible and

which constitute an approximate solution of hx y in K

if hK then

smallK then if is

output K

else

K K IAdivide K

n

if n then

output K

else

for i in f ng do

K output IAroots

i

Examples

Figure a shows a full enumeration of the cubic curve dened implicitly

by y x x in the square using a

grid Intersecting cells were identied by sampling and app ear in grey

note how few they are only out of The p oints where the curve

crosses cell edges marked with white dots were computed by simple

linear interp olation and have b een joined into a p olygonal approxima

tion for the curve Note that a level curve can have several connected

comp onents Figure b shows an adaptiveenumeration based on IA of

the same curve but now using a grid Note that large p ortions

of were discarded at early stages

Figure shows adaptiveenumerations using d trees of the quar

tic curve dened by hx y x y xy xy in the square

Note howAAwas able to b e compute a much

b etter approximation than IA b ecause of the many correlations in h

Ray tracing

Interval analysis has also b een used for reliable raytracing of surfaces

sp ecicallyto determine all intersections b etween an implicit sur

face hx y z and a line segment pq the ray

Ray tracing

Figure Full enumeration left and hierarchical enumeration using IA

right

Figure Adaptiveenumeration of quartic with IA left and AA right

Some applications

This problem is equivalent to that of nding the ro ots of the univari

ate function

f th tx tx ty ty tz tz

p q p q p q

in the interval A robust and reasonably ecient algorithm for

the latter combines interval analysis with Newtons ro otnding metho d

Weevaluate u f t using IA for the whole interval t If the

resulting intervalu is strictly p ositive or strictly negative then weknow

that the ray pq do es not intersect the surface Otherwise weevaluate the



f t in that interval also using IA From the intervals derivative v

u andv we can compute a subinterval t of t that must contain all the

ro ots of f in t If t is less than half as wide as t then we rep eat the

searchint recursively Otherwise we split t in two equal parts and

rep eat the search recursively in eachhalf The recursion stops when the

interval t is small enough for the application

The order of convergence of this algorithm is somewhere between

linear and quadratic dep ending on the accuracy of the computed inter

vals u and v However evaluating f t in the IA mo del is equivalent

to evaluating hx y z on the intervals x x x y y y

p q p q

z z z that is evaluating h on the axisaligned b ounding b ox

p q

of the segment pq instead of only along the segment itself Once again

the problem arises b ecause the IA routines havenoway of knowing that

the arguments x y and z of hx y z are highly correlated

Obviously the b ounding box of the segment pq may intersect the

surface even when the segment itself do es not Even assuming that

hx y z will b e computed accurately which as we saw is unlikely to

happ en with standard IA this fact alone will surely lead to slow con

vergence and to manyevaluations of f on ray segments that eventually

turn out not to contain any ro ots

Replacing standard IA by AA will generally improve the p erformance

of this algorithm Even without any algebraic manipulation AA will

automatically notice that the ane forms x y and z are strongly cor

related and will use this fact to pro duce tighter b ounds for f t

Moreover as the interval t decreases the deviation of the computed

ane form u should b e increasingly dominated by the single error term

u whose noise symbol is that of the input interval t Recall that

j j j

if f is mo derately well b ehaved then the other partial deviations of u

Global optimization

should decrease quadratically with the size of t But in that case the

co ecient u is a go o d estimate of the derivativeoff in the interval and

j

we can use it to guess the p osition of the ro ot for the next iteration In

other words AA allows us to carry out Newtons ro otnding algorithm

without explicitly computing the derivativeoff

Global optimization

Another imp ortant and dicult problem is global optimization that is

the computation of the global maximum or minimum of a function over

its domain There are two main variants for this problem unconstrained

optimization whichisover the entire domain and constrained optimiza

tionwhich is only in a subregion of the domain usually dened implic

itly by nonlinear equations and inequations

We shall consider the boxconstrained global minimization problem

d

given a ddimensional box R that is the Cartesian pro duct of d

and a continuous objective function f R nd its real intervals

global minimum f min f f xx g and the set of all global min

imizers f f x f x f g Actuallywe shall consider the

approximate numerical version of this problem instead of nding all

b

minimizers f we seek only to identify some subset of that is

guaranteed to contain The goal then b ecomes to make the measure

b

of as small as p ossible for a given computation budget

There are many metho ds for ndind lo cal minima but it would seem

that nding a global minimum with a computer is a hop eless task In

deed this is probably correct if we are restricted to computing the

values of f at a nite set of sample points in b ecause f may oscil

late arbitrarily between these sample p oints Nevertheless combining

general branchandb ound techniques with range analysis can provide

robust algorithms for global optimization b ecause range estimates can

be used to discard large subregions of that cannot contain a global

minimum

Branchandb ound metho ds for global optimization

Branchandb ound is a general numerical technique for solving global

minimization problems A branchandb ound algorithm generally alter

Some applications

nates b etween two main steps branching whichis a recursive sub divi

sion of the domain and boundingwhich is the computation of lower

and upp er b ounds for the global minimum of f in a subregion of By

keeping track of the currentbestupp er b ound for the global minimum

of f one can discard subregions that cannot contain a global minimizer

ie subregions where the lower b ound for f is greater than the current

upp er b ound for the global minimum f Subregions that cannot b e dis

carded in this way are kept in a list L to b e further pro cessed Thus at

b

anytimetheset L is a valid solution to the global minimization

problem such as dened ab ove The algorithm stops when the current

b

solution is adequate for the application based on the sizes of the

boxes in L on the estimated range for f or on some other criterion

the function f is continuous This algorithm converges provided that

the branching step is such that the width of the widest box in L tends

to zero and the range estimates for f shrink to a single value as the

diameter of go es to zero

The basic branchandb ound algorithm outlined ab ove admits end

less variations dep ending on how the branching and b ounding steps are

implemented The simplest branching metho d is to

bisect the currentbox orthogonally to its widest direction Alterna

tively one can cyclically bisect along one of the co ordinate directions at

each step This similar to enumeration techniques for level sets

The correctness of general branchandb ound metho ds requires range

estimates that are guaranteed to contain the values of f in a subregion

of On the other hand the eciency of such metho ds dep ends on the

quality of those estimates One usually trades quality for sp eed when

computing estimates however tight estimates even if more exp ensive

to compute sometimes provide overall faster algorithms

Example

Consider the GoldsteinPrice function

x y x x y xy y f x y

x y x x y xy y

The global minimum of f in the box is

f f

Surface intersection

This is an easy function for lo cal optimization but which is very

dicult for a branchandb ound IA algorithm The dep endency problem

generates a large number of boxes and causes it to partition the region

much more nely than is required for AA as shown in Figure In this

gure b oxes shown in white have b een eliminated and b oxes shown in

grey remain at termination and are thus guaranteedtocontain all global

minimizer for f in Intuitively a go o d algorithm should generate a

picture with few large white b oxes and few small grey boxes This is

interpreted as its ability to both quickly discard large subregions of

and lo cate all global minimizers very precisely

Figure Domain decomp ositions for minimizing the GoldsteinPrice func

tion with IA left and AA right

Surface intersection

Parametric surfaces are the most p opular primitives used in computer

aided geometric design CAGD They are easy to approximate and ren

der and there is a huge literature on sp ecial classes of surfaces suitable

for shap e design such as Bezier and splines surfaces for which sp ecial

algorithms exist However using parametric surfaces for mo deling

solids in CSG systems requires ecient and robust metho ds for comput

ing surface intersection mainly for trimming surfaces into patches that can b e sewn together to b ound complex shap es

Some applications

Metho ds for computing surface intersections

Several metho ds have b een prop osed for solving the imp ortant problem

of computing the intersection of two parametric surfaces These meth

ods can b e classied into two ma jor classes continuation methods and

decomposition methods

Continuation methods also called marching methods use a lo cal ap

proach to the surface intersection problem Starting from a p ointknown

to be on both surfaces these metho ds build an approximation for the

intersection curvebymarching along the curve successively computing

a new point based on the previous point or points Continua

tion metho ds must use numerical approximations not only for marching

along the curv e but also for nding starting p oints Since the intersec

tion curve may have several connected comp onents a starting point is

needed on each comp onent Moreover care must b e taken for handling

closed comp onents correctly In some applications such as trimming in

tersection curves computed with continuation metho ds must b e mapp ed

back to the parameter domains to dene trimming curves This maybe

a dicult inverse problem

Decomposition methods on the other hand use a more global ap

proach to the problem A simple decomp osition metho d is to build

p olygonal approximations for b oth surfaces and then intersect the cor

resp onding p olyhedral surfaces Although it is easy to build p olygonal

approximations for parametric surfaces such approximations need to b e

Anaive very ne to provide a go o d approximation for the intersection

p olygonal approximation is obtained by simply sub dividing the parame

ter domain uniformly into many small rectangles However intersecting

such ne p olygonal approximation is itself a dicult task Even if wedo

not care ab out geometric degeneracies this is a high complexity

task If there are n rectangles along each main direction in parameter

space then there are n faces in each p olyhedron A naive algorithm

that computes the intersection of the two p olyhedra by testing every

p ossible pair of faces has to consider n cases most of which do not

contribute to the intersection This algorithm is not practical b ecause

it is very exp ensive to rene an approximation

Since decomp osition metho ds work directly on parameter domains

no inverse problem needs to be solved to nd trimming curves On

the other hand decomp osition metho ds compute trimming curves in a

Surface intersection

piecewise unstructured way the pieces must b e somehow glued together

into complete curves

Adaptive decomposition methods avoid the cost of uniform decom

p ositions by sub dividing the domain until the surface is approximately

planar In that way the asso ciated p olygonal approximation is adapted

to the lo cal curvature of the surface b eing ner in regions of high cur

vature and coarser in regions of low curvature where the surface is

almost at Such metho ds are generally restricted to sp ecic typ es of

surfaces whose nature can b e exploited to derive ecient tests for lo cal

atness

A decomp osition metho d based on interval analysis

The decomp osition metho d prop osed by Gleicher and Kass takes

a global approach for sub dividing the domains using range analysis

Given a rectangle in each domain use IA to compute an estimate for

the range of values taken by the corresp onding parametric function on

each rectangle This estimate is a b ounding box for a surface patch

ie a rectangular box in d space aligned with the co ordinate axes

and guaranteed to contain the piece of the surface corresp onding to the

given rectangle in parameter space If two b ounding boxes do not in

tersect then the corresp onding surfaces patches cannot intersect If the

bounding boxes do intersect then the surfaces patches may intersect

In this case the corresp onding rectangles in the domains are sub divided

pieces and the pro cess is rep eated until either the sur into four equal

faces patches are proved disjointora user dened tolerance is reached

the patches are then assumed to intersect In this way a quadtree de

comp osition is built for each domain

For eciency Gleicher and Kass keep track of all pairs of patches

that might intersect each leaf no de in one quadtree contains a list of

leaf no des in the other quadtree that it overlaps This list is rened and

distributed to its children when a no de is sub divided

Examples

We show two examples of how the GleicherKass algorithm for surface

intersection can be improved by using AA instead of IA sp ecially for

surfaces that are common in CAGD

Some applications

Lofted parab olas

Consider a cubic patch obtained by lofting a parab ola to another parab ola

More precisely take three p oints a a a in R and consider the

quadratic Bezier curve dened by these p oints

ua u a u ua u u

Take three other p oints b b b in R andtheBezier parab ola dened

by them

ub u b u ub u u

Now sweep to linearly to obtain a surface

f u v v uv u u v

Lofting is a common op eration in CAGD

Because the parametrization f contains several o ccurrences of u and

u and of v and v the terms are strongly correlated and we

exp ect AA to provide tighter b ounds for f than IA This exp ectation is

met Figure shows the domain decomp ositions built with IA and AA

for computing the intersection of the two lofted parab olas shown in Fig

ure using six levels of recursive sub division Note how AA exploits

correlations to give much tighter approximations for the intersection

quickly discarding large parts of b oth domains

Figure Twointersecting lofted parab olas

Surface intersection

Figure Domain decomp ositions computed with IA top and AA b ottom

for intersecting the twoskew parab olic cylinders shown in Figure

Some applications

Bicubic patches

Consider now bicubic patches the most common surface patches in

CAGD A bicubic patch is a pro duct Bezier surface dened bya

mesh of sixteen control p oints a R i j

ij

X X

f u v a B uB v

ij

i j

i j

n

where u v and B is the ith Bernstein p olynomial of degree n

i

n

ni

i n

t t B t

i

i

Lofted parab olas are also tensor pro duct Bezier surfaces

Figure sho ws the domain decomp ositions built with IA and AA

for computing the intersection of the two bicubic patches shown in Fig

ure Because tensor pro duct parametrizations contain many occur

rences of strongly correlated terms we exp ect AA to provide tighter

b ounds than IA Again this exp ectation is met An extra sub division

step with AA is sucienttoshow that the intersection curve is not a lo op

Figure Figure shows the trimming curves corresp onding to

the intersection curve

Figure Twointersecting bicubic patches

Figure Domain decomp ositions computed with IA top and AA b ottom

for intersecting the two bicubic patches shown in Figure

Figure Extra sub division step with AA shows that intersection curveis

not a lo op

Figure Trimming curves for the intersecting bicubic patches

Bibliography

M Abramowitz and I A Stegun editors Handbook of mathemati

cal functions with formulas graphs and mathematical tablesDover

Publications Inc New York Reprint of the edition

G Alefeld A Frommer and B Lang eds Scientic Comput

ing and Validated Numerics volume of Mathematical Research

AkademieVerlag Pro ceedings of the International Symp o

sium on Scientic Computing Computer Arithmetic and Validated

Numerics SCAN Wupp ertal Germany

ANSIIEEE ANSIIEEE Std IEEE New York IEEE

Standard for Binary FloatingPoint Arithmetic

R Baker Kearfott Rigorous Global Search Continuous Problems

Klu wer Octob er

R E Barnhill Surfaces in computeraided geometric design A

survey with new results Computer Aided Geometric Design

R E Barnhill G Farin M Jordan and B R Pip er Sur

facesurface intersection Computer Aided Geometric Design

W Barth R Lieger and M Schindler Ray tracing general para

metric surfaces using interval arithmetic The Visual Computer

F L Chernousko and A I Ovseevich Metho d of ellipsoids Guar

anteed estimation in dynamical systems un der uncertainties and

Bibliography

control In Abstracts of the International Conference on Interval

and Compute rAlgebraic Methods in Science and Engineering IN

TERVAL page St Petersburg Russia April

D M Claudio and S M Rump Inclusion metho ds for real and

complex functions in one variable Revista de Informatica Teorica

e Aplicada January

J L D Comba and J Stol Ane arithmetic and its applications

to computer graphics In Proceedings of VI SIBGRAPI Brazilian

Symposium on Computer Graphics and Image Processing pages

Available at httpdccunicampbrstolfi

L H de Figueiredo Surface intersection using ane arithmetic

In Proceedings of Graphics Interface pages May

Available at ftpcsguwaterloocapublhfgipsgz

L H de Figueiredo and J Stol Adaptive enumeration of im

plicit surfaces with ane arithmetic Computer Graphics Forum

L H de Figueiredo R Van Iwaarden and J Stol Fast

interval branchandb ound metho ds for unconstrained global

optimization with ane arithmetic March Sub

mitted to SIAM Journal of Optimization Available at

ftpftpicadpucriobrpublhfdocgopsgz

T Du Interval arithmetic and recursive sub division for implicit

functions and constructive solid geometry Computer Graphics

SIGGRAPH Proceedings July

R Farouki and V Ra jan Algorithms for p olynomials in Bernstein

form Computer AidedGeometric Design

D Filip R Magedson and R Markot Surface algorithms us

ing b ounds on derivatives Computer Aided Geometric Design

Geomview Software written at the Geometry Center Universityof

Minnesota Available at httpwwwgeomumnedusoftware

Bibliography

M Gleicher and M Kass An interval renement technique for

surface intersection In Proceedings of Graphics Interfacepages

May

D Goldb erg What every computer scientist should know ab out

oatingp oint arithmetic ACM Computing Surveys

E Hansen A generalized interval arithmetic In K Nickel editor

Interval Mathematics number in Lecture Notes in Computer

Science pages Springer Verlag

E Hansen Global optimization using interval analysis the one

dimensional case Journal of Optimization Theory and Applications

E Hansen Global optimization using interval analysis the multi

dimensional case Numerische Mathematik

E Hansen Glob al Optimization using Interval Analysis Number

in Monographs and Textb o oks in Pure and Applied Mathemat

ics M Dekker New York

J Hass M Hutchings and R Schlay The double bubble conjec

ture Electronic Research Announcements of the American Mathe

matical Society electronic

C M Homann Geometric and Solid Modeling An Introduction

Morgan Kaufmann

R Horst and H Tuy Global Optimization SpringerVerlag Berlin

K Ichida and Y Fujii An interval arithmetic metho d for global

optimization Computing

Interval computations httpcsutepeduintervalcompmainhtml

R B Kearfott Interval Newtongeneralized bisection when there

are singularities near ro ots AnnalsofOperations Research

Bibliography

R B Kearfott An interval branch and b ound algorithm for b ound

constrained optimization problems Journal of Global Optimization

ARITHMETIC a R B Kearfott Algorithm INTERVAL

Fortran mo dule for an interval data typ e ACM Transactions on

Mathematical Software

B W Kernighan and D M Ritchie The C Programming Language

PrenticeHall

O Knupp el BIAS basic interval arithmetic subroutines Tech

nical Rep ort Department of Computer Science III Tech

nical University of HamburgHarburg July Avaliable at

httpwwwtituharburgde

O Knupp el PROFIL programmers runtime optimized fast

interval library Technical Rep ort Department of Computer

Science III Technical University of HamburgHarburg July

Avaliable at httpwwwtituharburgde

O Knupp el and T Simenec PROFILBIAS extensions Tech

nical Rep ort Department of Computer Science III Techni

cal Universityof HamburgHarburg November Avaliable at

httpwwwtituharburgde

D E Knuth AncientBabylonian algorithms Communications of

the ACM Errata ibid

U Kulisch and H J Stetter editors Scientic computation with

automatic result verication volume of Computing Supplemen

tum SpringerVerlag Vienna Pap ers from the conference

held in Karlsruhe Septemb er Octob er

A B Kurzhanski Ellipsoidal calculus for uncertain dynam

ics In Abstracts of the International Conference on Interval and

ComputerAlgebraic Methods in Science and Engineering INTER

VAL page St Petersburg Russia April

D Michelucci and JM Moreau Lazy arithmetic Techni

cal rep ort Ecole des Mines de StEtienne Available at

ftpftpemsefrpubpapersLAZYlazypsgz

Bibliography

D PMitchell Robust rayintersection with interval arithmetic In

Proceedings of Graphics Interface pages May

R Mo ore E Hansen and A Leclerc Rigorous metho ds for global

optimization In C A Floudas and P M Pardalos editors Re

cent Advances in Global Optimization pages Princeton

University Press Princeton NJ

R E Mo ore Interval Analysis PrenticeHall

R E Mo ore Methods and Applications of Interval Analysis SIAM

Philadelphia

S P Mudur and P A Koparkar Interval metho ds for pro cess

ing geometric ob jects IEEE Computer Graphics Applications

O Neugebauer A History of Ancient Mathematical Astronomy

SpringerVerlag New York Studies in the History of Math

ematics and Physical Sciences No

A Neumaier Interval Methods for Systems of Equations Cam

bridge University Press New York

C E Pearson editor Handbook of Applied Mathematics Selected

Results and MethdosVan Nostrand Reinhold second edition

A Preusser xfarb e Visualization of Darrays Available at

httpwwwfhiberlinmpgdegrzpubxfarbe dochtml

H Ratschek and J Rokne Computer Methods for the Range of

Functions Ellis Horwo o d Ltd

H Ratschek and J Rokne New Computer Methods for Global

Optimization Ellis Horwo o d Ltd

H Ratschek and RL Voller What can interval analysis do for

global optimization Journal of Global Optimization

D Ratz Boxsplitting strategies for the interval GaussSeidel step

in a global optimization metho d Computing

Bibliography

S M Rump Algorithms for veried inclusions theory and prac

tice In R E Mo ore editor Reliability in computing The role of

interval methods in scientic computingvolume of Perspectives

in Computing pages Academic Press Inc Boston MA

J M Snyder Interval analysis for computer graphics Computer

Graphics SIGGRAPH Proceedings July

A J Stewart Lo cal robustness and its applications to p olyhe

dral intersection International Journal on Computational Geome

try and Applications

An ane arithmetic library in C Avaliable J Stol libaa

at httpwwwdccunicampbrstolfi

K G Suern Quadtree algorithms for contouring functions of two

variables The Computer Journal

K G Suern and E D Fackerell Interval metho ds in computer

graphics Computers Graphics

G Taubin Rasterizing algebraic curves and surfaces IEEE Com

puter Graphics and Applications

J A Tupp er Graphing equations with generalized interval

arithmetic Masters thesis Graduate Department of Com

puter Science University of Toronto Available ay

httpwwwdgputorontocapeoplemooncakemschtml

R van Iwaarden An Improved Unconstrained Global Opti

mization Algorithm PhD thesis Department of Mathemat

ics University of Colorado at Denver Available at

httpwwwcshopeedurvaniwaaphdpsgz

J H Wilkinson Rounding errors in algebraic processes Prentice

Hall Inc Englewo o d Clis NJ