Floating Point Representation

FloatingPoint Repre s entation Michael L. Overton c copyr ight 1996 1 Computer Repre s entation of Numbers Computers whichworkwith real ar ithmetic us e a system calle d oating point. Suppose a real number x has thebinary expans ion E x = m 2 ; where 1 m<2 and m =(b :b b b :::) : 0 1 2 3 2 Tostore a number in oating p oint repre s entation, a computer word i s divided into 3 elds, repre s entingthe s ign, theexponent E ,andthe s igni cand m re sp ectively. A 32-bit word could b e divided into elds as follows: 1 bit for the s ign, 8 bitsfortheexponentand 23 bits for the s igni cand. Since the exp onent eld i s 8 bits, it can b e us e d to represent exp onentsbetween 128 and 127. The s igni cand eld can store the rst23bitsofthebinary repre s entation of m,namely b :b :::b : 0 1 22 If b ;b ;::: are not all zero, thi s oating p oint repre s entation of x i s not 23 24 exact butapproximate. A numberiscalleda oating point number if it can b e store d exactly on the computer us ingthegiven oatingpoint repre s entation scheme, i.e. in thi s cas e, b ;b ;::: are all zero. For example, thenumber 23 24 2 11=2= (1:011) 2 2 would b e repre s ented by 0 E =2 1.0110000000000000000000 ; 1 andthenumber 6 71 = (1:000111) 2 2 would b e repre s ented by 0 E =6 1.0001110000000000000000 : Toavoid confus ion, the exp onent E , whichisactually store d in a binary repre s entation, i s shown in decimal for themoment. The oatingpoint repre s entation of a nonzero number is unique as long as we require that1 m<2. If it were not for thi s requirement, thenumber 11=2 could also b e wr itten 4 (0:01011) 2 2 and could therefore b e repre s ented by 0 E =4 0.0101100000000000000000 : However, thi s i s not allowe d s ince b =0 andsom<1. A more intere sting 0 example i s 1=10 = (0:0001100110011 :::) : 2 Since thi s binary expans ion i s in nite, wemust truncate the expans ion some- where. (An alter native, namely rounding, i s di scuss e d later.) The s imple st way totruncatethe expans ion to 23 bitswould givethe representation 0 E =0 0.0001100110011001100110 ; butthi s means m<1 s ince b =0.Aneven wors e choice of repre s entation 0 would b e the following: s ince 4 1=10 = (0:00000001100110011 :::) 2 ; 2 thenumb er could b e repre s ented by 0 E =4 0.0000000110011001100110 : Thi s i s clearly a bad choice s ince le ss of the binary expans ion of 1=10 i s store d, due tothe space wasted bythe leading zeros in the s igni cand eld. This is the reason why m<1,i.e.b =0, is not al lowed. The only allowable 0 repre s entation for 1=10 us e s thefactthat 4 1=10 = (1:100110011 :::) 2 ; 2 2 givingthe repre s entation 0 E = 4 1.1001100110011001100110 : Thi s repre s entation includes more of the binary expansion of 1=10 than the others, and i s said tobenormalized, s ince b = 1, i.e. m> 1. Thus noneof 0 theavailable bitsiswasted bystor ing leading zeros. We can s ee f rom thi s example whythename oating point is used: the binary point of thenumber 1=10 can b e oated toanyposition in the bitstr ing we likebychoosingtheappropr iateexponent: thenormalize d repre s entation, with b =1,istheone whichshould b e always b e us e d when p oss ible. It i s 0 clear that an irrational number suchas i s also repre s ente d most accurately by a normalize d repre s entation: s igni cand bitsshould not b e wasted by stor ing leading zeros. However, thenumber zero i s sp ecial. It cannot b e normalize d, s ince all the bitsinits repre s entation are zero. The exp onent E i s irrelevantand can b e s et to zero. Thus, zero could b e repre s ented as 0 E =0 0.0000000000000000000000 : The gap b etween thenumber1andthenext large st oatingpointnumber 1 i s calle d the precision of the oatingpointsystem, or, often, the machine precision,andweshall denotethi s by .Inthesystem just described, the next oating p oint bigger than 1 i s 1:0000000000000000000001; 22 withthe last bit b =1.Therefore, the preci s ion i s =2 . 22 Exerci s e 1 What is the smal lest possible positive normalized oating point number using the system just described? 1 m< Exerci s e 2 Could nonzero numbers instead be normalizedsothat 2 1? Would this bejustasgood? It i s quite instructivetosuppose thatthe computer wordsizeismuch smaller than 32 bitsandworkoutindetail whatallthe p oss ible oating numb ers are in such a cas e. Suppose thatthe s igni cand eld has ro om only tostore b :b b ,andthatthe only p oss ible value s for the exp onent E are 1, 0 1 2 0and1.Weshall call thi s system our toy oating point system.Thesetof toy oating p ointnumbers is shown in Figure 1 1 Actually,the usual de nition of preci s ion i s onehalf of thi s quantity, for reasons that will b ecomeapparentinthenext s ection. Weprefertoomitthe f actor of onehalf in the de nition . 3 - 0 1 2 3 :::::: Figure 1: TheToyFloatingPointNumbers 1 The large st number is (1:11) 2 =(3:5) ,andthesmalle st p os itivenor- 2 10 1 malize d number is (1:00) 2 =(0:5) . All of thenumbers shown are 2 10 normalize d except zero. Since thenext oating p ointnumb er bigger than 1 i s 1.25, the preci s ion of thetoysystem i s =0:25. Notethatthegapbetween oating p ointnumbers becomes smal ler as themagnitudeofthenumb ers thems elve s get smaller, and bigger as thenumb ers get bigger. Notealsothat the gap b etween b etween zero andthesmallest positivenumber is much bigger than thegapbetween thesmalle st p os itivenumber andthenext p os itive numb er. Weshall showinthenext s ection howthi s gap can b e \ lle d in" withtheintro duction of \subnormal numb ers". 2 IEEE FloatingPointRepresentation In the 1960's and 1970's, eachcomputer manuf acturer developed itsown oating p ointsystem, leadingto a lot of incons i stency as tohowthe same program b ehave d on di erentmachines. For example, although most machine s us e d binary oatingpoint systems, the IBM 360/370 s er ie s, which dominate d computingdur ingthi s p er io d, us e d a hexadecimal bas e, i.e. num- E bers were repre s ented as m 16 .Other machines, such as HP calculators, usedadecimal oatingpointsystem. Through the e ortsofseveral computer scientists, particularly W. Kahan, a binary oating p ointstandard was developed in the early 1980's and, most imp ortantly, followed very carefully bythe pr incipal manuf acturers of oatingpointchips for p ersonal computers, namely Intel andMotorola. Thi s standard has b ecome known as theIEEE oating p ointstandard s ince it was developed andendors e d bya working 2 committee of the Institute for Electr ical and Electronics Engineers. (There i s also a decimal vers ion of thestandard butweshall not di scuss thi s.) The IEEE standard has three very imp ortant requirements: 2 ANSI/IEEE Std 754-1985. Thanks to Jim Demmel for intro ducingtheauthor tothe standard. 4 cons i stent representation of oatingpointnumb ers across all machines adoptingthestandard correctly rounded arithmetic (tobeexplained in thenext s ection) cons i stentand s ens ible treatment of exceptional s ituations suchasdi- vi s ion byzero(to b e di scuss e d in the followingsection). We will not describe thestandard in detail, butwe will cover themain p oints. Westart withthe following obs ervation. In thelastsection, wechos e to E ,where 1 m<2, i.e. normalize a nonzero number x so that x = m 2 m =(b :b b b :::) ; 0 1 2 3 2 with b =1.Inthe s imple oatingpointmodel di scuss e d in the previous 0 s ection, westore d the leading nonzero bit b in the rst p os itionofthe eld 0 provided for m.Note, however, that s ince weknowthi s bit has thevalue one, it is not necessary to storeit. Cons equently,we can us e the 23 bitsofthe s igni cand eld tostore b ;b ;:::;b instead of b ;b ;:::;b 2, changingthe 1 2 23 0 1 2 22 23 machine preci s ion f rom =2 to =2 : Since the bitstr ingstore d in the s igni cand eld i s now actually the fractional part of the s igni cand, weshall refer henceforthtothe eldasthe fraction eld. Given a str ing of bitsinthe f raction eld, it i s nece ssary toimaginethatthesymb ols \1." app ear in f ront of thestring, even though these symb ols are not store d. Thi s technique i s calle d hidden bit normalization andwas us e d by Digital for theVax machine in thelate 1970's. Exerci s e 3 Show that the hidden bit technique does not result in a more accurate representation of 1=10. Would this stil l be true if we had started with a eld width of 24 bits before applying the hidden bit technique? Note an imp ortant p oint: s ince zero cannot b e normalize d tohavea leading nonzero bit, hidden bit repre s entation requires a special technique for storing zero.Weshall s ee whatthi s i s shortly.Apatter n of all zeros in the f raction eld of a normalize d numb er repre s entsthe s igni cand 1.0, not 0.0.

Load more