Floating Point Representation

FloatingPoint Repre s entation Michael L. Overton c copyr ight 1996 1 Computer Repre s entation of Numbers Computers whichworkwith real ar ithmetic us e a system calle d oating point. Suppose a real number x has thebinary expans ion E x = m 2 ; where 1 m<2 and m =(b :b b b :::) : 0 1 2 3 2 Tostore a number in oating p oint repre s entation, a computer word i s divided into 3 elds, repre s entingthe s ign, theexponent E ,andthe s igni cand m re sp ectively. A 32-bit word could b e divided into elds as follows: 1 bit for the s ign, 8 bitsfortheexponentand 23 bits for the s igni cand. Since the exp onent eld i s 8 bits, it can b e us e d to represent exp onentsbetween 128 and 127. The s igni cand eld can store the rst23bitsofthebinary repre s entation of m,namely b :b :::b : 0 1 22 If b ;b ;::: are not all zero, thi s oating p oint repre s entation of x i s not 23 24 exact butapproximate. A numberiscalleda oating point number if it can b e store d exactly on the computer us ingthegiven oatingpoint repre s entation scheme, i.e. in thi s cas e, b ;b ;::: are all zero. For example, thenumber 23 24 2 11=2= (1:011) 2 2 would b e repre s ented by 0 E =2 1.0110000000000000000000 ; 1 andthenumber 6 71 = (1:000111) 2 2 would b e repre s ented by 0 E =6 1.0001110000000000000000 : Toavoid confus ion, the exp onent E , whichisactually store d in a binary repre s entation, i s shown in decimal for themoment. The oatingpoint repre s entation of a nonzero number is unique as long as we require that1 m<2. If it were not for thi s requirement, thenumber 11=2 could also b e wr itten 4 (0:01011) 2 2 and could therefore b e repre s ented by 0 E =4 0.0101100000000000000000 : However, thi s i s not allowe d s ince b =0 andsom<1. A more intere sting 0 example i s 1=10 = (0:0001100110011 :::) : 2 Since thi s binary expans ion i s in nite, wemust truncate the expans ion some- where. (An alter native, namely rounding, i s di scuss e d later.) The s imple st way totruncatethe expans ion to 23 bitswould givethe representation 0 E =0 0.0001100110011001100110 ; butthi s means m<1 s ince b =0.Aneven wors e choice of repre s entation 0 would b e the following: s ince 4 1=10 = (0:00000001100110011 :::) 2 ; 2 thenumb er could b e repre s ented by 0 E =4 0.0000000110011001100110 : Thi s i s clearly a bad choice s ince le ss of the binary expans ion of 1=10 i s store d, due tothe space wasted bythe leading zeros in the s igni cand eld. This is the reason why m<1,i.e.b =0, is not al lowed. The only allowable 0 repre s entation for 1=10 us e s thefactthat 4 1=10 = (1:100110011 :::) 2 ; 2 2 givingthe repre s entation 0 E = 4 1.1001100110011001100110 : Thi s repre s entation includes more of the binary expansion of 1=10 than the others, and i s said tobenormalized, s ince b = 1, i.e. m> 1. Thus noneof 0 theavailable bitsiswasted bystor ing leading zeros. We can s ee f rom thi s example whythename oating point is used: the binary point of thenumber 1=10 can b e oated toanyposition in the bitstr ing we likebychoosingtheappropr iateexponent: thenormalize d repre s entation, with b =1,istheone whichshould b e always b e us e d when p oss ible. It i s 0 clear that an irrational number suchas i s also repre s ente d most accurately by a normalize d repre s entation: s igni cand bitsshould not b e wasted by stor ing leading zeros. However, thenumber zero i s sp ecial. It cannot b e normalize d, s ince all the bitsinits repre s entation are zero. The exp onent E i s irrelevantand can b e s et to zero. Thus, zero could b e repre s ented as 0 E =0 0.0000000000000000000000 : The gap b etween thenumber1andthenext large st oatingpointnumber 1 i s calle d the precision of the oatingpointsystem, or, often, the machine precision,andweshall denotethi s by .Inthesystem just described, the next oating p oint bigger than 1 i s 1:0000000000000000000001; 22 withthe last bit b =1.Therefore, the preci s ion i s =2 . 22 Exerci s e 1 What is the smal lest possible positive normalized oating point number using the system just described? 1 m< Exerci s e 2 Could nonzero numbers instead be normalizedsothat 2 1? Would this bejustasgood? It i s quite instructivetosuppose thatthe computer wordsizeismuch smaller than 32 bitsandworkoutindetail whatallthe p oss ible oating numb ers are in such a cas e. Suppose thatthe s igni cand eld has ro om only tostore b :b b ,andthatthe only p oss ible value s for the exp onent E are 1, 0 1 2 0and1.Weshall call thi s system our toy oating point system.Thesetof toy oating p ointnumbers is shown in Figure 1 1 Actually,the usual de nition of preci s ion i s onehalf of thi s quantity, for reasons that will b ecomeapparentinthenext s ection. Weprefertoomitthe f actor of onehalf in the de nition . 3 - 0 1 2 3 :::::: Figure 1: TheToyFloatingPointNumbers 1 The large st number is (1:11) 2 =(3:5) ,andthesmalle st p os itivenor- 2 10 1 malize d number is (1:00) 2 =(0:5) . All of thenumbers shown are 2 10 normalize d except zero. Since thenext oating p ointnumb er bigger than 1 i s 1.25, the preci s ion of thetoysystem i s =0:25. Notethatthegapbetween oating p ointnumbers becomes smal ler as themagnitudeofthenumb ers thems elve s get smaller, and bigger as thenumb ers get bigger. Notealsothat the gap b etween b etween zero andthesmallest positivenumber is much bigger than thegapbetween thesmalle st p os itivenumber andthenext p os itive numb er. Weshall showinthenext s ection howthi s gap can b e \ lle d in" withtheintro duction of \subnormal numb ers". 2 IEEE FloatingPointRepresentation In the 1960's and 1970's, eachcomputer manuf acturer developed itsown oating p ointsystem, leadingto a lot of incons i stency as tohowthe same program b ehave d on di erentmachines. For example, although most machine s us e d binary oatingpoint systems, the IBM 360/370 s er ie s, which dominate d computingdur ingthi s p er io d, us e d a hexadecimal bas e, i.e. num- E bers were repre s ented as m 16 .Other machines, such as HP calculators, usedadecimal oatingpointsystem. Through the e ortsofseveral computer scientists, particularly W. Kahan, a binary oating p ointstandard was developed in the early 1980's and, most imp ortantly, followed very carefully bythe pr incipal manuf acturers of oatingpointchips for p ersonal computers, namely Intel andMotorola. Thi s standard has b ecome known as theIEEE oating p ointstandard s ince it was developed andendors e d bya working 2 committee of the Institute for Electr ical and Electronics Engineers. (There i s also a decimal vers ion of thestandard butweshall not di scuss thi s.) The IEEE standard has three very imp ortant requirements: 2 ANSI/IEEE Std 754-1985. Thanks to Jim Demmel for intro ducingtheauthor tothe standard. 4 cons i stent representation of oatingpointnumb ers across all machines adoptingthestandard correctly rounded arithmetic (tobeexplained in thenext s ection) cons i stentand s ens ible treatment of exceptional s ituations suchasdi- vi s ion byzero(to b e di scuss e d in the followingsection). We will not describe thestandard in detail, butwe will cover themain p oints. Westart withthe following obs ervation. In thelastsection, wechos e to E ,where 1 m<2, i.e. normalize a nonzero number x so that x = m 2 m =(b :b b b :::) ; 0 1 2 3 2 with b =1.Inthe s imple oatingpointmodel di scuss e d in the previous 0 s ection, westore d the leading nonzero bit b in the rst p os itionofthe eld 0 provided for m.Note, however, that s ince weknowthi s bit has thevalue one, it is not necessary to storeit. Cons equently,we can us e the 23 bitsofthe s igni cand eld tostore b ;b ;:::;b instead of b ;b ;:::;b 2, changingthe 1 2 23 0 1 2 22 23 machine preci s ion f rom =2 to =2 : Since the bitstr ingstore d in the s igni cand eld i s now actually the fractional part of the s igni cand, weshall refer henceforthtothe eldasthe fraction eld. Given a str ing of bitsinthe f raction eld, it i s nece ssary toimaginethatthesymb ols \1." app ear in f ront of thestring, even though these symb ols are not store d. Thi s technique i s calle d hidden bit normalization andwas us e d by Digital for theVax machine in thelate 1970's. Exerci s e 3 Show that the hidden bit technique does not result in a more accurate representation of 1=10. Would this stil l be true if we had started with a eld width of 24 bits before applying the hidden bit technique? Note an imp ortant p oint: s ince zero cannot b e normalize d tohavea leading nonzero bit, hidden bit repre s entation requires a special technique for storing zero.Weshall s ee whatthi s i s shortly.Apatter n of all zeros in the f raction eld of a normalize d numb er repre s entsthe s igni cand 1.0, not 0.0.

Floating Point Representation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support