Question Floating Point Numbers Learning Outcomes Number Representation

31/10/2011 Floating Point Numbers Question 6.626068 x 10 -34 Do you have your laptop here? A yes B no C what’s a laptop D where is here? E none of the above • Eddie Edwards • [email protected] • https://www.doc.ic.ac.uk/~eedwards/compsys • Heavily based on notes from Naranker Dulay Eddie Edwards 2008 Floating Point Numbers 7.2 Learning Outcomes Number Representation - recap We have seen how to represent integers At the end of this lecture you should positive integers as binary, octal and hexadecimal Understand the representation of real numbers in other bases negative integers as one's complement, two's complement, Excess-n (e.g 2) BCD, ASCI..... Know the mantissa/exponent representation (in base 10, 2 etc.) We have also seen how to perform arithmetic Be able to express numbers in normalised/un-normalised form Addition Be able to convert fractions/decimals between bases by adding the binary bits Know the IEEE 754 floating point format (32 and 64 bit) overflow conditions Know the special values and when they should occur Multiplication/division Understand the issues of accuracy in floating point representation same “long hand” techniques as base 10 slightly complicated in two's compliment can take the absolute values, perform calculation, then sort out the sign Eddie Edwards 2008 Floating Point Numbers 7.3 Eddie Edwards 2008 Floating Point Numbers 7.4 1 31/10/2011 Numbers: Large, Small, Fractional Large Integers Population of the World 6,879,009,033 people Example: How can we represent integers up to 30 decimal digits long? US National Debt (1990) $3, 144, 830, 000, 000 30 1 Light Year 9, 130, 000, 000, 000 km Binary log (10 ) = ~ 100 bits (1 decimal digit = 3.322 bits) Mass of the Sun 2, 000, 000, 000, 000, 000, 000, 000, 000, 2 000 kg BCD 30 x 4-bit = 120 bits Diameter of an Electron 0.000, 000, 000, 000, 000, 000, 01 m Mass of an Electron 0.000, 000, 000, 000, 000, 000, 000, 000, ASCII 30 x 8-bit = 240 bits 000, 000, 9 kg Smallest Measurable 0.000, 000, 000, 000, 000, 000, 000, 000, length of Time 000, 000, 000, 000, 000, 000, 1 sec Pi (to 8 decimal places) 3.14159265... The Pentium includes instructions for writing multi-precision integer Standard Rate of VAT 17.5 routines using Binary Coded Decimal (BCD) Arithmetic & ASCII arithmetic Eddie Edwards 2008 Floating Point Numbers 7.5 Eddie Edwards 2008 Floating Point Numbers 7.6 Floating Pointing Numbers Zones of Expressibility Scientific Notation Example: Assume numbers are formed with a Signed 3-digit Mantissa and a Signed 2-digit Exponent –99 +99 Numbers span from ±.001 x 10 to ±.999 x 10 Number = M x 10 E Decimal E Number = M x 2 Binary Zones of Expressibility Zero M is the Mantissa (or Significand or Fraction or Argument ) Negative Expressible Negative Positive Expressible Positive E is the Exponent (or Characteristic) Overflow –ve Nums Underflow Underflow +ve Nums Overflow 10 (or for binary, 2) is the Radix (or Base ) Digits (bits) in Exponent -> Range (Bigness/Smallness) Digits (bits) in Mantissa -> Precision (Exactness) +99 –.999 x 10 –.001 x 10 –99 0 +. 001 x 10 –99 +. 999 x 10 +99 Eddie Edwards 2008 Floating Point Numbers 7.7 Eddie Edwards 2008 Floating Point Numbers 7.8 2 31/10/2011 Reals vs. Floating Point Numbers Normalised Floating Point Numbers Floating Point Numbers can have multiple forms, e.g. Mathematical Real Floating-point Number 0.232 x 10 4 = 2.32 x 10 3 = 23.2 x 10 2 Range -Infinity .. +Infinity Finite = 2 320 x 10 0 = 232 000 x 10 -2 No. of Values Infinite Finite For hardware implementation its desirable for each number to have a Spacing Constant & Infinite Gap between numbers varies unique representation => Normalised Form Errors ? Incorrect results are We’ll normalise Mantissa's in the Range [ 1 .. R ) where R is the Base, possible e.g.: [ 1 .. 10 ) for DECIMAL [ 1 .. 2 ) for BINARY Eddie Edwards 2008 Floating Point Numbers 7.9 Eddie Edwards 2008 Floating Point Numbers 7.10 Normalised Forms (Base 10) Binary & Decimal Fractions Binary Decimal Number Normalised Form 0.1 0.5 23.2 x 10 4 2.32 x 10 5 0.01 0.25 -3 -3 –4.01 x 10 –4.01 x 10 0.001 0.125 0.11 0.75 343 000 x 10 3.43 x 10 5 0.111 0.875 0.000 000 098 9 x 10 0 9.89 x 10 -8 0.011 0.375 0.101 0.625 Eddie Edwards 2008 Floating Point Numbers 7.11 Eddie Edwards 2008 Floating Point Numbers 7.12 3 31/10/2011 Binary Fraction to Decimal Fraction Decimal Fraction to Binary Fraction Example: What is 0.6875 in binary ? Example : What is the binary value 0.011010 in decimal ? 10 . 0 1 1 0 1 0.6875 * 2= 1 .3750 0.3750 * 2= 0 .7500 32 16 8 4 2 1 Sum = 888+8+++4444++++11 = 131313 0.7500 * 2= 1 .5000 0.5000 * 2= 1 .0000 0.0000 * 2= 0 Answer: 13 / 32 = 0.40625 Answer: 0. 1011 2 0.1 Example : What is 0 . 0 0011 0011 00 in decimal ? Example : What is 10 in binary ? Answer: (((323232+32 +++16161616++++2222++++1111)))) / 512 = 51 /// 512 = 0.099609375 Eddie Edwards 2008 Floating Point Numbers 7.13 Eddie Edwards 2008 Floating Point Numbers 7.14 0.1 10 in binary? Normalised Binary Floating Point Numbers What is 0.1 in binary ? 10 Number Normalised Binary Normalised Decimal 0.1 * 2 = 0 .2 0.2 * 2 = 0 .4 100.01 x 21 1.0001 x 23 8.5 x 10 0 0.4 * 2 = 0 .8 0.8 * 2 = 1 .6 1010.11 x 22 1.01011 x 25 4.3 x 10 1 0.6 * 2 = 1 .2 0.2 * 2 = 0 .4 and then repeating 0.4, 0.8, 0.6 0.00101 x 2-2 1.01 x 2-5 3.90625 x 10 -2 1100101 x 2-2 1.100101 x 2+4 9.86328125 x 10 -2 Answer 0.0 0011 0011 0011 0011 0011 0011 ..... 2 Eddie Edwards 2008 Floating Point Numbers 7.15 Eddie Edwards 2008 Floating Point Numbers 7.16 4 31/10/2011 Floating Point Multiplication Truncation and Rounding E1 E2 N1 x N2 = (M1 x 10 ) x ( M2 x 10 ) For many computations the result of a floating point operation can be E1 E2 too large to store in the Mantissa. = ( M1 x M2) x ( 10 x 10 ) E1+E2 = ( M1 x M2) x (10 ) Example: with a 2-digit mantissa 1 1 2 2.3 x 101010 11 *** 2.3 x 101010 11 = 5.29 x 101010 22 i.e. We multiply the Mantissas and Add the Exponents 2 TRUNCATION => 5.2 x 10 (Biased Error) 1 0 5.3 x 10 2 Example: 20 * 6 = ( 2.0 x 10 ) x ( 6.0 x 10 ) ROUNDING => (Unbiased Error) 1+0 = ( 2.0 x 6.0 ) x ( 10 ) = 12.0 x 10 1 2 We must also normalise the result, so the final answer = 1.2 x 10 Eddie Edwards 2008 Floating Point Numbers 7.17 Eddie Edwards 2008 Floating Point Numbers 7.18 Floating Point Addition Exponent Overflow & Underflow EXPONENT OVERFLOW occurs when the Result is too Large 3 2 A floating point addition such as 4.5 x 101010 33 +++ 6.7 x 101010 22 is not a simple mantissa i.e. when the Result’s Exponent > Maximum Exponent addition, unless the exponents are the same 99 99 198 => we need to ensure that the mantissas are aligned first. Example : if Max Exponent is 99 then 10 * 10 = 10 (overflow) NNN1N111 +++ NNN2N222 = ( MMM1M1 x 101010 EEE1E1 ) + ( MMM2M2 x 101010 EEE2E2 ))) On Overflow => Proceed with incorrect value or infinity value or raise an EEE2E222---EEE1E111 EEE1E111 = ( MMM1M111 +++ MMM2M222 x 101010 ) x) x 101010 Exception To align, choose the number with the smaller exponent & shift mantissa the corresponding number of digits to the right. EXPONENT UNDERFLOW occurs when the Result is too Small Example: 4.5 xxx101010 333 +++ 6.7 x 101010 222 === 4.5 xxx101010 333 +++ 0.67 xxx 101010 333 i.e. when the Result’s Exponent < Smallest Exponent 333 -99 -99 -198 = 5.17 x 101010 Example : if Min Exp. is –99 then 10 * 10 = 10 (underflow) = 5.2 x 101010 333 (rounded) On Underflow => Proceed with zero value or raise an Exception Eddie Edwards 2008 Floating Point Numbers 7.19 Eddie Edwards 2008 Floating Point Numbers 7.20 5 31/10/2011 Comparing Floating-Point Values Because of the potential for producing in-exact results, comparing floating-point values should account for close results. If we Know the liKely magnitude and precision of results we can adjust for closeness ( epsilon ), for example, for equality we can: a = b a > ( b - e ) AND a < ( b + e ) Floating point numbers - questions a = 1 a > (1 - 0.000005) AND a < 1 + 0.000005 a > 0.999995 AND a < 1.000005 Alternatively we can calculate | a - b | < e e.g. | a - 1 | < 0.000005 A more general approach is to calculate the closeness based on the relative size of the two numbers being compared. Eddie Edwards 2008 Floating Point Numbers 7.21 Eddie Edwards 2008 Floating Point Numbers 7.22 What is the binary notation for 3.625 IEEE Floating-Point Standard A 11.011 B 10.101 C 11.101 D 101.11 E 11.11 IEEE: Institute of Electrical & Electronic Engineers (USA) Comprehensive standard for Binary Floating-Point Arithmetic What is binary 0.1101 in decimal? Widely adopted => Predictable results independent of architecture The standard defines: A 0.8125 B 0.8 C 0.8625 D 0.9125 E 0.7865 The format of binary floating-point numbers Semantics of arithmetic operations Rules for error conditions Eddie Edwards 2008 Floating Point Numbers 7.23 Eddie Edwards 2008 Floating Point Numbers 7.24 6 31/10/2011 Single Precision Format (32-bit) Exponent Field Sign Exponent Significand In the IEEE Standard, exponents are stored as Excess (Bias) Values, not as 2’s S E F Complement Values 1 bit 8 bits 23 bits Example: In 8-bit Excess 127 –127 would be held as 0000 0000 The mantissa is called the SIGNIFICAND in the IEEE standard ..

Question Floating Point Numbers Learning Outcomes Number Representation

Design of 32 Bit Floating Point Addition and Subtraction Units Based on IEEE 754 Standard Ajay Rathor, Lalit Bandil Department of Electronics & Communication Eng

Floating Point Representation (Unsigned) Fixed-Point Representation

Floating Point Numbers and Arithmetic

Floating Point Numbers Dr

A Decimal Floating-Point Speciftcation

Chap02: Data Representation in Computer Systems

Names for Standardized Floating-Point Formats

Largest Denormalized Negative Infinity ---Number with Hex

Chap02: Data Representation in Computer Systems

EECC250 - Shaaban #1 Lec # 0 Winter99 11-29-99 Number Systems Used in Computers

Floating-Point (FP) Numbers

Floating Point Representation (Unsigned) Fixed-Point Representation