Floating Point Numbers and Arithmetic

Overview Floating Point Numbers & • Floating Point Numbers Arithmetic • Motivation: Decimal Scientific Notation – Binary Scientific Notation • Floating Point Representation inside computer (binary) – Greater range, precision • Decimal to Floating Point conversion, and vice versa • Big Idea: Type is not associated with data • MIPS floating point instructions, registers CS 160 Ward 1 CS 160 Ward 2 Review of Numbers Other Numbers •What about other numbers? • Computers are made to deal with –Very large numbers? (seconds/century) numbers 9 3,155,760,00010 (3.1557610 x 10 ) • What can we represent in N bits? –Very small numbers? (atomic diameter) -8 – Unsigned integers: 0.0000000110 (1.010 x 10 ) 0to2N -1 –Rationals (repeating pattern) 2/3 (0.666666666. .) – Signed Integers (Two’s Complement) (N-1) (N-1) –Irrationals -2 to 2 -1 21/2 (1.414213562373. .) –Transcendentals e (2.718...), π (3.141...) •All represented in scientific notation CS 160 Ward 3 CS 160 Ward 4 Scientific Notation Review Scientific Notation for Binary Numbers mantissa exponent Mantissa exponent 23 -1 6.02 x 10 1.0two x 2 decimal point radix (base) “binary point” radix (base) •Computer arithmetic that supports it called •Normalized form: no leadings 0s floating point, because it represents (exactly one digit to left of decimal point) numbers where binary point is not fixed, as it is for integers •Alternatives to representing 1/1,000,000,000 –Declare such variable in C as float –Normalized: 1.0 x 10-9 –Not normalized: 0.1 x 10-8, 10.0 x 10-10 CS 160 Ward 5 CS 160 Ward 6 Floating Point Representation [1] Floating Point Representation [2] yyyy 38 • Normal format: +1.xxxxxxxxxxtwo*2 two • What if result too large? (> 2.0x10 ) • Multiple of Word Size (32 bits) – Overflow! – Overflow => Exponent larger than 31 30 23 22 0 represented in 8-bit Exponent field S Exponent Significand • What if result too small? (>0, < 2.0x10-38 ) 1 bit 8 bits 23 bits – Underflow! • S represents Sign of significand (number) – Underflow => Negative exponent larger than Exponent represents y’s represented in 8-bit Exponent field Significand represents x’s • How to reduce chances of overflow or underflow? • Represent numbers as small as 2.0 x 10-38 to as large as 2.0 x 1038 CS 160 Ward 7 CS 160 Ward 8 Double Precision Fl. Pt. IEEE 754 Fl. Pt. Standard [1] • Next Multiple of Word Size (64 bits) • Single Precision, DP similar 31 30 20 19 0 • Sign bit: 1 means negative S Exponent Significand 0 means positive 1 bit 11 bits 20 bits Significand (cont’d) • Significand: 32 bits – To pack more bits, leading 1 implicit for • Double Precision (vs. Single Precision) normalized numbers – 1 + 23 bits single, 1 + 52 bits double – C variable declared as double – always true: 0 < Significand < 1 – Represent numbers almost as small as 2.0 x 10-308 to almost as large as 2.0 x 10308 • Note: 0 has no leading 1, so reserve – But primary advantage is greater accuracy exponent value 0 just for number 0 due to larger significand CS 160 Ward 9 CS 160 Ward 10 IEEE 754 Fl. Pt. Standard [2] IEEE 754 Fl. Pt. Standard [3] •Negative Exponent? •Called Biased Notation, where bias is number –2’s comp? 1.0 x 2-1 v. 1.0 x2+1 (1/2 v. 2) subtract to get real number –IEEE 754 uses bias of 127 for single precision 1/2 0 1111 1111 000 0000 0000 0000 0000 0000 –Subtract 127 from Exponent field to get actual value for 2 0 0000 0001 000 0000 0000 0000 0000 0000 exponent –1023 is bias for double precision –This notation using integer compare of 1/2 v. 2 makes 1/2 > 2! • Summary (single precision): •Instead, pick notation 0000 0001 is most 3130 2322 0 negative, and 1111 1111 is most positive S Exponent Significand –1.0 x 2-1 v. 1.0 x2+1 (1/2 v. 2) 1 bit 8 bits 23 bits 1/2 0 0111 1110 000 0000 0000 0000 0000 0000 • (-1)S x (1 + Significand) x 2(Exponent-127) 2 0 1000 0000 000 0000 0000 0000 0000 0000 – Double precision identical, except with exponent bias of 1023 and significand of 52 bits CS 160 Ward 11 CS 160 Ward 12 Understanding the Significand [1] Understanding the Significand [2] • Method 1 (Fractions): • Method 2 (Place Values): – Convert from scientific notation – In decimal: 0.34010 => 34010/100010 0 -1 => 3410/10010 – In decimal: 1.6732 = (1x10 ) + (6x10 ) + (7x10-2) + (3x10-3) + (2x10-4) – In binary: 0.110 => 110 /1000 = 6 /8 2 2 2 10 10 0 -1 -2 => 11 /100 = 3 /4 – In binary: 1.1001 = (1x2 ) + (1x2 ) + (0x2 ) 2 2 10 10 + (0x2-3) + (1x2-4) – Advantage: less purely numerical, more thought – Interpretation of value in each position extends oriented; this method usually helps people beyond the decimal/binary point understand the meaning of the significand better – Advantage: good for quickly calculating significand value; use this method for translating FP numbers CS 160 Ward 13 CS 160 Ward 14 Ex: Converting Binary FP to Decimal Continuing Example: Binary to ??? 0 0110 1000 101 0101 0100 0011 0100 0010 0011 0100 0101 0101 0100 0011 0100 0010 • Convert 2’s Comp. Binary to Integer: • Sign: 0 => positive 229+228+226+222+220+218+216+214+29+28+26+ 21 • Exponent: = 878,003,010ten – 0110 1000two = 104ten • Convert Binary to Instruction: – Bias adjustment: 104 - 127 = -23 0011 0100 0101 0101 0100 0011 0100 0010 • Significand: 13 221 17218 – 1 + 1x2-1+ 0x2-2 + 1x2-3 + 0x2-4 + 1x2-5 +... =1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22 ori $s5, $v0, 17218 = 1.0 + 0.666115 • Convert Binary to ASCII: -23 -7 • Represents: 1.666115ten*2 ~ 1.986*10 0011 0100 0101 0101 0100 0011 0100 0010 (about 2/10,000,000) 4UCB CS 160 Ward 15 CS 160 Ward 16 Big Idea: Type not associated with Data Converting Decimal to FP [1] 0011 0100 0101 0101 0100 0011 0100 0010 • Simple Case: If denominator is an exponent of •What does bit pattern mean: 2 (2, 4, 8, 16, etc.), then it’s easy. –1.986 *10-7? 878,003,010? “4UCB”? • Show IEEE 32-bit representation of -0.75 ori $s5, $v0, 17218? – -0.75 = -3/4 •Data can be anything; operation of instruction that accesses operand determines its type! – -11two/100two = -0.11two -1 –Side-effect of stored program concept: – Normalized to -1.1two x 2 instructions stored as numbers – (-1)S x (1 + Significand) x 2(Exponent-127) •Power/danger of unrestricted addresses/ pointers: use ASCII as Fl. Pt., instructions as data, integers – (-1)1 x (1 + .100 0000 ... 0000) x 2(126-127) as instructions, ... 1 0111 1110 100 0000 0000 0000 0000 0000 CS 160 Ward 17 CS 160 Ward 18 Converting Decimal to FP [2] Converting Decimal to FP [3] • Not So Simple Case: If denominator is not an • Fact: All rational numbers have a exponent of 2. repeating pattern when written out in – Then we can’t represent number precisely, but that’s decimal. why we have so many bits in significand: for • Fact: This still applies in binary. precision – Once we have significand, normalizing a number to • To finish conversion: get the exponent is easy. – Write out binary number with repeating – So how do we get the significand of a neverending pattern. number? – Cut it off after correct number of bits (different for single vs. double precision). – Derive Sign, Exponent and Significand fields. CS 160 Ward 19 CS 160 Ward 20 Hairy Example [1] Hairy Example [2] • How to represent 1/3 in IEEE 32-bit? • Sign: 0 • 1/3 • Exponent = -2 + 127 = 125 =01111101 = 0.33333…10 10 2 = 0.25 + 0.0625 + 0.015625 + 0.00390625 • Significand = 0101010101… + 0.0009765625 + … 0 0111 1101 0101 0101 0101 0101 0101 010 = 1/4 + 1/16 + 1/64 + 1/256 + 1/1024 + … = 2-2 + 2-4 + 2-6 + 2-8 + 2-10 + … 0 = 0.0101010101… 2 * 2 -2 = 1.0101010101… 2 * 2 CS 160 Ward 21 CS 160 Ward 22 Representation for +/- Infinity Representation for 0 • In FP, divide by zero should produce +/- • Represent 0? infinity, not overflow. – exponent all zeroes • Why? – significand all zeroes too – OK to do further computations with infinity – What about sign? e.g., X/0 > Y may be a valid comparison – +0: 0 00000000 00000000000000000000000 • IEEE 754 represents +/- infinity – -0: 1 00000000 00000000000000000000000 • Why two zeroes? – Most positive exponent reserved for infinity – Helps in some limit comparisons – Significands all zeroes CS 160 Ward 23 CS 160 Ward 24 Special Numbers FP Addition • What have we defined so far? • Much more difficult than with integers (Single Precision) • Can’t just add significands Exponent Significand Object • How do we do it? 00 0 – De-normalize to match exponents 0 nonzero Denormalized – Add significands to get resulting significand 1-254 anything +/- fl. pt. # – Keep the same exponent 255 0 +/- infinity – Normalize (possibly changing exponent) 255 nonzero ??? • Note: If signs differ, just perform a subtract instead. CS 160 Ward 25 CS 160 Ward 26 FP Subtraction FP Addition/Subtraction • Similar to addition • Problems in implementing FP add/sub: • How do we do it? – If signs differ for add (or same for sub), what will be the sign of the result? – De-normalize to match exponents – Subtract significands • Question: How do we integrate this into the integer arithmetic unit? – Keep the same exponent – Normalize (possibly changing exponent) • Answer: We don’t! CS 160 Ward 27 CS 160 Ward 28 Fl.

Floating Point Numbers and Arithmetic

TI 83/84: Scientific Notation on Your Calculator

Primitive Number Types

Design of 32 Bit Floating Point Addition and Subtraction Units Based on IEEE 754 Standard Ajay Rathor, Lalit Bandil Department of Electronics & Communication Eng

C++ Data Types

Floating Point Representation (Unsigned) Fixed-Point Representation

Floating Point Numbers

Floating Point Numbers Dr

A Decimal Floating-Point Speciftcation

Princeton University COS 217: Introduction to Programming Systems C Primitive Data Types

36 Scientific Notation

Floating Point CSE410, Winter 2017

Fortran Reference Guide