Machine Numbers: How Floating Point Numbers Are Stored?

Machine numbers: how floating point numbers are stored? Floating-point number representation What do we need to store when representing floating point numbers in a computer? ! = ± 1. & × 2) ! = ± * + sign exponent significand Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines. Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard. Floating-point number representation Numerical form: ! = ± 1. & × 2) Representation in memory: ! = Precisions: IEEE-754 Single precision (32 bits): ! = IEEE-754 Double precision (64 bits): ! = I C f bit 11 52 IEEE-754 Single Precision (32-bit) ! = (−1)O' 1. ) × 2, - . = / + -ℎ234 O3 sign exponent significand (1-bit) (8-bit) (23-bit) 255 use 0 and - 000 .- - OS C flu . l not 0 255 I U ME [ , ] I f Cf 254 M E E- 126,127] I fmtshiftfzsg choice) shift = 127 ( →fl26fmfl2 IEEE-754 Single Precision (32-bit) ! = (−1)' 1. ) × 2, Example: Represent the number ! = −67.125 using IEEE Single- Precision Standard 2 67.125 = 1000011.001 1 = 1.000011001 1×2 IEEE-754 Single Precision (32-bit) # = (−1)) 1. + × 2. = ) 3 + / = 0 + 127 • Machine epsilon (!"): is defined as the distance (gap) between 1 and the next largest floating point number. • Smallest positive normalized FP number: • Largest positive normalized FP number: Special Values: -0 see ! = (−1)+ 1. - × 20 = + 1 - 1) Zero: a f - - - ! = 000 - - - - OO OO -00 000 2) Infinity: +∞ (% = 0) and −∞ % = 1 - - - - - - 0000 0000 ! = 11 . ll l 3) NaN: (results from operations with undefined results) but zeros . ! = Ill . Ill anything id " subnormal 16oo.no#*but-eros# IEEE-754 Double Precision (64-bit) ! = (−1)' 1. ) × 2, 52 53 ✓ p= - . = / + -ℎ234 3 sign (1-bit) exponent significand (11-bit) (52-bit) 2047 - - o C - Ll f Cf - Ill 00.0 . f 00£ - - " Il save 2047 o → special , cases I f C f 2046 2046 -7 -10225 I f Mt shift f ME 1023 shift = (choice ! ) 1mCEl022,W23e IEEE-754 Double Precision (64-bit) # = (−1)) 1. + × 2. = ) 4 + / = 0 + 1023 • Machine epsilon (!"): is defined as the distance (gap) between 1 and the next largestn floating point number. -b I = , yo Em =T.ofo.to × 2-! 2--52 -n 2 308 • Smallest positive normalized FP number: - I "" e to .- - 2- UFL =L. ooo oox 2- = 2- = 308 • Largest positive normalized FP number: "" 't' I - -2-53 = I - - I . = ( 810 OFL Ill H x 2 (I-2-17=2 ) Subnormal (or denormalized) numbers = - - - 01×24 2-52×2-1022 O. 0000 ← -1308 -308 308 O 10 - lo 10 -10308 → ← " 0×2 . 0000 - X = l . ↳ |x=O ×"I€ - 53 - . 01×2 p= I 00001 - X = . " = 52 - Llull - - 11×2 - p X = O. " x 2 - p -52 - X = O - Il O - Coo " 9 - - → + = 0. 0001001 - 01×2 p=4 Subnormal (or denormalized) numbers IEEE-754 Single precision (32 bits): ! = 00000000 $ = 0 smallest# oooo . - 01×2-126=2-23×2426,10-45 IEEE-754 Double precision (64 bits): ! = 00000000000 $ = 0 " . 1×5 ooo . ? smallest# o - 2-52×24022,10-324 Subnormal (or denormalized) numbers • Noticeable gap around zero, present in any floating system, due to normalization • Relax the requirement of normalization, and allow the leading digit to be zero, only when the exponent is at its minimum (! = #) • Computations with subnormal numbers are often slow. Representation in memory (another special case): $ = s all zeros f Numerical\ value: / - . x I x - ⑤ o.b.IE IEEE-754 Double Precision ' 4 l l G l l -3244 K -324 40 log t t Summary for Single Precision 6 = (−1), 1. × 28 = , ; . 9 = " − 127 Stored binary Significand value exponent (") fraction ($) 00000000 0000…0000 zero 00000000 %&' $ ≠ 0 (−1), 0. × 21234 00000001 %&' $ (−1), 1. × 21234 ⋮ ⋮ ⋮ 11111110 %&' $ (−1), 1. × 2235 11111111 %&' $ ≠ 0 NaN 11111111 0000…0000 infinity -- 3 Iclicker question n pent 1=4 , A number system can be represented as ! = ±1. &'&(&)×2 -5 for - ∈ [−5,5] and &4 ∈ {0,1}. 1.000×2 1) What is the smallest positive normalized FP number: a) 0.0625 b) 0.09375 Dc) 0.03125 d) 0.046875 e) 0.125 2B 2) What is the largest positive normalized FP number: a) 28 D b) 60 c) 56 d) 32 20+41-2*1=2%73%760 3) How many additional numbers (positive and negative) can be " represented when using subnormal representation? 0.10-2×2 a) 7 b) 14 c) 3 d) 6 e) 16 D o.O LO O - 001×2-5 4) What is the smallest positive subnormal number? a) 0.00390625 b) 0.00195313 c) 0.03125 d) 0.0136719 0.011 -2-12-3 O - lol Em ILO 5) Determine machine epsilon O - a) 0.0625 b) 0.00390625 c) 0.0117188 d) 0.125 D O - Ill - A number system can be represented as ! = ±1. &'&(&)&*×2 - for . ∈ [−6,6] and &5 ∈ {0,1}. n ↳ # → p=5 1) Let’s say you want to represent the decimal number 19.625 using the binary number system above. Can you represent this number exactly? 1-0000×210 blast integer 2) What is the range of integer numbers that you can✓ represent exactly using this binary system? (1) (1)2=1-0000×20 /l.0000x2⑤=3zJ we ' 1.000¥25 = (2) co 72=1-0000×2 , (310--41)=1.1000×2 2,1/1/1=34 (lull)z= 1.1111×24=61110.

Machine Numbers: How Floating Point Numbers Are Stored?

Scilab Textbook Companion for Numerical Methods for Engineers by S

Extended Precision Floating Point Arithmetic

Introduction & Floating Point NMMC §1.4.1, Atkinson §1.2 1 Floating

1.2 Round-Off Errors and Computer Arithmetic

Floating Point Representation (Unsigned) Fixed-Point Representation

Absolute and Relative Error All Computations on a Computer Are Approximate by Nature, Due to the Limited Precision on the Computer

Numerical Computations – with a View Towards R

CS 542G: Preliminaries, Floating-Point, Errors

Limitations of Digital Representations Machine Epsilon

Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates

Floating-Point Numbers and Rounding Errors

• 1E-8 Smaller Than Machine Epsilon (Float) • Forward Sum Fails Utterly