Table 1: Values Represented by Bit Patterns in IEEE Single Format

Table 1: Values Represented by Bit Patterns in IEEE Single Format Single-Format Bit Pattern Value 0 < e < 255 (-1)s × 2e-127 × 1.f (normal numbers) s e = 0; f =6 0 (at least one bit in f is nonzero) (-1) × 2-126 × 0.f (subnormal numbers) s e = 0; f = 0 (all bits in f are zero) (-1) × 0.0 (signed zero) s = 0; e = 255; f = 0 (all bits in f are zero) +INF (positive infinity) s = 1; e = 255; f = 0 (all bits in f are zero) -INF (negative infinity) s = u; e = 255;f =6 0 (at least one bit in f is nonzero) NaN (Not-a-Number) Bit Patterns in Single-Storage Format and their IEEE Values Common Name Bit Pattern (Hex) Decimal Value +0 00000000 0.0 -0 80000000 -0.0 1 3f800000 1.0 2 40000000 2.0 maximum normal number 7f7fffff 3.40282347e+38 minimum positive normal number 00800000 1.17549435e-38 maximum subnormal number 007fffff 1.17549421e-38 minimum positive subnormal number 00000001 1.40129846e-45 +∞ 7f800000 Infinity −∞ ff800000 -Infinity Not-a-Number 7fc00000 NaN Table 2: Values Represented by Bit Patterns in IEEE Double Format Double-Format Bit Pattern Value 0 < e < 2047 (-1)s × 2e-1023 x 1.f (normal numbers) s e = 0; f =6 0 (at least one bit in f is nonzero) (-1) × 2-1022 x 0.f (subnormal numbers) s e = 0; f = 0 (all bits in f are zero) (-1) × 0.0 (signed zero) s = 0; e = 2047; f = 0 (all bits in f are zero) +INF (positive infinity) s = 1; e = 2047; f = 0 (all bits in f are zero) -INF (negative infinity) s = u; e = 2047; f =6 0 (at least one bit in f is nonzero) NaN (Not-a-Number) Bit Patterns in Double-Storage Format and their IEEE Values Common Name Bit Pattern (Hex) Decimal Value + 0 00000000 00000000 0.0 - 0 80000000 00000000 -0.0 1 3ff00000 00000000 1.0 2 40000000 00000000 2.0 max normal number 7fefffff ffffffff 1.7976931348623157e+308 min positive normal number 00100000 00000000 2.2250738585072014e-308 max subnormal number 000fffff ffffffff 2.2250738585072009e-308 min positive subnormal number 00000000 00000001 4.9406564584124654e-324 +∞ 7ff00000 00000000 Infinity −∞ fff00000 00000000 -Infinity Not-a-Number 7ff80000 00000000 NaN Table 3: Double-Extended Bit Pattern (x86) Value j = 0, 0 <e< 32767 Unsupported s j = 1, 0 <e< 32767 (-1) x 2e-16383 x 1.f (normal numbers) s j = 0, e = 0; f =6 0 (at least one bit in f is nonzero) (-1) x 2-16382 x 0.f (subnormal numbers) j = 1, e = 0 (-1)s x 2-16382 x 1.f (pseudo-denormal numbers) j = 0, e = 0, f = 0 (all bits in f are zero) (-1)s x 0.0 (signed zero) j = 1; s = 0; e = 32767; f = 0 (all bits in f are zero) +INF (positive infinity) j = 1; s = 1; e = 32767; f = 0 (all bits in f are zero) -INF (negative infinity) j = 1; s = u; e = 32767; f = .1uuu– uu QNaN (quiet NaNs) j = 1; s = u; e = 32767; f = .0uuu– uu =6 0 (at least one of theu in f is nonzero) SNaN (signaling NaNs) Figure 1: The floating-point number line int main() { float y, z; y = 838861.2; z = 1.3; printf("y: %18.11f\n", y); printf("z: %18.11f\n", z); return 0; } The output from this program should be similar to: y: 838861.18750000000 z: 1.29999995232 Range and Precision of Storage Formats Format Sig Digits (Binary) Smallest Pos Largest Pos Sig Digits (Decimal) single 24 1.175... 10-38 3.402... 10+38 6-9 double 53 2.225... 10-308 1.797... 10+308 15-17 double extended (x86) 64 3.362... 10-4932 1.189... 10+4932 18-21 double extended (x86 64) 113 3.362... 10-4932 1.189... 10+4932 33-36 Standards: POSIX, BSD 4.3, ISO 9899 acos arccosine, returns value in [0, π] asin arcsine, returns value in [−π/2,π/2] atan arctangent, returns value in [−π/2,π/2] atan2 takes y and x to break degeneracy in atan(y/x) ceil smallest integral value not less than x cos Cosine cosh Hyperbolic cosine exp Exponentiate fabs absolute value of floating-point number floor largest integral value not greater than x fmod floating-point remainder function frexp convert floating-point number to fractional and integral components ldexp multiply floating-point number by integral power of 2 log Natural log log10 Log base ten modf extract signed integral and fractional values from floating-point number pow Raise number to a power sin Sine sinh Hyperbolic sine sqrt Square root of a number tan Tangent tanh Hyperbolic tangent.

Table 1: Values Represented by Bit Patterns in IEEE Single Format

Fortran 90 Overview

Quick Overview: Complex Numbers

Fortran Math Special Functions Library

IEEE Standard 754 for Binary Floating-Point Arithmetic

SPARC Assembly Language Reference Manual

X86-64 Machine-Level Programming∗

FPGA Based Quadruple Precision Floating Point Arithmetic for Scientific Computations

A Practical Introduction to Python Programming

A Fast-Start Method for Computing the Inverse Tangent

Appendix a Mathematical Fundamentals

Hardware Implementations of Fixed-Point Atan2

X86 Intrinsics Cheat Sheet Jan Finis [email protected]