Limitations of Digital Representations Machine Epsilon
Total Page:16
File Type:pdf, Size:1020Kb
Floating Point Arithmetic APM 6646 Suhan Zhong Note: This note mainly uses materials from the book Applied Linear Algebra(Second Edition). Limitations of Digital Representations Q1 How to represent real numbers on a digital machine? Q2 Can we represent all real numbers on computer? In fact, digital computers can only represent a finite subset of the real numbers. Let's use F to denote this finite set. Since F is finite, we can easily conclude that: 1 F is bounded (maxx2F jxj < 1) 2 Numbers in F have gaps (minx;y2F jx − yj) The above constraints both give limitations to numerical computations. But 1 is no longer a problem since modern computers now can represent numbers sufficient large and small. While 2 is still an important concern throughout scientific computing. Now let's take IEEE double precision arithmetic as an example. In this arithmetic, the interval [1; 2] is represented by the discrete subset 1; 1 + 2−52; 1 + 2 × 2−52; 1 + 3 × 2−52; ··· ; 2 (1) The interval [2; 4] is represented by the same numbers multiplied by 2, 2; 2 + 2−51; 2 + 2 × 2−51; 2 + 3 × 2−51; ··· ; 4 (2) and in general, the interval [2j; 2j+1] is represented by (1) times 2j. Note that in IEEE double precision arithmetic, the gaps between adjacent numbers are never larger than 2−52 ≈ 2:22 × 10−16. Though it seems negligible, it is surprising how many carelessly constructed algorithms turn out to be unstable! For example, the differences between Gram-Schmidt and modified Gram-Schmidt algorithms. Machine Epsilon IEEE arithmetic is an example of an arithmetic system based on a floating point representation of the real numbers. The resolution of F is traditionally summarized by a number known as machine epsilon: 1 = β1−t; (3) machine 2 where an integer β ≥ 2 is known as the base and an integer t > 1 is known as the precision. machine is half distance between 1 and the next larger floating point number. It has the following property: 0 0 For all x 2 R; there exists x 2 F such that jx − x j ≤ machinejxj (4) (How to prove this property? Hint: by definition of machine) For the values of β and t common −6 −35 on various computers, machine usually lies between 10 and 10 . In IEEE single and double −24 −8 −53 −16 precision arithmetic, machine is specified to be 2 ≈ 5:96 × 10 and 2 ≈ 1:11 × 10 , respectively. Let fl: R ! F be a function giving the closest floating point approximation to a real number, its rounded equivalent in the floating point system. For all x 2 R; there exists with jj ≤ machine such that fl(x) = x(1 + ) (5) 1 In other words, the difference between a real number and its closest floating point approximation is always smaller than machine in relative terms. Floating Point Arithmetic Now let's focus on elementary arithmetic operations on F. Let x; y 2 F, ∗ be one of the operations +; −; ×; ÷(on R) and ~ be its floating point analogue(on F). Then x ~ y must e given exactly by x ~ y = fl(x ∗ y) (6) Then we can conclude that the computer has a simple and powerful property. Fundamental Axiom of Floating Point Arithmetic For all x; y 2 F, there exists with jj ≤ machine such that x ~ y = (x ∗ y)(1 + ) That is, every operation of floating point arithmetic is exact up to a relative error of size at most machine. 2.