CPSC 303: WHAT THE CONDITION NUMBER DOES AND DOES NOT TELL US
JOEL FRIEDMAN
Contents 1. Motivating Examples from Interpolation2 1.1. An n = 1 Example: the Tangent Line of Calculus2 1.2. Degeneracy in Interpolation is Not Degenerate and Yields Derivatives3 1.3. An n = 2 Example With One Derivative3 1.4. A Doubly Degenate Example and Taylor Series3 2. More Motivation (From General Linear Algebra) for the Condition Number5 2.1. Sensitivity in General n × n Systems5 2.2. Sensitivity in General Interpolation5 2.3. Optimal Sensitivity: Diagonal Systems6 2.4. Cancellation6 2.5. A Mixed Signs Example6 3. Cancellation, Awkwardness and Norms, and Relative Error7 3.1. Common Norms and Awkward Norms7 3.2. The Lp-Norm (or `p-Norm or p-Norm)8 3.3. Cancellation and the Triangle Inequality8 3.4. Relative Error9 3.5. Further Remarks on Norms9 4. The L2-Norm (or 2-Norm) of a Matrix 10 5. The Lp-Norm (or p-Norm) of a Matrix 12 6. Some 2 × 2 Matrix Norm Formulas 13 7. The (L2- and) Lp-Condition Number of a Matrix 14 8. Interpolation: What The Condition Number Does and Does Not Tell Us 15 8.1. Formulas for the Inverse 15 8.2. The 2 × 2 Interpolation for a Tangent Line 16 8.3. A 3 × 3, Doubly Degenerate Example 17 9. The Doubly-Normed Condition Number 18 Exercises 19
Copyright: Copyright Joel Friedman 2020. Not to be copied, used, or revised without explicit written permission from the copyright owner.
Research supported in part by an NSERC grant. 1 2 JOEL FRIEDMAN
Disclaimer: The material may sketchy and/or contain errors, which I will elab- orate upon and/or correct in class. For those not in CPSC 303: use this material at your own risk...
The main goals of this article is to motivate and define the condition number of a square matrix, and to explain the usefulness and shortcomings of this definition regarding interpolation. At the end of this article we will define what we call a bi-normed condition number, that involves two norms, making it much easier to understand what is going on with the examples regarding interpolation. The condition number is studied in more detail in CPSC 302 and Section 5.8 of the course textbook [A&G] (by Ascher and Greif). It is defined in terms of relative error, measured in various `p-norms for p ≥ 1 (see Section 4.2 of [A&G]). We will review these concepts after some motivating examples.
1. Motivating Examples from Interpolation
Consider fitting data (xi, yi) with i = 0, . . . , n to a polynomial p(x) (we switch from v(x) to p(x) after Section 10.1 of [A&G]) of degree at most n, where x0, x2, x3, . . . , xn are fixed and distinct, and x1 = x1() = x0 + where we think of > 0 as very small; eventually we will take a limit as → 0.
1.1. An n = 1 Example: the Tangent Line of Calculus. Consider case n = 1 in the example of data points (x0, y0), (x1, y1) where x0 = 2 and x1 = 2 + , which gives us the 2 × 2 system 1 2 c y (1) 0 = 0 , 1 2 + c1 y1 or equivalently 1 2 c y (2) 0 = 0 , 0 c1 y1 − y0 or equivalently 1 2 c y (3) 0 = 0 , 0 1 c1 (y1 − y0)/ The condition number of the systems (1) and (2) turn out to be very large (i.e., very bad) for small , but that of (3) is reasonable. Of course, if y0 = f(x0) and y1 = f(x1) where f is a differentiable function, then f(2 + ) − f(2) (y − y )/ = , 1 0 0 whose limit as → 0 is f (2). Hence the → 0 limit of (3) when yi = f(xi) is 1 2 c0 f(2) f(2) 0 0 = lim = 0 ⇒ c1 = f (2), c0 = f(2)−2f (2) 0 1 c1 →0 f(2 + ) − f(2) / f (2) which is the line p(x) = f(2) − 2f 0(2) + f 0(2)x = f(2) + (x − 2)f 0(2) (you should recognize f(2) + (x − 2)f 0(2) from calculus as the tangent line to f at x = 2). CPSC 303: WHAT THE CONDITION NUMBER DOES AND DOES NOT TELL US 3
1.2. Degeneracy in Interpolation is Not Degenerate and Yields Deriva- tives. The “degeneracy” above, i.e., x1 = x0 + with x0 fixed and → 0—and sim- ilar “degeneracies” where some of the xi are infinitesimally close—will be extremely important when we study splines (Chapter 11 of [A&G]) and are an essential part of our discussion of interpolation (Chapter 10 of [A&G]). Furthermore, such degen- eracies also arise and are an essential topic in linear interpolation more general than polynomial interpolation (such interpolation is briefly discussed in Section 10.1 of [A&G]). The main point is that these degenaracies—such as x1 = x0 + —become deriv- ative conditions in the → 0 limit. Furthermore, even before we take the limit, the fact that we divide certain equations by makes the condition number go from bad to good. It follows that the condition number does not tell us what we really want to know about degenerate interpolation and derivatives, at least until we divide certain equations by . It is extremely important—once we define and study the condition number—to understand this.
1.3. An n = 2 Example With One Derivative. A similar example to the last, except with n = 2 would be x0 = 2, x1 = 2 + , x2 = 3, which yields the system 1 2 4 c0 y0 2 (4) 1 2 + (2 + ) c1 = y1 , 1 3 9 c2 y2 or equivalently, 1 2 4 c0 y0 2 (5) 1 4 + c1 = y1 − y0 , 1 3 9 c2 y2 and both systems turn out to be “poorly conditioned” systems for small ; by contrast, the equivalent system 1 2 4 c0 y0 0 1 2 + c1 = (y1 − y0)/ 1 3 9 c2 y2 is well conditioned; if yi = f(xi) then the → 0 limit of this system is 1 2 4 c0 f(2) 0 0 1 2 c1 = f (2) 1 3 9 c2 f(3)
2 which means that p(x) = c0+c1x+c2x is the (unique) polynomial with p(2) = f(2), p0(2) = f 0(2), and p(3) = f(3).
1.4. A Doubly Degenate Example and Taylor Series. Consider an example with x0 = 2, x1 = 2 + , and x2 = 2 + 2. This gives the system 1 2 4 c0 y0 2 1 2 + (2 + ) c1 = y1 ; 2 1 2 + 2 (2 + 2) c2 y2 4 JOEL FRIEDMAN subtracting the first row from the second row and the third row yields 1 2 4 c0 y0 2 0 4 + c1 = y1 − y0 ; 2 0 2 8 + 4 c2 y2 − y0 subtracting 2 times the second row from the third row yields 1 2 4 c0 y0 2 (6) 0 4 + c1 = y1 − y0 ; 2 0 0 2 c2 y2 − y0 − 2(y1 − y0) all of the above systems are extremely poorly conditioned: once we define the condition number, we will see that all of the above systems in this subsection— which involve a “doubly degenerate”—have condition number of order 1/2; the systems (1) and (2) (and (5)) will have condition number of order 1/. However, when we divide the second row by and the third row by 2, we get the system 1 2 4 c0 y0 (7) 0 1 4 + c1 = (y1 − y0)/ 2 0 0 2 c2 y2 − y0 − 2(y1 − y0) / which is “well-conditioned,” i.e., has a moderate condition number. If f = f(x) is twice differentiable at x = 2, then as → 0 we have y0 f(2) →0 0 (y1 − y0)/ −−−→ f (2) 2 y2 − y0 − 2(y1 − y0) / L where f(2 + 2) − 2f(2 + ) + f(2) L = lim . →0 2 We may compute L either by (1) applying L’Hˆopital’sRule twice: f(2 + 2) − 2f(2 + ) + f(2) 2f 0(2 + 2) − 2f 0(2 + ) L = lim = lim →0 2 →0 2 4f 00(2 + 2) − 2f 00(2 + ) 4f 00(2) − 2f 00(2) = lim = = f 00(2); →0 2 2 or (2) by using Talyor series: (2)2 f(2 + 2) = f(2) + 2f 0(2) + f 00(2) + o(2) 2 2 f(2 + ) = f(2) + f 0(2) + f 00(2) + o(2) 2 which gives (after some algebra) f(2 + 2) − 2f(2 + ) + f(2) 2 f 00(2) + o(2) L = lim = lim = f 00(2). →0 2 →0 2 Hence (1.4) implies that the → 0 of (7) is 1 2 4 c0 f(2) 0 0 1 4 c1 = f (2) . 00 0 0 2 c2 f (2) CPSC 303: WHAT THE CONDITION NUMBER DOES AND DOES NOT TELL US 5
2 After some algebra, we see that p(x) = c0 + xc1 + x c2 can be written as
(x − 2)2 p(x) = f(2) + (x − 2)f 0(2) + f 00(2), 2 which you should recognize as the second order Taylor expansion of f about x = 2.
2. More Motivation (From General Linear Algebra) for the Condition Number The condition number is an important consideration in any type of linear algebra, not merely interpolation. In this section we give additional motivation from linear algebra to study the condition number.
2.1. Sensitivity in General n × n Systems. The idea behind the condition involves solving an n×n system of equations Ax = b, where x is an n×1 (“column vector”) of unknowns, b is a given n×1 (the “constants” of the equation or “system of equations”), and A is an n × n matrix (the “coefficients” of the equation); here we typically work over the real or complex numbers. We typically assume that A is invertible, so that for any b the system Ax = b has a unique solution x = A−1b. In practice we are often interested in all possible values of Ae−1be where the pairs (A,e be) range over some values that are “close to” (A, b), and we hope that the values of Ae−1be do not differ by much from one another. We may also be interested in the “typical range of values” value of Ae−1be where (A,e be) range over some probability distribution. Generally speaking (and rather imprecisely) we say that an n × n system is sensitive if a small change in its constants (i.e., in b) and/or in its coefficients (i.e., in A) gives a large change in the solution. Analyzing such changes precisely can be very difficult, but the condition number will measure this to some extent. Let us give some examples of question regarding sensitivity.
2.2. Sensitivity in General Interpolation.
Example 2.1. We measure three data points (x0, y0), (x1, y1), (x2, y2) which we fit 2 exactly with a polynomial v(x) = c0 + c1x + c2x , which determine c0, c1, c2 as:
2 c0 1 x0 x0 y0 −1 2 c1 = A b, where 1 x1 x1 , b = y1 . 2 c2 1 x2 x2 y2
However, if there are errors in measuring the data (xi, yi), or the model is only an approximation, we may be interested in knowing all possible values of A−1b defined as above, with each xi and each yi replaced by a range of possible values. We may also be interested in the same where each xi and yi vary over some probability distribution.
Knowing all values or a typical value in this first example is very difficult; we might settle for an approximate solution or a bound on how “bad” the solution can get. 6 JOEL FRIEDMAN
2.3. Optimal Sensitivity: Diagonal Systems. Example 2.2. We wish to determine all possible values of x = A−1b where 107 0 4 ± 0.04 A = , b = . 0 3 3 ± 0.03 Therefore A is known exactly, and each value of b is known to within 1%. (Tech- nically, 4 ± 0.04 refers to the set of real x with 3.96 ≤ x ≤ 4.04.) It is easy to see that each value of x is known to within 1%, namely −7 x1 = 10 (4 ± 0.04), x2 = (1/3)(3 ± 0.03). A similar remark holds whenever A is a fixed diagonal matrix: if each component of b known to within 1%, or any percent, the solution x is known to within the same percent. This is an optimal situation; we shall soon see that if A is not diagonal—more precisely when A−1 has rows of mixed signs—then the situation is worse. 2.4. Cancellation. The next example involves a fact about cancellation: we easily check that the sum of two positive numbers known to within 1% is still known to within 1%, for example (12 ± 0.12) + (9 ± 0.09) = 21 ± 0.21. (Technically the expression 9 ± 0.09 refers to the set of x in the closed interval [8.91, 9.09], i.e., 8.91 ≤ x ≤ 9.09.) However, (8) (12 ± 0.12) − (9 ± 0.09) = 3 ± 0.21 (since the left-hand-side can be as large as (12 + 0.12) − (9 − 0.09) = 3 + 0.21 and, similarly, as small as 3 − 0.21). Hence the difference of two positive quantities known to within 1% are not generally known to within 1% when the main terms have opposite signs. 2.5. A Mixed Signs Example. Example 2.3. We wish to determine all possible values of x = A−1b where 1 2 4 ± 0.04 A = , b = . 3 4 3 ± 0.03 The formula for a 2 × 2 inverse gives 1 4 −2 −2 1 A−1 = = , det(A) −3 1 3/2 −1/2 and hence 4 ± 0.04 −2 1 4 ± 0.04 −5 ± 0.11 x = A−1 = = , 3 ± 0.03 3/2 −1/2 3 ± 0.03 9/2 ± 0.075 where the −5 ± 0.11 results from the fact that (−2)(4 ± 0.04) + (1)(3 ± 0.03) can be as large as (−2)(4 − 0.04) + (1)(3 + 0.03) = −5 + 0.11 CPSC 303: WHAT THE CONDITION NUMBER DOES AND DOES NOT TELL US 7 and, similarly, as small as −5 − 0.11. Hence
x1 = −5 ± 0.11, x2 = 9/2 ± 0.075 = 4.5 ± 0.075.
Notice that in the above example, x1, x2 are not determined to within 1% because of the cancellation.
3. Cancellation, Awkwardness and Norms, and Relative Error In this section we discuss sensivity and cancellation in Rn or Cn for n ≥ 2; in the previous section we dealt with expressions like 9 ± 0.09, which is the case of Rn with n = 1. The first step to note is that the sets in the previous section, such as ( ) 4 ± 0.04 b1 (9) = 3.96 ≤ b1 ≤ 4.04, 2.97 ≤ b2 ≤ 4.03 3 ± 0.03 b2 tend to be very awkward to deal with (we’ll explain why soon), and matters are worse with their analogs in Rn with n large. We can work with such sets, but it usually simpler—and almost as powerful—to introduce norms (also called lengths or magnitudes) on Rn or Cn. 3.1. Common Norms and Awkward Norms. We tend to work with the fol- T lowing norms for n-dimensional vectors v = [v1 . . . vn] : (1) the most common is the 2-norm (which coincides with and also called the L2-norm or `2-norm):
def p 2 2 kvk2 = |v1| + ··· + |vn| 2 2 n (writing |v1| instead of v1 makes this expression valid for both v ∈ R or v ∈ Cn); we use the 2-norm most often becuase is simplifies computations and has a nice “geometric interpretation,” even though it is not always the most relevant norm to our given applications; (2) the 1-norm def kvk1 = |v1| + ··· + |vn|; and (3) the max-norm or ∞-norm (the textbook [A&G] tends to use ∞-norm) def kvk∞ = kvkmax = max |v1|, ··· , |vn| . In any norm k · k, the distance between two vectors v and u is the length (or norm) of v − u, i.e., kv − uk. Notice that the set in (9) is contained in the set T b kb − [4 3] k∞ ≤ 0.04 , i.e., the elements of ∞-distance at most 0.04 to [4 3]T . Furthermore (9) contains the set T b kb − [4 3] k∞ ≤ 0.03 ; for these reasons (9) is “pretty well” approximated using the k · k∞ norm. One could describe the set in (9) exactly by introducing a weighted ∞-norm, such as weird weight def (10) kvk∞ = max |v1|, (4/3)|v2| 8 JOEL FRIEDMAN which gives a 4/3 weight to the v2 component; however, this tends to be rather awkward to work with—each such set may require a different weignt and therefore a different norm. Later in the course (and elsewhere in the literature) we may see examples of weighted 2-norms where the weights chosen for a good reason (and do not depend on the particular vector, such as [4 3]T in the case above).
3.2. The Lp-Norm (or `p-Norm or p-Norm). The textbook [A&G], Section 4.2, defines for any p ≥ 1 the p-norm (also known as the Lp-norm or `p-norm):