<<

NUMERICAL LINEAR ALGEBRA

Robert D. Skeel

c 2006 Robert D. Skeel February 10, 2006 0 Chapter 1

DIRECT METHODS—PART I

1.1 Linear Systems of Equations 1.2 Data Error 1.3 1.4 LU Factorization 1.5 Symmetric Positive Definite Matrices

1.1 Linear Systems of Equations

1.1.1 Network analysis A resistance network 4 ohm 3 ohm E1

I1 5 ohm 2 v

I3

I2

E2 has unknown currents in its branches and unknown potentials at its nodes. A computer representation might look like branch from to RV 1 2 1 4 2 2 1 2 5 −4 3 1 2 3 0

1 Ohm’s Law Eto = Efrom + V − RI gives equations

branch 1 E1 = E2 +2− 4I1 branch 2 E2 = E1 − 4 − 5I2 branch 3 E2 = E1 − 3I3 Kirchoff’s Current Law (a conservation law) gives equations

node 1 I1 − I2 − I3 =0 node 2 − I1 + I2 + I3 =0

These equations are redundant; hence set E2 =0. Usually these equations are reduced to a smaller system by one of two techniques: 1. loop analysis—complicated to program 2. nodal analysis—eliminate currents and solve for nodal potentials, i.e., use 3 branch equations to substitute for currents in 1st node eqn: 1 1 1 1 ( − E ) − (−1+ E ) − (0 + E )=0. 2 4 1 5 1 3 1 Here are some other applications which give rise to linear systems of equations:

• AC networks: capacitance, inductance, complex numbers; • hydraulic networks: pressure, rate of flow (flux); • framed structures: displacements, forces, stiffness, Newton’s 1st Law, Hooke’s Law; • surveying networks.

The last 3 are nonlinear.

1.1.2 Matrices To quote Jennings, “they provide a concise and simple method of describing lengthy and otherwise complicated computations.” A A ∈ Rm×n is expressed   a a ··· a n  11 12 1   a21 a22 ··· a2n  A =[aij]= . . .  .  . . .  am1 am2 ··· amn For example, the directed graph

2 1 1 2

34 8 23 4 56 7 5 can be represented by the branches 12345678  1 −1 −1000001   2  10−10−10 0 0   nodes 3  001−10−10 0 4  010100−10 5 0000111−1 Special types of matrices are a column vector   x  1   x2  m x =  .  ,x∈ R ,  .  xm a   d 0 ··· 0  1   0 d2 ··· 0  D =  . . . .  =diag(d1,d2,...,dn),  . . .. .  00··· dn and an   0  .   .   .     0    I =diag(1, 1,...,1),ek =  1  k.    0   .  . 0

3 Three operations are defined for matrices:

(i) αA

(ii) A + B

AB=: C (iii) m×nn×pm×p

Xn cij = aikbkj k=1 j

i

Note that     x x  1   1   x2   x2  α  .  =  .  [α].  .   .  xn xn Generally AB =6 BA. We define the by

T A =: C where cij = aji.

T This permits a compact definition of a column vector by means of x =(x1,x2,...,xn) . Note that (AB)T = BTAT. A lower or left , typically denoted by the symbol L, has the form   ×    ××   ×××  . ××××

4 An upper or right triangular matrix, typically denoted by U or R, has the form   ××××    ×××  ×× . ×

A unit triangular matrix has ones on the diagonal. The det(A), whose definition is complicated, has the properties

det(αA)=αn det(A),

det(AT)=det(A), det(AB)=det(A)det(B), det(A) =06 ⇐⇒ A−1 exists. Also (AB)−1 = B−1A−1.

1.1.3 Partitioned Matrices This notion makes it possible to express many ideas without introducing the clutter of subscripts. An example of partitioning a matrix into blocks is   00−1 −1   0 AT 2  00 11 M =   = AR 2  −11 50 22 −11 07 where     −11 50 A = ,R= . −11 07 In matrix operations, blocks can be treated as scalars except that multiplication is noncom- mutative. The product     A A ... A q B B ... B r  11 12 1  l1  11 12 1  m1 q r  A21 A22 ... A2  l2  B21 B22 ... B2  m2  . . .   . . .   . . .   . . .  p q Ap1 Ap2 ... Apq l Bq1 Bq2 ... Bqr m m1 m2 mq n1 n2 nr can be conveniently expressed because the partitioning is conformable. The (1-1)-block of the product is A11B11 + A12B21 + ···+ A1qBq1

5 Here are some examples:   x  1     x2  Ax =: c1 c2 ··· cn  .  = x1c1 + x2c2 + ···+ xncn,  .  xn     T T r1 r1 x  T   T   r2   r2 x  Ax =:  .  x =  .   .   .  T T rm rmx (the two preceding examples are computational alternatives),     AB =: A b b ··· bp = Ab Ab ··· Abp ,  1 2     1 2  T T T T a11 a12 a13 r1 a11r1 + a12r2 + a13r3    T   T T T  AB =: a21 a22 a23 r2 = a21r1 + a22r2 + a23r3 . T T T T a31 a32 a33 r3 a31r1 + a32r2 + a33r3 The last example states that the rows of AB are linear combinations of rows of B. Therefore, matrix premultiplication ⇔ row operations.

1.1.4 Linear Spaces

1 Amultiset of vectors x1,x2,...,xk is linearly independent if

α1x1 + α2x2 + ···+ αkxk =0⇒ α1 = α2 = ···= αk =0; otherwise it is linearly dependent, which implies that one of them is a linear combination of the others. Rn is a .Asubspace S is a subset which is also a vector space, i.e., x ∈ S ⇒ αx ∈ S, x, y ∈ S ⇒ x + y ∈ S.

(What are the possible subspaces for n = 3?) Recall that span{x1,x2,...,xk} := ··· . The dimension of S := maximum number of linearly independent vectors. A linearly indepen- n dent multiset having that many elements is a basis, e.g., R has a basis e1,e2,...,en. If y1,y2,...,yk is a basis for S, then for any x ∈ S there exist unique α1,α2,...,αk such that x = α y + α y + ···+ αkyk or 1 1 2 2      α   1   α   2  x =[y|1,y2,...,y{z }k]  .  ,  .  basis  αk  |{z} coordinates 1A multiset, or bag, of k elements is a k-tuple in which the ordering of elements does not matter

6 which we note is a conformable partitioning. Consider a matrix A ∈ Rm×n expressed as   T r1  T   r2  A =[c1,c2,...,cn]andA =  .  .  .  T rm

The range n R(A)= span{c1,c2,...,cn} = {Ax : x ∈ R }. The null space N(A)={x ∈ Rn : Ax =0}. Also rank (A)= dim[R(A)]. It can be shown that rank (AT) = rank (A). The problem Ax = b can be written

c1x1 + c2x2 + ···+ cnxn = b.

It has a solution if b ∈ R(A). There is always a solution if R(A)=Rm, i.e., rank (A)=m, which implies n ≥ m. The solution is unique if c1,c2,...,cn is a basis, which implies n = m and x = A−1b. The inner product for x, y ∈ Rn is xTy. The outer product for x ∈ Rm, y ∈ Rn is xyT ∈ Rm×n, e.g., T T ··· T I = e1e1 + e2e2 + + enen .

Review questions 1. Give a complete definition of the product AB where A is an m×n matrix with elements aij and B is an n × p matrix with elements bij.

2. If AB is a , when is it true that det(AB)=det(A)det(B)?

3. Give an expression for the product AB with B is partitioned into columns.

4. Define what it means for a multiset of vectors x1,x2,...,xk to be linearly independent. 5. Define a subspace of a linear space.

6. Define span{x1,x2,...,xk}. 7. Define the dimension of a subspace.

8. Define a basis for a subspace.

7 9. Define the range of a matrix.

10. Define the null space of a matrix.

11. Define the column rank of a matrix.

12. If A is an m × n matrix, what can we say about its row rank and its column rank?

Exercises

1. In the resistance network of section 1.1.1 do the currents I1,I2,I3 depend on the arbi- trary choice for E2?

T T 2. Let A denote the given in the example, let i =[I1,I2,...,I8] , T v =[V1,V2,...,V8] , R =diag(R1,R2,...,R8), and e =[E1,E2,E3,E4,E5]where Ij,Vj,Rj is the current, voltage source, resistance along branch j and Ei is the potential at node i. Use these vectors and matrices to express Ohm’s Law and Kirchoff’s Current Law as vector equations. Then eliminate i to get a single vector equation for e.

3. In an electrical resistance network such as given in the example, we can represent a simple loop by a column vector b of dimension 8 with elements −1, 0, and 1. What would be the meaning of the values −1, 0, and 1? Let AT be the node–branch incidence matrix. What can we say about ATb? Explain.

k k− 4. Show that if x, y ∈ Rn,then(xyT) =(xTy) 1xyT.

5. Prove the Woodbury formula

−1 −1 (A + UV T) = A−1 − A−1U(I + V TA−1U) V TA−1

where A ∈ Rn×n, U, V ∈ Rn×k and both A and I + V TA−1U are nonsingular. − T 6. Show that (AT) 1 =(A−1) .

T 7. Show that (AB) = BTAT.

T 8. Use Exercise 1.10 to show that (ABC) = CTBTAT. Avoid subscripts.

9. (Stewart, p. 66) Prove that the equation Ax = b has a solution if and only if for any y, yTA = 0 implies yTb =0.

10. (advanced) (Stewart, p. 67) Show that if A ∈ Rm×n and rank (A)=r,thenA = UV T where U and V are of full rank r. Do not use the approach suggested by Stewart; rather, partition A and U by columns and V by rows and columns.

11.

8 12. Let A be a given square matrix such that for any y,Ax = y has at least one solution. Show that for any y,Ax = y has exactly one solution.   1010 13. Determine the null space of  0101 . 1010

14. Let AT be the m by n node–branch incidence matrix for an electrical resistance network of m + 1 nodes and n branches. What does it mean in terms of the network if A is not of full rank?

15. (continuation of Exercise 3) Using linear algebra ideas, show that there exists a matrix B of full rank with n rows such that any simple loop b = Bx for some x.   P T 0 16. (advanced) Consider the 2n by 2n matrix  0 QT  where R is an n by n diagonal − I R   QT matrix having nonnegative entries, QTP =0,andP , Q,and are of full rank. −R Show that the 2n × 2n matrixisoffullrank.

17. Show that if R(A) ∩R(I − A)={0},thenA2 = A.

18. Determine  − L 0 1 `T λ where L is a nonsingular square matrix and λ is a nonzero scalar. This result can be used to develop a recursive algorithm for inverting a nonsingular lower triangular matrix.

19. Prove by induction on the order of the matrices that the product of two lower triangular matrices is lower triangular. Use partitioned matrices.

20. Determine  n L 0 `T 0 for any positive integer n

21. Using induction on n and the result of exercise 20, prove that if A ∈ Rn ×Rn is strictly lower triangular, then An is zero. (A matrix is strictly lower triangular if its only nonzero elements are below the main diagonal.)

9 22. The product of two matrices C = AB can be computed by initializing an array

cij =0;

for all i and j andthenexecuting

cij = cij + aik ∗ bkj;

for all i, j and k. This second assignment is therefore put inside a triply nested loop. There are six possible nestings of these loops: ijk, ikj, jik, jki, kij, kji. Which of these give the correct result?

23. Let A by an n by n matrix. Which of the following statements is not equivalent to the other four? Explain briefly.

(i) There exists x =0suchthat6 Ax =0. (ii) det (A)=0. (iii) rank (A)

24. (a) Define what it means for a subset of vectors S to be a linear subspace.

(b) Let S3 be the intersection S1 ∩ S2 of two linear subspaces S1 and S2. Prove that S3 is a linear subspace. 25. Let A be an m by n matrix and B an n by p matrix. We can express their product as T T ··· T AB =(Ae1)(e1 B)+(Ae2)(e2 B)+ +(Aen)(en B). Complete the algorithm below so that it evaluates the expression on the right-hand side from left to right (it is not enough to have a mathematically correct answer):

for i := 1 to m do for j := 1 to p do c[i, j]:=0 end for end for;

for := 1 to do

for := 1 to do

for := 1 to do

10 c[i, j]:=c[i, j]+a[i, k] ∗ b[k, j] end for end for

26. Can the number of solutions to a linear system Ax = b where A is square ever be determined solely from the matrix A without knowing the right-hand-side vector b? Explain.   AB 27. Give an explicit formula for the inverse of the partitioned nonsingular matrix 0 D in terms of the blocks A, B, D where A and D are square nonsingular matrices of possibly different dimension.

1.2 Data Error

1.2.1 Condition of a problem A mathematical problem is a mapping

input data 7→ output data (results).

E.g., A, b 7→ x (the solution of Ax = b) or A 7→ L, U and L, U, b 7→ x or 2 α, β 7→ x1,x2 (the roots of x + αx + β). Error might enter in two places:

1. data error, which does not depend on the algorithm, might already be present in the input data;

2. computational error, which does depend on the algorithm, might be introduced by the computational process into the output data.

What are the sources of error? Data error might be

1. measurement error,

2. modeling error,

3. previous computational error.

11 Computational error might be

1. roundoff error: finite- arithmetic,

2. truncation error: truncating infinite series, iteration, discretization.

This second kind of error can be estimated and controlled.

definition of error (in general) Letα ˜ beregardedasanapproximationtoα.

• The (absolute) error inα ˜ is usually defined asα ˜ − α (so that to obtain the exact value one must take away the error from the approximate value).

• The relative error inα ˜ is usually defined as (˜α−α)/α. It is often preferable to absolute error because it is scale invariant.

• The remainder forα ˜ is α − α˜ (so that to obtain the exact value one must add the remainder to the approximate value).

A logarithmic measure of error is usually more convenient.

• Define, if |error| =: 10−d,

− | | decimal places of accuracy := d = log10 error

• and, if |relative error| =10−δ, − | | digits of accuracy := δ = log10 relative error ,

e.g.,  2.7 decimal places of accuracy 40 = 39.998 to 4.3 digits of accuracy How do we represent uncertain values? According to one version of the SIGNIFICANT DIGIT CONVENTION a number should be expressed so that it has no more than 5 ulps (units in the last place) of error. E.g., 12.00 denotes a value in [11.95, 12.05] and 12.0 denotes a value in [11.5, 12.5]. Hence, a value in [12.44,12.55] is represented by 12.5, which denotes a value in [12.0, 13.0], i.e., a loss of information. WE WILL NOT USE THE SIGNIFICANT DIGIT CONVENTION, i.e., 12.00 and 12.0 denote the same point on the 44 real line rather than nested intervals. A value in [12.44, 12.55] will be represented by 12.55, which is an interval. For example, the resistance network of §1.1.1 might have resistances 3 96 4 95 2 97 R1 =4 .04,R2 =5 .05,R3 =3 .03. For each possible combination (R1,R2,R3) there is a possible solution (I1,I2,I3):

12 I3

set of possible solutions

I2

I1

min max min max min max This is complicated to describe. It is probably adequate to compute I1 ,I1 ,I2 ,I2 ,I3 ,I3 . We consider now the very important matter of error propagation. output data mapping defined by algorithm ~ computed value y desired mapping computational error y propagated data error y*

input data x* x true value given value data error

E.g., for the roots of a quadratic,

13 1

x2− 2x + 1 1

x2− (2 + 10 −6 )x + 1 changing the coefficient 2 to 2 + 10−6 causes the roots to change from 1, 1 to 1 ± 10−3.The 1 × −6 −3 relative change in coeff = 2 10 , but relative change in roots = 10 , which is 2000 times as great. ill conditioning : small relative changes in the input data can produce large relative changes in results. conditioning : sensitivity of results to small changes in input data

1.2.2 Norms We first discuss vector norms. Which error is smaller? T T [0.02, −0.02, 0.03] or [0.19, 0.01, −0.01] We need a single number to measure the size of an array of errors. E.g., Euclidean norm √ kxk = xTx 2 q 2 2 ··· 2 = x1 + x2 + + xn . However, x is seldom a physical vector, e.g., T x =[3.1v, 4.2v, 0.4amps, −1.0amp, 0.6amps] . We could use instead the maximum norm

kxk∞ =max|xi| 1≤i≤n or a weighted maximum norm kDxk∞ where D is a diagonal matrix with positive diagonal elements. A H¨older norm is defined by

p p p 1/p kxkp := (|x1| + |x2| + ···+ |xn| ) , 1 ≤ p<∞, kxk = lim kxkp . ∞ p→∞

14 These norms are generalized measures of vector length. The unit circle in different norms:

kxk1 =1⇔|x1| + |x2| =1.

||x|| infinity = 1 1 ||x|| =1 1 ||x||1 =1 2

1 11

What makes a vector norm? DEFINITION k·kis a norm on Rn if

1. x =06 ⇒kxk > 0,

2. kαxk = |α|kxk,

3. kx + yk≤kxk + kyk TRIANGLE INEQUALITY.

x+y x y

Consequences are (can you prove them?)

1. k0k =0,

2. |kxk−kyk| ≤ kx − yk. Here is a picture for k·k2:

x

||x−y||

| ||x|| − ||y|| |

15 One can prove that the H¨older norms are really norms. THEOREM (Cauchy-Schwarz-Bunyakowski inequality) T |x y|≤kxk2kyk2. Proof. Trick. Start with

x ± y ≥ 2 k k k k 0. x 2 y 2 2 T k k k k Recall from vector calculus x y = x 2 y 2 cos θ where θ is the angle between x and y. From the Cauchy-Schwarz-Bunyakowski inequality, one can show (can you?) that k k ≤k k k k x + y 2 x 2 + y 2. subordinate matrix norms These are also known as induced matrix norms. Suppose we have a norm k·kfor column vectors of every dimension. (Assume k·k= |·|for R1.)ForA ∈ Rm×n define kAxk kAk =max , x=06 kxk which is the maximum amount that multiplication by A can magnify the length of any vector x. In particular, kAkp is defined in this manner. CAUTION: Maxima do not always exist (e.g., the set of negative reals has an l.u.b. of 0 but no maximum), but it can be proved that it does always exist in the above definition. PROPERTIES: Let A, B ∈ Rm×n,C ∈ Rn×p 1. A =06 ⇒kAk > 0 norm axiom 1, 2. kαAk = |α|kAk norm axiom 2, 3. kA + Bk≤kAk + kBk norm axiom 3, 4. kACk≤kAkkCk consistency, 5. kAx∗k = kAkkx∗k for some x∗ ∈ Rn,x∗ =06 . A special case of 4. is kAxk≤kAkkxk where x ∈ Rn.

Note that kAxk x max =max A =maxkAuk. x=06 kxk x=06 kxk kuk=1 Example 1   2 −1 A = . 11

16 e2 {Ax: ||x||1 =1} ||x||1=1 Ae1 Ae2

e1

Note that kAxk1 varies from 1 to 3 and that kAk1 =max{kAe1k1, kAe2k1},whichisthe maximum “distance” to the corners. More generally

kAk1 =max{kAe1k1, kAe2k1,...,kAenk1} Xm =max |aij| MAX COLUMN SUM . 1≤j≤n i=1 Example 2

||A||2 = sigma1

||x||2 = 1

sigma2 sigma1

{Ax: ||x||2 = 1} ellipse

17 σ1,σ2 are called singular values. This is called the spectral norm. Example ∞

||x||infinity = 1 =1} infinity {Ax: ||x||

hyperparallelipiped ||A||infinity

It can be shown Xn kAk∞ =max |aij| MAX ROW SUM . 1≤i≤m j=1

(Can you show that kABk∞ ≤kAk∞kBk∞ using the formula above?) Note. T kA k2 = kAk2, T kA k1 = kAk∞. general matrix norms A general matrix norm must satisfy only Properties 1–3. An example of a matrix norm that is not subordinate to any vector norm is the Frobenius norm v u uXm Xn t 2 kAkF = |aij| . i=1 j=1

18 It also satisfies Property 4 but not 5. The proof of 4 is ! Xm Xp Xn 2 k k2 AC F = aikckj i=1 j=1 k=1 ! ! Xm Xp Xn Xn ≤ 2 2 k k2 k k2 aik clj = A F C F. i=1 j=1 k=1 l=1

1.2.3 Condition numbers Ax = b. Consider the effect δx of a perturbation δb:

A(x + δx)=b + δb.

x2

x1

δx = A−1δb kδxk≤kA−1kkδbk with equality possible.

kδxk kA−1kkbk kδbk ≤ · (1.1) kxk kA−1bk kbk relative maximum relative error in amplification error in result factor r.h.s.

Now look at perturbations in coefficient matrix:

Ax = b

(A + δA)(x + δx)=b δx = −A−1δA(x + δx) kδxk≤kA−1kkδAkkx + δxk with equality possible k k ≤k −1kk kk k k −1kk kk k |δx{z} A δA x + A δA |δx{z}

19 kA−1kkδAkkxk kδxk≤ 1 −kA−1kkδAk kδAk k k κ(A) δx ≤ kAk k k − kδAk x 1 κ(A) kAk where the condition number κ(A):=kA−1kkAk.   ! kδxk kδAk kδAk 2 ≤ κ(A) + O κ(A) kxk kAk kAk relative max relative error in amplification error in result factor A

Note that the coefficient from eq. (1.1) satisfies

kA−1kkbk kA−1kkbk kA−1kkAxk ≤ max =max = κ(A) kA−1bk b kA−1bk x kxk with equality for special choices of b.Thusκ(A) is a good measure of sensitivity. We do not know κ(A), although DGECON of LAPACK estimates it. As an example of how it is used, suppose κ(A)=10+6 and the relative error in A =10−14 (data error) . Then 10−8 relative error in x ≤ ( propagated data error) . 1 − 10−8 major axis The 2-norm condition number κ (A)=kA−1k kAk .Ifn =2,thisis of 2 2 2 axis ellipse {Au : kuk =1}, i.e., the extent to which multiplication by A elongates the unit hypersphere. Using norms to measure ill conditioning often yields a gross overestimate.

Review questions 1. What is an ill-conditioned problem?

2. What are the 3 properties of any vector norm?

3. Give the formula for the 1-norm, the 2-norm, and the ∞-norm.

20 4. Define the 2-norm of a vector v without making reference to the elements of v.

5. What is the relationship between the difference of norms of two vectors and the norm of the difference of two vectors? Express your answer algebraically.

6. State the Cauchy-Schwarz(-Bunyakowski) inequality for vectors of arbitrary dimension. Do not make reference to the elements of the vectors and be specific about the type of norm.

7. Define a subordinate matrix norm.

8. For a subordinate matrix norm, what is the relationship between the product of norms of two matrices and the norm of the product of two matrices. Express your answer algebraically.

9. Give the formula for the max norm of a matrix.

10. Give the formula for the Frobenius norm of a matrix.

11. Give the formula for the condition number of a matrix.

12. The condition number measures the effect on the solution x of Ax = b of perturbing A. Using algebra, explain in what sense this is true.

Exercises 1. Does the rank of a matrix satisfy the three norm axioms? Explain.

2. Show that for a subordinate matrix norm kIk =1.

T 3. Given that Ax = y where x =[32− 1] and y =[−302]T, what can we say about kAk2?

4. Somehow determine a formula for what you believe kDk2 to be for a diagonal matrix D. Call this quantity ν(D). Prove that ν(D)=kDk2 by showing

∗ ∗ ∗ (i) ∃x =06 kDx k2 = ν(D)kx k2,

(ii) ∀x kDxk2 ≤ ν(D)kxk2.

T 5. Prove that kv k2 = kvk2,wherev is a column vector. k Tk k k2 6. Prove that vv 2 = v 2,wherev is a column vector. (Hint: what 2 things must be shown? Also, no need to consider individual elements of v.)

7. Give a complete proof of the Cauchy-Schwarz-Bunyakowski inequality.

21 8. Show that for some matrix A the Frobenius norm does not satisfy property 5.

T 9. Let x = A−1b where b =(10, 25, −40, 25) and κ(A) = 60 in the one–norm. If we make a change 4b =(.04, −.05,.16, −.05) in b, what can we say about the corresponding change 4x in x?

10. Let ||A|| =maxij |aij|.Isthisnormasubordinate matrix norm? Explain. Hint: show that it does not have a property that any subordinate norm would have.

2 T 11. Prove ||A||F = (A A). 12. Prove or disprove: kIk =1ifk·kis a subordinate matrix norm.

13. Prove k(A + E)−1 − A−1k≤kEkkA−1kk(A + E)−1k.

14. Prove the formula Xn kAk1 =max |aij|. 1≤j≤m i=1 √ 15. Find some vector y =6 0 of dimension n such that ||y||1 = n||y||2. (Do this for general n.)

16. From an existing vector norm || · || one can define a new vector norm by

||x||A := ||Ax||

if A is nonsingular. Why must A be nonsingular?

T 17. What is the numerical value of kx+yk2 given that kxk2 =1,kyk2 =3,andx y = −2? 18. If a matrix A is multiplied by a scalar −3, how does this change the norm of its inverse?

1.3 Gaussian Elimination

Consider one equation in one unknown,

a11x1 = b1.   1 Do we compute an inverse? x1 = · b1. a11 b1 No, we divide by the coefficient: x1 = . a11 Similarly we should (almost) NEVER solve Ax = b by computing A−1. Rather x = b divided by A.

22 We often write this as A−1b, but it must never be thought that A−1 is to be computed. We give an algorithm for U −1b where   u u ··· u n  11 12 1   u22 ··· u2n  U =  . .  .  0 .. .  unn

Let x = U −1b.Then

u11x1 + ···+ u1,n−1xn−1 + u1nxn = b1, . .

un−1,n−1xn−1 + un−1,nxn = bn−1, unnxn = bn, whichissolvedbyback substitution:

for (i = n; i ≥ 1; i −−) { xi = bi ; for (j = i +1;j ≤ n; j++) xi = xi − uij ∗ xj; xi = xi/uii; }

Each execution of the inner loop requires n − i multiplications so that althogether there are n − 1 (n − 1) + (n − 2) + ···+0mults = |{z}n mults. 2 number of terms | {z } average of 1st and last

The computation L−1b canbedonebyforward substititution. Cramer’s rule is fine for 2×2andmaybe3×3 matrices; otherwise, its cost is astronomical. It is very useful theoretically. To solve a linear system with Gauss–Jordan, form the     x1 a11 a12 ··· a1n b1    ···   x2   a21 a22 a2n b2   .   . . . .   .  =0.  . . . .     xn  an an ··· ann bn 1 2 −1

Using the row operations

23 (i) add a multiple of one row to another, (ii) multiply a row by a scalar, (iii) interchange rows (if needed to avoid division by zero), reduce the coefficient matrix to the identity column by column:         ×××× 1 ××× 10× × 100×  ××××  →  0 ×××  →  01× ×  →  010×  ×××× 0 ××× 00× × 001×

The last column of the fully reduced matrix is the solution. For several problems Ax1 = b1,Ax2 = b2,...,Axm = bm, you get

[A|b1|b2|···|bm] →···→[I|x1|x2|···|xm].

−1 For an inverse A(A ei)=ei; whence, [A|I] →···→[I|A−1]. Gauss–Jordan is elegant but inefficient (it costs 50% too much) and not so accurate. Gaussian elimination can be considered as a recursive algorithm: Solve n equations in n unknowns { if (n =1) x1 = b1/a11; else { eliminate 1st unknown from eqns 2, 3,...,n by subtracting appropriate multiples of eqn 1; // forward elimination solve for x2,x3,...,xn using eqns 2, 3,...,n; x1 =(b1 − a12x2 − a13x3 −···−a1nxn)/ann; // back substitution } } Here is a nonrecursive algorithm: k

k

24 for (k =1;k ≤ n − 1; k++) { eliminate kth unknown from equations k +1,k+2,...,n by subtracting appropriate multiples of eqn k; } for (k = n; k ≥ 1; k −−) xk =(bk − ak,k+1xk+1 − ak,k+2xk+2 −···−aknxn)/akk;

The kth step of forward elimination applied to the coefficients of the unknowns is

aikakj for k +1≤ i, j ≤ n do aij = aij − ; akk Note the symmetry.

operation count for the kth stage    1 division  − × − − − (n k)  n k multiplications  =(n k)(n k)mults n − k additions Altogether there are

n 3 (n − 1)2 +(n − 2)2 + ···+12 ≈ mults 3 In many situations (iterative refinement, nonlinear systems of equations) we want to solve a sequence of problems

Ax1 = b1,Ax2 = b2,Ax3 = b3, ...

where b2 is some function of x1,b3 is some function of x2, etc. In such situations considerable savings are possible if the Gaussian elimination process is split into

1. an algorithm which performs elimination on the coefficient matrix producing an upper triangular matrix U and a record of row operations and

2. a second algorithm which repeats the elimination for the right hand side b producing c and then does back substitution x = U −1c.   forward elimination on A Gaussian elimination forward elimination on b  backsolve back substitution

25 Example  Ab   10 1 −5 1  −20 3 20   2  535 6  ↓↓   10 1 −5 1  −2 510  4  1 5 15 11 2 2 2 2  ↓↓   10 1 −5 1  −2 510  4  1 1 5 7 2 2 2 2 ↓

x3 =1.4 back substitution x2 = −2 x1 =1

Review questions 1. Write a nonrecursive algorithm for Gaussian elimination (without pivoting) applied to a system Ax = b of n equations in n unknowns. 2. Neglecting lower order terms, what is the number of operations required for Gaussian elimination applied to a system of n equations in n unknowns. 3. Neglecting lower order terms, what is the number of operations required for the back substituion phase of Gaussian elimination applied to a system of n equations in n unknowns. 4. Explain how Gauss-Jordan differs from Gaussian elimination in terms of their forward elimination and back substitution phases. 5. What is the difference between a backsolve and back substitution?

Exercises 1. Write an algorithm for solving Lx = b where L is unit lower triangular. Your algorithm should reference only those elements `ij with j

26 where Ej has zero or more columns and has all its nonzero elements in the first j rows. Do this for the matrix AT on page 46 of Jennings. Then determine a basis for the null space of the reduced matrix BT. Relate this to the original resistance network.

3. (advanced) Write an algorithm for finding the null space of a general matrix AT (as- suming exact arithmetic). It may be necessary to interchange rows to avoid division by zero (mathematical pivoting).

4. Write an algorithm to generate the inverse of an upper triangular matrix U.The elements of V = U −1 should overwrite those of U. Do not reference the strictly lower triangular part of U nor any scratch arrays.

5. The algorithm given for back substitution references the elements of U row by row. Interchange the order of the loops so that the elements are referenced column by column, that is, we first reference (fetch) the last column, then the second last column and so on. Hint: one would call this revised algorithm backward elimination.

6. Generalize the augmented matrix form of Ax = b to the case of m right hand sides: Ax(1) = b(1), Ax(2) = b(2), ..., Ax(m) = b(m). That is, write down an equation of the form matrix · vector = 0 that is equivalent to the m systems of equations.

7. How does the computational work in solving an n × n triangular system of linear equations compare with that for solving a general n × n system of linear equations? For full credit be quite specific.

1.4 LU Factorization

We show forward elimination on A produces an LU factorization of A,

Turing (1948). FACT: a row operation can be accomplished by premultiplying the coefficient matrix by a certain matrix, this second matrix being the result of performing the row operation on the identity matrix. We can turn this thought around and define a row operation to be an operation which can be expressed as RA where R is nonsingular. It purpose is to simplify matrix “division”:

A−1 = A−1R−1R =(RA)−1R.

Example.   24−2 A :=  1 −15 , 41−2

27 subtract 1 × row 1 from row 2, desired row operations: 2 subtract 2 × row 1 from row 3, 100 0  − 1  −  1  T corresponding matrix: M1 = 2 10 = I 2 e1 . −   201  2 0 24−2 (1) −  1 T   −  A := M1A = A 2 e1 A = 0 36 T − 2e1 A  0 72   100 0 subtract 7 × row 2 3   −   T M2 = 010= I 0 e2 from row 3 − 7 7   0 3 1  3 0 24 2 (2) (1)     A := M2A = A − 0 = 0 −36. 7 T − 3 e2 A 0012

In sum, M2M1A =: U where U is upper triangular and M1, M2 are Gauss transformations or −1 −1 A = |M1 {zM2 } U a factorization . lower triangular More generally U = Mn−1Mn−2 ···M2M1A where     1 0  .   .   ..   .     .   1      − T  0  Mk =  .  = I mkek ,mk =    −mk ,k ..  mk+1,k  +1    . .  .   . ..  . −mnk 1 mnk

A = M −1M −1 ···M −1 M −1 U | 1 2 {z n−2 n−}1 =: L

28 Note that the inverse of Mk is obtained by negating the off diagonal elements:

−1 T Mk = I |{z}+ mkek −1 −1 T T T T M1 M2 = I + m1e1 + m2e2 + |m1e1{zm2e2} 0 −1 −1 −1 T T T T T T M1 M2 M3 = I + m1e1 + m2e2 + m3e3 +(|m1e1 + m{z2e2 )m3e3} 0 . . −1 −1 −1 T T T L = M M ···M = I + m e + m e + ···+ mn− e 1 2 n−1  1 1 2 2  1 n−1 1    m 1   21  =  m31 m32 1   . . . .   . . . ..  mn1 mn2 mn3 ··· 1

Backsolving

Ax = b x = A−1b = U −1(L−1b) where division by L is forward substitution and division by U is back substitution or

−1 x = U (Mn−1(···(M2(M1b)) ···)) where the multiplications by M1,M2,...,Mn−1 is forward elimination on b and again division by U is back substitution. Benefits of viewing Gaussian elimination as LU factorization include the following:

1. It suggests an efficient way to solve a pair of systems of the form Ax = b, ATy = c.

2. It suggests alternative algorithms such as the forward substitution method given above.

LU-theorem Because Gaussian elimination computes an LU factorization, it will work only when an LU factorization exists. Do all square matrices have an LU factorization?

29   01 No, consider a nice matrix (why?). Gaussian elimination fails, but that does not 10 necessarily2 prove anything. Try solving for L and U      10 u u 01 u =0u =1 11 12 = 11 12 `21 1 0 u22 10 `21u11 =1 ··· IMPOSSIBLE. This proves it. Defn: A principal submatrix of an n × n matrix is obtained by deleting up to n − 1rows and the corresponding columns.   123   13 e.g.  456 i.e. 79 789 Defn A leading principal submatrix is obtained by deleting only trailing rows and columns. How many leading principal submatrices does an n × n matrix have? How many principal submatrices? Theorem An n × n matrix has an LU factorization where L is unit lower triangular and U is upper triangular if the leading principal submatrices of order 1, 2,...,n− 1areall nonsingular. Moreover the factorization is unique. Proof: (constructive) By induction. True for matrices of order 1: a11 =1· u11 uniquely n×n solvable for u11. Assume true for matrices of order n − 1. Let A ∈ R and try solving for L and U where we partition the matrices as follows:       Lˆ 0 n − 1 Uuˆ n − 1 Abˆ n − 1 L = U = A = `T 1 1 , 0 δ 1 , cT α 1 . n − 11 n − 11 n − 11 Thus, LU = A becomes LˆUˆ = A,ˆ Uˆ T` = c, Luˆ = b, `Tu + δ = α. Leading principal submatrices of Aˆ of order 1, 2,...,n− 2 are nonsingular ⇒∃|L,ˆ Uˆ. ∃|u clearly. Aˆ nonsingular ⇒ Uˆ T nonsingular ⇒∃|`. ∃|δ clearly. True for matrices of order n. 2

2Actually there exists an LU factorization if and only if G. elim. does not fail but this is not obvious. It must be proved.

30 Corollary A strictly diagonally dominant matrix has a unique LU factorization. Definition A matrix A is strictly diagonally dominant if Xn |aii| > |aij|,i=1, 2,...,n.

j=1 j=6 i

Proofofcorollary:

matrix strictly diagonally dominant ⇒ principal submatrix strictly diagonally dominant

strictlydiag. dom. ⇒ nonsing.

2 LU factorizations exist for rectangular matrices: A L U

A

Compact schemes These do not require recording of intermediate results. Hence they are suitable for calcula- tors.

• construction in LU theorem

• Doolittle’s method

In both cases one uses the n2 eqns LU = A to solve for elements of L and U. The order in which one determines the unknowns of these nonlinear eqns is all important.

31 LU theorem Doolittle's method 1 1 3 2 5 3 7 4 5 9 2 6 4 7 6 8 8 9

The computations are identical to Gauss. elim., i.e., the results are identical even in finite prec. arithmetic. However, the order varies.

• Crout’s method: U is unit upper triangular; otherwise it is like Doolittle’s method.

Review questions 1. Given a row operation or a set of row operations, how does one construct a matrix R such that premultiplication by R is equivalent to the row operation(s)?

2. What is a Gauss transformation and how does one invert one?

3. In what sense does Gaussian elimination compute an LU factorization of a matrix?

4. What is a principal submatrix of a matrix and what is a leading principal submatrix ofamatrix?

5. State a sufficient condition for the existence of an LU factorization as given in the LU Theorem. Under this condition is the factorization necessarily unique?

6. Define a strictly diagonally dominant matrix.

7. The entries of an LU factorization can be computed only for certain orderings. Give an example of a possible ordering.

Exercises 1. Show that for a strictly diagonally dominant matrix

(i) any principal submatrix also is, (ii) it is nonsingular.

2. (Outer product form of LU factorization.) By considering (1,n− 1) by (1,n− 1) par- titionings of the matrices involved, give a recursive algorithm for the LU factorization via Gaussian elimination (in the same spirit as the proof of the LU theorem).

32 3. Write a nonrecursive algorithm for the LU theorem method.

4. Write a nonrecursive algorithm for Doolittle’s method.

5. Given an array A containing a unit lower triangular matrix L and an upper triangular matrix U stored in the usual fashion, the following algorithm computes the product LU storing the result in an array B:

for (i =1;i ≤ n; i++) { for (j =1;j ≤ i − 1; j++) { bij = ai1 ∗ a1j; for (k =2;k ≤ j; k++) bij = bij + aik ∗ akj; }; for (j = i; j ≤ n; j++) { bij = aij; for (k =1;k ≤ i − 1; k++) bij = bij + aik ∗ akj; }; }

How would you modify this algorithm if, instead of using B, the contents of A are overwritten with the elements of the product LU? The modified algorithm should use no array storage other than the n2 elements of the array A.

6. Assume that A can be written as the product of a unit lower triangular matrix L and an upper triangular matrix. Prove or disprove: A = LDU1 where D is diagonal and U1 is unit upper triangular. 7. Gaussian elimination (without row interchanges) will successfully solve the linear sys- tem of equations Ax = b if and only if (choose one)

(i) A is nonsingular. (ii) all principal submatrices of A are nonsingular. (iii) all leading principal submatrices of A are nonsingular. (iv) all diagonal elements of A are nonzero. (v) all square submatrices of A are nonsingular.

8. Find an example of a square matrix having an LU factorization and having a singular leading principal submatrix of order less than n where n is the dimension of the matrix.

33 9. The process of forward elimination applied to Ly = b where L is unit lower triangular can be expressed y = Mn−1(···(M2(M1b)) ···)whereMk is defined in the class notes. Give a very similar description for forward substitution in terms of elementary matrices 0 Mk. 10. Without doing any calculations explain why the following matrix does not have an LU factorization?   1110    1012  1013 1110

1.5 Symmetric Positive Definite Matrices

A satisfies AT = A

and a skew-symmetric AT = −A. A symmetric matrix is said to be positive definite if xTAx > 0 for all x =0.6 E.g.,   1 1 1 2 3  1 1 1  2 3 4 . 1 1 1 3 4 5 If A =[aij] is symm. pos. def. then all principal submatrices are s.p.d. Hence, aii > 0. Also,   aii aij 2 is pos. def ⇒ aiiajj >aij aij ajj

⇒ max{aii,ajj} > |aij| ⇒ max elt of A lies on diagonal.

Corollary of LU-theorem A symmetric positive definite matrix has a unique LU factor- ization. Proofofcorollary: matrix pos. def. ⇒ principal submatrix pos. def.

pos. def. ⇒ nonsing.

34 2 E.g.,             1 1 1 1 T 1 2 3 100 1 2 3 100 1 100  1 1 1   1   1 1   1   1   1  2 3 4 = 2 10 0 12 12 = 2 10 12 2 10 1 1 1 1 1 1 1 1 3 4 5 3 11 00 180 3 11 180 3 11

More generally, let an s.p.d. matrix A have an LU factorization LU and express U = DU1 T T T T T where U1 is unit upper triangular. However, A = A = U L = U1 (DL ), and uniqueness of the LU factorization implies that U T = L. We conclude that

A = LDLT where L is unit lower triangular and D is diagonal. In addition, can we infer that D is positive? −T T −T T −1 T −T 0 < (L ei) A(L ei)=ei L LDL L ei = dii p p p 1/2 Letting D =diag( d11, d22,..., dnn), we have

A = LD1/2D1/2LT =(LD1/2)(LD1/2)T =: GGT, which we call the . The matrix G is sometimes called a Cholesky triangle. In our example it is   1 √   1/2 √3/6 √ . 1/3 3/6 5/30

A symmetric matrix A is positive definite if and only if A has an LU factorization where the diagonal elements of U are positive. By computing a Cholesky factorization, our hope is to reduce storage and computation by one half. You need store only the lower half of A, which you overwrite with G.Fora symmetric matrix   1 1 1 2 3  1 1 1  A = 2 3 4 1 1 1 3 4 5 you can use packed storage

1 1 1 1 1 1 2 3 3 4 5 instead of unpacked storage × × 1 1 × 1 1 1 1 2 3 3 4 5

35 To develop an algorithm, look at the (i, j)elementofA = GGT, j ≤ i: Xj aij = gikgjk . k=1 A G

i j

j i

j i ji

Use this equation to solve for gij, because it is the most “advanced” unknown element in the above equation. There are two cases: case I j = i case II j

Note we can overwrite gij onto aij. ALGORITHM Cholesky Decomposition (row version) for (i =1;i ≤ n; i++) { for (j =1; j

Operation count Altogether there are n square roots and ! n i− n X X1 X 1 n3 (j − 1) + (i − 1) = i(i − 1) ≈ mults. 2 6 i=1 j=1 i=1

1 3 For symmetric indefinite matrices there are also algorithms using 6 n mults.

36   r r r ··· rn− rn−  0 1 2 2 1   .. ..   r−1 r0 r1 . . rn−2   . .   .. ..   r−2 r−1 r0   . . . . .   ...... r   2   ......  r2−n . . . r1 r1−n r2−n r−2 r−1 r0

−1 7 2 For symmetric positive definite Toeplitz matrices there are algorithms for T requiring 4 n mults—note the exponent!

Review questions 1. What is a symmetric matrix? a skew symmetric matrix?

2. What is a symmetric positive definite matrix?

3. Which matrix properties discussed previously imply that a matrix is nonsingular?

4. Which matrix properties discussed previously are inherited by the principal submatri- ces of a matrix?

5. Which matrix properties discussed previously are sufficient for the existence of a unique LU factorization?

6. Which matrix properties discussed previously are sufficient for the existence of a unique Cholesky factorization?

7. The entries of a Cholesky factorization can be computed only for certain orderings. Give an example of a possible ordering. Give the equations for computing the entries in this order.

8. Neglecting lower order terms, what is the number of operations required for Cholesky factorization applied to a symmetric positive definite system of n equations in n un- knowns.

Exercises 1. Show that the inverse of a symmetric matrix is symmetric.

2. Show that the inverse of a positive definite matrix is positive definite. Hint: a 1 × 1 matrix is symmetric.

37 3. Modify the Cholesky decomposition algorithm on page 36 so that it computes the LDLT factorization with L and D stored in the obvious places.

4. Given a Cholesky factorization for a symmetric positive definite matrix A, what is the cost of computing A−1b?

5. Prove by induction that a symmetric positive definite matrix A has a unique Cholesky factorization GGT Do not use the fact that the LU is unique, but rather give a con- struction like that of the LU theorem. At some point in the proof you will need to know that a certain scalar quantity is positive. Simply insert the statement **to be shown later**.

6. Complete the preceding problem.

7. (Outer product form of Cholesky factorization) By considering (n − 1, 1) by (n − 1, 1) partitionings of the matrices involved, give a recursive algorithm for the Cholesky factorization GGT of a symmetric positive definite matrix A. Cf. exercise 4.2.

8. Rewrite the Cholesky decomposition algorithm so that it determines the elements of the Cholesky triangle G ofamatrixA column by column (first column, second column, ..., lastcolumn).

9. Is symmetry inherited by a principal submatrix?

10. Show that for a positive definite matrix

(i) any principal submatrix also is, (ii) it is nonsingular.

11. (advanced) Show that if A = AT is strictly diagonally dominant with positive diagonal 1 2 2 | i j|≤ elements, then A is positive definite. Hint: x x 2 (xi + xj ). 12. Define B = ATA where A is an m × n real matrix. Show that B is positive definite if and only if rank(A)=n.   AB 13. If is a symmetric partitioning of a symmetric matrix, what can we say about CD the blocks A, B, C, D?

1.5.13 Determine the Cholesky factorization for   42−4 A =  22−2  −4 −25

38 by reducing the problem to that of finding the Cholesky decomposition of a 2 by 2 matrix, whose decomposition is as follows:           42 20 21 10 10 10 either = or = . 22 11 01 01 01 01

(Two decompositions are given because there are two reasonable ways of solving this problem.)

1.5.14 This question asks for the formulas needed to develop the algorithm for Cholesky decomposition A = GGT where only the elements on or below the diagonal of A are stored. The starting point for these formulas is the (i, j)th element of the matrix equation A = GGT: Xj aij = gikgjk,j≤ i . k=1

Obtain from this equation explicit formulas for gii,1≤ i ≤ n, and for gij,1≤ j

39