CS537

Numerical Analysis

Lecture 4

System of Linear Equations

Professor Jun Zhang

Department of Computer Science University of Kentucky Lexington, KY 40506-0633

February 20, 2019 System of Linear Equations

a x + a x ++ a x = b 11 1 + 12 2 + + 1n n = 1 a21 x1 a22 x2  a2n xn b2    + + + = an1 x1 an2 x2  ann xn bn where aij are coefficients, xi are unknowns, and bi are right-hand sides. Written in a compact form is n = =  aij x j bi , i 1,,n j=1 The system can also be written in a form Ax = b where the coefficient matrix is a11 a12  a1n  a a  a  A =  21 22 2n          an1 an2  ann  and x = [x , x ,, x ]T ,b = [b ,b ,b ]T 1 2 n 1 2 n 2 An Upper Triangular System

The solution of a general linear system is not easily available

The solution can be obtained easily from an upper triangular system

+ + + + = a11 x1 a12 x2 a13 x3  a1n xn b1 + + + = a22 x2 a23 x3  a2n xn b2 + + = a33 x3  a3n xn b3    + = an−1,n−1 xn−1 an−1,n xn bn−1 = ann xn bn

Objective: Modify a general linear system into an upper triangular system

Three elementary operations on a linear system do not change its solution

3 Karl Friedrich Gauss (April 30, 1777 – February 23, 1855) German Mathematician and Scientist

4 Gaussian Elimination

Linear systems are solved by Gaussian elimination, which involves repeated procedure of multiplying a row by a number and adding it to another row to eliminate a certain variable

For a particular step, this amounts to ← − aik ≤ ≤ aij aij akj (k j n) akk ← − aik bi bi bk akk

th After this step, the variable xk, is eliminated in the (k + 1) and in the later equations

The Gaussian elimination modifies a matrix into an upper triangular form such that aij = 0 for all i > j. The solution of an upper triangular system is then easily obtained by a back substitution procedure

5 Illustration of Gaussian Elimination

6 7 Back Substitution

The obtained upper triangular system is + + + + = a11 x1 a12 x2 a13 x3  a1n xn b1 + + + = a22 x2 a23 x3  a2n xn b2 + + = a33 x3  a3n xn b3    + = an−1,n−1 xn−1 an−1,n xn bn−1 = ann xn bn We can compute = bn xn ann From the last equation and substitute its value in other equations and repeat the process  n  = 1  −  xi  bi  aij x j  aii  j=i+1  For i = n –1, n – 2,…, 1 8 9 Condition Number and Error

A quantity used to measure the quality of a matrix is called condition number, defined as − ρ(A) = A A 1

The condition number measures the transfer of error from the matrix A to the right-hand side vector b. If A has a large condition number, small error in A or b may yield large error in the solution x = A-1b. Such a matrix is called ill-conditioned

The error e is defined as the difference between a computed solution and the exact solution e = x − x~ Since the exact solution is generally unknown, we measure the residual r = b − A x~

As an indicator of the size of the error

10 Error Correction

If the error can be found, then the approximate solution can be corrected: = + (1) You cannot solve the residual equation to get the error = In many modern iterative methods, we aim to solve an approximate equation of them form =

Where is a “good” approximate to A, so that will be a good approximate to the error e, we can correct the approximate the solution as:

= + is a better approximation then x, and we can repeat the process again

11 Small Pivot ε + = x1 x2 1 + = x1 x2 2 for some small ε. After the step of Gaussian elimination ε + = x1 x2 1 ()− 1 = − 1 1 ε x2 2 ε We have 2 − 1/ ε x = 2 1− 1/ ε 1− x x = 2 1 ε

For very small ε, the computer result will be x2 = 1 and x1 = 0. The correct results are = 1 ≈ x1 ε 1 1− 1− 2ε x = ≈ 1 2 1− ε 12 Scaled Partial Pivoting

We need to choose an element which is large relative to other elements of the same row as the pivot

Let L = (l1, l2,…, ln) be an index array of integers. We first compute an array of scaling factor as S = (s1, s2,…, sn) where = ≤ ≤ si max aij (1 i n) 1≤ j≤n

The first row i is chosen such that the ratio |ai,1|/si is the greatest. Suppose this index is l1, then appropriate multipliers of equation l1 are subtracted from the other equations to eliminate x1 from the other equations

Suppose initially L = (l1, l2,…, ln) = (1, 2,…, n), if our first choice is lj, we will interchange lj and l1 in the index set, not actually interchange the first and the lj rows, to avoid moving data around the memory

13 Example

Straightforward Gaussian elimination does not work well (not robust) ε + = x1 x2 1 + = x1 x2 2 The scaling factor will be computed as S = {1,1}. In the first step, the scaling factor ratio array {ε,1}. So the 2nd row is the pivoting row

st After eliminating x1 from the 1 equation, we have − ε = − ε (1 )x2 1 2 + = x1 x2 2 It follows that 1− 2ε x = ≈ 1 2 1− ε = − ≈ x1 2 x2 1 We computed correct results by using scaled partial pivoting strategy

14 Gaussian Elimination with Partial Pivoting

15 Long Operation Count

We count the number of multiplications and divisions, ignore summations and subtractions

The 1st step, finding a pivoting costs n divisions

Additional n operations are needed to multiply a factor to the pivoting row for each of the n –1eliminations. The cost is n(n –1)operations. The total cost of this step is n2 operations

The computation is repeated on the remaining (n –1)equations. The total costs of Gaussian elimination with scaled partial pivoting is n2 + (n − 1)2 ++ 42 + 32 + 22 = n(n + 1)(2n + 1) n3 − 1 ≈ 6 3 Back substitution costs n(n –1)/2operations 16 Tridiagonal and Banded Systems

Banded system has a coefficient matrix such that aij = 0 if |i – j| ≥ w. For a tridiagonal system, w = 2

d1 c1  x1   b1  a d c  x   b   1 2 2  2   2   a2 d3 c3  x3   b3     =              an−1 dn−1 cn−1 xn−1  bn−1        an−1 dn  xn   bn  General elimination procedure ← − (ai −1 ) di di ci−1 di −1 ← − ()ai −1 bi bi bi−1 di −1

The array ci is not modified. No additional nonzero is created

Matrix can be stored in three vector arrays 17 Tridiagonal Systems

The back substitution is straightforward = bn xn dn − = bi ci xi+1 = − xi (i n 1,,1) di No pivoting is performed, otherwise the procedure will be quite different due to the fill-in (the array c will be modified)

Diagonal dominance: A matrix A = (aij)n×n is diagonally dominant if n > ≤ ≤ aii aij (1 i n) j=1, j≠i For a diagonally dominant tridiagonal system, no pivoting is needed, i.e., no division by zero

We want to show Gaussian elimination preserves diagonal dominance, i.e., > + di ai−1 ci 18 Tridiagonal Systems

The new coefficient matrix has 0 elements at the ai ‘s places. The new diagonal elements are determined recursively as ∧ = d1 d1

∧   = −  ai−1  ≤ ≤ d i d c − ∧ (2 i n) i i 1   di−1  We assume that > + di | ai−1 | |ci | We want to show that >

We use induction to prove the inequality It is obviously true for i =1, as =

19 Tridiagonal Systems

If we assume that >

We prove for index I, as ∧   = −  ai−1  | d i | | d c − ∧ | i i 1   di−1 

≥ − | ci−1 | | di | | ai−1 | ∧

| di−1 | > + − = | ai−1 | | ci | | ai−1 | | ci |

It follows that the new diagonal entries will not be zero, the Gaussian elimination procedure can be carried out without any problem

20 Example of a pentadiagonal matrix. It is nearly tridiagonal. The matrix with only nonzero entries on the main diagonal, and The first two diagonals above and below it

21 22 LU Factorization 1

As we showed before, an n*n system of linear equations can be written in a matrix form as Ax=b where the coefficient matrix A has the form

a11 a12  a1n  a a  a  A =  21 22 2n          an1 an2  ann  x is the unknown vector and b is right-hand side known vector

We also assume A is of full rank, and most entries of A are not zero

23 LU Factorization 2

There are two special forms of matrices. One is (unit) lower triangular 1 0  0  l 1  0  L =  21          ln1 ln2  1  The other is upper triangular

u11 u12  u1n   0 u  u  U =  22 2n           0 0  u nn  We want to find a pair of L and U matrices, such that A= LU 24 Example

Take a system of linear equations −  6 2 2 4 x1   16  12 −8 6 10 x   26    2  =   − −  3 13 9 3 x3   19 − −   −   6 4 1 18x4   34 The Gaussian elimination process finally yields an upper triangular system − 6 2 2 4 x1  16  0 − 4 2 2 x  − 6   2  =   − − 0 0 2 5x3   9  −   −  0 0 0 3x4   3 This could be achieved by multiplying the original system with a matrix M , such that MAx= Mb 25 Example

We want the matrix M to be special, so that MA is upper triangular 6 − 2 2 4  0 − 4 2 2  MA =   = U 0 0 2 − 5   0 0 0 − 3 The question is: can we find such a matrix M? Look at the first step of the Gaussian elimination − 6 2 2 4 x1   16  0 − 4 2 2 x   − 6    2  =   − − 0 12 8 1 x3   27  −   −  0 2 3 14x4   18 This step can be achieved by multiplying the original system with a lower = M1Ax M1b 26 Example

Here the lower triangular matrix M1 is  1 0 0 0 − 2 1 0 0 M =  1  1 − 0 1 0  2   1 0 0 1 This matrix is nonsingular, because it is lower triangular with a main diagonal containing all 1’s.

The nonzero elements in the first columns are the negatives of the multipliers in the positions where 0’s were created in the original matrix = M1Ax M1b

27 Example

If we continue to the second step of the Gaussian elimination, we have − 6 2 2 4 x1   16  0 − 4 2 2 x   − 6    2  =   − − 0 0 2 5 x3   9   −   −  0 0 4 13x4   21 This can be achieved with the multiplication of another lower triangular matrix 1 0 0 0   0 1 0 0 M = 2 0 − 3 1 0  1  0 0 1  2  Thus, we have = M2M1Ax M2M1b 28 Example The last step is − 6 2 2 4 x1  16  0 − 4 2 2 x  − 6   2  =   − − 0 0 2 5x3   9  −   −  0 0 0 3x4   3 with another lower triangular matrix

1 0 0 0 0 1 0 0 M =   3 0 0 1 0   0 0 − 2 1 And, we have   = M3M2M1Ax M3M2M1b 29 Example If we define = M M3M2M1 Then M is lower triangular and MA is upper triangular

Since = M3M2M1A U the inverse of a lower triangular is again a lower triangular matrix. The product of two lower triangular matrices is a lower triangular matrix = −1 = −1 −1 −1 = A M U M1 M 2 M 3 U LU

This shows that we have transformed a matrix A into the product of a lower triangular matrix and an upper triangular matrix

This process is called LU factorization or LU decomposition

30 Example The lower triangular matrix is = −1 −1 −1 L M1 M2 M3 which in the current example is  1 0 0 0  2 1 0 0  1  L =  3 1 0  2  − − 1   1 2 1 And, we have  2  1 0 0 0   − −  6 2 2 4   6 2 2 4  2 1 0 0      1  0 − 4 2 2 12 −8 6 10 LU =  3 1 0  =   = A  2 0 0 2 − 5  3 −13 9 3   1     −1 − 2 1 0 0 0 − 3 − 6 4 1 −18  2  31 Illustration of LU Factorization

32 Example The original system Ax=b can be factored into LUx=b The solution process can be separated into two phases. First, forward elimination with an intermediate variable z, as Lz = b Second, back substitution step Ux = z The advantage of LU factorization or LU decomposition is that several linear systems with the same coefficient matrix but different right hand side vectors can be solved more efficiently. We only need do the factorization once. The forward elimination and back substitution steps are O(n2) operations 33 Compute the Inverse The system of linear equations Ax=b can be symbolically solved as x = A−1b But the inverse of A is seldom computed explicitly. This is because performing Gaussian elimination is less expensive.

If the explicit inverse of a matrix A is needed, it can be obtained by using LU factorization of A, note that AX = I We can compute a series of linear systems as LU[x(1) , x(2) ,, x(n) ] = [I (1) , I (2) ,, I (n) ] And A−1 = [x(1) , x(2) ,, x(n) ]

34 Singular Value Decomposition (SVD) The eigenvalues and eigenvectors of a matrix A, are Ax=λx I.e., the application of A on the vector of x is equivalent to a scaling. Here λ is an eigenvalue and x is the associated eigenvector

The singular values of a matrix A are the nonnegative square roots of the eigenvalues of AT A By the Spectral Theorem for Matrices, A T A can be diagonalized by an , Q, as AT A = QΣQ−1 Where (orthogonality property) QT Q = QT Q = I, i.e., QT = Q−1 The Σ contains the eigenvalues of A T A on its diagonal 35 Singular Value Decomposition II We can see that AT AQ=QΣ

T So the columns of Q are eigenvectors of A A

T If λ is an eigenvalue of A A and x is a corresponding eigenvector, then AT Ax = λx And λ || Ax ||2 = (Ax)T (Ax) = xT AT Ax = xT x = λ || x ||2 It follows that the eigenvalue λ is real and nonnegative.

Since Q is an orthogonal matrix, its columns form an orthonormal base. They are unit eigenvectors of AT A

If vj is the jth column of Q, we have T =λ A Av j j v j 36 Singular Value Decomposition III

For a matrix, the SVD is A=UΣVT Where U and V are orthogonal matrices and Σ is a diagonal matrix

37 SVD – Example

• A = UΣVT -example: retrieval inf. lung data brain 11 1 0 0 0.18 0 22 2 0 0 0.36 0 Eng 9.64 0 11 1 0 0 0.18 0 x x 55 5 0 0 = 0.90 0 05.29 00 0 2 2 00.53 Med 00 0 3 3 00.80 0.58 0.58 0.58 0 0 00 0 1 1 00.27 0 0 0 0.71 0.71 SVD – Example

• A = UΣVT -example: retrieval Eng Topics inf. lung Med Topics data brain 11 1 0 0 0.18 0 22 2 0 0 0.36 0 Eng 9.64 0 11 1 0 0 0.18 0 x x 55 5 0 0 = 0.90 0 05.29 00 0 2 2 00.53 Med 00 0 3 3 00.80 0.58 0.58 0.58 0 0 00 0 1 1 00.27 0 0 0 0.71 0.71 SVD – Example

Document-to-Topics • A = UΣVT -example: Similarity Matrix retrieval Eng Topics inf. lung Med Topics data brain 11 1 0 0 0.18 0 22 2 0 0 0.36 0 Eng 9.64 0 11 1 0 0 0.18 0 x x 55 5 0 0 = 0.90 0 05.29 00 0 2 2 00.53 Med 00 0 3 3 00.80 0.58 0.58 0.58 0 0 00 0 1 1 00.27 0 0 0 0.71 0.71

KDD'09 Faloutsos, Miller, Tsourakakis P1-40 SVD – Example

• A = UΣVT -example: retrieval inf. lung data brain Strength of“Eng Topics” 11 1 0 0 0.18 0 22 2 0 0 0.36 0 Eng 9.64 0 11 1 0 0 0.18 0 x x 55 5 0 0 = 0.90 0 05.29 00 0 2 2 00.53 Med 00 0 3 3 00.80 0.58 0.58 0.58 0 0 00 0 1 1 00.27 0 0 0 0.71 0.71 SVD – Example

• A = UΣVT -example: retrieval inf. lung Word-to-Topics data brain Similarity Matrix 11 1 0 0 0.18 0 22 2 0 0 0.36 0 Eng 9.64 0 11 1 0 0 0.18 0 x x 55 5 0 0 = 0.90 0 05.29 00 0 2 2 00.53 Med 00 0 3 3 00.80 0.58 0.58 0.58 0 0 00 0 1 1 00.27 0 0 0 0.71 0.71 Truncated SVD

For a matrix, the truncated rank-k SVD is = Σ T Ak Uk kVk Where U and V are orthogonal matrices and Σ is a diagonal matrix

43 SVD – Dimension Reduction

• Why is it called “dimension reduction”? Modified data: rank 1

1 1 1 0 0 0.18 2 2 2 0 0 0.36 1 1 1 0 0 9.64 0.18 x x 5 5 5 0 0 = 0.90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.58 0.58 0.58 0 0

0 Gene H. Golub (February 29, 1932 – November 16, 2007) American Mathematician and Computer Scientist

45 Iterative Solution Methods Obtain an approximate solution to a linear system iteratively + + + + = a11 x1 a12 x2 a13 x3  a1n xn b1 + + + + = a21 x1 a22 x2 a23 x3  a2n xn b2    + + + + = an1 x1 an2 x2 an3 x3  ann xnn bn

Create a single equation for each variable − + + + = b1 (a12 x 2 a13 x 3  a1n x n ) x1 a11 − + + + = b2 (a 21 x1 a 23 x 3  a 2 n x n ) x 2 a 22   − + + + = bn (a n1 x1 a n 2 x 2 a n 3 x 3  ) x n a nn We assign an arbitrary value to each of the variables in the right-hand side to start iteration 46 Jacobi Iteration

We repeatedly use each set of computed values as the next iterates, and hope the process will converge to the true solution. (May stop early if necessary.) This iteration procedure is called Jacobi iteration method. Convergence is not guaranteed for general matrices

− (k−1) + (k−1) + + (k−1) (k) = b1 (a12x 2 a13x3  a1n x n ) x1 a11 − (k−1) + (k−1) + + (k−1) (k) = b2 (a 21x1 a 23x3  a 2n x n ) x 2 a 22   − (k−1) + (k−1) + (k−1) + (k) = bn (a n1x1 a n2x 2 a n3x3 ) x n a nn

47 Carl Gustav Jacobi (December 10, 1804 – February 18, 1851) Prussian Mathematician

48 Gauss-Seidel Iteration

If the iteration process converges, it may be faster if we use the new values as soon as they are available. This gives the Gauss-Seidel iteration method − (k−1) + (k−1) + + (k−1) (k) = b1 (a12x 2 a13x3  a1n x n ) x1 a11 − (k) + (k−1) + + (k−1) (k) = b2 (a 21x1 a 23x3  a 2n x n ) x 2 a 22   − (k) + (k) + (k) + (k) = bn (a n1x1 a n2x 2 a n3x3 ) x n a nn

In general, Gauss-Seidel is faster than Jacobi, but the later is more attractive to run on parallel computers. No convergence is guaranteed

49 Philipp Ludwig von Seidel (October 23, 1821 – August 13, 1896) German Mathematician

Seidel was a student of Jacobi

50 SOR Method

Around 1950, David Young at Harvard University found that the iteration can be accelerated by using a weighted average of the successive approximate solutions.ω Given some parameter 0 < ω < 2, we perform (k ) = ~(k ) + − ω (k−1) = xi xi (1 )xi i 1,2,,n

This step symbolizes the beginning of modern iterative methods

With an arbitrary initial guess, convergence of these basic iterative methods is guaranteed if

1.) The matrix is strictly diagonally dominant;

2.) The matrix is symmetric positive definite, i.e., A = AT and xTAx > 0

51 David M. Young, Jr. (1923-2008) American Mathematician and Scientist

52 Splitting Coefficient Matrix

The system of linear equations is Ax = b We can split the coefficient matrix A as A = M − N Such that M is nonsingular and M-1 is easy to compute. So we have (M − N)x = b and Mx = Nx + b

So x = M −1(Nx + b) = M −1Nx + M −1b We can use this relationship for an iteration scheme x(k ) = M −1Nx(k −1) + M −1b

53 Iteration Matrix

The matrix M-1N is called the iteration matrix

For the iteration process to be inexpensive to carry out, M must be easy to invert x(k ) = M −1Nx(k−1) + M −1b Since N = M − A We have x(k ) = M −1(M − A)x(k−1) + M −1b = (I − M −1 A)x(k−1) + M −1b If x(k) converges to the solution x, then x = (I − M −1 A)x + M −1b It follows that Ax = b

54 Convergence

We define the error vector at each iteration step as e(k ) = x(k ) − x It follows that e(k ) = x(k ) − x = (I − M −1 A)x(k−1) − x + M −1b = (I − M −1 A)x(k−1) − (I − M −1 A)x = (I − M −1 A)(x(k−1) − x) = (I − M −1 A)e(k−1) If we want the error to become smaller and smaller, we want the iteration matrix I − M −1 A to be small, in some sense 55 Iteration Matrix

So we want the matrix M −1 A to be close to the . It is to say that we want M is close to A

If fact, if all eigenvelues of the iteration matrix I − M −1 A is smaller than one, then the iteration e(k ) = (I − M −1 A)e(k−1) converges to zero and the original iteration converges. This requires ρ(I − M −1 A) <1 I.e., the spectral radius of the iteration is smaller than one guarantees the convergence of the induced iteration method

56 Stationary Iterative Methods

For the Jacobi iteration = − + ω A D (L U ) For the Gauss-Seidel Method A = (ωL + D) −U ω For the SOR method, the iteration is (D − L)x(k ) = [ U + (1− )D]x(k −1) +ωb These iterative methods are called stationary methods, or classic iterative methods. They converge slowly

Modern iterative methods, such as multigrid method and Krylov subspace methods converge much more rapidly. Those methods will be studied in CS623

57