Solving Systems of Linear Equations

● Motivation

● The idea behind Googles pagerank ● An example from economics ● Gaussian elimination

● LU Decomposition

● Iterative Methods – The Jacobi method ● Summary

Motivation

● Systems of linear equations of the form A ⃗x=b⃗ or in components a11 x1+a12 x2+...+a1n xn=b1 a11 a12 ... a1n b1 a x +a x +...+a x =b a a ... a b 21 1 22 2 2n n 2 A= 21 22 2n ,b= 2 ...... a x +a x +...+a x =b (am1 am2 ... amn) (bm) m1 1 m2 2 mn n m

● Are found in many contexts. For instance we need them to find roots of systems of non-linear equations using the Newton method. Short Reminder (1)

● Matrix-vector multiplication

a11 a12 a13 x1 a11 x1+a12 x2+a13 x3

A ⃗x= a21 a22 a23 x2 = a21 x1+a22 x2+a23 x3 [ a31 a32 a33 ][ x3 ] [ a31 x1+a32 x2+a33 x3 ]

● Matrix-

a11 a12 a13 b11 b12 b13 ...... = ... a b ... a21 a22 a23 b21 b22 b23 ∑k ik kj [a31 a32 a33 ][b31 b32 b33 ] [...... ]

Short Reminder (2)

● Graphical illustration of 2d systems

● Every equation represents a line

non-singular singular

● The following situations are possible – Lines cross at exactly one point -> unique solution (non- singular) – Lines are parallel (singular) and don't cross -> no solution – Lines overlap completely -> infinitely many solutions ● In higher d the same situations are possible Short Reminder (3)

● Focus on square matrices

● Some criteria that an n*n matrix is non-singular:

● The inverse of A (denoted A-1) exists ● det (A) !=0 ● rank (A) = n ● For any vector z!=0 it holds Az!=0

Example: How Google Search Works (sort of)

● In a particular group of people, who is the most popular? ● Say there are three people: Jane, Charlie, Fred

● We ask them to list their friends

● J lists C, C lists J,F, F lists J ● Could write this as a matrix J C F J 0 1 1 C 1 0 0 F 0 1 0

#total 1 2 1 Example (2)

● Some people list more than others as friends. To compensate we normalise each column by the sum J C F J 0 1/2 1 “linking matrix” C 1 0 0 =L F (0 1/2 0)

● Now want to find a vector p=(j,c,f) which assigns a popularity to each person

● Idea: a person's popularity is the weighted sum of the popularities of the people who mention it → p=Lp Example (3)

1 1/2 1/2

1/3 1/3

1/3

A node (person) is the more popular the ● more people list it as friends ● less friends these people list ● more popular these friends are (i.e. Having one very

popular good friend can make you more popular than having 10 not so popular friends) Example (4)

● This defines a system of linear equations! J C F J 0 1/2 1 C 1 0 0 =L F (0 1/2 0)

● E.g.: j=1/2 c+ f, c=j, f=1/2 c or Lp= p

● In this case this is easy to solve

● System is under-determined: (e.g.) set j=1 as a scale, -> c=1, f=1/2, i.e.

● Charlie and Jane are most popular, Fred is less so. Remark

● We could use this to assign a popularity to web- pages

● The friend lists from before would then just refer to the lists of links in web-pages, i.e. How many web- pages link to a given page (normalized)? ● The popularity p could be interpreted as page rank (or we might wish to add contextual information (c) for search queries and not base them purely on popularity … e.g. p=Lp + c) ● We might wonder if the system Lp=p always has positive solutions p

● -> This is ensured by the Frobenius-Perron theorem Example: Setting Prices in an Economy ● This is a problem that goes back to Wassily Leontief (who got the Nobel prize for economics in 1973)

● Suppose in a far away land of Eigenbazistan, in a small country town called Matrixville, there lived a Farmer, a Tailor, a Carpenter, a Coal Miner and Slacker Bob. The Farmer produced food; the Tailor, clothes; the Carpenter, housing; the Coal Miner supplied energy; and Slacker Bob made High Quality 100% Proof Moonshine, half of which he drank himself.

● Let's assume that:

● No outside supply or demand ● Everything produced is consumed. ● Need to specify what fraction of each good is consumed by each person

● Problem: What are the incomes of each person? Consumption

Food Clothes Housing Energy High Quality 100 Proof Moonshine

Farmer 0.25 0.15 0.25 0.18 0.20 Tailor 0.15 0.28 0.18 0.17 0.05 Carpenter 0.22 0.19 0.22 0.22 0.10 Coal Miner 0.20 0.15 0.20 0.28 0.15 Slacker Bob 0.18 0.23 0.15 0.15 0.50

● I.e. The farmer eats 25% of all food, 15% of all clothes, 25% of all housing, 18% of all energy

● Let's say price levels are p ,p ,p ,p ,p F T CA CO S ● Since this is a closed economy, we require:

pF=0.25 pF+0.15 pT +0.25 pCA+0.18 pCO+0.20pS

Again ... a System of Linear Equations

pF=0.25 pF+0.15 pT +0.25 pCA+0.18 pCO+0.20pS

pT =0.15 pF+0.28 pT +0.18 pCA +0.17 pCO +0.05pS

pCA=0.22 pF+0.19 pT +0.22 pCA +0.22 pCO +0.10pS

pCO=0.20 pF +0.15 pT +0.20 pCA +0.28 pCO +0.15pS

pS=0.18 pF +0.23 pT +0.15 pCA +0.15 pCO +0.50pS

● Again this is a system of the type Ap=p (i.e. Under-determined, can set one price arbitrarily)

● Not so easy to solve by hand ... computer methods would be useful!

Gaussian Elimination

● One idea to “mechanise” solving such systems is Gaussian elimination

● Consider the system: 2x+ y−z=8 −3x− y+2z=−11 −2x+ y+2z=−3 ● Observations:

● Can multiply equations by constants, swap equations, or add or subtract equations without changing solution ● Apply such operations to transform system into convenient form?

● What is a convenient form? -> triangular Gaussian Elimination (2)

● Triangular form: 2x+ y−z=8 (1) 1/3 y+1/3 z=2/3 (2) −z=1 (3)

● If our system had the above form we could just solve it by successive substitution, i.e.

(3) z=−1 (substitute into) (2) 1/3 y−1/3=2/3→ y=3 (substitute into) (3) 2 x+3−(−1)=8→ x=2

How to Transform into Triangular Form?

● Start with: 2x+ y−z=8 (1) −3x− y+2z=−11 (2) −2x+ y+2z=−3 (3) ● Pivot (1). Multiply (2) by 2/3 and add to (1)

2x+ y−z=8 (1) y z + =2/3 (2') 3 3 −2x+ y+2z=−3 (3)

● Multiply (3) by 1 and add to (1) 2x+ y−z=8 (1) y z + =1 (2') 3 3 2y+z=5 (3') How to Transform into Triangular Form? (2)

2x+ y−z=8 (1) y z + =2/3 (2') 3 3 2y+z=5 (3') ● Now multiply (2') by -6 and add to (3') 2x+ y−z=8 (1) y z + =2/3 (2') 3 3 −z=1 (3'')

● Matrix is now in triangular form. Can

diagonalise it as follows: Diagonalising ...

● Multiply (2') by 3 and add to (3''), (3'') by -1 add to (1):

2 x+ y =7 (1) y =3 (2'') −z=1 (3'')

● And finally: Multiply (2'') by -1 and add to (1). Finally, scale coefficients:

2 x =4 (1') y =3 (2'') z=1 (3'')

Gaussian Elimination

● Usually, this is done using only coefficient schemes the “augmented” matrix

● Can be used to diagonalise matrices, just append identity matrix and perform all operations on it as well

2 1 −1 1 0 0 −3 −1 2 0 1 0 [−2 1 2 0 0 1]

Coefficient matrix Identity matrix Example ...

2 1 −1 1 0 0 −3 −1 2 0 1 0 [−2 1 2 0 0 1] 2 1 −1 1 0 0 0 1/2 1/2 3/2 1 0 3/2*(1)+(2) [0 2 1 1 0 1] (1)+(2) 2 1 −1 1 0 0 0 1/2 1/2 3/2 1 0 [0 0 −1 −5 −4 1] -4*(2)+(3)

Example (2)

2 1 −1 1 0 0 0 1/2 1/2 3/2 1 0 [0 0 −1 −5 −4 1]

2 1 0 6 4 −1 (1)-(3) 0 1 0 −2 −2 1 2*(2)+(3) [0 0 −1 −5 −4 1 ]

2 0 0 8 6 −2 (1)-(2) 0 1 0 −2 −2 1 -1 [0 0 −1 −5 −4 1 ] This is A 1 0 0 4 3 −1 1/2*(1) 0 1 0 −2 −2 1 1*(2)

[0 0 1 5 4 −1] -1*(3) Example (3)

● OK, we found A-1, how to check if this is correct? ● Suppose we now have A-1 … how is this useful to find x?

Some Remarks

● Pivot may be zero or (“inconvenient” for ) -> reorder rows ● Possibly divide by very small numbers (if pivot is small), better reorder to use largest possible pivot

● Computational complexity is of order O(n3) (roughly: n-1 rows need to be dealt with, this process involves order n operations and needs to be repeated less than n times)

Pseudocode

for k = 1 ... min(m,n): Find the k-th pivot: i_max := argmax (i = k ... m, abs(A[i, k])) if A[i_max, k] = 0 error "Matrix is singular!" swap rows(k, i_max) Do for all rows below pivot: for i = k + 1 ... m: Do for all remaining elements in current row: for j = k + 1 ... n: A[i, j] := A[i, j] - A[k, j] * (A[i, k] / A[k, k]) Fill lower triangular matrix with zeros: A[i, k] := 0

Turns m*n matrix into “echelon” formation which can then be solved by back-substitution LU Decomposition

● A slightly different strategy is LU factorization

● Write A as a product of two triangular matrices, one upper (U) and one lower triangular A=LU

a11 a12 a13 l11 0 0 u11 u12 u13

a21 a22 a23 = l21 l22 0 0 u22 u23 [a31 a32 a33 ] [l31 l32 l 33][ 0 0 u33 ]

● How does this help to solve Ax=b? A x= LU x=b

● Can first solve Ly=b and then Ux=y LU Decomposition (2)

● Ly=b is “easy” to solve because

l11 0 0 y1 l11 y1 b1

l21 l22 0 y2 = l21 y1+l22 y2 = b2 [l31 l32 l33 ][ y3] [l 31 y1+l32 y2+l33 y3] [ b3 ] we can solve it by back-substitution

● Similarly, Ux=y is “easy”:

u11 u12 u13 x1 u11 x1+u12 x2+u13 x3 y1

0 u22 u23 x2 = u22 x2+u23 x3 = y2 [ 0 0 u33][ x3] [ u33 x3 ] [ y3 ]

LU Decomposition (3)

● Essentially variant of Gaussian elimination

● Like in Gaussian elimination we might have to exchange

● Rows (“partial pivoting”) PA=LU ● Rows and columns (“full pivoting”) PAQ=LU ● Where P is a permutation matrix that keeps accounts for row permutations and Q for column permutations ● Any square matrix allows an LUP factorization

● For positive definite symmetric (or Hermitian) matrices there is a similar decomposition, the Cholesky decomposition A=L L* (often used in

statistics) Doolittle Algorithm (1)

● One algorithm to generate LUP

● Similarly to GE we aim to remove entries of A under the diagonal which is achieved by multiplying A with an appropriate matrix L' from the left, e.g.:

a11 a12 a13

a21 a22 a23 Multiply by -a21/a11 and add to first row [a31 a32 a33 ] Multiply by -a31/a11 and add to first row

● This is equivalent to multiplying A from the left by

1 0 0 a11 a12 a13 a11 a12 a13

−a21/ a11 1 0 a21 a22 a23 = 0 −a21 / a11 a12+a22 −a21/ a11 a13 +a23

[−a31/ a11 0 1][ a31 a32 a33 ] [ 0 −a31 / a11 a12+a32 −a31/ a11 a13 +a33 ] Doolittle Algorithm (2)

● Start with matrix A(0)=A (n-1) ● At stage n-1 left multiply A with a matrix L that has n diagonal entries 1 and only entries in the nth column under the diagonal. These entries are −a(n−1) (n)= i , n li ,n (n−1) ann (N −1) ● In N-1 steps obtain an upper triangular matrix U =A −1 −1 −1 (N −1) ● We have: A=L1 L2 ... LN−1 A ● Since products and inverses of lower triangular matrices are −1 −1 −1 lower triangular we have (*) L= L1 L2 ... LN −1 and A=LU Doolittle (3)

● Still need to determine L from (*) 1 0 0 0 0 ... 0 0 1 0 0 0 ... 0

Ln= 0 ... 0 1 0 ... 0

0 ... 0 li , n 0 ... 0 [ 0 ... 0 lN ,n 0 ... 1] 1 0 0 0 0 ... 0 0 1 0 0 0 ... 0 −1 Ln = 0 ... 0 1 0 ... 0

0 ... 0 −li , n 0 ... 0 0 ... 0 −l 0 ... 1 [ N ,n ] Doolittle (4)

● And hence:

1 0 0 0 0 ... 0

−l21 1 0 0 0 ... 0 −l −l ... 1 0 ... 0 L= 31 32 −l41 −l 42 −l43 −li n 0 ... 0 ...... −l −l −l ...... −l 1 [ N1 N2 N3 N N −1 ]

Beyond Doolittle ...

● LU decomposition has some slight advantages over Gaussian elimination

● i.e. once we have L and U we can solve systems for every r.h.s. B ● Small modifications can also be used to calculate the matrix inverse ● Generalizations are e.g. the Crout algorithm or LUP decomposition (the wiki pages are very good on this topic and give many more “advanced” references: http://en.wikipedia.org/wiki/LU_decomposition) Iteration Methods

● Time complexity of LU decomposition or Gaussian elimination is O(n3), i.e. dealing with millions of equations (which arise when solving certain PDEs) can take a long time ...

● There are faster methods that only come up with approximate solutions, these are iteration methods

● The trade-off of these methods is that they might not converge for general types of coefficient matrices A

● For diagonally dominant methods a widely used method is the Jacobi method Jacobi Method

● Want to solve Ax=b (*)

● Write A=D+R

● Where D are the diagonal elements of A and R is the rest ● Diagonally dominant means that “D is large compared to R”, e.g. ∣a ∣> ∣a ∣ ii ∑ j ij ● Write (*) as: A ⃗x=(D +R)⃗x=⃗b ⃗x= D−1(⃗b−R ⃗x)

● Iterate this: −1 ⃗ x⃗n+1=D ( b−R x⃗n) Iteration # Example Jacobi

● Consider the system: 2x+ y=11 2 1 11 A= ,b⃗= 5x+7y=13 [5 7] [13]

−1 T ● ⃗ Jacobi: x ⃗ n + 1 = D ( b − R x ⃗ n ) say: x⃗0=[1,1]

x⃗n +1=⃗c−T x⃗n

2 0 0 1 −1 1/2 0 D= , R= → D = [ 0 7] [5 0] [ 0 1/7] ⃗c= D−1 b⃗=[11/2,13/7]T

−1 0 1/2 T=D R= [5/7 0 ] Example Jacobi (2)

● Hence we have to iterate:

x⃗n +1=⃗c−T x⃗n

11/2 0 −1/2 x⃗ = − x⃗ n+1 [13/7] [−5/7 0 ] n

● With x0=[1,1] 11/2 0 −1/2 1 5 x⃗ = − = 1 [13/7] [−5/7 0 ][1] [8/7]

● Iterating 25 times one obtains: x=[7.111, -3.222]

● Exact -> x=64/9=7.1111...., y=-29/9=-3.22222.... The Gauss-Seidel Method

● Jacobi method only applicable to diagonally dominant matrices ● Gauss-Seidel can be used for general matrices, but convergence only guaranteed if

● Diagonally dominant or ● Symmetric and positive definite ● Idea: A=L+U

Lower triangular Strictly upper triangular

−1

x⃗n+1=L (b−U x⃗n) Software Packages

● LINPACK

● Package for solving wide variety of linear systems and special systems (e.g. symmetric, banded, etc.) ● Has become standard benchmark for comparing performance of computers ● LAPACK

● More recent replacement of LINPACK – higher performance on modern computer architectures including some parallel computers ● Available from Netlib -- http://www.netlib.org/

Summary

● Exact methods to solve linear systems of equations

● Gaussian elimination ● LU decomposition – Doolittle algorithm ● With full/partial pivoting -> in practice stable ● O(n3) ● Iterative methods – e.g. Jacobi method

● Faster – suited to millions of equations that might arise when solving certain PDEs ● Limited convergence

● More sophisticated: weighted Jacobi, Gauss-Seidel successive over-relaxation, ... References

● The English wiki pages are quite comprehensive on this topic (and I used them quite a bit for preparing the lecture) ● More comprehensive:

● David M. Young, Jr. Iterative Solution of Large Linear Systems, Academic Press, 1971. (reprinted by Dover, 2003) ● Richard S. Varga 2002 Matrix Iterative Analysis, Second ed. (of 1962 Prentice Hall edition), Springer-Verlag.