<<

Part II

Vector spaces and linear transformations

9

Chapter 2

Vector spaces

In this chapter we introduce the notion of a , which is fundamen- tal for the approximation methods that we will later develop, in particular in the form of an orthogonal projection onto a subspace representing the best possible approximation in that subspace. Any vector in an vector space can be expressed in terms of a set of vectors, and we here introduce the process of constructing an orthonormal basis from an arbitrary basis, which provides the foundation for a range of matrix factorization methods we will use to solve systems of linear equations and eigenvalue problems. We use the Euclidian space Rn as an illustrative example, but the con- cept of a vector space is much more general than that, forming the basis for the theory of approximation and partial di↵erential equations.

2.1 Vector spaces

Vector space

We denote the elements of R,therealnumbers,asscalars,andavector space,orlinear space, is then defined by a set V which is closed under two basic operations on V , vector addition and multiplication,

(i) x, y V x + y V, 2 ) 2 (ii) x V,↵ R ↵x V , 2 2 ) 2 satisfying the expected algebraic rules for addition and multiplication. A vector space defined over R is a real vector space.Moregenerally,wecan define vector spaces over the complex numbers C,oranyalgebraic field F.

11 12 CHAPTER 2. VECTOR SPACES

The Euclidian space Rn The Euclidian space Rn is a vector space consisting of the set of column vectors x1 T . x =(x1,...,xn) = 2 . 3 , (2.1) xn 6 7 4 5 T where (x1,...,xn)isarowvectorwithxj R,andwherev denotes 2 the transpose of the vector v.InRn the basic operations are defined by component-wise addition and multiplication, such that,

T (i) x + y =(x1 + y1,...,xn + yn) ,

T (ii) ↵x =(↵x1,...,↵xn) . A geometrical interpretation of a vector space will prove to be useful. For example, the vector space R2 can be interpreted as the vector arrows in the Euclidian plane, defined by: (i) a direction with respect to a fixed point (origo), and (ii) a magnitude (the Euclidian length).

T x x = (x1,x2) x+y

T αx = (αx1,αx2)

T y origo = (0,0)

T Figure 2.1: Geometrical interpretation of a vector x =(x1,x2) in the Euclidian plane R2 (left), scalar multiplication ↵x with ↵ =0.5(center), and vector addition x + y (right).

Vector subspace A subspace of a vector space V is a subset S V ,suchthatS together with the basic operations in V defines a vector⇢ space in its own right. For example, the planes 3 S1 = x R : x3 =0 , (2.2) { 2 } 2.1. VECTOR SPACES 13

3 S2 = x R : ax1 + bx2 + cx3 =0:a, b, c R , (2.3) { 2 2 } are both subspaces of R3, see Figure 2.2.

x3 3 R S2

x2 e3 e2 S1

e1 x1

Figure 2.2: Illustration of the Euclidian space R3 with the three coordinate axes in the directions of the standard basis vectors e1,e2,e3,andtwosub- 3 spaces S1 and S2,whereS1 is the x1x2-plane and S2 agenericplaneinR that includes origo, with the indicated planes extending to infinity.

Basis For a set of vectors v n in V ,werefertothesum n ↵ v ,with { i}i=1 i=1 i i ↵i R,asalinear combination of the set of vectors vi. All possible linear 2 P combinations of the set of vectors vi define a subspace, n S = v V : v = ↵ivi,↵i R , (2.4) { 2 2 } i=1 X n and we say that the vector space S is spanned by the set of vectors vi i=1, n { } denoted by S =span vi i=1 = v1,...,vn . We say that the set{ }v n his linearlyi independent,if { i}i=1 n ↵ v =0 ↵ =0, i =1,...,n. (2.5) i i ) i 8 i=1 X 14 CHAPTER 2. VECTOR SPACES

n Alinearlyindependentset vi i=1 is a basis for the vector space V ,ifall v V can be expressed as a{ linear} combination of the vectors in the basis, 2 n

v = ↵ivi, (2.6) i=1 X n where ↵i R are the coordinates of v with respect to the basis vi i=1.The dimension2of V ,dim(V ), is the number of vectors in any basis{ for} V ,and any basis of V has the same dimension. The standard basis e ,...,e = (1, 0,...,0)T ,...,(0,...,0, 1)T spans { 1 n} { } Rn,suchthatanyx Rn can be expressed as 2 n

x = xiei, (2.7) i=1 X n where dim R = n,andwerefertothecoordinatesxi R in the standard basis as Cartesian coordinates. 2

Norm To measure the size of vectors we introduce the of a vector in the vector space V .Anormmustsatisfythefollowingconditions:k·k (i) x 0, x V, and x =0 x =0, k k 8 2 k k , (ii) ↵x = ↵ x , x V,↵ R, k k | |k k 8 2 2 (iii) x + y x + y , x, y V , k kk k k k 8 2 where (iii) is the triangle inequality. A is a vector space on which a norm is defined. For n example, R is a normed vector space on which the l2-norm is defined,

n 1/2 x = x2 =(x2 + ... + x2 )1/2, (2.8) k k2 i 1 n i=1 ! X which corresponds to the Euclidian length of the vector x.

Inner product Afunction(, ):V V R on the vector space V is an inner product if · · ⇥ ! (i) (↵x + y,z)=↵(x, z)+(y, z), 2.1. VECTOR SPACES 15

(ii) (x, ↵y + z)=↵(x, y)+(x, z), (iii) (x, y)=(y, x), (iv) (x, x) 0, x V, and (x, x)=0 x =0, 8 2 , for all x, y, z V and ↵, R. An inner2 product space2is a vector space on which an inner product is defined, and each inner product induces an associated norm by x =(x, x)1/2, (2.9) k k and thus an is also a normed space. An inner product and its associated norm satisfies the Cauchy-Schwarz inequality. Theorem 1 (Cauchy-Schwarz inequality). For the associated norm of the inner product ( , ) in the vector space V , wek have·k that · · (x, y) x y , x, y V. (2.10) | |k kk k 8 2 Proof. Let s R so that 2 0 x + sy 2 =(x + sy, x + sy)= x 2 +2s(x, y)+s2 y 2, k k k k k k and then choose s as the minimizer of the right hand side of the inequality, that is, s = (x, y)/ y 2,whichprovesthetheorem. k k The Euclidian space Rn is an inner product space with the Euclidian inner product, also referred to as scalar product or , defined by (x, y) = x y = x y + ... + x y , (2.11) 2 · 1 1 n n 1/2 n which induces the l2-norm x 2 =(x, x)2 .InR we often drop the sub- script for the Euclidian innerk k product and norm, with the understanding that (x, y)=(x, y) and x = x . 2 k k k k2 We can also define general lp-norms as

n 1/p x = x p , (2.12) k kp | i| i=1 ! X for 1 p< . In Figure 2.3 we illustrate the l -norm,  1 1 x = x + ... + x , (2.13) k k1 | 1| | n| and the l -norm, defined by 1 x =max xi . (2.14) 1 1 p n k k   | | In fact, the Cauchy-Schwarz inequality is a special case of the H¨older n inequality for general lp-norms in R . 16 CHAPTER 2. VECTOR SPACES

IIxII1=1 IIxII2=1 IIxII∞=1

n Figure 2.3: Illustration of lp-norms in R through the unit circles x p =1, for p =1, 2, (from left to right). k k 1

Theorem 2 (H¨older inequality). For 1 p, q and 1/p +1/q =1,  1 n (x, y) x p y q, x, y R . (2.15) | |k k k k 8 2 In particular, we have that

n (x, y) x 1 y , x, y R . (2.16) | |k k k k1 8 2 2.2 Orthogonal projections

Orthogonality An inner product space provides a means to generalize the concept of mea- suring angles between vectors, from the Euclidian plane to general vector spaces, where in particular two vectors x and y are orthogonal if (x, y)=0. If a vector v V is orthogonal to all vectors s in a subspace S V , that is 2 ⇢ (v, s)=0, s S, 8 2 then v is said to be orthogonal to S.Forexample,thevector(0, 0, 1)T R3 2 is orthogonal to the subspace spanned in R3 by the vectors (1, 0, 0)T and (0, 1, 0)T . We denote by S? the orthogonal complement of S in V ,definedas

S? = v V :(v, s)=0, s S . (2.17) { 2 8 2 }

The only vector in V that is an element of both S and S? is the zero vector, and any vector v V can be decomposed into two orthogonal components 2 s1 S and s2 S?,suchthatv = s1 + s2,wherethedimensionofS? is equal2 to the codimension2 of the subspace S in V ,thatis

dim(S?)=dim(V ) dim(S). (2.18) 2.2. ORTHOGONAL PROJECTIONS 17

Orthogonal projection

x

x-βy

y

βy

Figure 2.4: Illustration in the Euclidian plane R2 of y,theprojectionof the vector x in the direction of the vector y,withx y orthogonal to y.

The orthogonal projection of a vector x in the direction of another vector y,isthevectory with =(x, y)/ y 2,suchthatthedi↵erencebetween the two vectors is orthogonal to y,thatisk k

(x y,y)=0. (2.19) Further, the orthogonal projection of a vector v V onto the subspace S V ,isavectorv S such that 2 ⇢ s 2 (v v ,s)=0, s S, (2.20) s 8 2 where vs represents the best approximation of v in the subspace S V , with respect to the norm induced by the inner product of V . ⇢

Theorem 3 (Best ).

v v v s , s S (2.21) k skk k 8 2 Proof. For any vector s S we have that 2 v v 2 =(v v ,v v )=(v v ,v s)+(v v ,s v )=(v v ,v s), k sk s s s s s s 18 CHAPTER 2. VECTOR SPACES

since (v vs,s vs)=0,by(2.20)andthefactthats vs S.Theresult then follows from Cauchy-Schwarz inequality and division of2 both sides by v v , k sk (v v ,v s) v v v s v v v s . s k skk k)k skk k

To emphasize the geometric properties of a inner product space V ,itis sometimes useful to visualize a subspace S as a plane in R3, see Figure 2.5.

v v-s V

v-vs

vs-s s vs

S

Figure 2.5: The orthogonal projection vs S is the best approximation of v V in the subspace S V . 2 2 ⇢

Orthonormal basis We refer to a set of non-zero vectors v n in the inner product space V { i}i=1 as an orthogonal set,ifallvectorsvi are pairwise orthogonal, that is if

(v ,v )=0, i = j. (2.22) i j 8 6 2.3. EXCERCISES 19

n If vi i=1 is an orthogonal set in the subspace S V ,anddim(S)=n, then {v }n is a basis for S, that is all v S can be⇢ expressed as { i}i=1 s 2 n

vs = ↵1v1 + ... + ↵nvn = ↵ivi, (2.23) i=1 X with the coordinate ↵ =(v ,v)/ v 2 being the projection of v in the i s i k ik s direction of the basis vector vi. n If Q = qi i=1 is an orthogonal set, and qi =1foralli,wesaythat Q is an orthonormal{ } set.LetQ be an orthonormalk k basis for S,then

n v =(v ,q )q + ... +(v ,q )q = (v ,q)q , v S, (2.24) s s 1 1 s n n s i i 8 s 2 i=1 X where the coordinate (vs,qi)istheprojectionofthevectorvs onto the basis vector q . An arbitrary vector v V can be expressed as i 2 n

v = r + (v, qi)qi, (2.25) i=1 X n where the vector r = v i=1(v, qi)qi is orthogonal to S,thatisr S?,a fact that we will use repeatedly. 2 Thus the vector r VPsatisfies the condition 2 (r, s)=0, s S, (2.26) 8 2 and from (2.21) we know that r is the vector in V that corresponds to the minimal projection error of the vector v onto S with respect to the norm in V .Wewillrefertothevectorr as the residual.

2.3 Excercises

3 Problem 1. Prove that the plane S1 is a subspace of R , where S1 = 3 3 x R : x3 =0. Under what condition is the plane S2 = x R : { 2 } 3 { 2 ax1 + bx2 + cx3 + d =0:a, b, c, d R a subspace of R ? 2 } Problem 2. Prove that the standard basis in Rn is linearly independent. Problem 3. Prove that the Euclidian l -norm is a norm. 2 k·k2 Problem 4. Prove that the scalar product ( , ) is an inner product. · · 2 Problem 5. Prove that is induced by the inner product ( , ) . k·k2 · · 2 20 CHAPTER 2. VECTOR SPACES

n Problem 6. Prove that (x, y) x 1 y , x, y R . | |k k k k1 8 2 Problem 7. Prove that the vector (0, 0, 1)T R3 is orthogonal to the sub- 2 space spanned in R3 by the vectors (1, 0, 0)T and (0, 1, 0)T .

n Problem 8. Let qi i=1 be an orthonormal basis for the subspace S V , { } n ⇢ prove that r S?, with r = v (v, q )q . 2 i=1 i i n Problem 9. Let wi i=1 be a basisP for the subspace S V , so that all s S can be expressed{ } as s = n ↵ w . ⇢ 2 i=1 i i

(a) Prove that (2.20) is equivalentP to finding the vector vs S that satisfies the n equations of the form 2

(v v ,w)=0,i=1,...,n. s i (b) Since v S, we have that v = n w . Prove that (2.20) is s 2 s j=1 j j equivalent to finding the set of coordinates i that satisfies P n

j(wj,wi)=(v, wi),i=1,...,n. j=1 X

n (c) Let qi i=1 be an orthonormal basis for the subspace S V , so that we { } n ⇢ can express vs = j=1 jqj. Use the result in (b) to prove that (2.20) is equivalent to the condition that the coordinates are given as j =(v, qj). P Chapter 3

Linear transformations and matrices

Linear transformations, or linear maps, between vector spaces represent an important class of functions, in their own right, but also as approximations of more general nonlinear transformations. AlineartransformationactingonaEuclidianvectorcanberepresented by a matrix. Many of the concepts we introduce in this chapter generalize to linear transformations acting on functions in infinite dimensional spaces, for example and di↵erential operators, which are fundamental for the study of di↵erential equations.

3.1 Matrix algebra

Linear transformation as a matrix

Afunctionf : Rn Rm defines a linear transformation,if ! (i) f(x + z)=f(x)+f(z),

(ii) f(↵x)=↵f(x),

n for all x, z R and ↵ R. In the standard basis e1,...,en we can express 2 2 { } the ith component of the vector y = f(x) Rm as 2 n n

yi = fi(x)=fi( xjej)= xjfi(ej), j=1 j=1 X X 21 22 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES

n where fi : R R for all i =1,...,m.Incomponentform,wewritethisas ! y1 = a11x1 + ... + a1nxn . . (3.1) ym = am1x1 + ... + amnxn with a = f (e ). That is y = Ax,whereA is an m n matrix, ij i j ⇥ a a 11 ··· 1n . .. . A = 2 . . . 3 . (3.2) am1 amn 6 ··· 7 4 5 m n The set of real valued m n matrices defines a vector space R ⇥ , by the basic operations of (i)⇥ component-wise matrix addition, and (ii) component-wise scalar multiplication, that is a + b a + b ↵a ↵a 11 11 ··· 1n 1n 11 ··· 1n ...... A + B = 2 . . . 3 ,↵A= 2 . . . 3 , am1 + bm1 amn + bmn ↵am1 ↵amn 6 ··· 7 6 ··· 7 4 m n 5 4 5 with A, B R ⇥ and ↵ R. 2 2 Matrix-vector product

m n AmatrixA R ⇥ defines a x Ax,bytheoperationsof matrix-vector2 product and component-wise scalar7! multiplication,

A(x + y)=Ax + Ay, x, y Rn, 2 A(↵x)=↵Ax, x Rn,↵ R. 2 2 In index notation we write a vector b =(bi), and a matrix A =(aij), with i the row index and j is the column index.Foranm n matrix A, and x an n-dimensional column vector, we define the matrix-vector⇥ product b = Ax to be the m-dimensional column vector b =(bi), such that n

bi = aijxj,i=1,...,m. (3.3) j=1 X With aj the jth column of A,anm-dimensional column vector, we can express the matrix-vector product as a of the set of column vectors a n , { j}j=1 n

b = Ax = xjaj, (3.4) j=1 X 3.1. MATRIX ALGEBRA 23 or in matrix form

x1 2 3 2 3 x2 2 3 2 3 2 3 b = a1 a2 an 2 . 3 = x1 a1 + x2 a2 + ... + xn an . ··· . 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6xn7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 4 5 6 7 6 7 6 7 4 5 4 5 n 4 5 4 5 4 5 The vector space spanned by aj j=1 is the column space,orrange,of the matrix A,sothatrange(A)=span{ } a n .Thenull space,orkernel, { j}j=1 of an m n matrix A is the set of vectors x Rn such that Ax =0,with ⇥ 2 0thezerovectorinRm,thatisnull(A)= x Rn : Ax =0 . The dimension of the column space is the{ column2 rank of} the matrix A, rank(A). We note that the column rank is equal to the row rank, corre- sponding to the space spanned by the row vectors of A,andthemaximal rank of an m n matrix is min(m, n), which we refer to as full rank. ⇥ Matrix-matrix product

l n The matrix-matrix product B = AC is a matrix in R ⇥ ,definedfortwo l m m n matrices A R ⇥ and C R ⇥ ,as 2 2 m

bij = aikckj, (3.5) Xk=1 with B =(bij), A =(aik)andC =(ckj). We sometimes omit the summation sign and use the Einstein convention, where repeated indices imply summation over those same indices, so that we express the matrix-matrix product (3.5) simply as bij = aikckj. Similarly as for the matrix-vector product, we may interpret the columns bj of the matrix-matrix product B as a linear combination of the columns ak with coecients ckj

m

bj = Acj = ckjak, (3.6) Xk=1 or in matrix form

2 3 2 3 b b b = a a a c c c . 1 2 ··· n 1 2 ··· m 2 1 2 ··· n 3 6 7 6 7 6 7 6 7 6 7 6 7 4 5 6 7 6 7 4 5 4 5 24 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES

The composition f g(x)=f(g(x)), of two linear transformations f : n m m l m n R R and g : R R ,withassociatedmatricesA R ⇥ and !l w ! 2 n C R ⇥ ,correspondstothematrix-matrixproductAC acting on x R . 2 2

Matrix transpose and the inner and outer products

The transpose (or adjoint)ofanm n matrix A =(aij)isdefinedasthe T ⇥ matrix A =(aji), with the column and row indices reversed. Using the matrix transpose, the inner product of two vectors v, w Rn can be expressed in terms of a matrix-matrix product vT w,as 2

w1

T 2 . 3 (v, w)=v w = v1 vn . = v1w1 + ... + vnwn. (3.7) ··· 6 7 6 7 ⇥ ⇤ 6 7 6wm7 6 7 4 5 Similarly, the outer product,ortensor product,oftwovectorsv, w Rn, denoted by v w,isdefinedasthem n matrix corresponding to2 the matrix-matrix⌦ product vwT ,thatis ⇥

v v w v w 1 1 1 ··· 1 n

T 2 . 3 2 . . 3 v w = vw = . w1 wn = . . . ⌦ 6 7 ··· 6 7 6 7 6 7 6 7 ⇥ ⇤ 6 7 6vm7 6vmv1 vmwn7 6 7 6 7 4 5 4 5 In notation we can express the inner and the outer products as (v, w)=v w and v w = v w ,respectively. i i ⌦ i j The transpose has the property that (AB)T = BT AT ,andthussatisfies T n m m n the equation (Ax, y)=(x, A y), for any x R ,y R and A R ⇥ , which follows from the definition of the inner2 product2 in Euclidian2 vector spaces, since (Ax, y)=(Ax)T y = xT AT y =(x, AT y). (3.8)

n n AsquarematrixA R ⇥ is said to be symmetric (or self-adjoint) if A = AT , so that (Ax,2 y)=(x, Ay). If in addition (Ax, x) > 0forall non-zero x Rn,wesaythatA is a symmetric positive definite matrix. A square matrix2 is said to be normal if AT A = AAT . 3.1. MATRIX ALGEBRA 25

Matrix norms To measure the size of a matrix, we first introduce the Frobenius norm, corresponding to the l2-norm of the matrix A interpreted as an mn-vector, that is m n 1/2 A = a 2 . (3.9) k kF | ij| i=1 j=1 ! X X The Frobenius norm is induced by the following inner product over the m n space R ⇥ , (A, B)=tr(AT B), (3.10) with the of a square n n matrix C =(c )definedby ⇥ ij n

tr(C)= cii. (3.11) i=1 X

Figure 3.1: Illustration of the map x Ax;oftheunitcircle x 2 =1 (left) to the ellipse Ax (right), corresponding7! to the matrix A ink (3.13).k

m n Matrix norms for A R ⇥ are also induced by the respective lp-norms 2 on Rm and Rn,intheform

Ax p A p =supk k =sup Ax p. (3.12) k k x Rn x p x Rn k k x2=0 k k x2p=1 6 k k The last equality follows from the definition of a norm, and shows that the induced matrix norm can be defined in terms of its map of unit vectors, which we illustrate in Figure 3.1 for the matrix 12 A = . (3.13) 02  26 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES

We further have the following inequality, Ax A x , (3.14) k kp k kpk kp which follows from (3.12).

Determinant The determinant of a square matrix A is denoted det(A)or A .Fora2 2 matrix we have the explicit formula | | ⇥ ab det(A)= = ad bc. (3.15) cd For example, the determinant of the matrix A in (3.13) is computed as det(A)=1 2 2 0=2. The formula· for· the determinant is extended to a 3 3matrixby ⇥ abc ef df de det(A)= def= a b + c hi gi gh ghi = a(ei fh) b(di fg )+c(dh eg), (3.16) and by recursion this formula can be generalized to any square matrix. For a 2 2 matrix the absolute value of the determinant is equal to the area of the⇥ parallelogram that represents the of the unit square under the map x Ax,andsimilarlyfora3 3 matrix the volume of the parallelepiped7! representing the mapped unit⇥ cube. More generally, the absolute value of the determinant det(A)representsascalefactorofthe linear transformation A.

Matrix inverse If the column vectors a n of a square n n matrix A form a basis for { j}j=1 ⇥ Rn,thenallvectorsb Rn can be expressed as b = Ax,wherethevector n 2 n x R holds the coordinates of b in the basis aj j=1. 2 { } In particular, all x Rn can be expressed as x = Ix,whereI is the 2 square n n identity matrix in Rn,takingthestandardbasisascolumn vectors, ⇥

x1 10 0 x1 x 01··· 0 x 2 3 2 23 2 3 2 23 x = Ix = e1 e2 en . = . ···. . . , ··· ...... 6 7 . . . . 6 7 6 7 6 7 6 7 6 7 6xn7 60 017 6xn7 6 7 6 7 6 ··· 7 6 7 4 5 4 5 4 5 4 5 3.1. MATRIX ALGEBRA 27

(2,2) (3,2)

(0,1) (0,1)

(1,0) (1,0)

Figure 3.2: The map x Ax (right) of the unit square (left), for the matrix A in (3.13), with the corresponding7! area given as det(A) =2. | |

with the vector entries xi corresponding to the Cartesian coordinates of the vector x. n n AsquarematrixA R ⇥ is invertible,ornon-singular,ifthereexists 1 2 n n an inverse matrix A R ⇥ ,suchthat 2 1 1 A A = AA = I, (3.17)

1 1 which also means that (A ) = A.Further,fortwon n matrices A and 1 1 1 ⇥ B,wehavethepropertythat(AB) = B A .

n n Theorem 4 (Inverse matrix). For a square matrix A R ⇥ , the following statements are equivalent: 2

1 (i) A has an inverse A ,

(ii) det(A) =0, 6 (iii) rank(A)=n,

(iv) range(A)=Rn

(v) null(A)= 0 . { } The matrix inverse is unique. To see this, assume that there exist two matrices B1 and B2 such that AB1 = AB2 = I;whichbylinearitygives that A(B B )=0,butsincenull(A)= 0 we have that B = B . 1 2 { } 1 2 28 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES

3.2 Orthogonal projectors

Orthogonal matrix

n n T 1 AsquarematrixQ R ⇥ is ortogonal,orunitary,ifQ = Q . With qj the columns of Q we2 thus have that QT Q = I,orinmatrixform,

q1 1 q 1 2 2 3 2 3 2 3 . q1 q2 qn = . , . ··· .. . 6 7 6 7 6 7 6 7 6 qn 7 6 7 6 17 6 7 6 7 6 7 4 5 4 5 4 5 n so that the columns qj form an orthonormal basis for R . Multiplication by an orthogonal matrix preserves the angle between two vectors x, y Rn,since 2 (Qx, Qy)=(Qx)T Qy = xT QT Qy = xT y =(x, y), (3.18) and thus also the length of a vector,

Qx =(Qx, Qx)1/2 =(x, x)1/2 = x . (3.19) k k k k For example, counter-clockwise rotation by an angle ✓ in R2,takesthe form of multiplication by an orthogonal matrix,

cos(✓) sin(✓) Q = , (3.20) rot sin(✓)cos( ✓)  whereas reflection in the with a slope given by the angle ✓,corresponds to multiplication by the orthogonal matrix,

cos(2✓)sin(2✓) Q = , (3.21) ref sin(2✓) cos(2✓)  where we note the general fact that for a rotation det(Qrot)=1,andfora reflection det(Q )= 1. ref Orthogonal projector A projection matrix,orprojector,isasquarematrixP such that

P 2 = PP = P. (3.22) 3.2. ORTHOGONAL PROJECTORS 29

It follows that Pv = v, (3.23) for all vectors v range(P ), since v is of the form v = Px for some x,and thus Pv = P 2x 2= Px = v.Forv/range(P ) we have that P (Pv v)= P 2v Pv =0,sothattheprojectionerror2 Pv v null(P ). The matrix I P is also a projector, the complementary 2 projector to P , since (I P )2 =I 2P + P 2 = I P . The range and null space of the two projectors are related as range(I P )=null(P ), (3.24) and range(P )=null(I P ), (3.25) n so that P and I P separates R into the two subspaces S1 =range(P ) and S2 =range(I P ), since the only v range(P ) range(I P )isthe zero vector; v = v Pv =(I P )v = 02. \ { }

P⊥yx x

y Pr x y Pyx

H

Figure 3.3: The projector Pyx of a vector x in the direction of another y r vector y,itsorthogonalcomplementP ? x,andPy x,thereflectorofx in the H defined by y as a normal vector.

If the two subspaces S1 and S2 are orthogonal, we say that P is an orthogonal projector.ThisisequivalenttotheconditionP = P T ,sincethe inner product between two vectors in S1 and S2 then vanish, (Px,(I P )y)=(Px)T (I P )y = xT P T (I P )y = xT (P P 2)y =0, 30 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES and if P is an orthogonal projector, so is I P . For example, the orthogonal projection of one vector x in the direc- tion of another vector y,expressedin(2.19),correspondstoanorthogonal projector Py,by

(x, y)y y(y, x) y(yT x) yyT = = = x = P x. (3.26) y 2 y 2 y 2 y 2 y k k k k k k k k y r Similarly we can define the orthogonal complement P ? x,andPy x,the reflection of x in the hyperplane H defined by y as a normal vector, so that

T T T yy y yy r yy P = ,P? = I ,P= I 2 , (3.27) y y 2 y 2 y y 2 k k k k k k defines orthogonal projectors, where we note that a hyperplane is a subspace in V of 1.

3.3 Exercises

Problem 10. Formulate the algorithms for matrix-vector and matrix-matrix products in pseudo code.

Problem 11. Prove the equivalence of the definitions of the induced matrix norm, defined by

Ax p A p =supk k =sup Ax p. (3.28) k k x Rn x p x Rn k k x2=0 k k x2p=1 6 k k

m l l n T T T Problem 12. For A R ⇥ ,B R ⇥ , prove that (AB) = B A . 2 2 n n 1 1 1 Problem 13. For A, B R ⇥ , prove that (AB) = B A . 2 Problem 14. Prove the inequality (3.14).

Problem 15. Prove that an orthogonal matrix is normal.

Problem 16. Show that the matrices A and B are orthogonal and com- pute their determinants. Which matrix represents a rotation and reflection, respectively?

cos(✓) sin(✓) cos(✓)sin(✓) A = B = (3.29) sin(✓)cos( ✓) sin(✓) cos(✓)   3.3. EXERCISES 31

Problem 17. For P a projector, prove that range(I P )= null(P ), and that range(P )=null(I P ). Problem 18. For the vector y =(1, 0)T , compute the action of the projec- y r T tors Py,P? ,Py on a general vector x =(x1,x2) .

Chapter 4

Linear operators in Rn

In this chapter we give some examples of linear operators in the vector space Rn,usedextensivelyinvariousfields,includingcomputergraphics, robotics, computer vision, image processing, and computer aided design. We also meet di↵erential equations for the first time, in the form of matrix operators acting on discrete approximations of functions, defined by their values at the nodes of a grid.

4.1 Di↵erential and integral operators

Di↵erence and summation matrices

Subdivide the interval [0, 1] into a structured grid h with n intervals and T n +1nodes xi,suchthat0=x0

xi i F (xi)= f(s)ds f(xk)h Fh(xi), (4.1) 0 ⇡ ⌘ Z Xk=1 which defines a function Fh(xi) F (xi)forallnodesxi in the subdivision, based on Riemann sums. ⇡

The function Fh defines a linear transformation Lh of the vector of T sampled function values at the nodes y =(f(x1),...,f(xn)) ,whichcan

33 34 CHAPTER 4. LINEAR OPERATORS IN Rn be expressed by the following matrix equation,

h 0 0 f(x1) f(x1)h hh··· 0 f(x ) f(x )h + f(x )h 2 3 2 2 3 2 1 2 3 Lhy = . ···. . . = . , (4.2) ...... 6 7 6 7 6 n 7 6hh h7 6f(xn)7 6 f(xk)h 7 6 ··· 7 6 7 6 k=1 7 4 5 4 5 4 5 P 1 where Lh is a summation matrix, with an associated inverse Lh ,

10 0 10 0 11··· 0 11··· 0 2 3 1 1 2 3 Lh = h . ···. . Lh = h . ···. . . (4.3) . .. . ) . .. . 6 7 6 7 611 17 6 0 117 6 ··· 7 6 ··· 7 4 5 4 5 1 The inverse matrix Lh corresponds to a di↵erence matrix over the same subdivision h,approximatingtheslope(derivative)ofthefunctionf(x). T 1 To see this, multiply the matrix Lh to y =(f(xi)),

10 0 f(x1) (f(x1) f(x0))/h 11··· 0 f(x ) (f(x ) f(x ))/h 1 1 2 3 2 2 3 2 2 1 3 Lh y = h . ···. . . = . , ...... 6 7 6 7 6 7 6 0 117 6f(xn)7 6(f(xn) f(xn 1))/h7 6 ··· 7 6 7 6 7 4 5 4 5 4 5 where we recall that f(x0)=f(0) = 0.

f(x) f(x)

x0=0 x1 x2 x3 x4 xn=1 x x0=0 x1 x2 x3 x4 xn=1 x

Figure 4.1: Approximation of the integral of a function f(x)intheformof Riemann sums (left), and approximation of the derivative of f(x)byslopes computed from function values in the nodes xi (right), on a subdivision of [0, 1] with interval length h. 4.1. DIFFERENTIAL AND INTEGRAL OPERATORS 35

As the interval length h 0, the summation and di↵erence matrices converge to integral and di↵erential! operators, such that for each x (0, 1), 2 x 1 L y f(s)ds, L y f 0(x). (4.4) h ! h ! Z0 Further, we have for the product of the two matrices that x 1 y = L L y f(x)= f 0(s)ds, (4.5) h h ! Za as h 0, which corresponds to the Fundamental theorem of Calculus. ! Di↵erence operators 1 The matrix Lh in (4.3) corresponds to a backward di↵erence operator Dh, + and similarly we can define a forward di↵erence operator Dh ,by 100 0 110 0 11 0··· 0 0 11··· 0 2 3 2 3 1 . ···. . + 1 . ···. . Dh = h . .. . ,Dh = h . .. . . 6 7 6 7 6 0 1107 6 0 0 117 6 ··· 7 6 ··· 7 6 0 0 117 6 0 00 17 6 ··· 7 6 ··· 7 4 5+ 4 5 The matrix-matrix product Dh Dh takes the form, 110 0 1 21··· 0 2 3 + 2 . ···. . Dh Dh = h . .. . , (4.6) 6 7 6 0 1 217 6 ··· 7 6 0 01 27 6 ··· 7 4 5 which corresponds to an approximation of a second order di↵erential oper- + ator. The matrix A = D D is diagonally dominant,thatis h h a a , (4.7) | ii| | ij| j=i X6 and symmetric positive definite, since

T x Ax = ... + xi( xi 1 +2xi xi+1)+... + xn( xn 1 +2xn) 2 2 = ... xixi 1 +2xi xixi+1 xi+1xi + ... xn 1xn +2xn 2 2 2 = ... +(xi xi 1) +(xi+1 xi) + ... + xn > 0, for any non-zero vector x. 36 CHAPTER 4. LINEAR OPERATORS IN Rn

The finite di↵erence method + For a vector y =(u(xi)), the ith row of the matrix Dh Dh corresponds to a finite di↵erence stencil,withu(xi)functionvaluessampledatthenodesxi of the structured grid representing the subdivision of the interval I =(0, 1),

+ u(xi+1) 2u(xi)+u(xi 1) [(D D)y] = h h i h2 u(xi+1) u(xi) u(xi) u(xi 1) = h h . h + Similarly, the di↵erence operators Dh and Dh correspond to finite di↵erence stencils over the grid, and we have that for x I, 2 + + (D D)y u00(x), (D)y u0(x), (D )y u0(x), (4.8) h h ! h ! h ! as the grid size h 0. !

-1 -1 -1 6 -1 -1 -1 -1 -1 4 -1

-1

-1 2 -1

Figure 4.2: Examples of finite di↵erence stencils corresponding to the di↵er- + 2 ence operator (D D)overstructuredgridsinR (lower left), R (right) h h and R3 (upper left).

The finite di↵erence method for solving di↵erential equations is based on approximation of di↵erential operators by such di↵erence stencils over a grid. We can thus, for example, approximate the di↵erential equation

u00(x)+u(x)=f(x), (4.9) by the matrix equation + (D D)y +(D)y = b, (4.10) h h h with bi =(f(xi)). The finite di↵erence method extends to multiple dimen- sions, where the di↵erence stencils are defined over structured (Cartesian) grids in R2 or R3, see Figure 4.2. 4.2. PROJECTIVE GEOMETRY 37

Solution of di↵erential equations + Since the second order di↵erence matrix A = (Dh Dh)issymmetricposi- 1 tive definite, there exists a unique inverse A .Forexample,inthecaseof n =5andthedi↵erencematrixA below, we have that

2 10 0 0 54321 12 10 0 48642 2 2 3 1 2 2 3 A =1/h 0 12 10 A = h /6 36963. ) 6 00 12 17 6246847 6 7 6 7 6 000 127 6123457 6 7 6 7 4 5 4 5 1 The matrix A corresponds to a symmetric integral (summation) oper- ator, where the matrix elements decay with the distance from the diagonal. The integral operator has the property that when multiplied to a vector y =(yi), each element yi is transformed into an average of all the vector elements of y,withmostweightgiventotheelementsclosetoyi. Further, for y =(u(xi)) and b =(f(xi)), the solution to the di↵erential equation u00(x)=f(x)(4.11) can be approximated by 1 y = A b. (4.12) We can thus compute approximate solutions for any function f(x)on the right hand side of the equation (4.11). Although, we note that while the n n matrix A is sparse,withonlyfewnon-zeroelementsnearthe ⇥ 1 diagonal, the inverse A is a dense matrix without zero elements. 1 In general the dense matrix A has a much larger memory footprint than the sparse matrix A.Therefore,forlargematrices,itmaybeimpos- 1 sible to hold the matrix A in memory, so that instead iterative solution 1 methods must be used without the explicit construction of the matrix A .

4.2 Projective geometry

Ane transformations An ane transformation,orane map,isalineartransformationcomposed with a translation, corresponding to a multiplication by a matrix A,followed by addition of a position vector b,thatis

x Ax + b. (4.13) 7! 38 CHAPTER 4. LINEAR OPERATORS IN Rn

For example, an object defined by a set of vectors in R2 can be scaled by a diagonal matrix, or rotated by a Givens ,

cos(✓) sin(✓) A = , (4.14) rot sin(✓)cos( ✓)  with ✓ acounter-clockwiserotationangle. Any triangle in the Euclidian plane R2 is related to each other through an invertible ane map. There also exists ane maps from R2 to a surface (manifold) in R3, although such a map is not invertible, see Figure 4.4.

x 2 x R2 3 R3

Ax+b Ax+b (0,1) (0,1,0) x2

(1,0,0) x1 (1,0) x1

Figure 4.3: Ane maps x Ax + b of the reference triangle,withcorners 7! in (0, 0), (1, 0), (0, 1); in R2 (left); to a surface (manifold) in R3 (right).

Homogeneous coordinates By using homogeneous coordinates,orprojective coordinates,wecanex- press any ane transformation as one single , includ- ing translation. The underlying definition is that the representation of a geometric object x is homogeneous if x = x,forallrealnumbers =0. 2 T 6 An R vector x =(x1,x2) in standard Cartesian coordinates is repre- T sented as x =(x1,x2, 1) in homogeneous coordinates, from which follows that any object u =(u1,u2,u3)inhomogeneouscoordinatescanbeex- pressed in Cartesian coordinates, by

u1 u1/u3 x1 u1/u3 u2 = u2/u3 = . (4.15) 2 3 2 3 ) x2 u2/u3 u3 1   4 5 4 5 4.3. COMPUTER GRAPHICS 39

u3

x2

x1

1 u2

u1

Figure 4.4: The relation of homogeneous (projective) coordinates and Cartesian coordinates.

It follows that in homogeneous coordinates, rotation by an angle ✓ and translation by a vector (t1,t2), both can be expressed as matrices,

cos(✓) sin(✓)0 10t 1 A = sin(✓)cos(✓)0,A = 01t . (4.16) rot 2 3 trans 2 23 001 00 1 4 5 4 5 An advantage of homogenous coordinates is also the ability to apply combinations of transformations by multiplying the respective matrices, which is used extensively e.g. in robotics, computer vision, and computer graphics. For example, an ane transformation can be expressed by the matrix-matrix product AtransArot.

4.3 Computer graphics

Vector graphics Vector graphics is based the representation of primitive geometric objects defined by a set of parameters, such as a circle in R2 defined by its center and radius, or a cube in R3 defined by its corner points. Lines and polygones are other common objects, and for special purposes more advances objects 40 CHAPTER 4. LINEAR OPERATORS IN Rn are used, such as NURBS (Non-uniform rational B-splines) for computer aided design (CAD), and PostScript fonts for digital type setting. These objects my be characterized by their parameter values in the form of vectors in Rn,andoperationsonsuchobjectscanbedefinedbyane transformations acting on the vectors of parameters.

Raster graphics Whereas vector graphics describes an image in terms of geometric objects such as lines and curves, raster graphics represent an image as an array of color values positioned in a grid pattern. In 2D each square cell in the grid is called a pixel (from picture element), and in 3D each cube cell is known as a voxel (volumetric pixel). In 2D image processing, the operation of a convolution,orfilter,isthe multiplication of each pixel and its neighbours by a convolution matrix, or ,toproduceanewimagewhereeachpixelisdeterminedbythe kernel, similar to the stencil operators in the finite di↵erence method. Common kernels include the Sharpen and Gaussian blur filters,

0 10 121 1 15 1 , 242, (4.17) 2 3 16 2 3 0 10 121 4 5 4 5 where we note the similarity to the finite di↵erence stencil of the second order derivative (Laplacian) and its inverse.

Figure 4.5: Raster image (left), transformed by a Sharpen (middle) and a blur (right) filters. Part III

Algebraic equations

41

Chapter 5

Linear system of equations

In this chapter we study methods for solving linear systems of equations. That is, we seek a solution in terms of a vector x that satisfies a set of linear equations that can be formulated as a matrix equation Ax = b. For a square non-singular matrix A,wecanconstructdirectsolution methods based on factorization of the matrix A into a product of matrices that are easy to invert. In the case of a rectangular matrix A we formulate a least squares problem, where we seek a solution x that minimizes the norm of the residual b Ax. 5.1 Linear system of equations

Alinearsystemofequationscanbeexpressedasthematrixequation

Ax = b, (5.1) with A a given matrix and b agivenvector,forwhichx is the unknown solution vector. Given our previous discussion, b can be interpreted as the image of x under the linear transformation A,oralternatively,x can be interpreted as the coecients of b expressed in the column space of A. For a square non-singular matrix A the solution x can be expressed in terms of the inverse matrix as

1 x = A b. (5.2)

1 For some matrices the inverse matrix A is easy to construct, such as in the case of a diagonal matrix D =(dij), for which dij =0foralli = j. Here 1 1 6 the inverse is directly given as D =(dij ). Similarly, for an orthogonal 1 T matrix Q the inverse is given by the transpose Q = Q .Onthecontrary,

43 44 CHAPTER 5. LINEAR SYSTEM OF EQUATIONS for a general matrix A,computationoftheinverseisnotstraightforward. Instead we seek to transform the general matrix into a product of matrices that are easy to invert. We will introduce two factorizations that can be used for solving Ax = b, in the case of A being a general square non-singular matrix; QR factor- ization and LU factorization.Factorizationfollowedbyinversionofthe factored matrix is an example of a direct method for solving Ax = b.We note that to solve the equation we do not have to construct the inverse matrix explicitly, instead we only need to compute the action of matrices on a vector, which is important in terms of the memory footprint of the algorithms.

Triangular matrices Apart from diagonal and orthogonal matrices, triangular matrices are easy to invert. We distinguish between two classes of triangular matrices: a lower triangular matrix L =(lij), with lij =0forij.Theproductoflower triangular matrices is lower triangular, and the product of upper triangular matrices is upper triangular. Similarly, the inverse of a lower triangular matrix is lower triangular, and the inverse of an upper triangular matrix is upper triangular. The equations Lx = b and Ux = b,taketheform

l 0 0 x b u u u x b 11 ··· 1 1 11 12 ··· 1n 1 1 l12 l22 0 x2 b2 0 u22 u2n x2 b2 2 . ···. . 3 2 . 3 = 2 . 3 , 2 . ···. . 3 2 . 3 = 2 . 3 , ...... 6 7 6 7 6 7 6 7 6 7 6 7 6l1n l2n lnn7 6xn7 6bn7 6 00 unn7 6xn7 6bn7 6 ··· 7 6 7 6 7 6 ··· 7 6 7 6 7 4 5 4 5 4 5 4 5 4 5 4 5 solved by forward substitution and backward substitution,respectively,

b1 bn x1 = xn = l11 unn b2 l21x1 bn 1 un 1nxn x2 = xn 1 = l22 un 1n 1 ···n 1 ··· n b l x b u x x = n i=1 ni i x = 1 i=2 1i i n l 1 u Pnn P11 where both algorithms correspond to (n2)operations. O 5.2. QR FACTORIZATION 45

5.2 QR factorization

Classical Gram-Schmidt orthogonalization

n n For a square matrix A R ⇥ we denote the successive vector spaces 2 spanned by its column vectors aj as a a ,a a ,a ,a ... a ,...,a . (5.3) h 1i✓h 1 2i✓h 1 2 3i✓ ✓h 1 mi Assuming that A has full rank, for each such vector space we now construct an orthonormal basis q ,suchthat q ,...,q = a ,...,a ,forallj n. j h 1 ji h 1 ji  Given aj,wecansuccessivelyconstructvectorsvj that are orthogonal to the spaces q1,...,qj 1 ,sinceby(2.25)wehavethat h i j 1 v = a (a ,q)q , (5.4) j j j i i i=1 X for all j =1,...,n,whereeachvectoristhennormalizedtogetqj = vj/ vj . This is the classical Gram-Schmidt iteration. k k

Modified Gram-Schmidt orthogonalization

If we let Qˆj 1 be an n (j 1) matrix with the column vectors qi,for i j 1, we can rewrite⇥ (5.4) in terms of an orthogonal projector P ,  j j 1 j 1 T ˆ ˆT vj = aj (aj,qi)qi = aj qiqi aj =(I Qj 1Qj 1)aj = Pjaj, i=1 i=1 X X ˆ ˆT ˆ with Qj 1Qj 1 an orthogonal projector onto range(Qj 1), the column space ˆ ˆ ˆT of Qj 1. The matrix Pj = I Qj 1Qj 1 is thus an orthogonal projector onto the space orthogonal to range(Qˆj 1), with P1 = I.TheGram-Schmidt iteration can therefore be expressed in terms of the projector Pj as q = P a / P a , (5.5) j j j k j jk for j =1,...,n. Alternatively, Pj can be constructed by successive multiplication of pro- q T jectors P ? i = I q q ,orthogonaltoeachindividualvectorq ,suchthat i i i qj 1 q2 q1 P = P ? P ? P ? . (5.6) j ··· The modified Gram-Schmidt iteration corresponds to instead using this formula to construct Pj, which leads to a more robust algorithm than the classical Gram-Schmidt iteration. 46 CHAPTER 5. LINEAR SYSTEM OF EQUATIONS

Algorithm 1: Modified Gram-Schmidt iteration for i =1to n do vi = ai end for i =1to n do r = v ii k ik qi = vi/rii for j =1to i +1do T rij = qi vj vj = vj rijqi end end

QR factorization j 1 By introducing the notation rij =(aj,qi)andrii = aj i=1 (aj,qi)qi , we can rewrite the classical Gram-Schmidt iteration (5.4)k as k P a1 = r11q1

a2 = r12q1 + r22q2 (5.7) . .

an = r1nq1 + ... + r2nqn which corresponds to the QR factorization A = QR,withaj the column vectors of the matrix A, Q an orthogonal matrix and R an upper triangular matrix, that is

r11 r12 r1n r ··· 2 3 2 3 2 22 3 a1 a2 an = q1 q2 qn . . . ··· ··· .. . 6 7 6 7 6 7 6 7 6 7 6 rnn7 6 7 6 7 6 7 6 7 6 7 4 5 Existence4 and uniqueness5 of4 the QR factorization5 of a non-singular matrix A follows by construction from Algorithm 1. The modified Gram-Schmidt iteration of Algorithm 1 corresponds to successive multiplication of upper triangular matrices Rk on the right of the matrix A,suchthattheresultingmatrixQ is an orthogonal matrix, AR R R = Q, (5.8) 1 2 ··· n 1 1 1 and with the notation R = R1R2 Rn,thematrixR =(R ) is also an upper triangular matrix. ··· 5.2. QR FACTORIZATION 47

Householder QR factorization Whereas Gram-Schmidt iteration amounts to triangular orthogonalization of the matrix A,wemayalternativelyformulateanalgorithmfororthogonal triangularization, where entries below the diagonal of A are zeroed out by successive application of orthogonal matrices Qk,sothat

Qn...Q2Q1A = R, (5.9) where we note that the matrix product Q = Qn...Q2Q1 is also orthogonal. In the Householder algorithm, orthogonal matrices are chosen of the form I 0 Q = , (5.10) k 0 F  with I the (k 1) (k 1) identity matrix, and with F an (n k +1) ⇥ ⇥ (n k +1)orthogonalmatrix. Qk is constructed to successively introducing n k zeros below the diagonal of the kth column of A,whileleavingthe upper k 1rowsuntouched,thustakingtheform

I 0 Aˆ11 Aˆ12 Aˆ11 Aˆ12 QkAˆk 1 = = , (5.11) 0 F 0 Aˆ 0 F Aˆ   22  22 with Aˆk 1 = Qk 1 Q2Q1A,andwithAˆij representing the sub-matrices, ··· or blocks,ofAˆk 1 with corresponding block structure as Qk. To obtain a triangular matrix, F should introduce zeros in all subdi- agonal entries of the matrix. We want to construct F such that for x an n k + 1 column vector, we get x ±k0 k Fx = 2 . 3 = x e1, (5.12) . ±k k 6 7 6 0 7 6 7 4 5 T with e1 =(1, 0,...,0) astandardbasisvector. Further, we need F to be an orthogonal matrix, which we achieve by formulating F in the form of a reflector, so that Fx is the reflection of x in ahyperplaneorthogonaltothevectorv = x e x,thatis ±k k 1 vvT F = I 2 . (5.13) vT v We now formulate the full algorithm for QR factorization based on this Householder reflector,whereweusethenotationAi:j,k:l for a sub-matrix 48 CHAPTER 5. LINEAR SYSTEM OF EQUATIONS

x

-||x||e1-x +||x||e1-x

-||x||e1 +||x||e1

+ - H H

+ Figure 5.1: Householder reflectors across the two H and H.

Algorithm 2: Householder QR factorization for k =1to n do x = Ak:n,k v =sign(x ) x e + x k 1 k k2 1 vk = vk/ vk k k T Ak:n,k:n = Ak:n,k:n 2vk(vk Ak:n,k:n) end with a row index in the range i, ..., j,andcolumnindexintherangek, ..., l.

We note that Algorithm 2 does not explicitly construct the matrix Q, although from the vectors vk we can compute the matrix-vector product with Q = Q Q Q or QT = Q Q Q . 1 2 ··· n n ··· 2 1

5.3 LU factorization

Similar to Householder triangulation, Gaussian elimination transforms a square n n matrix A into an upper triangular matrix U,bysuccessively ⇥ 5.3. LU FACTORIZATION 49 inserting zeros below the diagonal. In the case of Gaussian elimination, this is done by adding multiples of each row to the other rows, which corresponds to multiplication by a sequence of triangular matrices Lk from the left, so that

Ln 1 L2L1A = U. (5.14) ··· 1 By setting L = Ln 1 L2L1,weobtainthefactorizationA = LU, 1 1 1 ··· with L = L1 L2 Ln 1. ··· The kth step in the Gaussian elimination algorithm involves division by the diagonal element ukk, and thus for stability of the algorithm in finite precision arithmetics it is necessary to avoid a small number in that position, which is achieved by reordering the rows, or pivoting. With a permutation matrix P ,theLU factorization with pivoting may be expressed as PA = LU.

Algorithm 3: Gaussian elimination with pivoting Starting from the matrices U = A, L = I, P = I for k =1to n 1 do Select i k to maximize uik Interchange the rows k and| i|in the matrices U, L, P for j = k +1to n do ljk = ujk/ukk uj,k:n = uj,k:n ljkuk,k:n end end

Cholesky factorization For the case of a symmetric positive definite matrix, A can be decomposed into a product of a lower triangular matrix L and its transpose LT ,which is referred to as the Cholesky factorization,

A = LLT . (5.15)

In the Cholesky factorization algorithm, symmetry is exploited to per- form Gaussian elimination from both the left and right of the matrix A at the same time, which results in an algorithm at half the computational cost of LU factorization. 50 CHAPTER 5. LINEAR SYSTEM OF EQUATIONS

5.4 Least squares problems

We now consider a system of linear of equations Ax = b,forwhichwehave n m n m n unknowns but m>nequations, that is x R , A R ⇥ and b R . 1 2 2 2 There exists no inverse matrix A ,andifthevectorb/range(A)we say that the system is overdetermined,andthusnoexactsolution2 x exists to the equation Ax = b.Insteadweseekthesolutionx Rn that minimizes m 2 the l2-norm of the residual b Ax R ,whichisreferredtoastheleast squares problem 2 min b Ax 2. (5.16) x Rn 2 k k Ageometricinterpretationisthatweseekthevectorx Rn such that the Euclidian distance between Ax and b is minimal, which2 corresponds to

Ax = Pb, (5.17)

m m where P R ⇥ is the orthogonal projector onto range(A). 2

b

r=b-Ax

Ax=Pb

range(A)

Figure 5.2: Geometric interpretation of the least squares problem.

Thus the residual r = b Ax is orthogonal to range(A), that is (Ay, r)= (y, AT r)=0,forally Rn,sothat(5.16)isequivalentto 2 AT r =0, (5.18) 5.5. EXERCISES 51 which corresponds to the n n system ⇥ AT Ax = AT b, (5.19) referred to as the normal equations. The normal equations thus provide a way to solve the m n least squares problem by solving instead a square n n system. The⇥ square matrix AT A is nonsingular if and only if A has full⇥ rank, for which the solution is T 1 T T 1 T given as x =(A A) A b,wherethematrix(A A) A is known as the pseudoinverse of A.

5.5 Exercises

Problem 19. Prove that the product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper triangular.

Problem 20. Express the classical Gram-Schmidt factorization in pseudo code.

Problem 21. Carry out the algorithms for QR and LU factorization for a 3 3 matrix A. ⇥ Problem 22. Implement the algorithms for QR and LU factorization, and test the computer program for n n matrices with n large. ⇥ Problem 23. Derive the normal equations for the system

23 2 x1 14 = 1 . (5.20) 2 3 x2 2 3 31 3 4 5 4 5

Chapter 6

The eigenvalue problem

An eigenvalue of a square matrix represents the linear transformation as a scaling of the associated eigenvector, and the action of certain matrices on general vectors can be represented as a scaling of the set of basis vectors used to represent the vector. If iterated some of these basis vectors will be amplified, whereas others will decay, which can charaterize the stability of iterative algorithms, or aphysicalsystemdescribedbythematrix,andalsoimpliesalgorithms for computational approximation of eigenvalues and eigenvectors based on power iteration. We also present the QR algorithm, based on constructing asimilaritytransformofthematrix.

6.1 Eigenvalues and eigenvectors

Complex vector spaces

In this section we change the focus from real vector spaces to complex vector spaces. We let z C denote a complex scalar, of the form z = a + ib,with 2 i2 = 1, and a, b R the real and imaginary parts, for which we define the complex conjugate2 asz ¯ = a ib. The complex vector space Cn is defined by the basic operations of com- ponentwise addition and scalar multiplication of complex numbers, and with n the transpose of a vector x C replaced by the adjoint x⇤,corresponding to the transpose of the vector2 with the entries replaced by their complex conjugates. Similarly, the adjoint of a complex m n matrix A =(a )is ⇥ ij the n m matrix A⇤ =(¯a ). If A = A⇤ the matrix A is Hermitian,andif ⇥ ji AA⇤ = A⇤A it is normal.

53 54 CHAPTER 6. THE EIGENVALUE PROBLEM

The inner product of x, y Cn is defined by 2 n

(x, y)=x⇤y = x¯iyi, (6.1) i=i X with the associated norm for x Cn, 2 x =(x, x)1/2. (6.2) k k Matrix spectrum and eigenspaces

n n We consider a square matrix A C ⇥ acting on a complex vector space 2 Cn. An eigenvector of A is a nonzero vector x Cn,suchthat 2 Ax = x, (6.3) with C the corresponding eigenvalue.IfA is a nonsingular matrix 21 1 then is an eigenvalue of the inverse matrix A ,whichisclearfrom 1 multiplication of (6.3) by A from the left,

1 1 1 1 1 A Ax = A x x = A x x = A x. (6.4) , , The set of all eigenvalues is denoted the spectrum of A, ⇤(A)= n , (6.5) { j}j=1 and the spectral radius of A is defined as

⇢(A)= max j . (6.6) j ⇤(A) | | 2 The sum and the product of all eigenvalues are related to the trace and the determinant of A as n n

det(A)= j, tr(A)= j. (6.7) j=1 j=1 Y X The subspace of Cn spanned by the eigenvectors corresponding to , together with the zero vector, is the eigenspace E.TheeigenspaceE is an invariant subspace under A,sothatAE E,anddim(E)isthe number of linearly independent eigenvectors corresponding✓ to the eigenvalue ,knownasthegeometric multiplicity of . We have that the eigenspace E =null(I A), since (I A)x =0, and thus for a nonempty eigenspace E,thematrixI A is singular, so that det(I A)=0. (6.8) 6.1. EIGENVALUES AND EIGENVECTORS 55

Characteristic equation

n n The characteristic polynomial of the matrix A C ⇥ ,isthedegreen polynomial 2 p (z)=det(zI A), (6.9) A with z C.For an eigenvalue of A,wethushavethat 2 pA()=0, (6.10) which we refer to as the characteristic equation, and by the fundamental theorem of algebra we can express pA()as p ()=(z )(z ) (z ), (6.11) A 1 2 ··· n where each j is an eigenvalue of A, not necessary unique. The multiplicity of each eigenvalue as a root to the equation pA()=0isthealgebraic multiplicity of ,whereaneigenvalueissaidtobesimple if its algebraic multiplicity is 1. The algebraic multiplicity of an eigenvalue is at least as great as its geometric multiplicity. Theorem 5 (Caley-Hamilton theorem). Every nonsingular matrix satisfies its own characteristic equation, pA(A)=0, that is,

n n 1 1 A + cn 1A + ... + c1A + c0I =0, (6.12) for some constants ci, which gives that

1 1 n 1 n 2 A = (A + cn 1A + ... + c1I), (6.13) c0 1 thus that the inverse matrix A is equal to a sum of powers of A.

Eigenvalue decompositions A defective matrix is a matrix which has one or more defective eigenval- ues,whereadefectiveeigenvalueisaneigenvalueforwhichitsalgebraic multiplicity exceeds its geometric multiplicity. Theorem 6 (Eigenvalue decomposition). Each nondefective matrix A n n 2 C ⇥ has an eigenvalue decomposition 1 A = X⇤X , (6.14)

n n where X C ⇥ is a nonsingular matrix with the eigenvectors of A as 2 n n column vectors, and where ⇤ C ⇥ is diagonal matrix with the eigenvalues of A on the diagonal. 2 56 CHAPTER 6. THE EIGENVALUE PROBLEM

We also say that a nondefective matrix is diagonalizable.Giventhatthe factorization (6.14) exits, we have that

AX = X⇤, (6.15) which expresses (6.3) as

Axj = jxj, (6.16) with j the jth diagonal entry of ⇤, and xj the jth column of X. For some matrices eigenvectors can be chosen to be pairwise orthogonal, so that a matrix A is unitary diagonalizable,thatis

A = Q⇤Q⇤, (6.17)

n n with Q C ⇥ an orthogonal matrix with orthonormal eigenvectors of A 2 n n as column vectors, and ⇤ C ⇥ a diagonal matrix with the eigenvalues of A on the diagonal. 2

Theorem 7. A matrix is unitary diagonalizable if and only if it is normal.

Hermitian matrices have real eigenvalues, and thus in the particular case of a real symmetric matrix, no complex vector spaces are needed to characterize the matrix spectrum and eigenspaces.

Theorem 8. An Hermitian matrix is unitary diagonalizable with real eigen- values.

Irrespectively if the matrix is nondefective or Hermitian, any square matrix always has a Schur factorization, with the diagonal matrix replaced by an upper triangular matrix.

Theorem 9 (Schur factorization). For every square matrix A there exists a Schur factorization A = QT Q⇤, (6.18) where Q is an orthogonal matrix, and T is an upper triangular matrix with the eigenvalues of A on the diagonal.

n n 1 More generally, if X C ⇥ is nonsingular, the map A X AX is a similarity transformation2of A,andwesaythattwomatrices7! A and B are 1 similar if there exists a similarity transformation such that B = X AX.

Theorem 10. Two similar matrices have the same eigenvalues with the same algebraic and geometric multiplicity. 6.2. EIGENVALUE ALGORITHMS 57

6.2 Eigenvalue algorithms

QR algorithm To compute the eigenvalues of a matrix A,onemayseektherootsofthe characteristic polynomial. Although, for a large matrix polynomial root finding is expensive and unstable. Instead the most ecient algorithms are based on computing eigenvalues and eigenvectors by constructing one of the factorizations (6.14), (6.17) or (6.18). We now present the QR algorithm,inwhichaSchurfactorization(6.18) of a matrix A is constructed from successive QR factorizations.

Algorithm 4: QR algorithm A(0) = A for k =1, 2,... do (k) (k) (k 1) Q R = A Ak = R(k)Q(k) end

We note that for each iteration A(k) of the algorithm, we have that

(k) (k) (k) (k) 1 (k 1) (k) A = R Q =(Q ) A Q , (6.19)

(k) (k 1) so that A and A are similar, and thus have the same eigenvalues. Un- der suitable assumptions A(k) will converge to an upper triangular matrix, or in the case of a Hermitian matrix a diagonal matrix, with the eigenvalues on the diagonal. The basic QR algorithm can be accelerated: (i) by Householder reflectors to reduce the initial matrix A(0) to Hessenberg form,thatisamatrixwith zeros below the first subdiagonal (or in the case of an Hermitian matrix a tridiagonal form), (ii) by introducing a shift to instead of A(k) factorize the matrix A(k) µ(k)I,whichhasidenticaleigenvectors,andwhereµ(k) an eigenvalue estimate, and (iii) if any o↵-diagonal element is close to zero, all o↵-diagonal elements on the row are zeroed out to deflate the matrix A(k) into submatrices on which the QR algorithm is then applied.

Rayleigh quotient To simplify the presentation, in the rest of this section we restrict attention to matrices that are real and symmetric, for which all eigenvalues j are real and the corresponding eigenvectors qj are orthonormal. 58 CHAPTER 6. THE EIGENVALUE PROBLEM

We now consider the question: given a vector x Rn,whatisthereal 2 number ↵ R that best approximate an eigenvalue of A in the sense that Ax ↵x 2is minimized? k k If x = qj is an eigenvector of A,then↵ = j is the corresponding eigenvalue. If not, ↵ is the solution to the n 1leastsquaresproblem ⇥ min Ax ↵x , (6.20) ↵ R 2 k k for which the normal equations are given as

xT Ax = xT ↵x. (6.21)

With ↵ = r(x), we define the Rayleigh quotient as

xT Ax r(x)= , (6.22) xT x where r(x)isanapproximationofaneigenvaluej,ifx is close to the eigenvector qj.Infact,r(x)convergesquadraticallytor(qj)=j,thatis

r(x) r(q )=O( x q 2), (6.23) j k jk as x q . ! j Power iteration For a real symmetric n n matrix A,theeigenvectors q n form an ⇥ { j}j=1 orthonormal basis for Rn so that we can express any vector v Rn in terms of the eigenvectors, 2 n

v = ↵jqj, (6.24) j=1 X with the coordinates ↵ =(v, q ). Further, we can express the map v Av j j 7! in terms of the corresponding eigenvalues j,as

n n

Av = ↵jAqj = ↵jjqj, (6.25) j=1 j=1 X X and thus the map amounts to a scaling of each eigenvector qj by j.If iterated, this map gives

n n k k k A v = ↵jA qj = ↵jj qj, (6.26) j=1 j=1 X X 6.2. EIGENVALUE ALGORITHMS 59

k k so that each eigenvector j qj of A converges to zero if j < 1, or infinity if > 1. | | | j| Now assume that ↵1 =(v, q1) =0,andthattheeigenvaluesofA are ordered such that 6 > > > , (6.27) | 1| | 2| ··· | n| where we say that 1 is the dominant eigenvalue,andq1 the dominant eigenvector.Thus / < 1forallj,whichimpliesthat | j 1| ( / )k 0, (6.28) j 1 ! as k .Wecanwrite !1 n k k k A v = 1(↵1q1 + ↵j(j/1) qj), (6.29) j=2 X and thus the approximation

Akv k↵ q , (6.30) ⇡ 1 1 1 improves as k increases. That is, Akv approaches a multiple of the eigen- vector qj,whichcanbeobtainedbynormalizating,sothat

v(k) Akv/ Akv q , (6.31) ⌘ k k⇡ j from which an approximation of the corresponding eigenvalue can be ob- tained by the Rayleigh quotient, which leads us to power iteration.

Algorithm 5: Power iteration v(0) such that v(0) =1 for k =1, 2,...kdo k (k 1) w = Av . apply A v(k) = w/ w . normalize k k (k) =(v(k))T Av(k) . Raylegh quotient end

Inverse iteration The convergence of the Power iteration to the dominant eigenvector is linear by a constant factor / ,whereastheconvergencetothedominant | 2 1| 60 CHAPTER 6. THE EIGENVALUE PROBLEM eigenvalue is quadratic in the same factor, due to the convergence of the Rayleigh quotient. Eciency of the algorithm thus depends on the size of the factor 2/1 . The idea of inverse iteration, is to apply power iteration to the| matrix| 1 1 (A µI) ,witheigenvalues (j µ) and with the same eigenvectors as A,since { }

1 1 Av = v (A µI)v =( µ)v ( µ) v =(A µI) v. (6.32) , , 1 With µ an approximation of j,theeigenvalue(j µ) can be expected to be dominant and much larger than the other eigenvalues, which results in an accelerated convergence of power iteration. Rayleigh quotient iteration is inverse iteration where µ is updated to the eigenvalue approximation of the previous step of the iteration, and the convergence to an eigenvalue/eigenvector pair is cubic.

Algorithm 6: Rayleigh quotient iteration v(0) such that v(0) =1 (0) =(v(0))T Avk (0) k for k =1, 2,... do (k 1) (k 1) (k 1) 1 Solve (A )w = v for w.apply (A I) v(k) = w/ w . normalize k k (k) =(v(k))T Av(k) . Raylegh quotient end

The QR algorithm as a power iteration We now revisit the QR algorithm. Let Q(k) and R(k) be the matrices gen- erated from the (unshifted) QR algorithm, then the matrix products

Q(k) = Q(1)Q(2) Q(k), (6.33) ··· and (k) (k) (k 1) (1) R = R R R , (6.34) ··· correspond to a QR factorization of the kth power of A,

Ak = Q(k)R(k), (6.35) which can be proven by induction. 6.3. EXERCISES 61

That is, the QR algorithm constructs successive orthonormal bases for the powers Ak, thus functioning like a power iteration that simultaneously iterates on the whole set of approximate eigenvectors. Further, the diagonal elements of the kth iterate A(k) are the Rayleigh quotients of A corresponding to the column vectors of Q(k),

A(k) =(Q(k))T AQ(k), (6.36) and thus the diagonal elements of A(k) converges (quadratically) to the eigenvalues of A. With the accelerations (i)-(iii) the QR algorithm exhibit cubic converge rate in both eigenvalues and eigenvectors.

6.3 Exercises

Problem 24. Prove that the eigenspace E, with an eigenvalue of the matrix A, is invariant under A.

Chapter 7

Iterative methods for large sparse systems

In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay for the direct methods based on matrix factorization is that the factors of a sparse matrix may not be sparse, so that for large sparse systems the memory cost make direct methods too expensive, in memory and in execution time. Instead we introduce iterative methods, for which matrix sparsity is exploited to develop fast algorithms with a low memory footprint.

7.1 Sparse matrix algebra

Large sparse matrices

We say that the matrix A Rn is large if n is large, and that A is sparse if most of the elements are2 zero. If a matrix is not sparse, we say that the matrix is dense. Whereas for a dense matrix the number of nonzero elements is (n2), for a sparse matrix it is only (n), which has obvious implicationsO for the memory footprint and eciencyO for algorithms that exploit the sparsity of a matrix.

AdiagonalmatrixisasparsematrixA =(aij), for which aij =0for all i = j,andadiagonalmatrixcanbegeneralizedtoabanded matrix, 6 for which there exists a number p,thebandwidth,suchthataij =0forall ij+ p.Forexample,atridiagonal matrix A is a banded 63 CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 64 SYSTEMS matrix with p =1, xx0000 xxx000 20 xxx003 A = , (7.1) 600xxx07 6 7 6000xxx7 6 7 60000xx7 6 7 where x represents a nonzero4 element. 5

Compressed row storage The compressed row storage (CRS) format is a data structure for ecient represention of a sparse matrix by three arrays, containing the nonzero values, the respective column indices, and the extents of the rows. For example, the following sparse matrix 320000 021002 20210003 A = , (7.2) 60032407 6 7 60400107 6 7 60000237 6 7 4 5 is represented as

val =[32 2 1 2 2 1 3 2 4 4 1 2 3] col idx =[12 2 3 6 2 3 3 4 5 2 5 5 6] row ptr =[13 6 8 1113] where val contains the nonzero matrix elements, col idx their column in- dices, and row ptr the indices in the other two arrays corresponding to the start of each row.

Sparse matrix-vector product For a sparse matrix A, algorithms can be constructed for ecient matrix- vector multiplication b = Ax,exploitingthesparsityofA by avoiding mul- tiplications by the zero elements of A. For example, the CRS data structure implies an ecient algorithm for sparse matrix-vector multiplication, for which both the memory footprint and the number of floating point operations are of the order (n), rather than (n2)asinthecaseofadensematrix. O O 7.2. ITERATIVE METHODS 65

Algorithm 7: Sparse matrix-vector multiplication for i =1:n do bi =0 for j = row ptr(i):row ptr(i +1) 1 do bi = bi + val(j)x(col idx(j)) end end

7.2 Iterative methods

Iterative methods for large sparse linear systems

n n n For a given nonsingular matrix A R ⇥ and vector b R ,weconsider 2 2 the problem of finding a vector x Rn, such that 2 Ax = b, (7.3) where n is large, and the matrix A is sparse. Now, in contrast to direct methods, we do not seek to construct the 1 exact solution x = A b by matrix factorization, which is too expensive. Instead we develop iterative methods based on multiplication by a (sparse) (k) iteration matrix, which generates a sequence of approximations x k 0 that converges towards x,withtheerror at iteration k given as { }

e(k) = x x(k). (7.4) Error estimation and stopping criterion Since the exact solution is unknown, the error is not directly computable, but can be expressed in terms of a computable residual,

r(k) = b Ax(k) = Ax Ax(k) = Ae(k). (7.5) The relative error can be estimated in terms of the relative residual and the condition number of A with respect to the norm ,definedas k·k 1 (A)= A A . (7.6) k kk k (k) Theorem 11 (Error estimate). For x k 0 a sequence of approximate solutions to the linear system of equations{ Ax} = b, the relative error can be estimated as e(k) r(k) k k (A)k k. (7.7) e(0)  r(0) k k k k CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 66 SYSTEMS Proof. By (7.5), we have that

(k) 1 (k) 1 (k) e = A r A r , (7.8) k k k kk kk k and similarly r(0) = Ae(0) A e(0) , (7.9) k k k kk kk k from which the result follows by the definition of the condition number.

The error estimate (7.7) may be used as a stopping criterion for the iterative algorithm, since we know that the relative error is bounded by the computable residual. That is, we terminate the iterative algorithm if the following condition is satisfied,

r(k) k k 0thechosentolerance. Although, to use the relative error with respect to the initial approx- imation is problematic, since the choice of x(0) may be arbitrary, without significance for the problem at hand. It is often more suitable to formulate astoppingcriterionbasedonthefollowingcondition,

r(k) k k

Convergence of iterative methods The iterative methods that we will develop are all based on the idea of fixed point iteration, x(k+1) = g(x(k)), (7.12) where the map x g(x)maybealineartransformationintheformofa matrix, or a general7! nonlinear function. By Banach fixed point theorem,if the map satisfies certain stability conditions, the fixed point iteration (7.12) generates a Cauchy sequence,thatis,asequenceforwhichtheapproxima- tions x(k) become closer and closer as k increases. ACauchysequenceinanormedvectorspaceX is defined as a sequence (k) x 1 ,forwhich { }k=1 lim x(m) x(n) =0, (7.13) n !1 k k 7.3. STATIONARY ITERATIVE METHODS 67 with m>n.IfallCauchysequencesinX converges to an element x X, that is, 2 lim x x(n) =0,xX, (7.14) n !1 k k 2 we refer to X as a . The vector spaces Rn and Cn are both Banach spaces, whereas, for example, the vector space of rational numbers Q is not. To see this, recall that p2isarealnumberthatisthelimitofaCauchysequenceofrational numbers constructed by iterated bisection of the interval [1, 2]. Further, a Banach space that is also an inner product space is referred to as a , which is central for the theory of di↵erential equations.

Rate of convergence We are not only interested in if an iterative method converges, but also how fast,thatistherate of convergence.Wesaythatasequenceofapproximate (k) solutions x 1 converges with order p to the exact solution x,if { }k=1 x x(k+1) lim | | = C, C > 0, (7.15) k x x(k) p !1 | | where p =1correspondstoalinear order of convergence,andp =2a quadratic order of convergence. We can approximate the rate of convergence by extrapolation,

x(k+1) x(k) log | (k) (k 1)| x x p | (k) (k 1)| , (7.16) ⇡ x x log | | x(k 1) x(k 2) | | which is useful in practice when the exact solution is not available.

7.3 Stationary iterative methods

Stationary iterative methods Stationary iterative methods are formulated as a linear fixed point iteration of the form x(k+1) = Mx(k) + c, (7.17) n n (k) n with M R ⇥ the iteration matrix, x k 0 R asequenceofapprox- 2 { } ⇢ imations, and c Rn avector. 2 CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 68 SYSTEMS Theorem 12 (Banach fixed point theorem for matrices). If M < 1, the fixed point iteration (7.17) converges to the solution of thek equationk x = Mx+ c. Proof. For any k>1, we have that

(k+1) (k) (k) (k 1) (k) (k 1) x x = Mx Mx = M(x x ) k k k (k) (k 1) k k k (1) (0) k M x x M x x . k kk kk k k k Further, for m>n,

(m) (n) (m) (m 1) (n+1) (n) x x = x x + ... + x x k k k m 1 k n k (1) (0) k ( M + ... + M ) x x ,  k k k k k k so that with M < 1. We thus have that k k lim x(m) x(n) =0, (7.18) n !1 k k (n) n that is x n1=1 is a Cauchy sequence, and since the vector space R is { } complete, all Cauchy sequences converge, so there exists an x Rn such that 2 x =limx(n). (7.19) n !1 By taking the limit of both sides of (7.17) we find that x satisfies the equation x = Mx+ c. An equivalent condition for convergence of (7.17) is that the spectral radius ⇢(M) < 1. In particular, for a real symmetric matrix A,thespectral radius is identical to the induced 2-norm, that is ⇢(A)= A . k k Richardson iteration The linear system Ax = b can be formulated as a fixed point iteration through the Richardson iteration,withaniterationmatrixM = I A, x(k+1) =(I A)x(k) + b, (7.20) which will converge if I A < 1, or ⇢(A) < 1. We note that for an initial approximation x(0) =0,weobtainfork k k =0,

x(1) =(I A)x(0) + b = b for k =1,

x(2) =(I A)x(1) + b =(I A)b + b =2b Ab, 7.3. STATIONARY ITERATIVE METHODS 69 for k =2,

x(3) =(I A)x(2) + b =(I A)(2b Ab)+b =3b 3Ab + A2b, and more generally, that the iterate x(k) is a linear combination of powers of the matrix A acting on b,thatis

k 1 (k) i x = ↵iA b, (7.21) i=0 X with ↵i R. 2 Preconditioned Richardson iteration To improve convergence of Richardson iteration we can precondition the system Ax = b by multiplication of both sides of the equation by a matrix B,sothatwegetthenewsystem

BAx = Bb, (7.22) for which Richardson iteration will converge if I BA < 1, or equivalently ⇢(BA) < 1, and we then refer to B as an approximatek k inverse of A.The preconditioned Richardson iteration takes the form

x(k+1) =(I BA)x(k) + Bb, (7.23) and the preconditioned residual Bb BAx(k) is used as basis for a stopping criterion.

Iterative methods based on matrix splitting An alternative to Richardson iteration is matrix splitting,wherestationary iterative methods are formulated based on splitting the matrix into a sum

A = A1 + A2, (7.24) where A1 is chosen as a nonsingular matrix easy to invert, such as a diagonal matrix D, a lower triangular matrix L or upper triangular matrix U.

Jacobi iteration Jacobi iteration is based on the splitting

A = D, A = R = A D, (7.25) 1 2 CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 70 SYSTEMS

1 1 which gives the iteration matrix M = D R and c = D b,orinterms J of the elements of A =(aij),

(k+1) 1 (k) x = a b a x , (7.26) i ii ij j j=i ! X6 where the diagonal matrix D is trivial to invert. To use Jacobi iteration as 1 apreconditioner,wechooseB = D .

Gauss-Seidel iteration Gauss-Seidel iteration is based on the splitting

A = L, A = R = A L, (7.27) 1 2

1 1 which gives the iteration matrix M = L R and c = L b,or GS

(k+1) 1 (k+1) (k) x = a b a x a x , (7.28) i ii ij j ij j ji ! X X where the lower triangular matrix L is inverted by forward substitution. 1 Gauss-Seidel iteration as a preconditioner leads to the choice of B = L , where the inversion corresponds to a forward substitution.

7.4 Krylov methods

Krylov subspace A Krylov method is an iterative method for the solution of the system Ax = (k) 1 b based on, for each iteration, finding an approximation x x = A b in k 1 ⇡ a Krylov subspace ,spannedbythevectorsb, Ab, ..., A b,thatis Kk

k 1 = b, Ab, ..., A b . (7.29) Kk h i The basis for Krylov methods is that, by the Cayley-Hamilton theorem, 1 k the inverse of a matrix A is a linear combination of its powers A ,which is also expressed in (7.21). 7.4. KRYLOV METHODS 71

GMRES The idea of GMRES (generalized minimal residuals) is that, at each step k (k) of the iteration, find the vector x k that minimizes the norm of the residual r(k) = b Ax(k), which corresponds2K to the least squares problem min b Ax(k) . (7.30) x(k) k k 2Kk But instead of expressing the approximation x(k) as a linear combination k 1 of the Krylov vectors b, Ab, ..., A b, which leads to an unstable algorithm, we construct an orthonormal basis q k for , such that { j}j=1 Kk = q ,q ,...,q , (7.31) Kk h 1 2 ki with Qk the n k matrix with the basis vectors qj as columns. ⇥ (k) k Thus we can express the approximation as x = Qky,withy R a vector with the coordinates of x(k),sothattheleastsquaresproblemtake2 the form min b AQky . (7.32) y Rk k k 2

Algorithm 8: Arnoldi iteration

q1 = b/ b for k =1k k, 2, 3,... do v = Aqk for j =1:k do T hjk = qj v v = v hjkqj end h = v n+1n k k qn+1 = v/hn+1n end

The Arnoldi iteration is just the modified Gram-Schmidt iteration (Al- gorithm 1) that constructs a partial similarity transformation of A into an ˜ k+1 k Hessenberg matrix Hk R ⇥ , 2 AQk = Qk+1H˜k, (7.33) that is

a11 a1n h11 h1k . ···. . ··· . .. . q q q q h21 2 . . 3 1 k 1 k+1 2 ··· 3 . . . 2 ··· 3 = 2 ··· 3 ...... 6 7 6 7 6a a 7 6 7 6 7 6 hk+1k7 6 n1 nn7 6 7 6 7 6 7 4 ··· 5 4 5 4 5 4 5 CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 72 SYSTEMS

T Multiplication of (7.32) by Qk+1 does not change the norm, so that the least squares problem takes the form,

T ˜ min Qk+1b Hky , (7.34) y Rk k k 2 T where we note that since q1 = b/ b ,wehavethatQk+1b = b e1 with T k k k+1 k k e1 =(1, 0,...,0) the first vector in the standard basis in R ,sothatwe can write (7.34) as min b e1 H˜ky , (7.35) y Rk kk k k 2 which is a (k +1) k least squares problem that we solve for y Rk at ⇥ (k) 2 each iteration k,togetx = Qky.

Algorithm 9: GMRES

q1 = b/ b while kr(kk) / r(0) TOL do k k k k Arnoldi iteration step k Q , H˜ . orthogonalize ! k k min b e1 H˜ky . least squares problem y Rkkk k k 2 (k) x = Qky.construct solution end

Conjugate method For a symmetric positive definite matrix A,wecandefinetheA-norm of a vector x Rn,as 2 x =(x, Ax)1/2, (7.36) k kA with ( , )thel2-norm. The Conjugate Gradient method (CG) is based on minimization· · of the error e(k) = x x(k) in the A-norm, or equivalently, by (k) (k) 1 (7.5), minimization of the residual r = b Ax in the A -norm, (k) (k) (k) 1/2 (k) (k) 1/2 1 (k) (k) 1/2 (k) e =(e , Ae ) =(e ,r ) =(A r ,r ) = r 1 , k kA k kA to compare to GMRES where the residual is minimized in the l2-norm. Further, to solve the minimization problem in CG we do not solve a least squares problem over the Krylov subspace k,butinsteadweiteratively construct a search direction p(k) and a stepK length ↵(k) to find the new (k) (k 1) approximate solution x from the previous iterate x .Inparticular, this means that we do not have to store the full Krylov basis. 7.4. KRYLOV METHODS 73

Algorithm 10: Conjugate Gradient method x(0) =0,r(0) = b, p(k) = r(0) while r(k) / r(0) TOL do (k)k k(kk 1) k(k 1) ↵ = r / p A . step length (k) k(k 1) k k(k) (k k1) x = x + ↵ p . approximate solution (k) (k 1) (k) (k 1) r = r ↵ Ap . residual (k) (k) (k 1) = r / r . improvement (k) k(k) k k(k) (k k1) p = r + p . search direction end

The key to the success of the CG method is that the residuals are mu- tually orthogonal, (r(k),r(j))=0, j

(p(k),p(j)) =0, j

T T T T T (x, y)A = x Ay =(Ay) x = y A x = y Ax =(y, x)A, (7.39) where we note that ( , ) induces the A-norm, · · A x =(x, x)1/2, (7.40) k kA A which is also referred to as the energy norm for the equation Ax = b. Theorem 13 (CG characteristics). For the CG method applied to the equa- tion Ax = b, with A an n n symmetric positive definite matrix, the or- thogonality relations (7.37)⇥ and (7.38) are true, and

k 1 (1) (2) (k) k = b, Ab, ..., A b = x ,x ,...,x K h (0) (1) (ki 1) h (0) (1) (ik 1) = p ,p ,...,p = r ,r ,...,r , h i h i with the approximate solutions x(k), search directions p(k) and residuals r(k) (k) constructed from Algorithm 10. Further, x is the unique point in k that minimizes e(k) , and the convergence is monotonic, that is K k kA (k) (k 1) e e , (7.41) k kA k kA with e(k) =0for some k n.  CHAPTER 7. ITERATIVE METHODS FOR LARGE SPARSE 74 SYSTEMS 7.5 Exercises

Problem 25. What is the representation of the following sparse matrix in the CRS format? 2 10 0 0 12 10 0 2 3 0 12 10 (7.42) 6 00 12 17 6 7 6 000 127 6 7 4 5 Problem 26. The Richardson iterative method takes the form

x(k+1) =(I A)x(k) + b. (7.43) (a) Under what condition does the iteration converge to a solution to the equation Ax = b?

(b) Prove that with x(0) =0, the Richardson iterate x(4) is a linear combi- nation of the vectors b, Ab, A2b, A3b . { } (c) Prove that the residual r(k) = b Ax(k) is equal to Ae(k), with the error e(k) = x x(k). (d) State the Richardson iteration of the preconditioned system BAx = Bb? Chapter 8

Nonlinear algebraic equations

We now turn to systems of nonlinear algebraic equations that cannot be ex- pressed as matrix equations. The fundamental idea to solve such equations is fixed point iteration, which we have met previously for linear systems of equations, in the context of stationary iterative methods. Convergence of fixed point iteration depends linearly on the degree to which the iteration function is continuous. In case we have access to a suciently good initial guess, we can formu- late Newton’s method which exhibits quadratic rate of convergence.

8.1 Continuous functions

A can be roughly characterized as a function for which small changes in input results in small changes in output. More formally, afunctionf : I R,issaidtobe(uniformly)continuous on the interval I =[a, b], if for each! ✏>0wecanfinda>0, such that

x y < f(x) f(y) <✏, x, y I, (8.1) | | )| | 8 2 and Lipschitz continuous on the interval I,ifthereexistsarealnumber Lf > 0, the Lipschitz constant of f,suchthat f(x) f(y) L x y , x, y I. (8.2) | | f | | 8 2 The vector space of real valued continuous functions on the interval I is denoted by 0(I), or (I), which is closed under the basic operations of pointwise additionC and scalarC multiplication, defined by,

(f + g)(x)=f(x)+g(x), x I, (8.3) 8 2 (↵f)(x)=↵f(x), x I, (8.4) 8 2 75 76 CHAPTER 8. NONLINEAR ALGEBRAIC EQUATIONS for f,g (I)and↵ R. Similarly, we let Lip(I)denotethevectorspace of Lipschitz2C continuous2 functions together with the same basic operations, and we note that any Lipschitz continuous function is also continuous, that is Lip(I) (I). Further,⇢C we denote the vector space of continuous functions with also k continuous derivatives up to the order k by (I), with 1(I)thevector space of continuous functions with continuousC derivatives ofC arbitrary order.

Local approximation of continuous functions By Taylor’s theorem we can construct a local approximation of a function f k(I)nearanypointy I,intermsofthefunctionanditsfirstk derivatives2C evaluated at the2 point y.Forexample,wecanapproximate f(x)byalinearfunction,

f(x) f(y)+f 0(y)(x y), (8.5) ⇡ corresponding to the tangent line of the function at x = y,withtheapprox- imation error reduced quadratically when decreasing the distance x y . | | Theorem 14 (Tayor’s theorem). For f 2(I), we have that 2C

1 2 f(x)=f(y)+f 0(y)(x y)+ f 00(⇠)(x y) , (8.6) 2 for y I and ⇠ [x, y]. 2 2

f(y)+f'(y)(x-y)

f(x)

a y b x

Figure 8.1: The tangent line f(y)+f 0(y)(x y)aty [a, b]. 2 8.2. NONLINEAR SCALAR EQUATIONS 77

8.2 Nonlinear scalar equations

Fixed point iteration

For a nonlinear function f : R R,weseekasolutionx to the equation ! f(x)=0, (8.7) for which we can formulate a fixed point iteration x(k+1) = g(x(k)), as

x(k+1) = g(x(k))=x(k) + ↵f(x(k)), (8.8) where ↵ is a parameter to be chosen. The fixed point iteration (8.8) con- verges to a unique fixed point x = g(x)thatsatisfiesequation(8.7),under the condition that the function g Lip(R)withLg < 1, which we can prove by similar arguments as in the case2 of a linear system of equations (7.17). For any k>1, we have that

(k+1) (k) (k) (k 1) (k) (k 1) k (1) (0) x x = g(x ) g(x ) L x x L x x , | | | | g| | g | | so that for m>n,

(m) (n) (m) (m 1) (n+1) (n) x x = x x + ... + x x | | | m 1 | n (1) | (0) | (L + ... + L ) x x ,  g g | |

(n) by the triangle inequality, and with Lg < 1wehavethat x n1=1 is a Cauchy sequence, { } lim x(m) x(n) =0, (8.9) n !1 | | which implies that there exists an x R,suchthat 2 lim x x(n) =0, (8.10) n !1 | | since R is a Banach space. Uniqueness of x follows from assuming that there exists another solution y R such that y = g(y), for which we have that 2 x y = g(x) g(y) L x y (1 L ) x y 0 x y =0, | | | | g| |) g | | )| | and thus x = y is the unique solution to the equation x = g(x). 78 CHAPTER 8. NONLINEAR ALGEBRAIC EQUATIONS

Newton’s method The analysis above suggests that the fixed point iteration (8.8) converges linearly, since

x x(k+1) = g(x) g(x(k)) L x x(k) , (8.11) | | | | g| | (k) (k) (k+1) (k) so that for the error e = x x we have that e Lg e . (k) 1 | | | | Although, for the choice ↵ = f 0(x ) ,whichwerefertoasNewton’s method, the fixed point iteration (8.8) exhibits quadratic convergence. The geometric interpretation of Newton’s method is that x(k+1) is determined from the tangent line of the function f(x)atx(k).

Algorithm 11: Newton’s method Given an initial approximation x(0) R and f : R R while f(x(k)) TOL do 2 ! | |(k) Compute f 0(x ) (k+1) (k) (k) 1 (k) x = x f 0(x ) f(x ) end

(k) (k) (k) f(x )+f'(x )(x-x )

f(x)

(k+1) (k) x x x

Figure 8.2: Geometric interpretation of Newtons method with the approx- imation x(k+1) obtained as the zero value of the tangent at x(k).

The quadratic convergence rate of Newton’s method follows from Tay- lor’s theorem,evaluatedatx(k),

(k) (k) (k) 1 (k) 2 0=f(x)=f(x )+f 0(x )(x x )+ f 00(⇠)(x x ) , (8.12) 2 8.3. SYSTEMS OF NONLINEAR EQUATIONS 79

(k) (k) with ⇠ [x, x ]. We divide by f 0(x )toget 2 (k) (k) 1 (k) 1 (k) 1 (k) 2 x (x f 0(x ) f(x )) = f 0(x ) f 00(⇠)(x x ) , (8.13) 2 (k+1) (k) (k) 1 (k) so that, with x = x f 0(x ) f(x ), we get (k+1) 1 (k) 1 (k) 2 e = f 0(x ) f 00(⇠) e , (8.14) | | 2| || | which displays the quadratic convergence of the sequence x(k) close to x.

8.3 Systems of nonlinear equations

Continuous functions and derivatives in Rn In the normed space Rn,afunctionf : Rn Rn is (uniformly) continuous, ! denoted f (Rn), if for each ✏>0wecanfinda>0, such that 2C x y < f(x) f(y) <✏, x, y Rn, (8.15) k k )k k 8 2 and Lipschitz continuous,denotedf Lip(Rn), if there exists a 2 Lf > 0, such that, n f(x) f(y) Lf x y , x, y R . (8.16) k k k k 8 2 We define the partial derivative as @f f (x ,...,x + h, ..., x ) f (x ,...,x ,...,x ) i =lim i 1 j n i 1 j n , (8.17) h 0 @xj ! h for i the index of the function component of f =(f1,...,fn), and j the n n index of the coordinate x =(x1,...,xn). The Jacobian matrix f 0 R ⇥ ,at 2 x Rn,isdefinedas 2 @f1 @f1 T ( f1) @x1 ··· @xn r . .. . . f 0 = 2 . . . 3 = 2 . 3 , (8.18) @fn @fn T ( fn) 6 @x1 ··· @xn 7 6 r 7 4 n 5 4 5 with the gradient fi(x) R ,definedby r 2 @f @f T f = i ,..., i , (8.19) r i @x @x ✓ 1 n ◆ for i =1,...,n. The vector space of continuous functions f : Rn Rn with also contin- ! k uous partial derivatives up to the order k is denoted by (I), with 1(I) the vector space of continuous functions with continuousC derivativesC of ar- bitrary order. 80 CHAPTER 8. NONLINEAR ALGEBRAIC EQUATIONS

Fixed point iteration for nonlinear systems Now consider a system of nonlinear equations: find x Rn,suchthat 2 f(x)=0, (8.20) with f : Rn Rn,forwhichwecanformulateafixedpointiteration ! x(k+1) = g(x(k)), with g : Rn Rn,justasinthecaseofthescalarproblem. ! Algorithm 12: Fixed point iteration for solving the system f(x)=0 Given initial approximation x(0) Rn and f : Rn Rn while f(x(k)) TOL do 2 ! k k x(k+1) = x(k) + ↵f(x(k)) end

Existence of a unique solution to the fixed point iteration follows by Banach fixed point theorem.

Theorem 15 (Banach fixed point theorem in Rn). The fixed point iteration of Algorithm 12 converges to a unique solution if Lg < 1, with Lg the Lipschitz constant of the function g(x)=x + ↵f(x). Proof. For k>1wehavethat

(k+1) (k) (k) (k 1) (k) (k 1) x x = x x + ↵(f(x ) f(x )) k k k (k) (k 1) k (1) (0) k = g(x ) g(x ) L x x , k k g k k and for m>n,

(m) (n) (m) (m 1) (n+1) (n) x x = x x + ... + x x k k k m 1 kn (1) k (0) k (L + ... + L ) x x .  g g k k (n) Since L < 1, x 1 is a Cauchy sequence, which implies that there g { }n=1 exists an x Rn such that 2 lim x x(n) =0, n !1 k k since Rn is a Banach space. Uniqueness follows from assuming that there exists another solution y Rn such that f(y)=0,sothat 2 x y = g(x) g(y) L x y (1 L ) x y 0, k k k k gk k) g k k and thus x = y is the unique solution to the equation f(x)=0. 8.4. EXERCISES 81

Newton’s method for nonlinear systems Newton’s method for a system in Rn is analogous to the method for scalar equations, but with the inverse of the derivative replaced by the inverse of (k) 1 the Jacobian matrix (f 0(x )) .Theinverseisnotconstructedexplicitly, (k) instead we solve a linear system of equations Ax = b,withA = f 0(x ), x =x(k+1) = x(k+1) x(k) the increment, and b = f(x(k))theresidual. Depending on how the Jacobian is computed, and the linear system solved, we get di↵erent versions of Newton’s method. If the system is large and sparse, we use iterative methods for solving the linear system Ax = b, and if the Jacobian f 0( ) is not directly available we use an approximation, obtained, for example,· by a di↵erence approximation based on f( ). · Algorithm 13: Newton’s method for systems of nonlinear equations Given initial approximation x(0) Rn and f : Rn Rn while f(x(k)) TOL do 2 ! k k(k) Compute f 0(x ) . compute Jacobian (k) (k+1) (k) (k+1) f 0(x )x = f(x ) . solve for x x(k+1) = x(k) +x(k+1) . update by x(k+1) end

Quadratic convergence rate of Newton’s method for systems follows from Taylor’s formula in Rn,whichstatesthat (k) (k) (k) (k) 2 f(x) (f(x )+f 0(x )(x x )) = ( x x ). O k k (k) For f(x) = 0, and assuming the Jacobian matrix f 0(x )tobenonsingular, we have that, (k) (k) 1 (k) (k) 2 x (x f 0(x ) f(x )) = ( x x ), O k k (k+1) (k) (k) 1 (k) and with x = x f 0(x ) f(x ), we get that e(k+1) k k = (1), (8.21) e(k) 2 O k k for the error e(k) = x x(k).Thequadraticconvergenceratethenfollows for x(k) close to x.

8.4 Exercises

Problem 27. State a fixpoint iteration to solve f(x)=sin(x) x =0. Problem 28. State Newton’s method to solve f(x)=sin(x) x =0.

Part IV

Di↵erential equations

83

Chapter 9

Initial value problems

Di↵erential equations are fundamental to model the laws of Nature, such as Newton’s laws of motion, Einstein’s , Schr¨odinger’s equa- tion of quantuum mechanics, and Maxwell’s equations of electromagnetics. The case of one single independent variable, we refer to as ordinary di↵erential equations, whereas partial di↵erential equations involve several independent variables. We first consider the initial value problem, an ordi- nary di↵erential equation where the independent variable naturally repre- sents time, for which we develop solution methods based on time-stepping algorithms.

9.1 The scalar initial value problem

We consider the following ordinary di↵erential equation (ODE) for a scalar function u : R+ R,withderivative˙u = du/dt, !

u˙(t)=f(u(t),t), 0

85 86 CHAPTER 9. INITIAL VALUE PROBLEMS

9.2 Time stepping methods

The variable t [0,T]isofteninterpretedastime,andnumericalmethods to solve (9.1) can2 be based on the idea of time-stepping,wheresuccessive approximations U(tn)arecomputedonapartition0=t0

In = (tn-1,tn)

kn = tn - tn-1

t

t0=0 t1 t2 t3 tn-1 tn tN-1 tN=T

Figure 9.1: Partition of the interval I =[0,T], 0 = t0 <...

Forward Euler method

For node tn, we may approximate the derivativeu ˙(tn)by

u(tn) u(tn 1) u˙(tn 1) , (9.2) ⇡ kn so that

u(tn) u(tn 1)+knu˙(tn 1)=u(tn 1)+knf(u(tn 1),tn 1), (9.3) ⇡ 9.2. TIME STEPPING METHODS 87 which motivates the forward Euler method for successive computational approximation of Un = U(tn).

Algorithm 14: Forward Euler method

U0 = u0 . initial approximation for n =1, 2,...,N do Un = Un 1 + knf(Un 1,tn 1) . explicit update end

We note that the forward Euler method is explicit,meaningthatUn+1 is directly computable from the previous solution Un in the time-stepping algorithm. The method is thus also referred to as the explicit Euler method.

Backward Euler method

Alternatively, we may approximate the derivativeu ˙(tn+1)by

u(tn) u(tn 1) u˙(tn) , (9.4) ⇡ kn so that

u(tn) u(tn 1)+knu˙(tn)=u(tn 1)+knf(u(tn),tn), (9.5) ⇡ which motivates the backward Euler method for successive computational approximation of Un = U(tn).

Algorithm 15: Backward Euler method

U0 = u0 . initial approximation for n =1, 2,...,N do Un = Un 1 + knf(Un,tn) . solve algebraic equation end

Contrary to the forward Euler method, the backward Euler method is implicit,thusalsoreferredtoastheimplicit Euler method,meaningthat Un+1 is not directly computable from Un,butisobtainedfromthesolution of an algebraic (possibly nonlinear) equation,

x = Un 1 + knf(x, tn), (9.6) for example, by the fixed point iteration,

(k+1) (k) x = Un 1 + knf(x ,tn). (9.7) 88 CHAPTER 9. INITIAL VALUE PROBLEMS

We note that the fixed point iteration (9.7) converges if knLf < 1, with Lf the Lipschitz constant of the function f( ,tn), thus if the time step kn is small enough. ·

Time stepping as quadrature There is a strong connection between time-stepping methods and numerical approximation of , referred to as quadrature.Forexample,assume that the initial condition is zero and that the function f in (9.1) does not depend on the solution u,butthetimet only, that is f = f(t). The solution u(t)isthentheprimitivefunctionoff(t)thatsatisfiesu(0) = 0, corresponding to the area under the graph of the function f(t)overthe interval [0,t]. We can approximate this primitive function by left and right rectangular rule quadrature, or Riemann sums, which we illustrate in Figure 9.2. The two approximations of the area under the graph then corresponds to the forward and backward Euler approximations to the initial value problem (9.1) with u0 =0andf = f(t).

f(t) f(t)

t0=0 t1 t2 t3 t4 tN=T t t0=0 t1 t2 t3 t4 tN=T t

Figure 9.2: Left (left) and right (right) rectangular rule quadrature, or Riemann sums, approximating the primitive function of f(t), corresponding to the area under the graph.

More generally, by the Fundamental Theorem of Calculus we have, for each subinterval In =(tn 1,tn), that tn u(tn)=u(tn 1)+ f(u(t),t) dt, (9.8) tn 1 Z from which we can derive suitable time stepping methods corresponding to di↵erent quadrature rules used to evaluate the integral in (9.8). 9.2. TIME STEPPING METHODS 89

Quadrature as interpolation Quadrature as an approximate integral of an exact function, can alterna- tively be expressed as exact integration of an approximate function. For example, the rectangular quadrature rules in Figure 9.2, corresponds to exact integration of a piecewise constant (over each subinterval In)approx- imation of the function f(t), with its value determined by the function value at the left or right endpoint of the interval, f(tn 1)orf(tn). From this perspective, one may ask if such a piecewise constant ap- proximation can be chosen in a more clever way to reduce the error in the approximation of the integral, which naturally leads to the midpoint rule where the piecewise constant function is chosen based on the function value at the midpoint of the subinterval, that is (f(tn 1)+fn)/2. Further, we may seek to approximate the function by a higher order polynomial. By linear interpolation over the partition k,corresponding approximation of the function by a continuous piecewiseT linear polynomial which is exact at each node tn,exactintegrationcorrespondstothetrape- zoidal rule.

f(t) f(t)

t0=0 t1 t2 t3 t4 tN=T t t0=0 t1 t2 t3 t4 tN=T t

Figure 9.3: Midpoint (left) and trapezoidal (right) quadrature rules, cor- responding to interpolation by a piecewise constant and piecewise respectively.

Interpolation as time stepping By (9.8), the midpoint and trapezoidal rules can also be formulated as time stepping methods. Although, in the case of a time stepping method, interpolation cannot be directly based on the function f(u(t),t)sinceu(t)is unknown. Instead we seek an approximate solution U(t)totheinitialvalue 90 CHAPTER 9. INITIAL VALUE PROBLEMS problem (9.1) as a piecewise polynomial of a certain order, determined by f(Un,tn)throughthesuccessiveapproximationsUn.

Algorithm 16: Trapezoidal time stepping method

U0 = u0 . initial approximation for n =1, 2,...,N do kn Un = Un 1 + (f(Un,tn)+f(Un 1,tn 1)) . solve equation 2 end

Both the midpoint method and the trapezoidal method are implicit, and thus require the solution of an algebraic equation, possibly nonlinear, at each time step. To seek approximate solutions to di↵erential equations as piecewise poly- nomials is a powerful idea that we will meet many times. Any piecewise polynomial function can be expressed as a linear combination of basis func- tions, for which the coordinates are to be determined, for example, based on minimization or orthogonality conditions on the residual of the di↵erential equation.

U4 U4 U3 U3

U1 U1 U2 U5 U0 U2 U5

t t t0=0 t1 t2 t3 t4 t5 t0=0 t1 t2 t3 t4 t5

Figure 9.4: Examples of a discontinuous piecewise constant polynomial determined by its value at the right endpoint of the subinterval (left), and acontinuouspiecewiselinearpolynomialdeterminedbyitsvalueinthe nodes of the partition (right).

The residual of (9.1) is given by R(U(t)) = f(U(t),t) U˙ (t), (9.9) and in the case of a continuous piecewise linear approximation U(t), an orthogonality condition enforces the integral of the residual to be zero over 9.2. TIME STEPPING METHODS 91

each time interval In,thatis

tn R(U(t)) dt =0, (9.10) tn 1 Z or equivalently, tn Un = Un 1 + f(U(t),t) dt, tn 1 Z which corresponds to (9.8). If f( , )isalinearfunction,itfollowsthat · ·

kn Un = Un 1 + (f(Un,tn)+f(Un 1,tn 1)), (9.11) 2 else we have to choose a quadrature rule to approximate the integral, with (9.11) corresponding to a trapezoidal rule for a nonlinear function f( , ). · ·

The ✓-method We can formulate the forward and backward Euler methods, and the trape- zoidal method, as one single method with a parameter ✓,the✓-method. For ✓ =1,wegettheexplicitEulermethod,for✓ =0theimplicitEuler method, and ✓ =0.5correspondstothetrapezoidalrule.

Algorithm 17: The ✓-method for initial value problems

U0 = u0 . initial approximation for n =1, 2,...,N 1 do Un = Un 1 + kn((1 ✓)f(Un,tn)+✓f(Un 1,tn 1)) . update end

Theorem 16 (Local error estimate for the ✓-method). For the ✓-method over one subinterval In =(tn 1,tn) of length kn = tn tn 1, with Un 1 = u(tn 1), we have the following local error estimate, u(t ) U = (k3), (9.12) | n n| O n for ✓ =1/2, and if ✓ =1/2, 6 u(t ) U = (k2). (9.13) | n n| O n 92 CHAPTER 9. INITIAL VALUE PROBLEMS

Proof. With the notation fn = f(Un,tn), so that fn 1 =˙u(tn 1)andfn = u˙(tn), and observing that Un 1 = u(tn 1), we have by Taylor’s formula that

u(tn) Un = u(tn) (Un 1 + kn((1 ✓)fn + ✓fn 1)) | | | | 1 2 3 = u(tn 1)+knu˙(tn 1)+ knu¨(tn 1)+ (kn) | 2 O (u(tn 1)+kn((1 ✓)˙u(tn)+✓u˙(tn 1)) | 1 2 3 = knu˙(tn 1)+ knu¨(tn 1)+ (kn) | 2 O 2 (kn((1 ✓)(u ˙(tn 1)+knu¨(tn 1)+ (kn)) + ✓u˙(tn 1)) O | 1 2 3 = ✓ u¨(tn 1) kn + (kn). | 2|| | O

9.3 System of initial value problems

We now consider systems of initial value problems for a vector valued func- tion u : R+ Rn defined on the interval I =[0,T], with derivativeu ˙ = ! T n + n du/dt =(du1/dt, ..., dun/dt) defined by the function f : R R R , such that ⇥ !

u˙(t)=f(u(t),t), 0

Time stepping methods for (9.14) are analogous to the scalar case (9.1), including the ✓-method of Algorithm 17, with the di↵erence that for implicit methods a system of (possibly nonlinear) equations needs to be solved.

Newton’s laws of motion Newton’s laws of motion for a particle can be formulated as an initial value problem (9.14), with Newton’s 2nd law expressing that force equals mass times acceleration, mx¨(t)=F (t), (9.15) given by v F/m u = ,f= , (9.16) x v   for x = x(t)theparticleposition,v = v(t)=x ˙(t)thevelocity,m the mass of the particle, and F = F (t)theforceapplied. 9.3. SYSTEM OF INITIAL VALUE PROBLEMS 93

For example, the force F = mg models gravitation, with g the gravi- tation constant, and the force F = kx models an elastic spring (Hooke’s law) with spring constant k. Newton’s first law that expresses that a parti- cle remains in its state in the absence of a force, follows from (9.15) in the case F =0.

The N-body problem

Newton’s third law states that if one particle pi exerts a force Fji on another particle pj,thenpj exerts a force Fij = Fji on pi, of the same magnitude but in the opposite direction. The N-body problem refers to the initial value problem (9.14) describing Newton’s laws of motion for a system of N particles p N ,withthepair- { i}i=1 wise force interactions Fij given by the system under study, for example gravitation in celestial mechanics, Coulomb interactions in electrostatics, Hookean springs in elasticity theory, or interatomic potentials in molecular dynamics simulations. The N-body problem takes the form (9.14) in R2N ,

v1 F1/m1 . . 2 . 3 2 . 3 vN FN /mN u = 6 7 ,f= 6 7 (9.17) 6 x1 7 6 v1 7 6 . 7 6 . 7 6 . 7 6 . 7 6 7 6 7 6x 7 6 v 7 6 N 7 6 N 7 4 5 4 5 with the resulting force on particle pi given by the sum of all pairwise interactions, N

Fi = Fij. (9.18) j=i X6 To solve (9.17) using a time-stepping is an (N 2) algorithm, which is very expensive for N large. Optimized algorithmsO of order (N)canbe developed based on the idea of clustering the force from multipleO particles at a distance.

Celestial mechanics Newton’s gravitational law models pairwise gravitational force interactions, which can be used to model the solar system for example. Every particle 94 CHAPTER 9. INITIAL VALUE PROBLEMS

pi is e↵ected by the sum of all other particles, and the force Fij acting on particle pi by pj,isgivenby, m m F = G i j , (9.19) ij x x 2 k i jk n where mi R and xi R denotes the mass and position of particle pi. 2 2 Mass-spring model The forces in a mass-spring model represent pairwise force interactions be- tween adjacent particles in a lattice,connectedviasprings,suchthatthe force Fij acting on particle pi by pj is given by,

F = k (x x ), (9.20) ij ij i j with kij = kji the relevant spring constant.

9.4 Exercises

Problem 29. State the ✓-method to solve the equation

u˙(t)=u(t),u(0) = 1. (9.21)

Problem 30. State the trapezoidal method to solve the equation

u˙ 1(t)=u2(t), (9.22) u˙ (t)= u (t), (9.23) 2 1 with u1(0) = 0 and u2(0) = 1. Chapter 10

Function approximation

We have studied methods for computing solutions to algebraic equations in the form of real numbers or finite dimensional vectors of real numbers. In contrast, solutions to di↵erential equations are scalar or vector valued functions, which only in simple special cases are analytical functions that can be expressed by a closed mathematical formula. Instead we use the idea to approximate general functions by linear com- binations of a finite set of simple analytical functions, for example trigono- metric functions, splines or polynomials, for which attractive features are orthogonality and locality. We focus in particular on piecewise polynomi- als defined by the finite set of nodes of a mesh, which exhibit both near orthogonality and local .

10.1 Function approximation

The Lebesgue space L2(I) Inner product spaces provide tools for approximation based on orthogonal projections on subspaces. We now introduce an inner product space for functions on the interval I =[a, b], the Lebesgue space L2(I), defined as the class of all square integrable functions f : I R, ! b L2(I)= f : f(x) 2 dx < . (10.1) { | | 1} Za The vector space L2(I)isclosedunderthebasicoperationsofpointwise addition and scalar multiplication, by the inequality, (a + b)2 2(a2 + b2), a, b 0, (10.2)  8 which follows from Young’s inequality.

95 96 CHAPTER 10. FUNCTION APPROXIMATION

Theorem 17 (Young’s inequality). For a, b 0 and ✏>0, 1 ✏ ab a2 + b2 (10.3)  2✏ 2 Proof. 0 (a ✏b)2 = a2 + ✏2b2 2ab✏.  The L2-inner product is defined by b (f,g)=(f,g)L2(I) = f(x)g(x) dx, (10.4) Za with the associated L2 norm,

b 1/2 1/2 2 f = f 2 =(f,f) = f(x) dx , (10.5) k k k kL (I) | | ✓Za ◆ for which the Cauchy-Schwarz inequality is satisfied, (f,g) f g . (10.6) | |k kk k Approximation of functions in L2(I) We seek to approximate a function f in a vector space V ,byalinear combination of functions n V ,thatis { j}j=1 ⇢ n f(x) f (x)= ↵ (x), (10.7) ⇡ n j j j=1 X n with ↵j R.Iflinearlyindependent,theset j j=1 spans a finite dimen- sional subspace2 S V , { } ⇢ n S = fn V : fn = ↵jj(x),↵j R , (10.8) { 2 2 } j=1 X with the set n abasisforS.Forexample,inaFourier series the basis { j}j=1 functions j are trigonometric functions, in a power series monomials. The question is now how to determine the coordinates ↵j so that fn(x) is a good approximation of f(x). One approach to the problem is to use the techniques of orthogonal projections previously studied for vectors in n R , an alternative approach is interpolation, where ↵j are chosen such that fn(xi)=f(xi), in a set of nodes xi,fori =1,...,n.Ifwecannotevaluate the function f(x)inarbitrarypointsx,butonlyhaveaccesstoasetof sampled data points (x ,f) m ,withm n, we can formulate a least { i i }i=1 squares problem to determine the coordinates ↵j that minimize the error f(x ) f ,inasuitablenorm. i i 10.1. FUNCTION APPROXIMATION 97

L2 projection The L2 projection Pf,ontothesubspaceS V ,definedby(10.8),ofa function f V ,withV = L2(I), is the orthogonal⇢ projection of f on S, that is, 2 (f Pf,s)=0, s S, (10.9) 8 2 which corresponds to,

n ↵ ( , )=(f, ), i =1,...,n. (10.10) j i j i 8 j=1 X

By solving the matrix equation Ax = b,withaij =(i,j), xj = ↵j, and bi =(f,i), we obtain the L2 projection as

n

Pf(x)= ↵jj(x). (10.11) j=1 X

We note that if i(x)haslocal support,thatisi(x) =0onlyfora subinterval of I,thenthematrixA is sparse, and for n an6 orthonormal { i}i=1 basis, A is the identity matrix with ↵j =(f,j).

Interpolation

The interpolant ⇡f S,isdeterminedbytheconditionthat⇡f(xi)=f(xi), for n nodes x n .Thatis,2 { i}i=1 n

f(xi)=⇡f(xi)= ↵jj(xi),i=1,...,n, (10.12) j=1 X which corresponds to the matrix equation Ax = b,withaij = j(xi), xj = ↵j,andbi = f(xi). The matrix A is an identity matrix under the condition that j(xi)=1, for i = j,andzeroelse.Wethenreferto n as a nodal basis,forwhich { i}i=1 ↵j = f(xj), and we can express the interpolant as

n n

⇡f(x)= ↵jj(x)= f(xj)j(x). (10.13) j=1 j=1 X X 98 CHAPTER 10. FUNCTION APPROXIMATION

Regression If we cannot evaluate the function f(x)inarbitrarypoints,butonlyhave m access to a set of data points (xi,fi) i=1,withm n,wecanformulate the least squares problem, { }

n min f f (x ) =min f ↵ (x ) ,i=1,...,m, (10.14) i n i n i j j i fn S k k ↵j k k 2 { }j=1 j=1 X which corresponds to minimization of the residual b Ax,witha = (x ), ij j i bi = fi,andxj = ↵j,whichwecansolve,forexample,byformingthenormal equations, AT Ax = AT b. (10.15)

10.2 Piecewise polynomial approximation

Polynomial spaces We introduce the vector space q(I), defined by the set of polynomials P q p(x)= c xi,xI, (10.16) i 2 i=0 X of at most order q on an interval I R,withthebasisfunctionsxi and 2 coordinates ci,andthebasicoperationsofpointwiseadditionandscalar multiplication,

(p + r)(x)=p(x)+r(x), (↵p)(x)=↵p(x), (10.17) for p, r q(I)and↵ R.Onebasisfor q(I)isthesetofmonomials xi q ,anotheris2P (x 2c)i q ,whichgivestheP power series, { }i=0 { }i=0 q p(x)= a (x c)i = a + a (x c)+... + a (x c)q, (10.18) i 0 1 q i=0 X for c I, with a Taylor series being an example of a power series, 2

1 2 f(x)=f(y)+f 0(y)(x y)+ f 00(y)(x y) + ... (10.19) 2 10.2. PIECEWISE POLYNOMIAL APPROXIMATION 99

Langrange polynomials For a set of nodes x q , we define the Lagrange polynomials q ,by { i}i=0 { }i=0 (x x0) (x xi 1)(x xi+1) (x xq) x xj i(x)= ··· ··· = , (xi x0) (xi xi 1)(xi xi+1) (xi xq) xi xj i=j ··· ··· Y6 that constitutes a basis for q(I), and we note that P

i(xj)=ij, (10.20) with the defined as

1,i= j ij = (10.21) 0,i= j ( 6 q so that i=0 is a nodal basis, which we refer to as the Lagrange basis.We can express{ } any p q(I)as 2P q

p(x)= p(xi)i(x), (10.22) i=1 X and by (10.13) we can define the polynomial interpolant ⇡ f q(I), q 2P q ⇡ f(x)= f(x ) (x),xI, (10.23) q i i 2 i=1 X for a continuous function f (I). 2C Piecewise polynomial spaces We now introduce piecewise polynomials defined over a partition of the interval I =[a, b],

a = x

W (q) = v : v q(I ),i=1,...,m+1 , (10.26) h { |Ii 2P i } 100 CHAPTER 10. FUNCTION APPROXIMATION and the continuous piecewise polynomials on I,definedby

V (q) = v W (q) : v (I) . (10.27) h { 2 h 2C } (q) The basis functions for Wh can be defined in terms of the Lagrange basis, for example,

xi x xi x i,0(x)= = (10.28) xi xi 1 hi x xi 1 x xi 1 i,1(x)= = (10.29) xi xi 1 hi (1) defining the basis functions for Wh ,by

0,x=[xi 1,xi], i,j(x)= 6 (10.30) (i,j,x [xi 1,xi], 2 (q) for i =1,...,m+1,andj =0, 1. For Vh we need to construct continuous basis functions, for example,

0,x=[xi 1,xi+1], 6 i(x)= i,1,x[xi 1,xi], (10.31) 8 2 <>i+1,0,x [xi,xi+1], 2 (1) :> for Vh , which we also refer to as hat functions.

φi,1(x) φi(x)

x x x0=a x1 x2 xi-1 xi xi+1 xm xm+1=b x0=a x1 x2 xi-1 xi xi+1 xm xm+1=b

Figure 10.1: Illustration of a mesh h = Ii ,withsubintervalsIj = T { } (1) (xi 1,xi)oflengthhi = xi xi 1, and i,1(x)abasisfunctionforWh (1) (left), and a basis function i(x)forVh (right). 10.2. PIECEWISE POLYNOMIAL APPROXIMATION 101

2 (1) L projection in Vh The L2 projection of a function f L2(I)ontothespaceofcontinuous (1) 2 piecewise linear polynomials Vh ,onasubdivisionoftheintervalI with n nodes, is given by n

Pf(x)= ↵jj(x), (10.32) j=1 X with the coordinates ↵j determined by from the matrix equation Mx = b, (10.33) with mij =(j,i), xj = ↵j,andbi =(f,i). The matrix M is sparse, since mij =0for i j > 1, and for large n we need to use an iterative method to solve (10.33).| | We compute the entries of the matrix M,referredtoas a mass matrix,fromthedefinitionofthebasisfunctions(10.31),starting with the diagonal entries,

1 xi xi+1 2 2 2 mii =(i,i)= i (x) dx = i,1(x) dx + i+1,0(x) dx 0 xi 1 xi Z Z Z xi 2 xi+1 2 (x xi 1) (xi+1 x) = 2 dx + 2 dx xi 1 hi xi hi+1 Z Z 3 xi 3 xi+1 1 (x xi 1) 1 (xi+1 x) hi hi+1 = + = + , h2 3 h2 3 3 3 i xi 1 i+1 xi   and similarly we compute the o↵-diagonal entries,

1 xi+1 mii+1 =(i,i+1)= i(x)i+1(x) dx = i+1,0(x)i+1,1(x) dx Z0 Zxi xi+1 (x x) (x x ) = i+1 i dx h h Zxi i+1 i+1 1 xi+1 = (x x x x x2 + xx ) dx h2 i+1 i+1 i i i+1 Zxi 1 x x2 x3 x2x xi+1 = i+1 x x x + i h2 2 i+1 i 3 2 i+1  xi 1 3 2 2 3 = 2 (xi+1 3xi+1xi +3xi+1xi xi ) 6hi+1

1 3 hi+1 = 2 (xi+1 xi) = , 6hi+1 6 and 1 hi mii 1 =(i,i 1)= i(x)i 1(x) dx = ... = . 6 Z0 102 CHAPTER 10. FUNCTION APPROXIMATION

10.3 Exercises

Problem 31. Prove that the sum of two functions f,g L2(I) is a function in L2(I). 2

Problem 32. Prove that the L2 projection Pf of a function f L2(I), onto the subspace S L2(I), is the best approximation in S, in the2 sense that ⇢ f Pf f s , s S. (10.34) k kk k 8 2 Chapter 11

The boundary value problem

The boundary value problem in one variable is an ordinary di↵erential equa- tion, for which an initial condition is not enough, instead we need to specify boundary conditions at each end of the interval. Contrary to the initial value problem, the dependent variable does not represent time, but should rather we thought of as a spatial coordinate.

11.1 The boundary value problem

The boundary value problem We consider the following boundary value problem,forwhichweseeka function u(x) 2(0, 1), such that 2C

u00(x)=f(x),x(0, 1), (11.1) 2 u(0) = u(1) = 0, (11.2) given a source term f(x), and boundary conditions at the endpoints of the interval I =[0, 1]. We want to find an approximate solution to the boundary value problem in the form of a continuous piecewise polynomial that satisfies the boundary conditions (11.2), that is we seek

U V = v V (q) : v(0) = v(1) = 0 , (11.3) 2 h { 2 h } such that the error e = u U is small in some suitable norm . k·k The residual of the equation is defined as

R(w)=w00 + f, (11.4)

103 104 CHAPTER 11. THE BOUNDARY VALUE PROBLEM with R(u)=0,foru = u(x)thesolutionoftheboundaryvalueproblem. 2 Our strategy is now to find an approximate solution U Vh (0, 1) such that R(U) 0. 2 ⇢C ⇡ We have two natural methods to find a solution U Vh with a minimal residual: (i) the least squares method,whereweseekthesolutionwiththe2 minimal residual measured in the L2-norm,

min R(U) , (11.5) U V 2 h k k and (ii) Galerkin’s method, where we seek the solution for which the residual is orthogonal the subspace Vh,

(R(U),v)=0, v V . (11.6) 8 2 h With an approximation space consisting of piecewise polynomials, we refer to the methods as a least squares finite element method,andaGalerkin finite element method. With a trigonometric approximation space we refer to Galerkin’s method as a spectral method.

Galerkin finite element method The finite element method (FEM) based on (11.6) takes the form: find U V , such that 2 h 1 1 U 00(x)v(x) dx = f(x)v(x) dx, (11.7) Z0 Z0 for all test functions v V .For(11.12)tobewelldefined,weneedtobe 2 h able to represent the second order derivative U 00,whichisnotobviousfor low order polynomials, such as linear polynomials, or piecewise constants. To reduce this constraint, we can use partial integration to move one derivative from the approximation U to the test function v,sothat

1 1 1 1 U 00(x)v(x) dx = U 0(x)v0(x) dx [U 0(x)v(x)] = U 0(x)v0(x) dx, 0 Z0 Z0 Z0 since v Vh, and thus satisfies the boundary conditions. The finite element method2 now reads: find U V , such that, 2 h 1 1 U 0(x)v0(x) dx = f(x)v(x) dx, (11.8) Z0 Z0 for all v V . 2 h 11.1. THE BOUNDARY VALUE PROBLEM 105

The discrete problem

We now let Vh be the space of continuous piecewise linear functions, that satisfies the boundary conditions (11.2), that is, U V = v V (1) : v(0) = v(1) = 0 , (11.9) 2 h { 2 h } so that we can write any function v V as 2 h n

v(x)= vii(x), (11.10) i=1 X n over a mesh h with n internal nodes xi,andvi = v(xi)since i i=1 is a nodal basis. T { } We thus search for an approximate solution n

U(x)= Ujj(x), (11.11) j=1 X with Uj = U(xj). If we insert (11.10) and (11.11) into (11.12), we get n 1 1

Uj j0 (x)i0 (x) dx = f(x)i(x) dx, i =1,...,n, (11.12) j=1 0 0 X Z Z which corresponds to the matrix equation Sx = b, (11.13) with sij =(j0 ,i0 ), xj = Uj,andbi =(f,i). The matrix S is sparse, since sij =0for i j > 1, and for large n we need to use an iterative method to solve (11.13).| | We compute the entries of the matrix S,referredtoasasti↵ness matrix, from the definition of the basis functions (10.31), starting with the diagonal entries,

1 xi xi+1 2 2 2 sii =(i0 ,i0 )= (i0 ) (x) dx = (i,0 1) (x) dx + (i0+1,0) (x) dx 0 xi 1 xi Z Z Z xi 1 2 xi+1 1 2 1 1 = dx + dx = + , xi 1 hi xi hi+1 hi hi+1 Z ✓ ◆ Z ✓ ◆ and similarly we compute the o↵-diagonal entries, 1 xi+1 1 1 1 s =(0 ,0 )= 0 (x)0 (x) dx = dx = , ii+1 i i+1 i i+1 h h h Z0 Zxi i+1 i+1 i+1 and 1 1 sii 1 =(i0 ,i0 1)= i0 (x)i0 1(x) dx = ... = . h Z0 i 106 CHAPTER 11. THE BOUNDARY VALUE PROBLEM

The variational problem Galerkin’s method is based on the variational formulation,orweak form,of the boundary value problem, where we search for solution in a vector space V ,forwhichthevariationalformiswelldefined:findu V ,suchthat 2 1 1 u0(x)v0(x) dx = f(x)v(x) dx, (11.14) Z0 Z0 for all v V . To construct2 an appropriate vector space V for (11.1) to be well defined, we need to extend L2 spaces to include also derivatives, which we refer to as Sobolev spaces. We introduce the vector space H1(0, 1), defined by,

1 2 2 H (0, 1) = v L (0, 1) : v0 L (0, 1) , (11.15) { 2 2 } and the vector space that also satisfies the boundary conditions (11.2),

H1(0, 1) = v H1(0, 1) : v(0) = v(1) = 0 . (11.16) 0 { 2 } 1 The variational form (11.1) is now well defined for V = H0 (0, 1), since

1 u0(x)v0(x) dx u0 v0 < (11.17) k kk k 1 Z0 by Cauchy-Schwarz inequality, and

1 f(x)v(x) dx f v < , (11.18) k kk k 1 Z0 for f L2(0, 1). 2 Optimality of Galerkin’s method Galerkin’s method (11.12) corresponds to searching for an approximate so- lution in a finite dimensional subspace Vh V ,forwhich(11.1)issatisfied for all test functions v V . ⇢ 2 h The Galerkin solution U is the best possible approximation in Vh,inthe sense that, u U u v , v V , (11.19) k kE k kE 8 2 h with the energy norm defined by

1 1/2 2 w = w0(x) dx . (11.20) k kE | | ✓Z0 ◆ 11.2. EXERCISES 107

Thus U Vh represents a projection of u V onto Vh,withrespectto the inner product2 defined on V , 2

1 (v, w)E = v0(x)w0(x) dx, (11.21) Z0 with w 2 =(w, w) .TheGalerkin orthogonality, k kE E (u U, v) =0, v V , (11.22) E 8 2 h expresses the optimality of the approximation U,as

u U u v , v V , (11.23) k kE k kE 8 2 h which follows by

u U 2 =(u U, u u ) =(u U, u v) +(u U, v u ) k kE h E E h E =(u U, u v) u U u v , E k kEk kE for any v V . 2 h 11.2 Exercises

Problem 33. Derive the variational formulation and the finite element method for the boundary value problem

(a(x)u0(x))0 + c(x)u(x)=f(x),x(0, 1), (11.24) 2 u(0) = u(1) = 0, (11.25) with a(x) > 0, and c(x) 0.

Chapter 12

Partial di↵erential equations

12.1 Di↵erential operators in Rn Di↵erential operators We recall the definition of the gradient of a scalar function as @f @f T f = ,..., , (12.1) r @x @x ✓ 1 n ◆ which we can interpret as the di↵erential operator @ @ T = ,..., , (12.2) r @x @x ✓ 1 n ◆ acting on the function f = f(x), with x ⌦ Rn. With this interpretation we express two second order di↵erential2 operators,⇢ the Laplacian f, 2 2 T @ f @ f f = f = 2 + ... + 2 , (12.3) r r @x1 @xn and the Hessian Hf,

@2f @2f @x1@x1 ··· @x1@xn T . .. . Hf = f = 2 . . . 3 . (12.4) rr @2f @2f 6 @xn@x1 ··· @xn@xn 7 4 5 For a vector valued function f : Rn Rm, we define the Jacobian matrix by ! @f1 @f1 T ( f1) @x1 ··· @xn r . .. . . f 0 = 2 . . . 3 = 2 . 3 , (12.5) @fm @fm T ( fm) 6 @x1 ··· @xn 7 6 r 7 4 5 4 5 109 110 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS and for m = n,wedefinethedivergence by @f @f f = 1 + ... + n . (12.6) r· @x1 @xn

Partial integration in Rn For the scalar function f : Rn R,andthevectorvaluedfunctiong : ! Rn Rn, we have the following generalization of partial integration over ! ⌦ Rn,referredtoasGreen’s theorem, ⇢ ( f,g)= (f, g)+(f,g n) , (12.7) r r· · where we use the notation,

(v, w) =(v, w)L2() = vw ds, (12.8) Z⌦ for the boundary integral, with L2() the Lebesgue space defined over the boundary .

Sobolev spaces The L2 space for ⌦ Rn,isdefinedby ⇢ L2(⌦) = v : v 2 dx < , (12.9) { | | 1} Z⌦ where in the case of a vector valued function v : Rn Rn,welet ! v 2 = v 2 = v2 + ... + v2. (12.10) | | k k2 1 n To construct appropriate vector spaces for the variational formulation of partial di↵erential equations, we need to extend L2 spaces to include also derivatives. The H1(⌦) is defined by,

@v H1(⌦) = v L2(⌦): i L2(⌦), i, j =1,...,n , (12.11) { 2 @xj 2 8 } and we define

H1(⌦) = v H1(⌦): v(x)=0,x , (12.12) 0 { 2 2 } to be the space of functions that are zero on the boundary . 12.2. POISSON’S EQUATION 111

12.2 Poisson’s equation

The Poisson equation We now consider the Poisson equation for a function u 2(⌦), 2C u = f, x ⌦, (12.13) 2 with ⌦ Rn,andf (⌦). For the equation to have a unique solution we need to⇢ specify boundary2C conditions. We may prescribe Dirichlet boundary conditions, u = g ,x, (12.14) D 2 Neumann boundary conditions,

u n = g ,x, (12.15) r · N 2 with n = n(x)theoutwardunitnormalonN ,oralinearcombinationof the two, which we refer to as a Robin boundary condition.

Homogeneous Dirichlet boundary conditions We now state the variational formulation of Poisson equation with homo- geneous Dirichlet boundary conditions,

u = f, x ⌦, (12.16) 2 u =0,x, (12.17) 2 1 which we obtain by multiplication by a test function v V = H0 (⌦) and integration over ⌦, using Green’s theorem, which gives, 2

( u, v)=(f,v), (12.18) r r since the boundary term vanishes as the test function is an element of the 1 vector space H0 (⌦).

Homogeneous Neumann boundary conditions We now state the variational formulation of Poisson equation with homo- geneous Neumann boundary conditions,

u = f, x ⌦, (12.19) 2 u n =0,x, (12.20) r · 2 112 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS which we obtain by multiplication by a test function v V = H1(⌦) and integration over ⌦, using Green’s theorem, which gives, 2

( u, v)=(f,v), (12.21) r r since the boundary term vanishes by the Neumann boundary condition. Thus the variational forms (12.18) and (12.21) are similar, with the only di↵erence being the choice of test and trial spaces. However, it turns out that the variational problem (12.21) has no unique solution, since for any solution u V ,alsov + C is a solution, with C a constant. To ensure a unique solution,2 we need an extra condition for the solution, for example, we may change the approximation space to

V = v H1(⌦): v(x) dx =0 . (12.22) { 2 } Z⌦ Non homogeneous boundary conditions We now state the variational formulation of Poisson equation with non homogeneous boundary conditions,

u = f, x ⌦, (12.23) 2 u(x)=g ,x , (12.24) D 2 D u n = g ,x , (12.25) r · N 2 N with =D N ,whichweobtainbymultiplicationbyatestfunction v V ,with [ 2 V = v H1(⌦): v(x)=g (x),x , (12.26) { 2 D 2 D} and integration over ⌦, using Green’s theorem, which gives,

( u, v)=(f,v)+(g ,v) . (12.27) r r N N The Dirichlet boundary condition is enforced through the trial space, and is thus referred to as an essential boundary condition, whereas the Neumann boundary condition is enforced through the variational form, thus referred to as a natural boundary condition.

The finite element method To compute approximate solutions to the Poisson equation, we can formu- late a finite element method based on the variational formulation of the 12.3. LINEAR PARTIAL DIFFERENTIAL EQUATIONS 113

equation, replacing the Sobolev space V with a polynomial space Vh, con- M structed by a set of basis functions i i=1,overamesh h,definedasa N { } M T collection of elements Ki i=1 and nodes Ni i=1. For the Poisson equation{ } with homogeneous{ } Dirichlet boundary condi- tions, the finite element method takes the form: Find U V ,suchthat, 2 h ( U, v)=(f,v),vV , (12.28) r r 2 h 1 with Vh H0 (⌦). The variational⇢ form (12.28) corresponds to a linear system of equations Ax = b,withaij =(j,i), xj = U(Nj), and bi =(f,i), with i(x)the basis function associated with the node Ni.

12.3 Linear partial di↵erential equations

The abstract problem We express a linear partial di↵erential equation as the abstract problem,

Lu = f, x ⌦, (12.29) 2 with boundary conditions,

Bu = g, x , (12.30) 2 for which we can derive a variational formulation: find u V such that, 2 a(u, v)=L(v),vV, (12.31) 2 with a : V V R a ,thatisafunctionwhichislinearin ⇥ ! both arguments, and L : V R a . In a Galerkin method we! seek an approximation U V such that 2 h a(U, v)=L(v),vV , (12.32) 2 h with Vh V afinitedimensionalsubspace,whichinthecaseofafinite element method⇢ is a piecewise polynomial space.

Energy error estimation Abilinearforma( , ) on the Hilbert space V is symmetric,if · · a(v, w)=a(w, v), v, w V, (12.33) 8 2 114 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS and coercive,orelliptic,if

a(v, v) c v ,vV, (12.34) k k 2 with c>0. A symmetric and elliptic bilinear form defines an inner product on V ,whichinducesanormwhichwerefertoastheenergy norm,

w = a(w, w)1/2. (12.35) k kE The Galerkin approximation is optimal in the norm, since by Galerkin orthogonality, a(u U, v)=0,vV , (12.36) 2 h we have that

u U 2 = a(u U, u u )=a(u U, u v)+a(u U, v u ) k kE h h = a(u U, u v) u U u v , k kEk kE so that u U u v ,vV . (12.37) k kE k kE 2 h 12.4 Heat equation

We consider the heat equation,

u˙(x, t) u(x, t)=f(x, t), (x, t) ⌦ I, (12.38) 2 ⇥ u(x, t)=0, (x, t) I, (12.39) 2 ⇥ u(x, 0) = u (x),x⌦(12.40) 0 2 on the domain ⌦ Rn with boundary , and with the time interval I = (0,T). To find an⇢ approximate solution to the heat equation, we use semi- discretization where space and time are discretized separately, using the finite element method and time stepping, respectively. 1 For each t T ,multiplytheequationbyatestfunctionv V = H0 (⌦) and integrate2 in space over ⌦to get the variational formulation,2

u˙(x, t)v(x) dx + u(x, t) v(x) dx = f(x, t)v(x) dx, r ·r Z⌦ Z⌦ Z⌦ from which we formulate a finite element method: find U Vh V ,such that, 2 ⇢

U˙ (x, t)v(x) dx + U(x, t) v(x) dx = f(x, t)v(x) dx, r ·r Z⌦ Z⌦ Z⌦ 12.5. EXERCISES 115

for all v Vh.Thefiniteelementmethodcorrespondstothesystemof initial value2 problems,

MU˙ (t)+SU(t)=b(t), (12.41) with mij = ⌦ j(x)i(x) dx,andbi(t)= ⌦ f(x, t)i(x) dx,whichissolved by time stepping to get, R R m

U(x, t)= Uj(t)j(x). (12.42) j=1 X 12.5 Exercises

Problem 34. Derive the variational formulation (12.27), and formulate the finite element method.

Problem 35. Derive (12.41) from the variational formulation of the heat equation.

Problem 36. Multiply (12.38) by u(x, t) and integrate over ⌦, to show that for f(x, t)=0, d u(t) 2 0. (12.43) dtk k 

Part V

Optimization

117

Chapter 13

Minimization problems

13.1 Unconstrained minimization

The minimization problem

Findx ¯ D Rn, such that 2 ⇢ f(¯x) f(x), x D, (13.1)  8 2 with D Rn the search space,¯x D the optimal solution,andf : D R the objective⇢ function (or cost function2 ). ! A stationary point,orcritical point,ˆx D is a point for which the gradient of the objective function is zero, that2 is,

f(ˆx)=0, (13.2) r and we refer to x⇤ D as a local minimum if there exists >0, such that, 2

f(x⇤) f(x), x : x x⇤ . (13.3)  8 k k If the minimization problem is convex,aninteriorlocalminimumisa global minimum, where in a convex minimization problem the search space is convex, i.e. (1 t)x + ty D, (13.4) 2 and the objective function is convex, i.e.

(1 t)f(x)+tf(y) f((1 t)x + ty), (13.5)  for all x, y D and t [0, 1]. 2 2 119 120 CHAPTER 13. MINIMIZATION PROBLEMS

Gradient descent method The level set of the function f : D Rn is defined as ! L (f)= x D : f(x)=c , (13.6) c { 2 } 2 where we note that Lc(f)representsalevelcurveinR ,alevelsurfacein R3,andmoregenerallyahypersurfaceofdimensionn 1inRn. Theorem 18. If f 1(D), then the gradient f(x) is orthogonal to the level set L (f) at x 2DC. r c 2 The gradient descent method is an iterative method that compute ap- proximations to a local minimum of (13.1), by searching for the next iterate in a direction orthogonal to the level set Lc(f) in which the objective func- tion decreases, the direction of steepest descent,withastep length ↵.

Algorithm 18: Method of steepest descent Start from x(0) . initial approximation for k =1, 2,... do x(k+1) = x(k) ↵(k) f(x(k)) . Step with length ↵(k) end r

Newton’s method For f 1(D) we know by Taylor’s formula that, 2C 1 f(x) f(y)+ f(y) (x y)+ (x y)T Hf(y)(x y), (13.7) ⇡ r · 2 for x, y D,withHf the Hessian matrix. Newton’s2 method to find a local minimum in the form of a stationary point is based on (13.7) with x = x(k+1), y = x(k) and x = x(k+1) x(k), for which we seek the stationary point, by d 1 0= f(x(k))+ f(x(k)) x + xT Hf(x(k))x d(x) r · 2 ✓ ◆ = f(x(k))+Hf(x(k))x, r which gives Newton’s method as an iterative method with increment

(k) 1 (k) x = (Hf(x )) f(x ). (13.8) r 13.2. LINEAR SYSTEM OF EQUATIONS 121

Algorithm 19: Newton’s method for finding a stationary point Start from x(0) . initial approximation for k =1, 2,... do Hf(x(k))x = f(x(k)) . solve linear system for x r x(k+1) = x(k) +x.Update approximation end

13.2 Linear system of equations

We now revisit the problem to find a solution x Rn to the system of linear equations 2 Ax = b, (13.9) m n m with A R ⇥ and b R ,withm n. 2 2 Least square method The linear system of equations Ax = b can be solved by minimzation algo- rithms, for example, in the form a least squares problem,

min f(x),f(x)= Ax b 2, (13.10) x D 2 k k m n m with A R ⇥ and b R ,andm n.Thegradientiscomputedas, 2 2 f(x)= ( Ax b 2)= ((Ax)T Ax (Ax)T b bT Ax + bT b) r r k k r = (xT AT Ax 2xT AT b + bT b)=AT Ax + xT AT A 2AT b r = AT Ax + AT Ax 2AT b =2AT (Ax b), which gives the following gradient descent method

x(k+1) = x(k) ↵(k)2AT (Ax(k) b)(13.11) Quadratic forms We consider the minimization problem,

min f(x), (13.12) x D 2 where f(x)isthequadraticform

f(x)=xT Ax bT x + c, (13.13) 122 CHAPTER 13. MINIMIZATION PROBLEMS

n n n with x R , A R ⇥ and n R,withastationarypointgivenby 2 2 2 1 1 1 0= f(x)= ( xT Ax bT x + c)= Ax + AT x b, (13.14) r r 2 2 2 which in the case A is a symmetric matrix corresponds to the linear system of equations, Ax = b. (13.15)

1 To prove that the solution x = A b is the solution of the minimization problem, study the error e = u y,withy D,forwhichwehavethat 2 1 f(x + e)= (x + e)T A(x + e) bT (x + e)+c 2 1 1 = xT Ax + eT Ax + eT Ae bT x bT e + c 2 2 1 1 =(xT Ax bT x + c)+ eT Ae +(eT b bT e) 2 2 1 = f(x)+ eT Ae. 2

1 We find that if A is a positive definite matrix x = A b is a global minimum, and thus any system of linear equations with a symmetric posi- tive definite matrix may be reformulated as minimization of the associated quadratic form. If A is not positive definite, it may be negative definite with minimum being ,singularwithnonuniqueminima,orelsethequadraticform f(x)hasa1 saddle-point.

Gradient descent method To solve the minimization problem for a quadratic form, we may use a gradient descent method for which the gradient gives the residual,

f(x(k))=b Ax(k) = r(k), (13.16) r that is, x(k+1) = x(k) ↵ f(x(k))=x(k) + ↵r(k). (13.17) r To choose a step length ↵ that minimizes x(k+1),wecomputethederiva- tive d d f(x(k+1))= f(x(k+1))T x(k+1) = f(x(k+1))T r(k) = (r(k+1))T r(k), d↵ r d↵ r 13.2. LINEAR SYSTEM OF EQUATIONS 123 which gives that ↵ should be chosen such that the successive residuals are orthogonal, that is (r(k+1))T r(k) =0, (13.18) which gives that

(r(k+1))T r(k) =0 (b Ax(k+1))T r(k) =0 (b A(x(k) + ↵r(k)))T r(k) =0 (b A(x(k))T r(k) ↵(Ar(k))T r(k) =0 (r(k))T r(k) = ↵(Ar(k))T r(k) so that (r(k))T r(k) ↵ = . (13.19) (Ar(k))T r(k)

Algorithm 20: Steepest descent method for Ax = b Start from x(0) . initial approximation for k =1, 2,... do r(k) = b Ax(k) . Compute residual r(k) ↵ =(r(k))T r(k)/(Ar(k))T r(k) . Compute step length ↵(k) x(k+1) = x(k) + ↵(k)r(k) . Step with length ↵(k) end

Conjugate gradient method revisited We now revisit the conjugate gradient (CG) method in the form of a method for solving the minimization of the quadratic form corresponding to a linear system of equations with a symmetric positive definite matrix. The idea is to formulate a search method,

x(k+1) = x(k) + ↵(k)d(k), (13.20)

(k) n 1 with a set of orthogonal search directions d k=0 ,wherethesteplength ↵(k) is determined by the condition that e{(k+1)}= x x(k+1) should be A- orthogonal,orconjugate,tod(k),thus

(d(k))T Ae(k+1) =0 (d(k))T A(e(k) ↵(k)d(k))=0, 124 CHAPTER 13. MINIMIZATION PROBLEMS so that (d(k))T Ae(k) (d(k))T r(k) ↵ = = . (13.21) (d(k))T Ad(k) (d(k))T Ad(k) To construct the orthogonal search directions d(k) we can use the Gram- Schmidt iteration, whereas if we choose the search direction to be the resid- ual we get the steepest descent method.

13.3 Constrained minimization

The constrained minimization problem We now consider the constrained minimization problem, min f(x)(13.22) x D 2 g(x)=c, (13.23) with the objective function f : D R,andtheconstraints g : D Rm, ! ! with x D Rn and c Rm. 2 ⇢ 2 We define the Lagrangian : Rn+m R,as L ! (x, )=f(x) (g(x) c), (13.24) L · with the dual variables,orLagrangian multipliers, Rm,fromwhichwe obtain the optimality conditions, 2 (x, )= f(x) g(x)=0, (13.25) rxL r ·r (x, )=g(x) c =0, (13.26) rL that is, n + m equations from which we can solve for the unknown variables (x, ) Rn+m. 2 Example in R2 For f : R2 R, g : R2 R and c =0,theLagrangiantakestheform ! ! (x, )=f(x) g(x), (13.27) L with the optimality conditions (x, )= f(x) g(x)=0, (13.28) rxL r r (x, )=g(x)=0, (13.29) rL so that f = g(x), which corresponds to the curve defined by the con- r r straint g(x)=0beingparalleltoalevelcurveoff(x)in¯x R2,the solution to the constrained minimization problem. 2 13.4. OPTIMAL CONTROL 125

13.4 Optimal control

The constrained optimal control problem We now consider the constrained optimal control problem,

min f(x, ↵)(13.30) ↵ g(x, ↵)=c, (13.31) with the objective function f : D A R, and the constraints g : D A ⇥ ! ⇥ ! Rm,withx D Rn, ↵ A Rl and c Rm. 2 ⇢ 2 ⇢ 2 We define the Lagrangian : Rn+m R,as L ! (x, ,↵)=f(x, ↵) (g(x, ↵) c), (13.32) L · with the dual variables,orLagrangian multipliers, Rm,fromwhichwe obtain the optimality conditions, 2

(x, )= f(x, ↵) g(x, ↵)=0, (13.33) rxL rx ·rx (x, )=g(x, ↵) c =0, (13.34) rL (x, )= f(x, ↵) g(x)=0, (13.35) r↵L r↵ ·r↵ that is, n + m + l equations from which we can solve for the unknown variables (x, ,↵) Rn+m+l. 2 Example We now consider the constrained minimization problem,

min cT x (13.36) x D 2 Ax = b, (13.37)

n n n with x, b, c R ,andA R ⇥ . 2 2 We define the Lagrangian : R2n R,as L ! (x, )=cT x T (Ax b), (13.38) L with Rn,fromwhichweobtaintheoptimalityconditions, 2 (x, )=c + AT =0, (13.39) rxL (x, )=Ax b =0. (13.40) rL

Chapter 14

Adaptive finite element methods

127

Chapter 15

PDE constrained optimal control

Parameter estimation as a PDE-constrained optimization problem

Symmetric positive definite matrices

m m We recall that a matrix A R ⇥ is symmetric positive definite (SPD) if the following conditions are2 satisfied: (i) A is symmetric (self-adjoint) so that A = AT ,and(ii)(Ax, x)=xT Ax > 0, for all non-zero x Rm. 2 It follows from the definition that xT Ay = yT Ax,foranyx, y Rm. 2 Further, all inner products on Rm take the form (x, y)=xT Ay,whereA is an SPD matrix. In the case of the standard Euclidian inner product A = I. An SPD matrix A is invertible, and the quadratic form xT Ax is convex. More generally, any quadratic function from Rm to R can be written as xT Ax + xT b + c,whereA is a symmetric m m matrix, b is an m vector, and c is a real number. This quadratic function⇥ is strictly convex when A is SPD, and hence has a unique finite global minimum. In particular, the unique global minimizer of the quadratic form 1 F (x)= xT Ax xT b, (15.1) 2 is the solution to the matrix equation Ax = b. The energy norm of a vector x Rn with respect to a symmetric matrix n m 2 A R ⇥ is defined as 2 x =(x, Ax)1/2 = xT Ax, (15.2) k kA which is related to the weighted inner product

(x, y)A =(x, Ay), (15.3)

129 130 CHAPTER 15. PDE CONSTRAINED OPTIMAL CONTROL for x, y Rn. 2 Part VI

Outroduction

131

Chapter 16

Outroduction

16.1

133