<<

Numerical Chap. 1: Basic Concepts from Linear Algebra

Heinrich Voss [email protected]

Hamburg University of Technology Institute for Numerical Simulation

TUHH Heinrich Voss Chapter 1 2005 1 / 60 Basic Ideas from Linear Algebra Vectors

The Rn is defined by

n T R := {(x1,..., xn) : xj ∈ R, j = 1,..., n},           x1 y1 x1 + y1 x1 αx1 x2 y2 x2 + y2 x2 αx2   +   =   , α   =    .   .   .   .   .   .   .   .   .   .  xn yn xn + yn xn αxn and Cn correspondingly.

A subset X ⊂ Rn is a subspace of Rn if it is closed with respect to addition and multiplication by scalars, i.e.

x, y ∈ X ⇒ x + y ∈ X,

n α ∈ R, x ∈ R ⇒ αx ∈ X.

TUHH Heinrich Voss Chapter 1 2005 2 / 60 Basic Ideas from Linear Algebra Subspaces

A set of vectors {a1,..., am} ⊂ Cn is linearly independent if

m X j αj a = 0 ⇒ αj = 0, j = 1,..., m. j=1

Otherwise, a nontrivial combination of the aj is zero, and {a1,..., am} is said to be linearly dependent.

Given vectors a1,..., am the set of all linear combinations of these vectors is a subspace referred to as the span of a1,..., am:

 m  1 m X j  span{a ,..., a } = αj a : αj ∈ C .  j=1 

TUHH Heinrich Voss Chapter 1 2005 3 / 60 Basic Ideas from Linear Algebra Subspaces ct.

If {a1,..., am} is linearly independent and b ∈ span{a1,..., am}, then b has a unique representation as of the aj .

n If S1,..., Sk are subspace of C then their sum

 k  X j j  S := S1 + ··· + Sk := a : a ∈ Sj , j = 1,..., k  j=1 

is also a subspace of Cn. S is said to be the direct sum if each x ∈ S has a 1 k j unique representattion x = a + ··· + a with a ∈ Sj . In this case we write

S = S1 ⊕ · · · ⊕ Sk .

The intersection of the subspaces S1,..., Sk is also a subspace,

S = S1 ∩ · · · ∩ Sk .

TUHH Heinrich Voss Chapter 1 2005 4 / 60 Basic Ideas from Linear Algebra Dimension

The subset {ai1 ,..., aik } is a maximal linearly independent subset of {a1,..., am} if it is linearly independent and is not contained properly in any linearly independent subset of {a1,..., am}. If {ai1 ,..., aik } is a maximal linearly independent subset then

span{ai1 ,..., aik } = span{a1,..., am}.

If S ⊂ Cn is a subspace it is always possible to find a maximal linearly independent subset {a1,..., ak }. Then S = span{a1,..., am}, and {a1,..., am} is called a of S.

All bases for a subspace S have the same number of elements. This number is the dimension of S, and it is denoted by dim(S).

TUHH Heinrich Voss Chapter 1 2005 5 / 60 Basic Ideas from Linear Algebra Linear map

A map A : Cn → Cm is called linear, if n n A(x+y) = Ax+Ay for every x, y ∈ C and A(λx) = λAx for every x ∈ C , λ ∈ C.

n For j = 1,..., n let ej ∈ C be the j-th canonical unit vector having a 1 in its n j-th component and zeros elsewhere. Then for x = (xj )j=1,...,n ∈ C n n n n  X  X X X Ax = A xj ej = A(xj ej ) = xj Aej =: xj aj j=1 j=1 j=1 j=1

Hence, the images aj := Aej of the canonical basis vectors characterize the linear map A. We therefore identify A with the m × n   a11 a12 ... a1n a21 a22 ... a2n  A =   .  . . .. .   . . . .  am1 a2m ... amn

TUHH Heinrich Voss Chapter 1 2005 6 / 60 Basic Ideas from Linear Algebra Matrix-vector product

m×n n For A = (ajk ) ∈ C and x = (xk ) ∈ C we have

n  X  m Ax = ajk xk =: b ∈ C . j=1,...,m k=1 The vector b is called matrix-vector product of A and x.

For every x ∈ Cn the matrix-vector product b = Ax is a linear combination of the columns aj of the matrix A.

TUHH Heinrich Voss Chapter 1 2005 7 / 60 Basic Ideas from Linear Algebra Matrix-matrix product

Let A : Cn → Cm and B : Cm → Cp. Then the composition

n p B ◦ A : C → C , (B ◦ A)x = B(Ax) is linear as well, and

n m n n m  X  X  X  X X  BAx = B(Ax) = B ajk xk = bij ajk xk = bij ajk xk . k=1 j=1 k=1 k=1 j=1

Hence, the composit map of B and A is represented by the matrix-matrix product C := BA ∈ Cp×n with elements

m X cik = bij ajk , i = 1,..., p, k = 1,..., n. j=1

Notice that the matrix-matrix product of B and A is only defined if the number of columns of B equals the number of rows of A.

TUHH Heinrich Voss Chapter 1 2005 8 / 60 Basic Ideas from Linear Algebra Range of a matrix

The range of a matrix A, written range(A), is the set of vectors that can be expressed as Ax for some x. The formula b = Ax leads naturally to the following characterization of range(A).

Theorem 1 range(A) is the space spanned by the columns of A.

Proof Pn Ax = j=1 aj xj is a linear combination of the columns aj of A. Conversely, any vector y in the space spanned by the columns of A can be Pn written as a linear combination of the columns, y = j=1 aj xj . Forming a vector x out of the coefficients xj , we have y = Ax, and thus y is in the range of A.

In view of Theorem 1, the range of a matrix A is also called the column space of A.

TUHH Heinrich Voss Chapter 1 2005 9 / 60 Basic Ideas from Linear Algebra Nullspace of A

The nullspace of A ∈ Cm×n, written null(A), is the set of vectors x that satisfy Ax = 0, where 0 is the 0-vector in Cn.

The entries of each vector x ∈ null(A) give the coefficients of an expansion of zero as a linear combination of columns of A:

0 = xl al + x2a2 + ··· + xnan.

The column of a matrix is the dimension of its column space. Similarly, the row rank of a matrix is the dimension of the space spanned by its rows.

Row rank always equals column rank (among other proofs, this is a corollary of the singular value decomposition, discussed later), so we refer to this number simply as the rank of a matrix.

TUHH Heinrich Voss Chapter 1 2005 10 / 60 Basic Ideas from Linear Algebra Full rank

An m × n matrix of full rank is one that has the maximal possible rank (the lesser of m and n).

This means that a matrix of full rank with m ≥ n must have n linearly independent columns. Such a matrix can also be characterized by the property that the map it defines is one-to-one.

Theorem 2 A matrix A ∈ Cm×n with m ≥ n has full rank if and only if it maps no two distinct vectors to the same vector. Proof If A is of full rank, its columns are linearly independent, so they form a basis for range(A). This means that every b ∈ range(A) has a unique linear expansion in terms of the columns of A, and therefore, every b ∈ range(A) has a unique linear expansion in terms of the columns of A, and thus, every b ∈ range(A) has a unique x such that b = Ax.

Conversely, if A is not of full rank, its columns aj are dependent, and there is a Pn nontrivial linear combination such that j=1 cj aj = 0. The nonzero vector c formed from the coefficients cj satisfies Ac = 0. But then A maps distinct vectors to the same vector since, for any x it holds that Ax = A(x + c). TUHH Heinrich Voss Chapter 1 2005 11 / 60 Basic Ideas from Linear Algebra Inverse matrix

A nonsingular or invertible matrix is a of full rank.

Note that the m columns of a nonsingular m × m matrix A form a basis for the whole space Cm. Therefore, we can uniquely express any vector as a linear combination of them.

In particular, the canonical unit vector ej , can be expanded:

m X ej = zij aj . i=1

m×m Let Z ∈ C be the matrix with entries zij , and let zj denote the jth column of Z. Then it holds ej = Azj , and putting these vectos together

AZ = (e1,..., em) =: I

where I ist the m × m identity. Z is the inverse of A, and is denoted by Z =: A−1.

TUHH Heinrich Voss Chapter 1 2005 12 / 60 Basic Ideas from Linear Algebra Gaussian elimination

The simplest way to solve a linear system (by hand or on a computer) is Gaussian elimination.

It transforms a linear system to an equivalent one with upper-triangular system matrix by applying simple linear transformations.

Let A ∈ Cm×n be given. The idea is to transform A into an upper-triangular matrix by introducing zeros below the diagonal, first in column 1, then in column 2 , etc. This is done by subtracting suitable multiples of each row from the subsequent ones. This eleimination process is equivalent to multiplying A by a sequence of lower triangular matrices Lj on the left:

Ln−1Ln−2 ··· L1A = U.

TUHH Heinrich Voss Chapter 1 2005 13 / 60 Basic Ideas from Linear Algebra LU factorization

−1 −1 −1 Setting L := L1 L2 ··· Ln−1 gives A = LU. Thus we obtain an LU factorization of A A = LU, where U is upper-triangular, and L is (as a product of lower-triangular matrices) lower-triangular.

It turns out that L can be chosen such that all diagonal entries are equal to 1. A matrix with this property is called unit lower-triangular.

TUHH Heinrich Voss Chapter 1 2005 14 / 60 Basic Ideas from Linear Algebra Example

 2 1 3 4  −2 1 −1 −2 A =    4 4 5 11 −2 1 −7 −1 The first step of Gaussian elimination looks like this: The first row is added to the second one, twice the first row is subtracted from the third one, and the first row is added to third on. This can be written as  1 0 0 0  2 1 3 4  2 1 3 4  1 1 0 0 −2 1 −1 −2 0 2 2 2 L1A =     =   −2 0 1 0  4 4 5 11 0 2 −1 3 1 0 0 1 −2 1 −7 −1 0 2 −4 3 Next we subtract the second row from the third and the fourth row: 1 0 0 0 2 1 3 4 2 1 3 4 0 1 0 0 0 2 2 2 0 2 2 2 L2L1A =     =   0 −1 1 0 0 2 −1 3 0 0 −3 1 0 −1 0 1 0 2 −4 3 0 0 −6 1

TUHH Heinrich Voss Chapter 1 2005 15 / 60 Basic Ideas from Linear Algebra Example ct.

1 0 0 0 2 1 3 4 2 1 3 4 0 1 0 0 0 2 2 2 0 2 2 2 L2L1A =     =   0 −1 1 0 0 2 −1 3 0 0 −3 1 0 −1 0 1 0 2 −4 3 0 0 −6 1 Finally we subtract twice the third row from the fourth row

1 0 0 0 2 1 3 4 2 1 3 4  0 1 0 0 0 2 2 2 0 2 2 2  L3L2L1A =     =   0 0 1 0 0 0 −3 1 0 0 −3 1  0 0 −2 1 0 0 −6 1 0 0 0 −1

To exhibit the full factorization A = LU we need to compute the product −1 −1 −1 L = L1 L2 L3 .

Surprisingly, this turns out to be trivial. The inverse of Lj , j = 1, 2, 3 is just Lj itself, but with each entry below the diagonal negated:

TUHH Heinrich Voss Chapter 1 2005 16 / 60 Basic Ideas from Linear Algebra Example ct.

−1  1 0 0 0  1 0 0 0 −1  1 1 0 0 −1 1 0 0 L =   =   ,... 1 −2 0 1 0  2 0 1 0 1 0 0 1 −1 0 0 1 −1 −1 −1 The product L1 L2 L3 is just the unit lower-triangular matrix with the nonzero −1 −1 −1 subdiagonal entries of L1 , L2 and L3 inserted in the appropriate places:  1 0 0 0 −1 1 0 0 L =   .  2 1 1 0 −1 1 2 1 Together we have  2 1 3 4   1 0 0 0 2 1 3 4  −2 1 −1 −2 −1 1 0 0 0 2 2 2  A =   =     = LU.  4 4 5 11  2 1 1 0 0 0 −3 1  −2 1 −7 −1 −1 1 2 1 0 0 0 −1

TUHH Heinrich Voss Chapter 1 2005 17 / 60 Basic Ideas from Linear Algebra General case

Suppose xk is the kth column of the matrix at the beginning of step k. Then the transformation has to be chosen such that     x1k x1k  .   .   . xkk   . xkk      xk = xk,k+1  →  0  .      .   .   .   .  xmk 0

To this end we subtract `jk times row k from row j with `jk = xjk /xkk , k < j ≤ m, which is performed multiplying by 1   ..   .     1  Lk =   .  −`k+1,k 1     . ..   . .  −`mk 1

TUHH Heinrich Voss Chapter 1 2005 18 / 60 Basic Ideas from Linear Algebra General case ct.

In the numerical example, we noted that Lk can be inverted by negating its subdiagonal entries, and that L can be formed by collecting the entries `jk in the appropriate places. These observations are true in the general case.

With  0   .   .     0  `k =   `k+1,k     .   .  `mk

H the matrix Lk can be written Lk = I − `k ek , where ek is the kth standard unit H vector. The sparsity pattern of `k implies ek `k = 0, and therefore

H H H H (I − `k ek )(I + `k ek ) = I − `k ek `k ek = I.

H In other words, the inverse of Lk is I + `k ek .

TUHH Heinrich Voss Chapter 1 2005 19 / 60 Basic Ideas from Linear Algebra General case ct.

−1 −1 That L = L1 ··· Lm can be formed by collecting the entries `jk in the appropriate places is proved by induction.

Assume that k −1 −1 X H L1 ··· Lk = I + `j ej . j=1

H Then it follws from `j ej `k+1 = 0 for j = 1,..., k

k −1 −1 −1 X H H L1 ··· Lk Lk+1 = (I + `j ej )(I + `k+1ek+1 j=1 k+1 k k+1 X H X H H X H = I + `j ej + `j ej `k+1ek+1. = I + `j ej . j=1 j+1 j=1

TUHH Heinrich Voss Chapter 1 2005 20 / 60 Basic Ideas from Linear Algebra Gaussian elimination

In practical Gaussian elimination, the matrices Lk are never formed and multiplied explicitly. The multipliers `k are computed and stored directly into L, and the transformations Lk are then applied implicitly.

Gaussian elimination without pivoting U = A, L = I for k=1:m-1 do for j=k+1:m do `jk = ujk /ukk

uj,k:m = uj,k:m − `jk uk,km end for end for

Three matrices A, L, U are not really needed; to minimize memory use on the computer, both L and U can be written into the same array as A.

TUHH Heinrich Voss Chapter 1 2005 21 / 60 Basic Ideas from Linear Algebra Linear systems

If A is factored into L and U, a system of Ax = b is reduced to the form LUx = b.

Thus it can be solved by solving two triangular systems: first Ly = b for the unknown y (forward substitution), then Rx = y for the unknown x (back substitution).

This is particularly advantageous, if several linear systems with the same system matrix have to be solved.

TUHH Heinrich Voss Chapter 1 2005 22 / 60 Basic Ideas from Linear Algebra Failure of Gaussian elimination

Unfortunately, Gaussian elimination as presented so far is unusable for solving general linear systems, for it is not stable.

The instability is related to another, more obvious difficulty. For certain matrices, Gaussian elimination fails entirely, because it attempts division by zero. For example, consider

0 1 A = 1 1 This matrix has full rank. Nevertheless, Gaussian elimination fails at the first step.

TUHH Heinrich Voss Chapter 1 2005 23 / 60 Basic Ideas from Linear Algebra Pivoting

At step k of Gaussian elimination, multiples of row k are subtracted from rows k + 1,..., m of the working matrix X in order to introduce zeros in entry k of these rows. In this operation row k, column k, and especially the entry xkk play special roles. We call xkk the pivot.

From every entry in the submatrix Xk+l:m,k:m is subtracted the product of a number in row k and a number in column k, divided by xkk

However, there is no reason why the kth row and column must be chosen for the elimination. For example, we could just as easily introduce zeros in column k by adding multiples of some row i with k < i < m to the other rows.

Similarly, we could introduce zeros in column j rather than column k to eliminate in linear system with system matrix the unknown xj from all remaining equations but one.

TUHH Heinrich Voss Chapter 1 2005 24 / 60 Basic Ideas from Linear Algebra Pivoting ct.

All in all, we are free to choose any entry of Xk:m,k:m as the pivot, as long as it is nonzero. The possibility that an entry Xkk = 0 might arise implies that some flexibility of choice of the pivot may sometimes be necessary, even from a pure mathematical point of view.

For numerical stability, however, it is desirable to pivot even when xkk is nonzero if there is a larger element available. In practice, it is common to pick as pivot the largest number among a set of entries being considered as candidates.

The structure of the elimination process quickly becomes confusing if zeros are introduced in arbitrary patterns through the matrix. To see what is going on, we want to retain the triangular structure, and there is an easy way to do this. We shall not think of the pivot xij as left in place. Instead, at step k, we shall imagine that the rows and columns of the working matrix are permuted so as to move xij into the (k, k) position. Then, when the elimination is done, zeros are introduced into entries k + 1,..., m of column k, just as in Gaussian elimination without pivoting. This interchange of rows and perhaps columns is what is usually thought of as pivoting. TUHH Heinrich Voss Chapter 1 2005 25 / 60 Basic Ideas from Linear Algebra Partial pivoting

If every entry of Xk:m,k:m is considered as a possible pivot at step k, there are (m − k)2 entries to be examined to determine the largest. This expensive strategy is called complete pivoting. In practice, equally good pivots can be found by considering a much smaller number of entries. The standard method for doing this is partial pivoting. Here, only rows are interchanged. The pivot at each step is chosen as the largest of the m − k + 1 subdiagonal entries in column k. To bring the kth pivot into the (k, k) position, no columns need to be permuted; only row k is swapped with the row containing the pivot. As usual in numerical linear algebra, this algorithm can be expressed as a matrix product. We saw in the last lecture that an elimination step corresponds to left-multiplication by an elementary lower-triangular matrix Lk . Partial pivoting complicates matters by applying a permutation matrix Pk on the left of the working matrix before each elimination. After m − 1 steps, A becomes an upper-triangular matrix U:

Lm−1Pm−1Lm−2 ··· L2P2L1P1A = U.

TUHH Heinrich Voss Chapter 1 2005 26 / 60 Basic Ideas from Linear Algebra Example

 2 1 3 4  −2 1 −1 −2 A =    4 4 5 11 −2 1 −7 −1

Since |a31| ≥ |aj1| for j = 1, 2, 3, 4 we interchange rows three and one:

0 0 1 0  2 1 3 4   4 4 5 11 0 1 0 0 −2 1 −1 −2 −2 1 −1 −2 P1A =     =   1 0 0 0  4 4 5 11  2 1 3 4  0 0 0 1 −2 1 −7 −1 −2 1 −7 −1

Next we eliminate the subdiagonal elements of the first column

 1 0 0 0  4 4 5 11 4 4 5 11   1/2 1 0 0 −2 1 −1 −2 0 3 3/2 7/2  L1P1A =     =   −1/2 0 1 0  2 1 3 4  0 −1 1/2 −3/2 1/2 0 0 1 −2 1 −7 −1 0 3 −9/2 9/2

TUHH Heinrich Voss Chapter 1 2005 27 / 60 Basic Ideas from Linear Algebra Example ct.

|x22| is already maximal in the second column; no permutation is necessary 1 0 0 0 4 4 5 11  4 4 5 11  0 1 0 0 0 3 3/2 7/2  0 3 3/2 7/2  L2L1P1A =     =   0 1/3 1 0 0 −1 1/2 −3/2 0 0 1 −1/3 0 −1 0 1 0 3 −9/2 9/2 0 0 −6 1

P3 permutes rows four and three 1 0 0 0 4 4 5 11  4 4 5 11  0 1 0 0 0 3 3/2 7/2  0 3 3/2 7/2  P3L2L1P1A =     =   0 0 0 1 0 0 1 −1/3 0 0 −6 1  0 0 1 0 0 0 −6 1 0 0 1 −1/3 And the final elimination step yields 1 0 0 0 4 4 5 11  4 4 5 11  0 1 0 0 0 3 3/2 7/2  0 3 3/2 7/2  P3L2L1P1A =     =   0 0 1 0 0 0 −6 1  0 0 −6 1  0 0 1/6 1 0 0 1 −1/3 0 0 0 −1/6

TUHH Heinrich Voss Chapter 1 2005 28 / 60 Basic Ideas from Linear Algebra PA = LU

The matrix Lm−1Pm−1Lm−2 ··· L2P2L1P1A = U in our example reads 0 0 1 0   2 1 3 4  4 4 5 11  0 1 1/2 0  −2 1 −1 −2 0 3 3/2 7/2      =   0 −1 0 1   4 4 5 11 0 0 −6 1  1 1/ −1/3 1/6 −2 1 −7 −1 0 0 0 −1/6 If we knew in advance the permutations of the rows which are performed in the course of the Gaussian elimination we could apply these permutations first (which corresponds to one matrix multiplication of A by a permutation matrix P) and determine the LU factorization of PA without pivoting: 0 0 1 0  2 1 3 4  0 1 0 0 −2 1 −1 −2     0 0 0 1  4 4 5 11 1 0 0 0 −2 1 −7 −1  1 0 0 0 4 4 5 11  −1/2 1 0 0 0 3 3/2 7/2  =     −1/1 1 1 0 0 0 −6 1  1/ −1/3 −1/6 1 0 0 0 −1/6

TUHH Heinrich Voss Chapter 1 2005 29 / 60 Basic Ideas from Linear Algebra PA = LU ct.

The factorization PA = LU can be determined in the course of the Gaussian elimination not knowing the permutations in advance.

Gaussian elimination generates the decomposition

Lm−1Pm−1Lm−2Pm−2Lm−3 ··· L2P2L1P1A = U. The lefthand side can be rewritten as

−1 Lm−1Pm−1Lm−2Pm−1Pm−1Pm−2Lm−3 ··· L2P2L1P1A 0 = Lm−1Lm−2Pm−1Pm−2Lm−3 ··· L2P2L1P1A 0 −1 −1 =: Lm−1Lm−2Pm−1Pm−2Lm−3Pm−2Pm−1Pm−1Pm−2Pm−3Lm−4 ··· L2P2L1P1A 0 0 =: Lm−1Lm−2Lm−3Pm−1Pm−2Pm−3Lm−4 ··· L2P2L1P1A = ··· 0 0 = (Lm−1Lm−2 ··· L1)(Pm−1Pm−2 ··· P1)A = U with

0 −1 −1 −1 Lk = Pm−1Pm−2 ··· Pk+1Lk Pk+1 ··· Lm−2Lm−1, k = 1,..., m − 2.

TUHH Heinrich Voss Chapter 1 2005 30 / 60 Basic Ideas from Linear Algebra PA = LU ct.

Multiplying Lk by Pk+1 on the left exchanges rows k + 1 and ` for some −1 ell > k + 1, and multiplying by Pk+1 on the right exchanges columns k + 1 and −1 `. Hence, Pk+1Lk Pk+1 has the same structure as Lk , and this structure is kept when multiplying with further permutations Pk+2,..., Pm−1 and their inverses on the left and right, respectively.

0 The matrices Lk are unit lower-triangular and easily invertible by negating the subdiagonal entries, just as in Gaussian elimination without pivoting.

0 0 −1 Writing L = (Lm−1Lm−2 ··· L1) and P = Pm−1 ··· P2P1, we have PA = LU.

TUHH Heinrich Voss Chapter 1 2005 31 / 60 Basic Ideas from Linear Algebra Gaussian elimination

Gaussian elimination with partial pivoting U = A, L = I, P = I for k=1:m-1 do selct i ≥ k to maximize |uik | uk,k:m ↔ ui,k:m `k,1:k−1 ↔ `i,1:k−1 pk,: ↔ pi,: for j=k+1:m do `jk = ujk /ukk

uj,k:m = uj,k:m − `jk uk,km end for end for

TUHH Heinrich Voss Chapter 1 2005 32 / 60 Basic Ideas from Linear Algebra Adjoint matrix

The complex conjugate of a scalar z ∈ C, written z or zH , is obtained by negating its imaginary part. For real z ∈ R, we have z = z.

The Hermitian conjugate or adjoint of an m × n matrix A ∈ Cm×n, written AH , is the n × m matrix whose (i, j) entry is the complex conjugate of the (j, i) entry of A, i.e.     a11 a21 a11 a12 a13 H A = ⇒ A = a12 a22 a21 a22 a23 a13 a23

If A = AH , then A is called Hermitian. By definition, a Hermitian matrix must be square. For real A, the adjoint simply interchanges the rows and columns of A. In this case, the adjoint is also known as the transpose, and is written AT . If a real matrix is Hermitian, that is, A = AT , then it is also said to be symmetric.

TUHH Heinrich Voss Chapter 1 2005 33 / 60 Basic Ideas from Linear Algebra Inner product

The inner product of two column vectors x, y ∈ Cn is the product of the adjoint of x by y: n H X x y := xj yj . j=1

The Euklidean length of of a vector x ∈ Cn is written kxk, and can be defined as the square root of the inner product of x with itself: v √ u n H uX 2 kxk = x x = t |xj | . j=1

The cosine of the angle φ between x and y can be expressed in terms of the inner product as x H y cos φ = . kxk · kyk

TUHH Heinrich Voss Chapter 1 2005 34 / 60 Basic Ideas from Linear Algebra Orthogonal vectors

A pair of vectors x and y are orthogonal if x H y = 0. If x and y are real, this means they lie at right angles to each other in Rn.

Two sets of vectors X and Y are orthogonal (also stated "X is orthogonal to Y ") if every x ∈ X is orthogonal to every y ∈ Y .

A set of nonzero vectors S is orthogonal if its elements are pairwise orthogonal, i.e., x, y ∈ S, x 6= y ⇒ x H y = 0.

A set of vectors is orthonormal if it is orthogonal and, in addition, every x ∈ S has kxk = 1.

TUHH Heinrich Voss Chapter 1 2005 35 / 60 Basic Ideas from Linear Algebra Orthogonal vectors ct.

Theorem The vectors in an orthogonal set S are linearly independent.

Proof: For v1,..., vk ∈ S let k X cj vj = 0. j=1

Multiplying by vi ∈ S, i ∈ {1,..., j} one gets

k k H X X H H 2 0 = vi cj vj = cj vi vj = ci vi vi = ci kvi k ⇒ ci = 0 j=1 j=1

which implies the of S

As a corollary of the Theorem it follows that if an orthogonal set S ⊂ Cm contains m vectors, then it is a basis for Cm.

TUHH Heinrich Voss Chapter 1 2005 36 / 60 Basic Ideas from Linear Algebra Representation by

m m Given a vector b ∈ C and a basis {q1,..., qm} of C one usually has to Pm solve a linear system to obtain the representation b = j=1 βj qj with respect to this basis, namely         q11 q12 ... q1m β1 b1 q1j q21 q22 ... q2m  β2  b2  q2j      =   where q =   .  ..   .   .  j  .   .   .   .   .  qm1 qm2 ... qmm βm bm qmj

H If {q1,..., qm} is an orthonormal basis, i.e. qi qj = δij where δij is the Kronecker symbol equal 1 if i = j and 0 if i 6= j, then m H H X H qi b = qi (β1q1 + β2q2 + ··· + βmqm) = βj qi qj = βi , j=1 and the representation of b is given by m m X H X H b = (qj b)qj = (qj qj )b. j=1 j=1

TUHH Heinrich Voss Chapter 1 2005 37 / 60 Basic Ideas from Linear Algebra Representation by orthonormal basis ct.

m m X H X H b = (qj b)qj = (qj qj )b. j=1 j=1

H contains two different ways to represent b, once with (qj b)qj , and again with H (qj qj )b.

These expressiosn are equal, but they have different interpretations.

H In the first case, we view b as a sum of coefficients (qj b) times vectors qj .

In the second, we view b as a sum of orthogonal projections of b onto the various directions qj . The jth projection operation is achieved by the rank-one H matrix qj qj .

TUHH Heinrich Voss Chapter 1 2005 38 / 60 Basic Ideas from Linear Algebra Unitary matrices

A square matrix Q ∈ Cm×m is unitary (in the real case, we also say orthogonal) if QH = Q−1, i.e, if QH Q = I.

In other words, the columns qj of a unitary matrix form an orthonormal basis of Cm.

 1 if i = j qH q = δ = i j ij 0 if i 6= j

δij is called .

TUHH Heinrich Voss Chapter 1 2005 39 / 60 Basic Ideas from Linear Algebra Vector norms

Norms serve the purpose to measure the length of vectors. A vector on Cn is a function

n k · k : C → R+ := {α ∈ R : α ≥ 0} that satisfies the following properties: (i) kxk = 0 ⇔ x = 0 (ii) kαxk = |α| · kxk for every x ∈ Cn and α ∈ C (iii) kx + yk ≤ kxk + kyk for every x, y ∈ Cn

p p 1/p Example: kxkp = (|x1| + ··· + |xn| ) called p-norm.

kxk1 = |x1| + ··· + |xn| q 2 2 kxk2 = |x1| + ··· + |xn|

kxk∞ = max |xj | j=1,...,n

TUHH Heinrich Voss Chapter 1 2005 40 / 60 Basic Ideas from Linear Algebra Properties of vector norms

Hölder’s inequality

1 1 |x H y| ≤ kxk · kyk where + = 1. p q p q

Important special case: Cauchy–Schwarz inequality

H |x y| ≤ kxk2 · kyk2.

All norms on Cn are equivalent, i.e. if k · k and k · k0 are two norms on Cn then there exist positive constants C1 and C2 such that

0 n C1kxk ≤ kxk ≤ C2kxk for every x ∈ C .

√ kxk2 ≤ kxk1 ≤ √ nkxk2 kxk∞ ≤ kxk2 ≤ nkxk∞ kxk∞ ≤ kxk1 ≤ nkxk∞

TUHH Heinrich Voss Chapter 1 2005 41 / 60 Basic Ideas from Linear Algebra Errors

Suppose that xˆ is an approximation to x. For a given vector norm k · k

abs := kx − xˆk is the absolute error of xˆ. If x 6= 0 then kx − xˆk  := rel kxk is the relative error of xˆ. If kx − xˆk ∞ ≈ 10−p kxk∞ then the largest component of xˆ has approximately p correct significant digits. If x = (9.876 , 0.0543)T and xˆ = (9.875 , 0.0700)T , then −3 kx − xˆk∞/kxk∞ ≈ 1.6e − 3 ≈ 10 , and the first component has about 3 correct leading digits whereas the second component has no correct significant digit. TUHH Heinrich Voss Chapter 1 2005 42 / 60 Basic Ideas from Linear Algebra Matrix norms

m×n n Let A ∈ C , let k · kn be a vector norm in C and k · km be a vector norm in Cm. Then kAxkm kAkm,n := supx6=0 kxkn

is the matrix norm subordinate to the vector norms k · kn and k · km. From   kAxkm x = A kxkn kxkn m it follows that kAkm,n = max{kAxkm : kxkn = 1}.

In particular, this observation guarantees that the maximum is attained by n some x ∈ C since the mapping x 7→ kAxkm is continuous and {x : kxkn = 1} is compact.

TUHH Heinrich Voss Chapter 1 2005 43 / 60 Basic Ideas from Linear Algebra Properties of matrix norms

n kAkm,n = 0 ⇐⇒ kAxkm = 0 for every x ∈ C n ⇐⇒ Ax = 0for every x ∈ C ⇐⇒ A = O

kαAkm,n = max{kαAxkm : kxkn = 1}

= max{|α| · kAxkm : kxkn = 1}

= |α| · kAkm,n

kA + Bkm,n = max{kAx + Bxkm : kxkn = 1}

≤ max{kAxkm + kBxkm : kxkn = 1}

≤ max{kAxkm : kxkn = 1} + max{kBxkm : kxkn = 1}

= kAkm,n + kBkm,n

m×n Hence k · km,n is a vector norm on the vector space C TUHH Heinrich Voss Chapter 1 2005 44 / 60 Basic Ideas from Linear Algebra Submultiplicativity of matrix norms

n m×n kAxkm ≤ kAkm,n · kxkn for every x ∈ C and every A ∈ C follows immediately from the definition of the matrix norm

For every A ∈ Cm×n and every B ∈ Cn×p it holds

kABkm,p ≤ kAkm,n · kBkn,p.

For every x ∈ Cp

kABxkm = kA(Bxkm ≤ kAkm,n · kBxkn ≤ kAkm,n · kBkn,p · kxkp,

and therefore

kABkm,p = max{kABxkm : kxkp = 1} ≤ kAkm,n · kBkn,p.

TUHH Heinrich Voss Chapter 1 2005 45 / 60 Basic Ideas from Linear Algebra Geometric interpretation

The matrix norm kAkm,n is the smallest nonnegative number µ such that

n kAxkm ≤ µ · kxkn for every x ∈ C .

Hence, kAkm,n is the maximum elongation of a vector x by the mapping n x 7→ Ax with respect to the norm k · kn in the domain C and k · km in the range Cm

From now on we only consider the case that the same (type of) norm is used in the domain and in the range (even if the two spaces are of different dimensions), and we denote the matrix norm by the same symbol that is used for the vector norm. 5×9 Hence, if A ∈ C then kAk∞ denotes the matrix norm of A with respe

TUHH Heinrich Voss Chapter 1 2005 46 / 60 Basic Ideas from Linear Algebra Matrix ∞-norm

n X kAk∞ = max |aij | i=1,...,m j=1

For every x ∈ Cn it holds

n n X X kAxk∞ = max | aij xj | ≤ max |aij | · |xj | i=1,...,m i=1,...,m j=1 j=1 n X ≤ kxk∞ · max |aij |. i=1,...,m j=1

Thus n X kAk∞ ≤ max |aij | (∗). i=1,...,m j=1

TUHH Heinrich Voss Chapter 1 2005 47 / 60 Basic Ideas from Linear Algebra Matrix ∞-norm ct.

Let k ∈ {1,..., m} such that n n X X |aij | ≤ |akj | for every i = 1,..., n j=1 j=1 n and define x ∈ C by xj := 1, if akj = 0, and xj := akj /|akj |, otherwise.

Then kxk∞ = 1 and n n X X kAxk∞ = max | aij xj | ≥ | akj xj | i=1,...,m j=1 j=1

n n X X = |akj | = max |aij |. i=1,...,m j=1 j=1 Hence n X kAk∞ = max{kAyk∞ : kyk∞ = 1} ≥ kAxk∞ ≥ max |aij |, i=1,...,m j=1 which together with inequality (*) yields the proposition. TUHH Heinrich Voss Chapter 1 2005 48 / 60 Basic Ideas from Linear Algebra Matrix 1-norm and 2-norm

Analogously the 1-norm of a matrix A ∈ Cm×N is easily shown to be m X kAk1 := max |aij | j=1,...,n i=1 The matrix 2-norm, called spectral norm of A ∈ Cm×n is is the square root of the largest eigenvalue of AH A. This follows from Rayleigh’s principle: kAxk2 x H AH Ax 2 = ≤ max{λ : AH Ax = λx}. kxk2 x H x

H Hence, kAk2 =square root of maximum eigenvalue of A A. The spectral norm can be easily bounded by p kAk2 ≤ kAk1 · kAk∞

H 2 Let z 6= 0 such that A Az = kAk2z. Then 2 H H kAk2kzk1 = kA Azk1 ≤ kA k1kAk1kzk1 = kAk∞kAk1kzk1.

TUHH Heinrich Voss Chapter 1 2005 49 / 60 Basic Ideas from Linear Algebra Frobenius norm

For every A ∈ Cm×n it holds v u m n uX X 2 kAk2 ≤ kAkF := t |aij | . i=1 j=1

kAkF is called Frobenius norm or Schur norm m×n m·n k · kF is a vector norm on C (= euclidian norm on C ), but it is not matrix norm subordinate to a vector norm, since in this case it would hold for n = m for the unit matrix I √ kIk = max{kIxk : kxk = 1} = 1 whereas kIkF = n.

TUHH Heinrich Voss Chapter 1 2005 50 / 60 Basic Ideas from Linear Algebra Frobenius norm ct.

From the Cauchy Schwarz inequality it follows for every x ∈ Cn 2   m n m n n 2 X X X X 2 X 2 2 2 kAxk = aij xj ≤  |aij |  |xj | = kAk · kxk , 2 F 2 i=1 j=1 i=1 j=1 j=1

and hence kAk2 ≤ kAkF .

TUHH Heinrich Voss Chapter 1 2005 51 / 60 Basic Ideas from Linear Algebra Singular value decomposition

The Singular value decomposition (SVD) is motivated by the following geometric fact: The image of the under any m × n matrix is a hyperellipse.

The SVD is applicable to both real and complex matrices. However, in describing the geometric interpretation, we assume as usual that the matrix is real.

The term "hyperellipse" may be unfamiliar, but this is just the m-dimensional generalization of an ellipse. We may define a hyperellipse in Rm as the surface obtained by stretching the unit sphere in Rm by some factors m σ1, . . . , σm (possibly zero) in some orthogonal directions ul ,..., um ∈ R .

For convenience, let us take the ui to be unit vectors, i.e., kui k2 = l. The vectors {σi ui } are the principal semiaxes of the hyperellipse, with lengths σ1, . . . , σm.

If A has rank r, exactly r of the lengths σi will turn out to be nonzero, and in particular, if m > n, at most n of them will be nonzero. TUHH Heinrich Voss Chapter 1 2005 52 / 60 Basic Ideas from Linear Algebra SVD ct.

n m×n Let S be the unit sphere in R , and take any A ∈ R with m >n. For simplicity, suppose for the moment that A has full rank n.

The image AS is a hyperellipse in Rm. We now define some properties of A in terms of the shape of AS. First, we define the n singular values of A. These are the lengths of the n principal semiaxes of AS, written as σ1, . . . , σn. It is conventional to assume that the singular values are numbered in descending order, σ1 ≥ σ2 ≥ · · · ≥ σn.

Next, we define the n left singular vectors of A. These are the unit vectors ul , u2,..., un oriented in the directions of the principal semiaxes of AS, numbered to correspond with the singular values. Thus the vector σi ui is the ith largest principal semiaxis of AS.

Finally, we define the n right singular vectors of A. These are the unit vectors {vl , v2,..., vn} ⊂ S that are the preimages of the principal semiaxes of AS, numbered so that Avj = σj uj .

TUHH Heinrich Voss Chapter 1 2005 53 / 60 Basic Ideas from Linear Algebra Reduced SVD

The equations relating right singular vectors {vj } and left singular vectors {uj } can be written as Avj = σj uj , j = 1,..., n.

This collection of vector equations can be expressed as a matrix equation,   σ1  ..  A[v1,..., vn] = [u1,..., un]  .  σn

or more compactly AV = Uˆ Σˆ where Σˆ ∈ Rn×n is a with positive entries, Uˆ ∈ Rm×n and V ∈ Rn×n have orthonormal columns.

Multiplying on the right by V H (notice that V is unitary!) one gets

A = Uˆ ΣˆV H

which is called the reduced singular value decomposition or reduced SVD of A. TUHH Heinrich Voss Chapter 1 2005 54 / 60 Basic Ideas from Linear Algebra Full SVD

In most application the SVD is used in exactly the form just described. However, this is not the way in which the idea of an SVD is usually formulated in textbooks. We have introduced the term "reduced" and the hats on U and Σ in order to distinguish the factorization from the more standard "full" SVD. The columns of U are n orthonormal vectors in the m-dimensional space Cm. Unless m = n, they do not form a basis of Cm, nor is U a unitary matrix. However, by adjoining an additional m − n orthonormal columns, Uˆ can be extended to a unitary matrix. Let us do this in an arbitrary fashion, and call the result U. If Uˆ is replaced by U, then Σˆ will have to change too. For the product to remain unaltered, the last m − n columns of U should be multiplied by zero. Accordingly, let Σ be the m × n matrix consisting of Σ in the upper n × n block together with m − n rows of zeros below. We now have a new factorization, the full SVD of A: A = UΣV H where U is m × m and unitary, V is n × n and unitary, and Σ is m × n and diagonal with positive real entries. TUHH Heinrich Voss Chapter 1 2005 55 / 60 Basic Ideas from Linear Algebra Full SVD ct.

Having described the full SVD, we can now discard the simplifying assumption that A has full rank.

If A is rank-deficient, the factorization A = UΣV H is still appropriate. All that changes is that now not n but only r of the left singular vectors of A are determined by the of the hyperellipse.

To construct the unitary matrix U, we introduce m − r instead of just m − n additional arbitrary orthonormal columns. The matrix V will also need n − r arbitrary orthonormal columns to extend the r columns determined by the geometry. The matrix Σ will now have r positive diagonal entries, with the remaining n − r equal to zero.

By the same token, the reduced SVD also makes sense for matrices A of less than full rank. One can take Uˆ to be m × n, with Σˆ of dimensions n × n with some zeros on the diagonal, or further compress the representation so that Uˆ is m × r and Σˆ is r × r and strictly positive on the diagonal.

TUHH Heinrich Voss Chapter 1 2005 56 / 60 Basic Ideas from Linear Algebra Formal definition of SVD

Let m and n be arbitrary (not necessarily m ≥ n). Given A ∈ Cm×n, a singular value decomposition (SVD) of A is a factorization A = UΣV H

where U ∈ Cm×m is unitary, V ∈ Cn×n is unitary, and Σ ∈ Cm×n is diagonal.

In addition, it is assumed that the diagonal entries σj of Σ are nonnegative and in nonincreasing order; that is, σl ≥ σ2 ≥ . . . σp, where p = min(m, n).

Note that the diagonal matrix Σ has the same shape as A even when A is not square, but U and V are always square unitary matrices.

It is clear that the image of the unit sphere in Rn under a map A = UΣV H must be a hyperellipse in Rm. The unitary map V H preserves the sphere, the diagonal matrix Σ stretches the sphere into a hyperellipse aligned with the canonical basis, and the final unitary map U rotates or reflects the hyperellipse without changing its shape. Thus, if we can prove that every matrix has an SVD, we shall have proved that the image of the unit sphere under any linear map is a hyperellipse. TUHH Heinrich Voss Chapter 1 2005 57 / 60 Basic Ideas from Linear Algebra Existence and Uniqueness

Theorem Every matrix A ∈ Cm×n has a singular value decomposition A = UΣV H .

Furthermore, the singular values σj are uniquely determined, and, if A is square and the σj are distinct, the left and right singular vectors {uj } and {vj } are uniquely determined up to complex signs (i.e., complex scalar factors of absolute value 1).

Proof: To prove existence of the SVD, we isolate the direction of the largest action of A, and then proceed by induction on the dimension of A.

n Set σ1 = kAk2. By a compactness argument, there must be a vector v1 ∈ C with kv1k2 = 1 and ku1k2 = σ1, where ul = Avl .

n Consider any extensions of v1 to an orthonormal basis {vj } of C and of ul to m an orthonormal basis {uj } of C , and let Ul and Vl denote the unitary matrices with columns uj and vj , respectively.

TUHH Heinrich Voss Chapter 1 2005 58 / 60 Basic Ideas from Linear Algebra Proof ct.

 H   H  u1 u1 H  .  ...   .  ...  U1 AV1 =  .  A v1 vn =  .  Av1 Avn H H um um  H  u1 σ w H  =  .  σ u ... Av  = 1 =: S  .  1 1 n 0 B H um where 0 is a column vector of dimension m − 1, w H is a row vector of dimension n − 1, and B ∈ Cm−1×n−1. σ w H  σ  q σ  1 1 ≥ σ2 + w H w = σ2 + w H w 1 w 1 1 w 0 B 2 2 q 2 H implying kSk2 ≥ σ1 + w w.

Since U1 and V1 are unitary, it follows that kSk2 = kAk2 = σ1, so this implies w = 0. TUHH Heinrich Voss Chapter 1 2005 59 / 60 Basic Ideas from Linear Algebra Proof ct.

If n = 1 or m = 1, we are done.

Otherwise, the submatrix B describes the action of A on the subspace H orthogonal to vl . By the induction hypothesis, B has an SVD B = U2Σ2V2 .

Now it is easily verified that

     H 1 0 σ1 0 1 0 H A = U1 V1 0 U2 0 Σ2 0 V2

is an SVD of A, completing the existence proof.

For the uniqueness claim, the geometric justification is straightforward: if the semiaxis lengths of a hyperellipse are distinct, then the semiaxes themselves are determined by the geometry, up to signs.

TUHH Heinrich Voss Chapter 1 2005 60 / 60