<<

NLA Reading Group Spring’13 by İsmail Arı  푏 is a linear combination of the columns of 퐴

2 Let us re-write the -vector multiplication

“As mathematicians, we are used to viewing the formula 퐴푥 = 푏 as a statement that 퐴 acts on 푥 to produce 푏 The new formula, by contrast, suggests the interpretation that 푥 acts on 퐴 to produce 푏

3 The map from vectors of coefficients of polynomials 푝 of degree < 푛 to vectors

(푝(푥1), 푝(푥2), … , 푝(푥푚)) of sampled polynomial values is linear.

The product 퐴푐 gives the sampled polynomial values:

4 Do not see 퐴푐 as 푚 distinct scalar summations. Instead, see 퐴 as a matrix of columns, each giving sampled values of a monomial*,

Thus, 퐴푐 is a single vector summation that at once gives a linear combination of these monomials,

*In , a monomial is roughly speaking, a polynomial which has only one term.  each column of 퐵 is a linear combination of the columns of 퐴

Thus 푏푗 is a linear combinations of the columns 푎푘 with coefficients 푐푘푗

6 7 The matrix 푅 is a discrete analogue of an indefinite integral operator

8 range(퐴) is the space spanned by the columns of 퐴

null(퐴) is the set of vectors that satisfy 퐴푥 = 0, where 0 is the 0-vector in ℂ푚

The column/row of a matrix is the of its column/row space. Column rank always equals row rank. So, we call this as rank of the matrix.

A matrix 퐴 of size m-by-n with m ≥ n has full rank iff it maps no two distinct vectors to the same vector.

9 A nonsingular or invertible matrix is a of full rank.

퐼 is the m-by-m identity. The matrix 푍 is the inverse of 퐴.

10 For an m-by-m matrix 퐴, the following conditions are equivalent:

We mention the determinant, though a convenient notion theoretically, rarely finds a useful role on numerical algorithms.

11 Do not think 푥 as the result of applying 퐴−1 to 푏. Instead, think it as the unique vector that satisfies the equation 퐴푥 = 푏.

Multiplication by 퐴−1 is a change of basis operation.

퐴−1푏 is the vector of coefficients of the expansion of 푏 in the basis of columns of 퐴.

12 NLA Reading Group Spring’13 by İsmail Arı The complex conjugate of a scalar 푧, written 푧 or 푧∗, is obtained by negating its imaginary part.

The hermitian conjugate or adjoint of an m-by-n matrix 퐴, written 퐴∗, is the n-by-m matrix whose i, j entry is the complex conjugate of the j, i entry of 퐴.

If 퐴 = 퐴∗, 퐴 is hermitian.

For real 퐴, adjoint is known as transpose and shown as 퐴푇.

If 퐴 = 퐴푇, then 퐴 is symmetric.

14 Euclidean length of 푥

The inner product is bilinear, i.e. linear in each vector separately:

15 A pair of vectors 푥 and 푦 are orthogonal if 푥∗푦 = 0.

Two sets of vectors 푋 and 푌 are orthogonal if every 푥 ∈ 푋 is orthogonal to 푦 ∈ 푌.

A set of nonzero vectors 푆 is orthogonal if its elements are pairwise orthogonal.

A set of nonzero vectors 푆 is orthonormal if it is orthogonal, in addition, every 푥 ∈ 푆 has 푥 = 1.

16 The vectors in an orthogonal set 푆 are linearly independent.

Sketch of the proof:  Assume than they were not independent and propose a nonzero vector by linear combination of the members of 푆  Observe that its length should be larger than 0  Use the bilinearity of inner products and the orthogonality of 푆 to contradict the assumption

⇒ If an orthogonal set 푆 ⊆ ℂ푚 contains 푚 vectors, then it is a basis for ℂ푚.

17 Inner products can be used to decompose arbitrary vectors into orthogonal components.

Assume 푞1, 푞2, … , 푞푛 : an orthonormal set 푣: an arbitrary vector

∗ Utilizing the scalars 푞푗 푣 as coordinates in an expansion, we find that

is orthogonal to 푞1, 푞2, … , 푞푛

Thus we see that 푣 can be decomposed into 푛 + 1 orthogonal components:

18 1 2

∗ 1 We view 푣 as a sum of coefficients 푞푗 푣 times vectors 푞푖.

We view 푣 as a sum of orthogonal projections of 푣 onto the various 2 directions of 푞푖. The 푖th projection operation is achieved by the very ∗ special rank-one matrix 푞푖푞푖 .

19 If 푄∗ = 푄−1, 푄 is unitary.

20 푄∗푏 is the vector of coefficients of the expansion of 푏 in the basis of columns of 퐴.

21 Multiplication by a or its adjoint preserve geometric structure in the Euclidean sense, because inner products are preserved.

The invariance of inner products means that angles between vectors are preserved, and so are their lengths:

In the real case, multiplication by an orthonormal matrix 푄 corresponds to a rigid rotation (if det푄 = 1) or reflection (if det푄 = −1) of the .

22 NLA Reading Group Spring’13 by İsmail Arı The essential notions of size and distance in a vector space are captured by norms.

In order to conform a reasonable notion of length, a must satisfy

for all vectors 푥 and 푦 and for all scalars 훼 ∈ ℂ.

24 The closed unit ball 푥 ∈ ℂ푚: 푥 ≤ 1 corresponding to each norm is illustrated to the right for the case 푚 = 2.

25 Introduce the diagonal matrix 푊 whose 푖th diagonal entry is the weight 푤푖 ≠ 0.

Example: a weighted 2-norm

The most important norms in this book are the unweighted 2-norm and its induced matrix form.

26 An 푚 × 푛 matrix can be viewed as a vector in an 푚푛-dimensional space: each of the 푚푛 entries of the matrix is an independent coordinate. ⇒ Any 푚푛-dimensional norm can be used for measuring the “size” of such a matrix.

However, certain special matrix norms are more useful than the vector norms.

These are the induced matrix norms, defined in terms of the behavior of a matrix as an operator between its normed domain and range spaces.

27 푚×푛 Given vector norms ⋅ (푛) and ⋅ (푚) on the domain and range of 퐴 ∈ ℂ , respectively, the induced matrix norm 퐴 (푚,푛) is the smallest number 퐶 for which

In other words, it is the maximum factor by which 퐴 can stretch a vector 푥.

28 29 30 For any 푚 × 푛 matrix 퐴, 퐴 1 is equal to the maximum column sum of 퐴.

Consider 푥 be in

By choosing 푥 = 푒 , where 푗 maximizes 푎 , we attain: 푗 푗 1

31 For any 푚 × 푛 matrix 퐴, 퐴 ∞ is equal to the maximum row sum of 퐴.

32 1 1 Let 푝 and 푞 satisfy + = 1, with 1 ≤ 푝, 푞 ≤ ∞. Then, the Hölder inequality states 푝 푞 that, for any vectors 푥 and 푦:

The Cauchy-Schwartz inequality is a special case 푝 = 푞 = 2:

33 Consider 퐴 = 푎∗ where 푎 is a column vector. For any 푥, we have:

This bound is tight: observe that

Therefore, we have

34 Consider 퐴 = 푢푣∗, where 푢 is an 푚-vector and 푣 is an 푛-vector. For any 푛-vector 푥, we can bound

Therefore, we have

This inequality is an equality for the case 푥 = 푣.

35 Therefore, the induced norm of 퐴퐵 must satisfy

36 37 The most important matrix norm which is not induced by a vector norm is the Hilbert-Schmidt or Frobenius norm, defined by

Observe that this s the same as the 2-norm of the matrix when viewed as an 푚푛- dimensional vector. Alternatively, we can write

38 Let 퐶 = 퐴퐵, then

39 The matrix 2-norm and Frobenius norm are invariant under multiplication by unitary matrices.

This fact is still valid if 푄 is generalized to a rectangular matrix with orthonormal columns. Recall transformation used in PCA.

40