Chapter 4: Matrix Norms the Analysis of Matrix-Based Algorithms Often Requires Use of Matrix Norms

EE448/528 Version 1.0 John Stensby Chapter 4: Matrix Norms The analysis of matrix-based algorithms often requires use of matrix norms. These algorithms need a way to quantify the "size" of a matrix or the "distance" between two matrices. For example, suppose an algorithm only works well with full-rank, nń matrices, and it produces inaccurate results when supplied with a nearly rank deficit matrix. Obviously, the concept of e- rank (also known as numerical rank), defined by rank(A,e) = min rank(B) (4-1) A-B £ e is of interest here. All matrices B that are "within" e of A are examined when computing the e- rank of A. We define a matrix norm in terms of a given vector norm; in our work, we use only the p- r vector norm, denoted as X p . Let A be an mń matrix, and define r ArX p A p = supr , (4-2) X¹0 X p where "sup" stands for supremum, also known as least upper bound. Note that we use the same × p notation for both vector and matrix norms. However, the meaning should be clear from context. Since the matrix norm is defined in terms of the vector norm, we say that the matrix norm is subordinate to the vector norm. Also, we say that the matrix norm is induced by the vector norm. r Now, since X p is a scalar, we have r r r ArX p A = supr = supr AX / X p . (4-3) p p X¹0 X p X¹0 CH4.DOC Page 4-1 EE448/528 Version 1.0 John Stensby r r In (4-3), note that X / X p has unit length; for this reason, we can express the norm of A in terms of a supremum over all vectors of unit length, r A p = rsup AX p . (4-4) X p =1 r r That is, A p is the supremum of AX p on the unit ball X p = 1. Careful consideration of (4-2) reveals that r r AX p £ A p X p (4-5) r r r r for all X. However, AX p is a continuous function of X, and the unit ball X p = 1 is closed and bounded (real analysis books, it is said to be compact). Now, on a closed and bounded set, a continuous function always achieves its maximum and minimum values. Hence, in the definition of the matrix norm, we can replace the "sup" with "max" and write r AX r r r p r A p = max = max AX p . (4-6) X¹0 X p X p =1 When computing the norm of A, the definition is used as a starting point. The process has two steps. r r r 1) Find a "candidate" for the norm, call it K for now, that satisfies AX £ K X for all X. r r r p p 2) Find at least one nonzero X0 for which AX0 p = K X0 p Then, you have your norm: set A p = K. MatLab's Matrix Norm Functions From an application standpoint, the 1-norm, 2-norm and the ¥-norm are amoung the most CH4.DOC Page 4-2 EE448/528 Version 1.0 John Stensby important; MatLab computes these matrix norms. In MatLab, the 1-norm, 2-norm and ¥-norm are invoked by the statements norm(A,1), norm(A,2), and norm(A,inf), respectively. The 2-norm is the default in MatLab. The statement norm(A) is interpreted as norm(A,2) by MatLab. Since the 2-norm used in the majority of applications, we will adopt it as our default. In what follows, an "un-designated" norm A is to be intrepreted as the 2-norm A 2 . The Matrix 1-Norm Recall that the vector 1-norm is given by r n X 1 = å xi . (4-7) i=1 Subordinate to the vector 1-norm is the matrix 1-norm F I a A 1 = max Gå ij J . (4-8) j H i K That is, the matrix 1-norm is the maximum of the column sums. To see this, let mń matrix A be represented in the column format r r r L A = A1 A2 An . (4-9) Then we can write r r r r r n r L AX = A1 A2 An X = å Akxk , (4-10) k=1 CH4.DOC Page 4-3 EE448/528 Version 1.0 John Stensby r where xk, 1 £ k £ n, are the components of arbitrary vector X. The triangle inequality and standard analysis applied to the norm of (4-10) yields r n r x r = A x £ x A AX 1 å k k å k k 1 k=1 1 k=1 r F n I £ max A x (4-11) j 1G å k J j H k=1 K r r = max A j X . j 1 1 With the development of (4-11), we have completed step #1 for computing the matrix 1-norm. That is, we have found a constant r K = max A j j 1 r r r such that AX 1 £ K X 1 for all X. Step 2 requires us to find at least one vector for which we r r have equality in (4-11). But this is easy; to maximize AX 1, it is natural to select the X0 that puts "all of the allowable weight" on the component that will "pull-out" the maximum column sum. That is, the "optimum" vector is r L L X0 = 0 0 0 1 0 0 jth-position where j is the index that satisfies r r A j = max Ak 1 k 1 CH4.DOC Page 4-4 EE448/528 Version 1.0 John Stensby (the sum of the magnitudes in the jth column is equal to, or larger than, the sum of the magnitudes r in any column). When X0 is used, we have equality in (4-11), and we have completed step #2, so (4-8) is the matrix 1-norm. The Matrix ¥-Norm Recall that the vector ¥-norm is given by r X ¥ = max xk , (4-12) k the vector's largest component. Subordinate to the vector ¥-norm is the matrix ¥-norm F I A ¥ = max Gå aij J . (4-13) i G J H j K That is, the matrix ¥-norm is the maximum of the row sums. To see this, let A be an arbitrary mń matrix and compute å a1kxk k å a2kxk r k F I AX = = max aikxk £ max aikxk ¥ M i å i Gå J k H k K (4-14) å amkxk k ¥ F I £ max aik max xk i Gå J k H k K CH4.DOC Page 4-5 EE448/528 Version 1.0 John Stensby F I r = max Gå aik J X ¥ i H k K F I We have completed the first step. We have found a constant K = max Gå aik J for which i H k K r r AX ¥ £ K X ¥ (4-15) r for all X. r Step #2 requires that we find a non-zero X0 for which equality holds in (4-14) and (4-15). r A close examination of these formulas leads to the conclusion that equality prevails if X0 is defined to have the components aik xk = , aik ¹ 0, 1 £ k £ n, aik (4-16) = = 1 , aik 0 , (the overbar denotes complex conjugate) where i is the index for the maximum row sum. That is, in (4-16), use index i for which å aik ³ å a jk , i ¹ j . (4-17) k k Hence, (4-13) is the matrix ¥-norm as claimed. The Matrix 2-Norm Recall that the vector 2-norm is given by CH4.DOC Page 4-6 EE448/528 Version 1.0 John Stensby r n r r 2 X 2 = å xi = X, X . (4-18) i=1 Subordinate to the vector 2-norm is the matrix 2-norm * A 2 = largest eigenvalue of A A . (4-19) Due to this connection with eigenvalues, the matrix 2-norm is called the spectral norm. To see (4-19) for an arbitrary mń matrix A, note that A*A is nń and Hermitian. By Theorem 4.2.1 (see Appendix 4.1), the eigenvalues of A*A are real-valued. Also, A*A is at least r r r r r positive semi-definite since X*(A*A)X = (AX)*(AX) ³ 0 for all X. Hence, the eigenvalues of A*A are both real-valued and non-negative; denote them as 2 2 2 L 2 s1 ³ s2 ³ s3 ³ ³ sn ³ 0 . (4-20) 2 Note that these eigenvalues are arranged according to size with s1 being the largest. These eigenvalues are known as the singular values of matrix A. Corresponding to these eigenvalues are n orthonormal (hence, they are independent) r r r eigenvectors U1, U2, ... , Un with r r 2 (A*A)Uk = (sk )Uk , 1 £ k £ n. (4-21) The n eigenvectors form the columns of a unitary nń matrix U that diagonalizes matrix A*A under similarity (matrix U*(A*A)U is diagonal with eigenvalues (4-20) on the diagonal). r r r Since the n eigenvectors U1, U2, ... , Un are independent, they can be used as a basis, and r vector X can be expresssed as CH4.DOC Page 4-7 EE448/528 Version 1.0 John Stensby r n r X = å ckUk , (4-22) k=1 r r where the ck are X-dependent constants.

Chapter 4: Matrix Norms the Analysis of Matrix-Based Algorithms Often Requires Use of Matrix Norms

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Lecture 5 Ch. 5, Norms for Vectors and Matrices

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors