Lecture 5 Ch. 5, Norms for Vectors and Matrices

KTH ROYAL INSTITUTE OF TECHNOLOGY Norms for vectors and matrices — Why? Lecture 5 Problem: Measure size of vector or matrix. What is “small” and what is “large”? Ch. 5, Norms for vectors and matrices Problem: Measure distance between vectors or matrices. When are they “close together” or “far apart”? Emil Björnson/Magnus Jansson/Mats Bengtsson April 27, 2016 Answers are given by norms. Also: Tool to analyze convergence and stability of algorithms. 2 / 21 Vector norm — axiomatic definition Inner product — axiomatic definition Definition: LetV be a vector space over afieldF(R orC). Definition: LetV be a vector space over afieldF(R orC). A function :V R is a vector norm if for all A function , :V V F is an inner product if for all || · || → �· ·� × → x,y V x,y,z V, ∈ ∈ (1) x 0 nonnegative (1) x,x 0 nonnegative || ||≥ � �≥ (1a) x =0iffx= 0 positive (1a) x,x =0iffx=0 positive || || � � (2) cx = c x for allc F homogeneous (2) x+y,z = x,z + y,z additive || || | | || || ∈ � � � � � � (3) x+y x + y triangle inequality (3) cx,y =c x,y for allc F homogeneous || ||≤ || || || || � � � � ∈ (4) x,y = y,x Hermitian property A function not satisfying (1a) is called a vector � � � � seminorm. Interpretation: “Angle” (distance) between vectors. Interpretation: Size/length of vector. 3 / 21 4 / 21 Connections between norm and inner products Examples / Corollary: If , is an inner product, then x =( x,x ) 1 2 � The Euclidean norm (l ) onC n: �· ·� || || � � 2 is a vector norm. 2 2 2 1/2 x 2 = ( x1 + x 2 + + x n ) . Called: Vector norm derived from an inner product. || || | | | | ··· | | Satisfies parallelogram identity (Necessary and sufficient � The sum norm (l1), also called one-norm or Manhattan condition): norm: x = x + x + + x . 1 || || 1 | 1| | 2| ··· | n| ( + 2 + 2) = 2 + 2 x y x y x y � The max norm (l ): 2 || || || − || || || || || ∞ Theorem (Cauchy-Schwarz inequality): x = max x1 , x 2 ,..., x n || || ∞ {| | | | | |} x,y 2 x,x y,y The sum and max norms cannot be derived from an inner |� �| ≤ � �� We have equality iffx= cy for somec F (i.e., linearly product! ∈ dependent) 5 / 21 6 / 21 Unit balls for different norms Examples cont’d x 1 x 1 � Thel -norm onC n is (p 1): || || ... ≤ || || ... ≤ p ≥ The shape of the unit x2 x2 x = ( x p + x p + + x p)1/p || || p | 1| | 2| ··· | n| ball characterizes the � Norms may also be constructed from others, e.g.,: norm. x1 x1 x = max x , x || || {|| || p1 || ||p2 } x 1 x 1 ... ... or let nonsingularT M n and be a given, then Fill in which norm || || ≤ || || ≤ ∈ || · || x2 x2 x = Tx . corresponds to which || || T || || unit ball! (same notation sometimes used for x W =x ∗Wx) x1 x1 || || � Norms on infinite-dimensional vector spaces (e.g., all continuous functions on an interval[a,b]): Properties: Convex and compact (forfinite dimensions), “similarly” defined (sums become integrals) includes the origin. 7 / 21 8 / 21 Convergence Convergence: Finite dimension Assume: Vector spaceV overR orC. Corollary: For any vector norms and on a || · ||α || · ||β Definition: The sequence x (k) of vectors inV converges to finite-dimensionalV , there exists 0 C <C < { } ≤ m M ∞ x V with respect to iff such that ∈ || · || x(k) x 0 ask . C x x C x x V || − ||→ →∞ m|| || α ≤ || ||β ≤ M || || α ∀ ∈ Conclusion: Convergence in one norm convergence in all Infinite dimension: ⇒ norms. � Sequence can converge in one norm, but not another. Note: Result also holds for pre-norms, without the � Important to state choice of norm. triangle inequality. Definition: Two norms are equivalent if convergence in one of the norms always implies convergence in the other. Conclusion: All norms are equivalent in thefinite dimensional case. 9 / 21 10 / 21 Convergence: Cauchy sequence Dual norms Definition: The dual norm of is Definition: A sequence x (k) inV is a Cauchy sequence �·� { } D y ∗x with respect to if for every�> 0 there is aN � >0 y = max Rey ∗x= max y ∗x = max | | || · || � � x: x =1 x: x =1 | | x=0 x such that � � � � � � � ( ) ( ) x k1 x k2 � Norm Dual norm || − ||≤ Examples: for allk ,k N . 2 2 1 2 ≥ � �·� �·� 1 �·� �·�∞ Theorem: A sequence x (k) in afinite dimensionalV 1 { } �·�∞ �·� converges to a vector inViff it is a Cauchy sequence. � Dual of dual norm is the original norm. � Euclidean norm is its own dual. � Generalized Cauchy-Schwarz: y x x y D | ∗ |≤� �� 11 / 21 12 / 21 Vector norms applied to matrices Matrix norm — axiomatic definition M is a vector space (of dimensionn 2) Definition: :M R is a matrix norm if for all n ||| · ||| n → A,B M , Conclusion: We can apply vector norms to matrices. ∈ n (1) A 0 nonnegative Examples: Thel 1 norm: A 1 = , aij . ||| |||≥ || || i j | | (1a) A =0iffA=0 positive Thel 2 norm (Euclidean/Frobenius norm): ||| ||| 1/2 � A = a 2 . (2) cA = c A for allc C homogeneous || ||2 i,j | ij | ||| ||| | | ||| ||| ∈ Thel norm: A = maxi,j aij . (3) A+B A + B triangle inequality ∞ � � || ||∞� | | ||| |||≤ ||| ||| ||| ||| (4) AB A B submultiplicative ||| |||≤ ||| ||| ||| ||| Observation: Matrices have certain properties (e.g., multiplication). Observations: � All vector norms satisfy (1)-(3), some May be useful to define particular matrix norms. may satisfy (4). � Generalized matrix norm if not satisfying (4). 13 / 21 14 / 21 Which vector norms are matrix norms? Induced matrix norms A and A are matrix norms. Definition: Let be a vector norm onC n. The matrix 1 2 || · || || || || || norm A is not a matrix norm (but a generalized matrix A = max Ax || ||∞ ||| ||| x =1 || || norm). || || is induced by . However, A =n A is a matrix norm. || · || ||| ||| || || ∞ Properties of induced norms : ||| · ||| � I = 1. ||| ||| � The only matrix normN(A) withN(A) A for ≤ ||| ||| allA M ∈ n isN( ) = . · ||| · ||| Last property called minimal matrix norm. 15 / 21 16 / 21 Examples Application: Computing Spectral radius The maximum column sum (induced byl ): Recall: Spectral radius:ρ(A) = max λ :λ σ(A) . 1 {| | ∈ } Not a matrix norm, but very related. A 1 = max aij ||| ||| j | | Theorem: For any matrix norm andA M n, �i ||| · ||| ∈ ρ(A) A . The spectral norm (induced byl 2): ≤ ||| ||| Lemma: For anyA M n and�> 0, there is such that A 2 = max √λ:λ σ(A ∗A) ∈ ||| · ||| ||| ||| { ∈ } ρ(A) A ρ(A) +� The maximum row sum (induced byl ): ≤ ||| |||≤ ∞ Corollary: For any matrix norm andA M , ||| · ||| ∈ n A = max aij ρ(A) = lim Ak 1/k ||| |||∞ i | | k ||| ||| �j →∞ 17 / 21 18 / 21 Application: Convergence ofA k Application: Power series k Lemma: If there is a matrix norm with A < 1 then Theorem: k∞=0 ak A converges if there is a matrix norm k ||| ||| k limk A = 0. such that ∞= ak A converges. →∞ � k 0 | | ||| ||| k Theorem: limk A =0iffρ(A)< 1. Corollary: If �A < 1 for some matrix norm, thenI A is →∞ ||| ||| − Matrix extension of lim xk =0iff x < 1. invertible and k ∞ →∞ | | 1 k (I A) − = A − �k=0 1 k Matrix extension of(1 x) = ∞ x for x < 1. − − k=0 | | 1 1 Useful to compute “error” between�A − and(A+E) − . 19 / 21 20 / 21 Unitarily invariant and condition number Definition: A matrix norm is unitarily invariant if UAV = A for allA M and all unitary matrices ||| ||| ||| ||| ∈ n U,V M . ∈ n Examples: Frobenius norm and spectral norm . || · ||2 ||| · |||2 Definition: Condition number for matrix inversion with respect to the matrix norm of nonsingularA M is ||| · ||| ∈ n 1 κ(A) = A− A ||| ||| ||| ||| Frequently used in perturbation analysis in numerical linear algebra. Observation: κ(A) 1 (from submultiplicative property). ≥ Observation: For unitarily invariant norms:κ(UAV)=κ(A). 21 / 21.

Lecture 5 Ch. 5, Norms for Vectors and Matrices

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors

7.4 Matrix Norms and Condition Numbers This Section Considers the Accuracy of Computed Solutions to Linear Systems