EE531 - System Identification Jitkomut Songsiri 3. Reviews on Linear algebra

matrices and vectors • linear equations • range and nullspace of matrices • norm and inner product spaces • factorizations • function of vectors, gradient and Hessian • function of matrices •

3-1 Vector notation n-vector x: x1 x x = 2  .  x   n   also written as x =(x , x ,...,x ) • 1 2 n set of n-vectors is denoted Rn (Euclidean space) • x : ith element or component or entry of x • i x is also called a column vector • y = y y y is called a row vector • 1 2 ··· n   unless stated otherwise, a vector typically means a column vector

Reviews on Linear algebra 3-2 Special vectors zero vectors: x = (0, 0,..., 0) all-ones vectors: x = (1, 1, , 1) (we will denote it by 1) ··· standard unit vectors: ek has only 1 at the kth entry and zero otherwise

1 0 0 e1 = 0 , e2 = 1 , e3 = 0 0 0 1       (standard unit vectors in R3) unit vectors: any vector u whose norm (magnitude) is 1, i.e.,

u , u2 + u2 + + u2 = 1 k k 1 2 ··· n q example: u = (1/√2, 2/√6, 1/√2) −

Reviews on Linear algebra 3-3 Inner products definition: the inner product of two n-vectors x, y is

x y + x y + + x y 1 1 2 2 ··· n n also known as the dot product of vectors x, y notation: xT y properties ✎

(αx)T y = α(xT y) for scalar α • (x + y)T z = xT z + yT z • xT y = yT x •

Reviews on Linear algebra 3-4 Euclidean norm

x = x2 + x2 + + x2 = √xT x k k 1 2 ··· n properties q

also written x to distinguish from other norms • k k2 αx = α x for scalar α • k k | |k k x + y x + y (triangle inequality) • k k ≤ k k k k x 0 and x = 0 only if x = 0 • k k≥ k k interpretation

x measures the magnitude or length of x • k k x y measures the distance between x and y • k − k

Reviews on Linear algebra 3-5 Matrix notation an m n matrix A is defined as ×

a11 a12 ... a1n a21 a22 ... a2n A =  . . .  , or A =[aij]m n ...... × a a ... a   m1 m2 mn  

a are the elements, or coefficients, or entries of A • ij m n set of m n-matrices is denoted R × • × A has m rows and n columns (m, n are the dimensions) • the (i, j) entry of A is also commonly denoted by A • ij A is called a square matrix if m = n •

Reviews on Linear algebra 3-6 Special matrices : A = 0 0 0 0 0 0 ··· 0 A = . . ···... 0 0 0 0  ···    aij = 0, for i = 1,...,m,j = 1,...,n : A = I 1 0 0 0 1 ··· 0 A = . . ···... 0 0 0 1  ···  a square matrix with a = 1,a = 0 for i = j  ii ij 6

Reviews on Linear algebra 3-7 : a square matrix with a = 0 for i = j ij 6

a1 0 0 0 a ··· 0 A = 2  . . ···... .   0 0 a   ··· n   : a square matrix with zero entries in a triangular part

uppertriangular lowertriangular

a11 a12 a1n a11 0 0 0 a ··· a a a ··· 0 A = 22 2n A = 21 22  . . ···...   . . ···...   0 0 a  a a a   ··· nn  n1 n2 ··· nn     a = 0 for i j a = 0 for i j ij ≥ ij ≤

Reviews on Linear algebra 3-8 notation example: 2 2-block matrix A × B C A = D E   for example, if B,C,D,E are defined as

2 1 0 1 7 B = , C = ,D = 0 1 , E = 4 1 1 3 8 1 9 1 − −         then A is the matrix 21 0 1 7 A = 38 1 9 1 0 1 4 1 1 − − note: dimensions of the blocks must be compatible 

Reviews on Linear algebra 3-9 Column and Row partitions write an m n-matrix A in terms of its columns or its rows × T b1 T b2 A = a1 a2 an =   ··· .   bT   m   a for j = 1, 2,...,n are the columns of A • j bT for i = 1, 2,...,m are the rows of A • i 1 2 1 example: A = 4 9 0   1 2 1 a = , a = , a = , bT = 1 2 1 , bT = 4 9 0 1 4 2 9 3 0 1 2          

Reviews on Linear algebra 3-10 Matrix-vector product product of m n-matrix A with n-vector x ×

a11x1 + a12x2 + ... + a1nxn a x + a x + ... + a x Ax = 21 1 22 2 2n n  .  a x + a x + ... + a x   m1 1 m2 2 mn n   dimensions must be compatible: # columns in A = # elements in x • if A is partitioned as A = a a a , then 1 2 ··· n   Ax = a x + a x + + a x 1 1 2 2 ··· n n Ax is a linear combination of the column vectors of A • the coefficients are the entries of x •

Reviews on Linear algebra 3-11 Product with standard unit vectors post-multiply with a column vector

0 a11 a12 ... a1n 0 a1k . a21 a22 ... a2n . a2k Aek =   =   = the kth column of A ...... 1 .   a a ... a  . a   m1 m2 mn    mk   0       pre-multiply with a row vector

a11 a12 ... a1n a a ... a eT A = 0 0 1 0  21 22 2n  k ··· ··· ......   am1 am2 ... amn   = a a a = the kth row of A  k1 k2 ··· kn  

Reviews on Linear algebra 3-12

Definition: trace of a square matrix A is the sum of the diagonal entries in A

tr(A)= a + a + + a 11 22 ··· nn example: 2 1 4 A = 0 1 5 3− 4 6 trace of A is 2 1+6=7   − properties ✎

tr(AT )= tr(A) • tr(αA + B)= α tr(A)+ tr(B) • tr(AB)= tr(BA) •

Reviews on Linear algebra 3-13 Eigenvalues

n n λ C is called an eigenvalue of A C × if ∈ ∈ det(λI A) = 0 − equivalent to:

there exists nonzero x Cn s.t. (λI A)x = 0, i.e., • ∈ − Ax = λx

any such x is called an eigenvector of A (associated with eigenvalue λ)

there exists nonzero w Cn such that • ∈ wT A = λwT

any such w is called a left eigenvector of A

Reviews on Linear algebra 3-14 Computing eigenvalues

(λ) = det(λI A) is called the characteristic polynomial of A • X − (λ) = 0 is called the characteristic equation of A • X eigenvalues of A are the root of characteristic polynomial •

Reviews on Linear algebra 3-15 Properties

if A is n n then (λ) is a polynomial of order n • × X if A is n n then there are n eigenvalues of A • × even when A is real, eigenvalues and eigenvectors can be complex, e.g., • 2 0 1 2 1 − A = − , A = 6 2 0 1 2 − −    19 5 4 −   if A and λ are real, we can choose the associated eigenvector to be real • if A is real then eigenvalues must occur in complex conjugate pairs • if x is an eigenvector of A, so is αx for any α C, α = 0 • ∈ 6 an eigenvector of A associated with λ lies in (λI A) • N −

Reviews on Linear algebra 3-16 Important facts denote λ(A) an eigenvalue of A

λ(αA)= αλ(A) for any α C • ∈ tr(A) is the sum of eigenvalues of A • det(A) is the product of eigenvalues of A • A and AT share the same eigenvalues ✎ • λ(AT )= λ(A) ✎ • λ(AT A) 0 • ≥ λ(Am)=(λ(A))m for any integer m • A is invertible if and only if λ = 0 is not an eigenvalue of A ✎ •

Reviews on Linear algebra 3-17 Eigenvalue decomposition if A is diagonalizable then A admits the decomposition

1 A = TDT −

D is diagonal containing the eigenvalues of A • columns of T are the corresponding eigenvectors of A • note that such decomposition is not unique (up to scaling in T ) • recall: A is diagonalizable iff all eigenvectors of A are independent

Reviews on Linear algebra 3-18 Inverse of matrices

Definition: a square matrix A is called invertible or nonsingular if there exists B s.t.

AB = BA = I

B is called an inverse of A • it is also true that B is invertible and A is an inverse of B • if no such B can be found A is said to be singular • assume A is invertible

an inverse of A is unique • 1 the inverse of A is denoted by A− •

Reviews on Linear algebra 3-19 assume A, B are invertible

Facts ✎

1 1 1 (αA)− = α− A− for nonzero α • T T 1 1 T A is also invertible and (A )− =(A− ) • 1 1 1 AB is invertible and (AB)− = B− A− • 1 1 1 (A + B)− = A− + B− • 6

Reviews on Linear algebra 3-20 Inverse of 2 2 matrices × the matrix a b A = c d   is invertible if and only if ad bc = 0 − 6 and its inverse is given by

1 1 d b A− = − ad bc c a − −  example: 2 1 1 1 3 1 A = , A− = − 1 3 7 1 2 −   

Reviews on Linear algebra 3-21 Invertible matrices

✌ Theorem: for a square matrix A, the following statements are equivalent

1. A is invertible

2. Ax = 0 has only the trivial solution (x = 0)

3. the reduced echelon form of A is I

4. A is invertible if and only if det(A) = 0 6

Reviews on Linear algebra 3-22 Inverse of special matrices diagonal matrix a1 0 0 0 a ··· 0 A = 2  . . ···... .   0 0 a   ··· n   a diagonal matrix is invertible iff the diagonal entries are all nonzero

a = 0, i = 1, 2,...,n ii 6 the inverse of A is given by

1/a1 0 0 0 1/a ··· 0 A 1 = 2 −  . . ···... .   0 0 1/a   ··· n   1 the diagonal entries in A− are the inverse of the diagonal entries in A

Reviews on Linear algebra 3-23 triangular matrix:

uppertriangular lowertriangular

a11 a12 a1n a11 0 0 0 a ··· a a a ··· 0 A = 22 2n A = 21 22  . . ···...   . . ···...   0 0 a  a a a   ··· nn  n1 n2 ··· nn     a = 0 for i j a = 0 for i j ij ≥ ij ≤ a triangular matrix is invertible iff the diagonal entries are all nonzero

a = 0, i = 1, 2,...,n ii 6 ∀

product of lower (upper) triangular matrices is lower (upper) triangular • the inverse of a lower (upper) triangular matrix is lower (upper) triangular •

Reviews on Linear algebra 3-24 : A = AT ✎

for any square matrix A, AAT and AT A are always symmetric •

1 if A is symmetric and invertible, then A− is symmetric • if A is invertible, then AAT and AT A are also invertible •

Reviews on Linear algebra 3-25 Symmetric matrix

n n T A R × is called symmetric if A = A ∈

Facts: if A is symmetric

all eigenvalues of A are real • all eigenvectors of A are orthogonal • A admits a decomposition • A = UDU T where U T U = UU T = I (U is unitary) and D is diagonal

(of course, the diagonals of D are eigenvalues of A)

Reviews on Linear algebra 3-26

n n a matrix U R × is called unitary if ∈ U T U = UU T = I

1 1 cos θ sin θ example: 1 − , − √2 1 1 sin θ cos θ     Facts:

a real unitary matrix is also called orthogonal • 1 T a unitary matrix is always invertible and U − = U • columns vectors of U are mutually orthogonal • norm is preserved under a unitary transformation: • y = Ux = y = x ⇒ k k k k

Reviews on Linear algebra 3-27

n n A R × is an idempotent (or projection) matrix if ∈ A2 = A examples: identity matrix

Facts: Let A be an idempotent matrix

eigenvalues of A are all equal to 0 or 1 • I A is idempotent • − if A = I, then A is singular • 6

Reviews on Linear algebra 3-28

a square matrix P is a projection matrix if and only if P 2 = P

P is a linear transformation from Rn to a subspace of Rn, denoted as S • columns of P are the projections of standard basis vectors • S is the range of P • from P 2 = P , it means if P is applied twice on a vector in S, it gives the • same vector examples: • 1 0 1/2 1/2 P = ,P = 0 0 1/2 1/2    

Reviews on Linear algebra 3-29 Orthogonal projection matrix a projection matrix is called orthogonal if and only if P = P T

P is bounded, i.e., P x x • k k ≤ k k P x 2 = xT P T P x = xT P 2x = xT P x P x x k k2 ≤ k kk k (by Cauchy-Schwarz inequality – more on this later) if P is an orthogonal projection onto a line spanned by a unit vector u, • P = uuT

(we see that (P ) = 1 as the dimension of a line is 1)

T 1 T another example: P = A(A A)− A for any matrix A •

Reviews on Linear algebra 3-30

n n A R × is nilpotent if ∈ Ak = 0, for some positive integer k

Example: any triangular matrices with 0’s along the main diagonal

0 1 0 0 0 1 0 0 1 ··· 0 , (shift matrix) 0 0 . . . ···... .   0 0 0 1  ···    also related to deadbeat control for linear discrete-time systems Facts:

the characteristic equation for A is λn = 0 • all eigenvalues are 0 •

Reviews on Linear algebra 3-31 Positive definite matrix a symmetric matrix A is positive semidefinite, written as A 0 if  xT Ax 0, x Rn ≥ ∀ ∈ and positive definite, written as A 0 if ≻ xT Ax > 0, for all nonzero x Rn ∈

Facts: A 0 if and only if 

all eigenvalues of A are non-negative • all principle minors of A are non-negative •

Reviews on Linear algebra 3-32 1 1 example: A = − 0 because 1 2  −  1 1 x xT Ax = x x 1 1 2 1− 2 x −  2 = x 2 + 2x2 2x x 1 2 − 1 2 =(x x )2 + x2 0 1 − 2 2 ≥ or we can check from

eigenvalues of A are 0.38 and 2.61 (real and positive) • 1 1 the principle minors are 1 and − = 1 (all positive) • 1 2 −

note: A 0 does not mean all entries of A are positive! 

Reviews on Linear algebra 3-33 Properties: if A 0 then 

all the diagonal terms of A are nonnegative • all the leading blocks of A are positive semidefinite • BABT 0 for any B •  if A 0 and B 0, then so is A + B •   A has a square root, denoted as a symmetric A1/2 such that • A1/2A1/2 = A

Reviews on Linear algebra 3-34 Schur complement a consider a symmetric matrix X partitioned as

AB X = CD   Schur complement of D in X is defined as • 1 S = A BD− C, if det D = 0 − 6 we can show that det X = det D det S

Schur complement of A in X is defined as • 1 S = D CA− B, if det A = 0 − 6 we can show that det X = det A det S

Reviews on Linear algebra 3-35 Matrix inversion lemma the inverse of X can be expressed with the terms involving Schur complement an LDU decomposition is

1 1 AB IBD− A BD− C 0 I 0 = − 1 CD 0 I 0 D D− CI      

1 this proves that the inverse of the whole box is det(A BD− C)det D − 1 if D and S = A BD− C are invertible, the inverse of the whole block is −

1 1 1 AB − I 0 S− 0 I BD− = 1 1 − CD D− CI 0 D− 0 I   −    1 1 1 S− S− BD− = 1 1 1 − 1 1 1 D− CS− D− + D− CS− BD− − 

Reviews on Linear algebra 3-36 Schur complement of positive semidefinite matrix

AB 1 T T 1 X = ,S = A BD− B ,S = D B A− B, BT D 1 − 2 −   Facts:

X 0 if and only if D 0 and S 0 • ≻ ≻ 2 ≻ if D 0 then X 0 if and only if S 0 • ≻  2  det X = det D det S = det A det S • 1 2 analogous results for S2

X 0 if and only if A 0 and S 0 • ≻ ≻ 1 ≻ if A 0 then X 0 if and only if S 0 • ≻  1 

Reviews on Linear algebra 3-37 Linear equations a general linear system of m equations with n variables is described by

a x + a x + + a x = b 11 1 12 2 ··· 1n n 1 a x + a x + + a x = b 21 1 22 2 ··· 2n n 2 . = . a x + a x + + a x = b m1 1 m2 2 ··· mn n m

where aij,bj are constants and x1, x2,...,xn are unknowns

equations are linear in x , x ,...,x • 1 2 n existence and uniqueness of a solution depend on a and b • ij j

Reviews on Linear algebra 3-38 Linear equation in matrix form the linear system of m equations in n variables

a x + a x + + a x = b 11 1 12 2 ··· 1n n 1 a x + a x + + a x = b 21 1 22 2 ··· 2n n 2 . = . a x + a x + + a x = b m1 1 m2 2 ··· mn n m in matrix form: Ax = b where

a11 a12 ... a1n x1 b1 a a ... a x b A = 21 22 2n , x = 2 , b = 2  ......   .   .  a a ... a  x  b   m1 m2 mn  n  m      

Reviews on Linear algebra 3-39 Three types of linear equations

square if m = n (A is square) • a a x b 11 12 1 = 1 a a x b  21 22 2  2

underdetermined if m < n (A is fat) •

x1 a11 a12 a13 b1 x2 = a21 a22 a23   b2   x3     overdetermined if m > n (A is skinny) •

a11 a12 b1 x1 a21 a22 = b2   x2   a31 a32   b3    

Reviews on Linear algebra 3-40 Existence and uniqueness of solutions existence:

no solution • a solution exists • uniqueness: – the solution is unique – there are infinitely many solutions every system of linear equations has zero, one, or infinitely many solutions

there are no other possibilities

Reviews on Linear algebra 3-41 Nullspace the nullspace of an m n matrix is defined as ×

(A)= x Rn Ax = 0 N { ∈ | }

the set of all vectors that are mapped to zero by f(x)= Ax • the set of all vectors that are orthogonal to the rows of A • if Ax = b then A(x + z)= b for all z (A) • ∈ N also known as of A • (A) is a subspace of Rn ✎ • N

Reviews on Linear algebra 3-42 Zero nullspace matrix

A has a zero nullspace if (A)= 0 • N { } if A has a zero nullspace and Ax = b is solvable, the solution is unique • columns of A are independent •

n n ✌ equivalent conditions: A R × ∈

A has a zero nullspace • A is invertible or nonsingular • columns of A are a basis for Rn •

Reviews on Linear algebra 3-43 Range space the range of an m n matrix A is defined as × (A)= y Rm y = Ax for some x Rn R { ∈ | ∈ } the set of all m-vectors that can be expressed as Ax • the set of all linear combinations of the columns of A = a a • 1 ··· n   (A)= y y = x a + x a + + x a , x Rn R { | 1 1 2 2 ··· n n ∈ }

the set of all vectors b for which Ax = b is solvable • also known as the column space of A • (A) is a subspace of Rm ✎ • R

Reviews on Linear algebra 3-44 Full range matrices

A has a full range if (A)= Rm R

✌ equivalent conditions:

A has a full range • columns of A span Rm • Ax = b is solvable for every b • (AT )= 0 • N { }

Reviews on Linear algebra 3-45 Rank and Nullity

m n rank of a matrix A R × is defined as ∈ rank(A) = dim (A) R m n nullity of a matrix A R × is ∈ nullity(A) = dim (A) N

Facts ✌

rank(A) is maximum number of independent columns (or rows) of A • rank(A) min(m, n) ≤

rank(A)= rank(AT ) •

Reviews on Linear algebra 3-46 Full rank matrices

m n for A R × we always have rank(A) min(m, n) ∈ ≤ we say A is full rank if rank(A) = min(m, n)

for square matrices, full rank means nonsingular (invertible) • for skinny matrices (m n), full rank means columns are independent • ≥ for fat matrices (m n), full rank means rows are independent • ≤

Reviews on Linear algebra 3-47 Theorems

m n Rank-Nullity Theorem: for any A R × , • ∈ rank(A)+dim (A)= n N

the system Ax = b has a solution if and only if b (A) • ∈R the system Ax = b has a unique solution if and only if • b (A), and (A)= 0 ∈R N { }

Reviews on Linear algebra 3-48 Vector space a vector space or linear space (over R) consists of

a set • V a vector sum + : • V×V→V a scalar multiplication : R • ×V→V a distinguished element 0 • ∈V which satisfy a list of properties

Reviews on Linear algebra 3-49 is called a vector space over R, denoted by ( , R) V V if elements, called vectors of satisfy the following main operations: V 1. vector addition: x, y x + y ∈ V ⇒ ∈V 2. scalar multiplication:

for any α R, x αx ∈ ∈ V ⇒ ∈V the definition 2 implies that a vector space contains the zero vector • 0 ∈V

the two conditions can be combined into one operation: • x, y , α R αx + αy ∈V ∈ ⇒ ∈V

Reviews on Linear algebra 3-50 Inner product space a vector space with an additional structure called inner product an inner product space is a vector space over R with a map V , : R h· ·i V×V→ for all x, y, z V and all scalars a R, it satisfies ∈ ∈ conjugate symmetry: x, y = y, x • h i h i linearity in the first argument: • ax, y = a x, y , x + y, z = x, z + y, z h i h i h i h i h i

positive definiteness • x, x 0, and x, x = 0 x = 0 h i≥ h i ⇐⇒

Reviews on Linear algebra 3-51 Examples of inner product spaces

Rn • x, y = yT x = x y + x y + + x y h i 1 1 2 2 ··· n n m n R × • X,Y = tr(Y T X) h i

2(a,b): space of real functions defined on (a,b) for which its second-power •Lof the absolute value is Lebesgue integrable, i.e.,

b 2 f 2(a,b) = f(t) dt < ∈L ⇒ s | | ∞ Za the inner product of this space is

b f,g = f(t)g(t)dt h i Za

Reviews on Linear algebra 3-52 Orthogonality let ( , R) be an inner product space V x and y are orthogonal: • x y x, y = 0 ⊥ ⇐⇒ h i

orthogonal complement in of S , denoted by S⊥, is defined by • V ⊂V

S⊥ = x x, s = 0, s S { ∈V|h i ∀ ∈ }

admits the orthogonal decomposition: •V

= ⊥ V M⊕M where is a subspace of M V

Reviews on Linear algebra 3-53 Orthonormal basis

φ , n 0 is an orthonormal (ON) set if { n ≥ }⊂V

1, i = j φi,φj = h i 0, i = j ( 6 and is called an orthonormal basis for if V

1. φ , n 0 is an ON set { n ≥ } 2. span φ , n 0 = { n ≥ } V we can construct an orthonormal basis from the Gram-Schmidt orthogonalization

Reviews on Linear algebra 3-54 Orthogonal expansion let φ n be an orthonormal basis for a vector of dimension n { i}i=1 V for any x , we have the orthogonal expansion: ∈V n x = x, φ φ h ii i i=1 X meaning: we can project x into orthogonal subspaces spanned by each φi the norm of x is given by n x 2 = x, φ 2 k k |h ii| i=1 X can be easily calculated by the sum square of projection coefficients

Reviews on Linear algebra 3-55 Adjoint of a Linear Transformation let A : be a linear transformation V→W the adjoint of A, denoted by A∗ is defined by

Ax, y = x, A∗y , x , y h iW h iV ∀ ∈V ∈W

A∗ is a linear transformation from to W V one can show that

= (A) (A∗) W R ⊕ N = (A∗) (A) V R ⊕ N

Reviews on Linear algebra 3-56 Example

A : Cn Cm and denote A = a → { ij} for x Cn and y Cm, and with the usual inner product in Cm, we have ∈ ∈

m m n Ax, y m = (Ax) y = a x y h iC i i  ij j i i=1 i=1 j=1 X X X n m  n m

= xj aijyi = xj aijyi j=1 i=1 ! j=1 i=1 ! X X X X n T T = xj A y , x, A y n j h iC j=1 X  

T hence, A∗ = A

Reviews on Linear algebra 3-57 Basic properties of A∗

Let A∗ : be the adjoint of A W→V facts:

A∗y, x = y, Ax (A∗)∗ = A • h i h i ⇔ A∗ is a linear transformation • (αA)∗ = αA∗ for α C • ∈ let A and B be linear transformations, then •

(A + B)∗ = A∗ + B∗ and (AB)∗ = B∗A∗

Reviews on Linear algebra 3-58 Normed vector space a normed vector space is a vector space over a R with a map V : R k · k V → called norm that satisfies

homogenity • αx = α x , x , α R k k | |k k ∀ ∈V ∀ ∈ triangular inequality • x + y x + y , x, y k k ≤ k k k k ∀ ∈V

positive definiteness • x 0, x = 0 x = 0, x k k≥ k k ⇐⇒ ∀ ∈V

Reviews on Linear algebra 3-59 Cauchy-Schwarz inequality for any x, y in an inner product space ( , R) V x, y x y |h i|≤k kk k moreover, for y = 0, 6 x, y = x y x = αy, α R h i k kk k ⇐⇒ ∃ ∈ proof. for any scalar α

0 x + αy 2 = x 2 + α2 y 2 + α x, y + α y, x ≤ k k k k k k h i h i if y = 0 then the inequality is trivial x, y if y = 0, then we can choose α = h i 6 − y 2 k k and the C-S inequality follows

Reviews on Linear algebra 3-60 Example of vector and matrix norms

n m n x R and A R × ∈ ∈ 2-norm • x = √xT x = x2 + x2 + + x2 k k2 1 2 ··· n q m n T 2 A F = tr(A A)= aij k k v | | ui=1 j=1 q uX X t 1-norm • x = x + x + + x , A = a k k1 | 1| | 2| ··· | n| k k1 ij | ij| -norm P • ∞

x = max x1 , x2 ,..., xn , A = max aij k k∞ k {| | | | | |} k k∞ ij | |

Reviews on Linear algebra 3-61 Operator norm

m n matrix operator norm of A R × is defined as ∈ Ax A = max k k = max Ax k k x =0 x x =1 k k k k6 k k k k also often called induced norm properties:

1. for any x, Ax A x k k ≤ k kk k 2. aA = a A (scaling) k k | |k k 3. A + B A + B (triangle inequality) k k ≤ k k k k 4. A = 0 if and only if A = 0 (positiveness) k k 5. AB A B (submultiplicative) k k ≤ k kk k

Reviews on Linear algebra 3-62 examples of operator norms

2-norm or spectral norm •

T A 2 , max Ax 2 = λmax(A A) k k x 2=1 k k k k q

1-norm • m A 1 , max Ax 1 = max aij k k x =1 k k j=1,...,n | | k k1 i=1 X -norm • ∞ n A , max Ax = max aij k k∞ x =1 k k∞ i=1,...,m | | k k∞ j=1 X note that the notation of norms may be duplicative

Reviews on Linear algebra 3-63 Matrix factorizations

LU factorization • QR factorization • singular value decomposition • Cholesky factorization •

Reviews on Linear algebra 3-64 LU factorization for any n n matrix A, it admits a decomposition × A = P LU with row pivoting

P , L unit lower triangular, U upper triangular • the decomposition exists if and only if A is nonsingular • it is obtained from the Gaussian elimination process •

Reviews on Linear algebra 3-65 QR factorization

m n a tall matrix A R × with m n is decomposed as ∈ ≥ R A = QR = Q Q 1 1 2 0     m n T Q R × is an (Q Q = I) • ∈ n n R R × is an upper triangular • ∈ m n if rank(A)= n, then n columns in Q1 R × forms an orthonormal basis • for (A) and that R is invertible ∈ R 1 if rank(A) < n then R contains a zero in the diagonal • 1 QR is obtained by many methods, e.g., Gram Schmidt, Householder transform •

Reviews on Linear algebra 3-66 Singular value decomposition

m n ler A R × with rank(A)= r min(m, n) then ∈ ≤

σ1 Σ 0 σ A = U + V T , Σ = 2 0 0 +  ...     σ   r m r  m (m r)  T U = U U ,U R × ,U R × − ,U U = I 1 2 1 ∈ 2 ∈ m n r n (n r) T V = V V , V R × , V R × − , V V = I 1 2 1 ∈ 2 ∈ n   the singular values of A: • σ σ σ > σ = = σ = 0, p = min(m, n) 1 ≥ 2 ≥···≥ r r+1 ··· p are the square root of the eigenvalues of AT A

Reviews on Linear algebra 3-67 columns of U are the eigenvectors of AT A • columns of V are the eigenvectors of AAT • the reduced form of SVD is A = U Σ V T • 1 + 1 the Frobenious norm of A is A = tr(Σ ) • k kF + A is the maximum singular value of A • k k2 rank(A) is the number of nonzero singular value of A •

Reviews on Linear algebra 3-68 Cholesky factorization

every positive definite matrix A can be factored as

A = LLT where L is lower triangular with positive diagonal elements

L is called the Cholesky factor of A • can be interpreted as ‘square root’ of a positive define matrix •

Reviews on Linear algebra 3-69 Derivative and Gradient

Suppose f : Rn Rm and x int dom f → ∈ m n the derivative (or Jacobian) of f at x is the matrix Df(x) R × : ∈

∂fi(x) Df(x)ij = , i = 1,...,m, j = 1,...,n ∂xj

when f is scalar-valued (i.e., f : Rn R), the derivative Df(x) is a row • vector → its is called the gradient of the function: •

T ∂f(x) f(x)= Df(x) , f(x)i = , i = 1,...,n ∇ ∇ ∂xi

which is a column vector in Rn

Reviews on Linear algebra 3-70 Second Derivative suppose f is a scalar-valued function (i.e., f : Rn R) → the second derivative or of f at x, denoted 2f(x) is ∇ 2 2 ∂ f(x) f(x)ij = , i = 1,...,n, j = 1,...,n ∇ ∂xi∂xj example: the quadratic function f : Rn R → f(x)=(1/2)xT P x + qT x + r, where P Sn,q Rn, and r R ∈ ∈ ∈ f(x)= P x + q • ∇ 2f(x)= P • ∇

Reviews on Linear algebra 3-71 Chain rule assumptions:

f : Rn Rm is differentiable at x int dom f • → ∈ g : Rm Rp is differentiable at f(x) int dom g • → ∈ define the composition h : Rn Rp by • → h(z)= g(f(z)) then h is differentiable at x, with derivative

Dh(x)= Dg(f(x))Df(x) special case: f : Rn R, g : R R, and h(x)= g(f(x)) → →

h(x)= g′(f(x)) f(x) ∇ ∇

Reviews on Linear algebra 3-72 example: h(x)= f(Ax + b)

Dh(x)= Df(Ax + b)A h(x)= AT f(Ax + b) ⇒ ∇ ∇ example: h(x)=(1/2)(Ax b)T P (Ax b) − − h(x)= AT P (Ax b) ∇ −

Reviews on Linear algebra 3-73 Function of matrices

m n we typically encounter some scalar-valued functions of matrix X R × ∈ f(X)= tr(AT X) (linear in X) • f(X)= tr(XT AX) (quadratic in X) • definition: the derivative of f (scalar-valued function) with respect to X is

∂f ∂f ∂f ∂x11 ∂x12 ··· ∂x1n ∂f  ∂f ∂f ∂f  = ∂x21 ∂x22 ∂x2n ∂X . ···.. .  . . .   ∂f ∂f ∂f  ∂x ∂x ∂x   m1 m2 ··· mn    note that the differential of f can be generalized to

∂f f(X + dX) f(X)= ,dX + higher order term − h∂X i

Reviews on Linear algebra 3-74 Derivative of a trace function let f(X)= tr(AT X)

T T f(X) = (A X)ii = (A )kiXki i i X X Xk = AkiXki i X Xk

∂f then we can read that ∂X = A (by the definition of derivative) we can also note that

f(X + dX) f(X)= tr(AT (X + dX)) tr(AT X)= tr(AT dX)= dX,A − − h i

∂f then we can read that ∂X = A

Reviews on Linear algebra 3-75 f(X)= tr(XT AX) • f(X + dX) f(X) = tr((X + dX)T A(X + dX)) tr(XT AX) − − tr(XT AdX)+ tr(dXT AX) ≈ = dX,AT X + AX,dX h i h i

∂f T then we can read that ∂X = A X + AX f(X)= Y XH 2 where Y and H are given • k − kF f(X + dX) = tr((Y XH dXH)T (Y XH dXH)) − − − − f(X + dX) f(X) tr(HT dXT (Y XH)) tr((Y XH)T dXH) − ≈ − − − − = tr((Y XH)HT dXT ) tr(H(Y XH)T dX) − − − − = 2 (Y XH)HT ,dX − h − i then we identifiy that ∂f = 2(Y XH)HT ∂X − −

Reviews on Linear algebra 3-76 Derivative of a log det function let f : Sn R be defined by f(X) = logdet(X) → 1/2 1/2 1/2 1/2 log det(X + dX) = logdet(X (I + X− dXX− )X ) 1/2 1/2 = logdet X + logdet(I + X− dXX− ) n

= logdet X + log(1 + λi) i=1 X 1/2 1/2 where λi is an eigenvalue of X− dXX−

n f(X + dX) f(X) λ (log(1 + x) x, x 0) − ≈ i ≈ → i=1 X 1/2 1/2 = tr(X− dXX− ) 1 = tr(X− dX)

∂f 1 we identify that ∂X = X−

Reviews on Linear algebra 3-77 References

H. Anton, Elementary Linear Algebra, 10th edition, Wiley, 2010

K.B. Petersen and M.S. Pedersen, et.al., The Matrix Cookbook, Technical University of Denmark, 2008

S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge, 2004

R.A. Horn and C.R. John Son, Matrix Analysis, 2nd edition, Cambridge Press, 2013

Chapter 2 in T. Katayama, Subspace methods for system identification, Springer, 2006

Reviews on Linear algebra 3-78