Positive Definite Matrix

Chia-Ping Chen

Professor Department of Computer Science and Engineering National Sun Yat-sen University

Linear Algebra

1/57 Outline

Quadratic form Function approximation Singular value decomposition Minimum principles

2/57 Chen P Positive Definite Matrix notation xT Ax: quadratic form f(x): a multi-variate function ∇f(x): the gradient vector of f(x) H(x): Hessian matrix of a multi-variate function σ: a singular value Σ: a singular value matrix A = UΣV T : the singular value decomposition of A A+: pseudo-inverse of A

3/57 Chen P Positive Definite Matrix Quadratic Function and Quadratic Form

4/57 Chen P Positive Definite Matrix Definition (quadratic function) A quadratic function of n variables is a sum of second-order terms. n n X X f(x1, . . . , xn) = cijxixj i=1 j=1

Definition (quadratic form) Let A be a matrix of order n × n. The quadratic form of A is xT Ax Note xT Ax is a quadratic function of n variables

5/57 Chen P Positive Definite Matrix symmetric matrices suffice Consider a quadratic function

n n X X f(x1, . . . , xn) = cijxixj i=1 j=1

Deﬁne A with 1 a = (c + c ) ij 2 ij ji The quadratic function is the quadratic form of A

T f(x1, . . . , xn) = x Ax

A is symmetric The eigenvalues of A are real

6/57 Chen P Positive Definite Matrix Example (Quadratic form of a symmetric matrix) Consider a symmetric matrix

"a b# A = b c

The quadratic form of A is " #" # T h i a b x1 x Ax = x1 x2 b c x2 2 2 = ax1 + 2bx1x2 + cx2 which is a quadratic function.

7/57 Chen P Positive Definite Matrix Definition (positive definite matrix) We consider real symmetric matrices only. Matrix A is positive deﬁnite if the quadratic form of A is always positive

xT Ax > 0 for any x 6= 0.

8/57 Chen P Positive Definite Matrix PD ⇒ eigenvalues > 0 Let A be positive deﬁnite. The eigenvalues of A are positive.

Proof. Let λ be an eigenvalue of A and s be a corresponding eigenvector. Then As = λs It follows that sT As = λ(sT s) Hence sT As λ = > 0 sT s

9/57 Chen P Positive Definite Matrix eigenvalues > 0 ⇒ PD If all eigenvalues of A are positive, A is positive deﬁnite.

Proof. By the spectral theorem, we have A = QΛQT where Q is orthogonal. Consider the quadratic form of A.

yT y z }| { z }| { T T T T X 2 x Ax = x Q Λ Q x = y Λy = λiyi i

T P 2 For x 6= 0, we have y 6= 0 and thus x Ax = i λiyi > 0. Hence A is positive deﬁnite.

10/57 Chen P Positive Definite Matrix PD ⇒ determinants > 0 Let A be positive deﬁnite. Every leading principal sub-matrix of A has a positive determinant.

T h T T i k Proof. Consider x = xk 0 with xk ∈ R . For xk 6= 0

" #" # h i A B x xT Ax = xT 0T k k = xT A x > 0 k BT C 0 k k k

So Ak is positive deﬁnite, the eigenvalues of Ak are positive, and k Y |Ak| = λk,i > 0 i=1 where λk,i is an eigenvalue of Ak.

11/57 Chen P Positive Definite Matrix determinants > 0 ⇒ pivots > 0 If every leading principal sub-matrix of A has a positive determinant, A has all positive pivots.

Proof. A is non-singular since |A|= 6 0. Let A = LDU be the LDU decomposition of A. Then

"A B# "L 0#"D 0#"U ∗# "L D U ∗# k = k k k = k k k BT C ∗ ∗ 0 ∗ 0 ∗ ∗ ∗

So Ak = LkDkU k. Let d1, . . . , dn be the pivots of A. Then

|Dk| |Ak| d1 = a11 > 0, dk = = > 0, k = 2, . . . , n |Dk−1| |Ak−1|

12/57 Chen P Positive Definite Matrix pivots > 0 ⇒ PD If A has all positive pivots, A is positive deﬁnite.

Proof. Let A = LDU be the LDU decomposition of A.

A = AT ⇒ LDU = U T DLT ⇒ U = LT

Thus A = LDLT = LD1/2D1/2LT = RT R where R = D1/2LT is non-singular. For x 6= 0

xT Ax = xT RT Rx = (Rx)T (Rx) = kRxk2 > 0

Hence A is positive deﬁnite.

13/57 Chen P Positive Definite Matrix Lemma (equivalent conditions for PD) The following conditions are equivalent.

1 A is positive deﬁnite

2 The eigenvalues of A are positive

3 The determinants of the leading principal sub-matrices of A are positive

4 The pivots of A are positive

The previous slides show

1 ⇔ 2 and 1 ⇒ 3 ⇒ 4 ⇒ 1

14/57 Chen P Positive Definite Matrix Example (positive definite matrix)  2 −1 0    A = −1 2 −1 0 −1 2

The quadratic form of A is

T 2 2 2 x Ax = 2x1 + 2x2 + 2x3 − 2x1x2 − 2x2x3 1 2 3 2 2 4 = 2 x − x + x − x + x2 1 2 2 2 2 3 3 3 3 The eigenvalues, the determinants, and the pivots are √ spectrum(A) = {2, 2 ± 2}, |A1| = 2, |A2| = 3, |A3| = 4

     1  1 0 0 2 1 − 2 0  1   3   2  A = − 2 1 0  2  0 1 − 3  2 4 0 − 3 1 3 0 0 1 15/57 Chen P Positive Definite Matrix ellipsoid If A is positive deﬁnite, xT Ax = 1 deﬁnes an ellipsoid.

A = QΛQT by the spectral theorem, so

T T T T X 2 x Ax = 1 ⇒ x QΛQ x = y Λy = λiyi = 1 i

h i T where Q = q1 ... qn and y = Q x

From the perspective of basis Q = {q1,... qn}, it is clear P 2 that i λiyi = 1 is an ellipsoid

The axes of symmetry are qi’s, and the intercepts are

q −1 yi = ± λi

16/57 Chen P Positive Definite Matrix Example (quadratic form and ellipse)

17/57 Chen P Positive Definite Matrix Definition (negative definite and semidefinite) Let A be a real symmetric matrix. A is negative definite if xT Ax < 0 for x 6= 0 A is positive semidefinite if xT Ax ≥ 0 for any x A is negative semidefinite if xT Ax ≤ 0 for any x

18/57 Chen P Positive Definite Matrix Approximation

19/57 Chen P Positive Definite Matrix Definition (partial derivatives)

Let f(x1, . . . , xn) be a multi-variate function. The ﬁrst-order partial derivatives of f are ∂f fxi = , i = 1, . . . , n ∂xi The second-order partial derivatives of f are

2 ∂fxi ∂ f fxixj = = , i, j = 1, . . . , n ∂xj ∂xj∂xi

Note that

fxixj = fxj xi

20/57 Chen P Positive Definite Matrix Definition (gradient and Hessian)

Let f(x1, . . . , xn) be a function. The gradient of f is a vector of 1st-order partial derivatives   fx1  .  ∇f =  .   .  fxn

The Hessian of f is a matrix of 2nd-order partial derivatives   fx1x1 . . . fx1xn  . . .  H =  . .. .   . .  fxnx1 . . . fxnxn

21/57 Chen P Positive Definite Matrix Definition (function approximation) Let f(x) be a function.

The 1st-order approximation to f(x) near a point x0 is

T f(x) ≈ f(x0) + ∇f(x0) (x − x0)

The 2nd-order approximation to f(x) near x0 is

T f(x) ≈ f(x0) + ∇f(x0) (x − x0) 1 + (x − x )T H(x )(x − x ) 2 0 0 0

22/57 Chen P Positive Definite Matrix Definition (stationary points)

A point x0 is a stationary point of f(x) if

∇f(x0) = 0

Let x0 be a stationary point of f(x). The 2nd-order approximation to f(x) near x0 is 1 f(x) ≈ f(x ) + (x − x )T H(x )(x − x ) 0 2 0 0 0

23/57 Chen P Positive Definite Matrix Example (function approximation) Find the 2nd-order approximation near (0, 0) to f(x, y) deﬁned by f(x, y) = 2x2 + 4xy + y2

"f # "4x + 4y# "f f # "4 4# ∇f = x = , H = xx xy = fy 4x + 2y fyx fyy 4 2

"4 4# f(0) = 0, ∇f(0) = 0, H(0) = 4 2 1 f(x) ≈ f(0) + (x − 0)T H(0)(x − 0) = 2x2 + 4xy + y2 2

24/57 Chen P Positive Definite Matrix Example (function approximation) Find the 2nd-order approximation near (0, 0) to F (x, y) deﬁned by F (x, y) = 7 + 2(x + y)2 − y sin y − x3

"F # " 4(x + y) − 3x2 # ∇F = x = Fy 4(x + y) − sin y − y cos y "F F # "4 − 6x 4 # H = xx xy = Fyx Fyy 4 4 − 2 cos y + y sin y "4 4# F (0) = 7, ∇F (0) = 0, H(0) = 4 2 1 F (x) ≈ F (0) + (x − 0)T H(0)(x − 0) = 7 + 2x2 + 4xy + y2 2

25/57 Chen P Positive Definite Matrix Definition (local minimum and local maximum) Let f(x) be a function.

A point x0 is a local minimum of f(x) if

f(x) ≥ f(x0)

for any x in a neighborhood of x0.

A point x0 is a local maximum of f(x) if

f(x) ≤ f(x0)

for any x in a neighborhood of x0.

26/57 Chen P Positive Definite Matrix optimality of a stationary point

Let f(x) be a function. Let x0 be a stationary point of f(x), and H be the Hessian of f(x) at x0.

x0 is a local minimum if H is positive semideﬁnite.

x0 is a local maximum if H is negative semideﬁnite.

x0 is a saddle point if it is neither a local maximum nor a local minimum. For example, (0, 0) is a saddle point of F (x, y).

27/57 Chen P Positive Definite Matrix bowl or saddle

28/57 Chen P Positive Definite Matrix Singular Value Decomposition

29/57 Chen P Positive Definite Matrix basic idea of singular value decomposition Let A be a real matrix. Both AT A and AAT are square matrices A can be expressed by the eigenvalues of AT A and the eigenvectors of AT A and AAT This is called singular value decomposition (SVD) SVD is an extension of eigenvalue decomposition

30/57 Chen P Positive Definite Matrix AT A and AAT Let A be a real matrix. AT A and AAT are positive semi-deﬁnite They have only real and non-negative eigenvalues

xT AT A x = (Ax)T (Ax) = kAxk2 ≥ 0

T yT AAT y = AT y AT y = kAT yk2 ≥ 0

T T AT A = AT AT = AT A

31/57 Chen P Positive Definite Matrix Definition (singular value and singular vector) Let A be a real matrix. A singular value of A is the square root of a positive eigenvalue of AT A .

Let A be a real matrix and σ be a singular value of A. σ2 is an eigenvalue of AT A σ2 is also an eigenvalue of AAT There exists v 6= 0 such that AT A v = σ2v

There exists u 6= 0 such that AAT u = σ2u

u is a left singular vector and u is a right singular vector 32/57 Chen P Positive Definite Matrix Definition (singular space) Let A be a real matrix and σ be a singular value of A. The right singular space of A corresponding to σ is

n T 2 o Rσ(A) = v | A A v = σ v The left singular space of A corresponding to σ is

n T 2 o Lσ(A) = u | AA u = σ u

33/57 Chen P Positive Definite Matrix Lemma (linearly independent singular vectors∗) Let A be a real matrix of rank r. There exists a linearly independent set containing r right singular vectors of A.

Proof. Let A be of order m × n with rank r. Note AT A x = 0 ⇒ xT AT Ax = 0 ⇒ Ax = 0 ⇒ AT A x = 0

T Thus dim N A A = dim N(A) = n − r, and the multi- plicity of eigenvalue 0 of AT A is (n − r). The sum of the multiplicities of the non-zero eigenvalues of AT A is

n − (n − r) = r

So there are r linearly independent right singular vectors of A.

34/57 Chen P Positive Definite Matrix Theorem (singular value decomposition) Let A be a real matrix of order m × n. Then

A = UΣV T where Σ is an m × n ”diagonal” matrix with the singular values of A as the leading diagonal elements U is an m × m orthogonal matrix with the eigenvectors of AAT as columns V is an n × n orthogonal matrix with the eigenvectors of AT A as columns

35/57 Chen P Positive Definite Matrix construction of SVD Let A be a real matrix with rank r.

Find singular values σ1 . . . σr

Find right singular vectors v1 ... vr Avi Find left singular vectors ui = , since σi

T 2 T AA Avi Aσi vi 2 AA ui = = = σi ui σi σi

T Find orthonormal vr+1 ... vn in E0 A A T Find orthonormal ur+1 ... um in E0 AA Construct         U = u1 ... um , V = v1 ... vn

36/57 Chen P Positive Definite Matrix We show U T AV = Σ, which then implies A = UΣV T . For Avj j = 1 . . . r, we have uj = or Avj = σjuj, so σj

T T T (U AV )ij = ui Avj = ui (σjuj) = σjδij, i = 1 . . . m

T For j = r + 1 . . . n, we have A A vj = 0 or Avj = 0, so

T T (U AV )ij = ui Avj = 0, i = 1 . . . m

Combining the results, we get

U T AV = Σ

Hence A = UΣV T

37/57 Chen P Positive Definite Matrix matrices in singular value decomposition Let A be a real matrix with SVD A = UΣV T . The column vectors in U form an eigenbasis of AAT

diagonal z }| { AAT = UΣV T V ΣT U T = U ΣΣT U T

The column vectors in V form an eigenbasis of AT A

diagonal z }| { AT A = V ΣT U T UΣV T = V ΣT Σ V T

38/57 Chen P Positive Definite Matrix singular vectors and fundamental subspaces∗ Let A be a real matrix with SVD A = UΣV T . The right singular vectors of A in V form an orthonormal T basis of C A The left singular vectors of A in U form an orthonormal basis of C (A)

39/57 Chen P Positive Definite Matrix Let A be of order m × n and rank r with SVD A = UΣV T . T vr+1,..., vn are eigenvectors of A A with eigenvalue 0 T A A vj = 0 vj = 0

T so they form a basis of N(A A) = N(A). It follows that v1,..., vr form a basis of the orthogonal complement of T N(A), i.e. C A . The ﬁrst r columns of AV = UΣ means

Avi = σiui

Thus u1,..., ur are vectors in C (A). Since they are linearly independent, they form a basis of C(A).

40/57 Chen P Positive Definite Matrix Example (singular value decomposition) Find an SVD of "−1 1 0# A = 0 −1 1

The eigenvalues of

 1 −1 0  T   A A = −1 2 −1 0 −1 1 are λ1 = 3, λ2 = 1, λ3 = 0. Hence the singular values of A are √ σ1 = 3, σ2 = 1

41/57 Chen P Positive Definite Matrix Orthonormal eigenvectors of AT A are

 1  −1 1 1 1 1       v1 = √ −2 , v2 = √  0  , v3 = √ 1 6 1 2 1 3 1

The corresponding left singular vectors of A are " # " # Av1 1 −1 Av2 1 1 u1 = = √ , u2 = = √ σ1 2 1 σ2 2 1

So   √ √1 − √2 √1 " √1 √1 #" # 6 6 6 T − 3 0 0  1 1  A = UΣV = 2 2 − √ 0 √  √1 √1 0 1 0  2 2  2 2 √1 √1 √1 3 3 3

42/57 Chen P Positive Definite Matrix SVD for approximation Let A be a real matrix of rank r with SVD A = UΣV T . Then A = UΣV T T T = σ1u1v1 + ··· + σrurvr

= A1 + ··· + Ar

A is the sum of r real matrices of rank 1 An image of size 1000 × 1000 can be compressed with a rate of 90% when 50 terms are used

Data Compression with SVD 43/57 Chen P Positive Definite Matrix Theorem (pseudo-inverse∗ and SVD) Let A be a real matrix with SVD A = UΣV T . The pseudo inverse of A can be deﬁned by

A+ = V Σ+U T

For any b, the minimum-length least-squares solution to Ax = b is x+ = A+b

44/57 Chen P Positive Definite Matrix 45/57 Chen P Positive Definite Matrix Minimum Principles∗

46/57 Chen P Positive Definite Matrix minimum principle for square system Let A be positive deﬁnite. Consider Ax = b. The system is non-singular

It can be solved by minimum principle: x0 is a solution of Ax = b if and only if it minimizes 1 P (x) = xT Ax − xT b 2 Note ∇ xT Ax = 2Ax, ∇ xT b = b so ∇P (x) = Ax − b

47/57 Chen P Positive Definite Matrix 48/57 Chen P Positive Definite Matrix minimum principle for over-determined system Let Ax = b be an over-determined system of linear equations. Such a system can be solved by minimum principle.

Speciﬁcally, the sum of squared errors as a function of x is

E(x) = kAx − bk2 = (Ax − b)T (Ax − b) = xT AT Ax − 2xT AT b + bT b

An x0 that achieves the minimum of E(x) satisﬁes

∇E(x) = 0 x=x0 That is T T A Ax0 = A b 49/57 Chen P Positive Definite Matrix Definition (Rayleigh quotient) Let A be a symmetric matrix. The Rayleigh quotient of A is

xT Ax R(x) = xT x

Let Q = q1,..., qn be an orthonormal eigenbasis of A, corresponding to eigenvalues λ1 ≤ · · · ≤ λn P For any x = xiqi, we have i T P x q A P x q xT Ax i i i i R(x) = = i i T T x x P P xiqi xiqi i i P 2 λixi i = P 2 xi i 50/57 Chen P Positive Definite Matrix Theorem (extremum of Rayleigh quotient)

Let A be a symmetric matrix and Q = {q1,..., qn} be orthonormal eigenbasis of A, corresponding to eigenvalues

λ1 ≤ · · · ≤ λn

The global minimum of the Rayleigh quotient of A is λ1

The global maximum of the Rayleigh quotient of A is λn

The minimum λ1 is attained by q1 (i.e. [xQ] = {δi,1})

λ1 = min R(x) x

The maximum λn is attained by qn (i.e. [xQ] = {δi,n})

λn = max R(x) x

51/57 Chen P Positive Definite Matrix diagonal elements and eigenvalues Let A be a symmetric matrix.

For a unit vector along coordinate axis, R(ei) = aii

Thus aii is bounded by eigenvalues

λ1 ≤ aii ≤ λn

In the cases of all positive eigenvalues for A, we have 1 1 1 √ ≤ √ ≤ √ λn aii λ1

The intercept of ellipsoid xT Ax = 1 along a coordinate axis is bounded

52/57 Chen P Positive Definite Matrix 53/57 Chen P Positive Definite Matrix Theorem (saddle points of Rayleigh quotient)

Let A be a symmetric matrix and Q = {q1,..., qn} be orthonormal eigenbasis of A, corresponding to eigenvalues

λ1 ≤ · · · ≤ λn

The eigenvectors q2,..., qn−1 are saddle points of R(x).

Consider q2 for example.

If we move from q2 along q1, R(x) decreases

If we move from q2 along q2, R(x) does not change

If we move from q2 along q3, R(x) increases

54/57 Chen P Positive Definite Matrix Theorem (Rayleigh quotient in a hyperplane) Let A be a symmetric matrix with orthonormal eigenvectors q1,..., qn and eigenvalues λ1 ≤ · · · ≤ λn. λ2 = max min R(x) v x∈v⊥ Let v be a vector and consider R(x) in the subspace v⊥.

For v = q1 λ2 = min R(x) ⊥ x∈q1 Given v, R(x) can be smaller as x can have component

along q1 λ2 ≥ min R(x) x∈v⊥ Thus λ2 = max min R(x) v x∈v⊥ 55/57 Chen P Positive Definite Matrix Corollary (Rayleigh quotient in a subspace) For the maximum in a hyperplane, we have

λn−1 ≤ max R(x) x∈v⊥ λn−1 = min max R(x) v x∈v⊥ Let V be a subspace of dimension j. We have λj+1 = max min R(x) V x∈V⊥ λn−j = min max R(x) V x∈V⊥

56/57 Chen P Positive Definite Matrix Theorem (intertwining of eigenvalues) Let A be a real symmetric matrix and B be (n − 1) × (n − 1) matrix formed by stripping the last row and column of A.

λ1(A) ≤ λ1(B) ≤ λ2(A) ≤ λ2(B) ≤ · · · ≤ λn−1(B) ≤ λn(A)

Example (intertwining of eigenvalues)  2 −1 0  " 2 −1# A = −1 2 −1 , B =   −1 2 0 −1 2 √ √ λ1(A) = 2 − 2, λ2(A) = 2, λ3(A) = 2 + 2

λ1(B) = 1, λ2(B) = 3

57/57 Chen P Positive Definite Matrix