Positive Definite Matrix
Chia-Ping Chen
Professor Department of Computer Science and Engineering National Sun Yat-sen University
Linear Algebra
1/57 Outline
Quadratic form Function approximation Singular value decomposition Minimum principles
2/57 Chen P Positive Definite Matrix notation xT Ax: quadratic form f(x): a multi-variate function ∇f(x): the gradient vector of f(x) H(x): Hessian matrix of a multi-variate function σ: a singular value Σ: a singular value matrix A = UΣV T : the singular value decomposition of A A+: pseudo-inverse of A
3/57 Chen P Positive Definite Matrix Quadratic Function and Quadratic Form
4/57 Chen P Positive Definite Matrix Definition (quadratic function) A quadratic function of n variables is a sum of second-order terms. n n X X f(x1, . . . , xn) = cijxixj i=1 j=1
Definition (quadratic form) Let A be a matrix of order n × n. The quadratic form of A is xT Ax Note xT Ax is a quadratic function of n variables
5/57 Chen P Positive Definite Matrix symmetric matrices suffice Consider a quadratic function
n n X X f(x1, . . . , xn) = cijxixj i=1 j=1
Define A with 1 a = (c + c ) ij 2 ij ji The quadratic function is the quadratic form of A
T f(x1, . . . , xn) = x Ax
A is symmetric The eigenvalues of A are real
6/57 Chen P Positive Definite Matrix Example (Quadratic form of a symmetric matrix) Consider a symmetric matrix
"a b# A = b c
The quadratic form of A is " #" # T h i a b x1 x Ax = x1 x2 b c x2 2 2 = ax1 + 2bx1x2 + cx2 which is a quadratic function.
7/57 Chen P Positive Definite Matrix Definition (positive definite matrix) We consider real symmetric matrices only. Matrix A is positive definite if the quadratic form of A is always positive
xT Ax > 0 for any x 6= 0.
8/57 Chen P Positive Definite Matrix PD ⇒ eigenvalues > 0 Let A be positive definite. The eigenvalues of A are positive.
Proof. Let λ be an eigenvalue of A and s be a corresponding eigenvector. Then As = λs It follows that sT As = λ(sT s) Hence sT As λ = > 0 sT s
9/57 Chen P Positive Definite Matrix eigenvalues > 0 ⇒ PD If all eigenvalues of A are positive, A is positive definite.
Proof. By the spectral theorem, we have A = QΛQT where Q is orthogonal. Consider the quadratic form of A.
yT y z }| { z }| { T T T T X 2 x Ax = x Q Λ Q x = y Λy = λiyi i
T P 2 For x 6= 0, we have y 6= 0 and thus x Ax = i λiyi > 0. Hence A is positive definite.
10/57 Chen P Positive Definite Matrix PD ⇒ determinants > 0 Let A be positive definite. Every leading principal sub-matrix of A has a positive determinant.
T h T T i k Proof. Consider x = xk 0 with xk ∈ R . For xk 6= 0
" #" # h i A B x xT Ax = xT 0T k k = xT A x > 0 k BT C 0 k k k
So Ak is positive definite, the eigenvalues of Ak are positive, and k Y |Ak| = λk,i > 0 i=1 where λk,i is an eigenvalue of Ak.
11/57 Chen P Positive Definite Matrix determinants > 0 ⇒ pivots > 0 If every leading principal sub-matrix of A has a positive deter- minant, A has all positive pivots.
Proof. A is non-singular since |A|= 6 0. Let A = LDU be the LDU decomposition of A. Then
"A B# "L 0#"D 0#"U ∗# "L D U ∗# k = k k k = k k k BT C ∗ ∗ 0 ∗ 0 ∗ ∗ ∗
So Ak = LkDkU k. Let d1, . . . , dn be the pivots of A. Then
|Dk| |Ak| d1 = a11 > 0, dk = = > 0, k = 2, . . . , n |Dk−1| |Ak−1|
12/57 Chen P Positive Definite Matrix pivots > 0 ⇒ PD If A has all positive pivots, A is positive definite.
Proof. Let A = LDU be the LDU decomposition of A.
A = AT ⇒ LDU = U T DLT ⇒ U = LT
Thus A = LDLT = LD1/2D1/2LT = RT R where R = D1/2LT is non-singular. For x 6= 0
xT Ax = xT RT Rx = (Rx)T (Rx) = kRxk2 > 0
Hence A is positive definite.
13/57 Chen P Positive Definite Matrix Lemma (equivalent conditions for PD) The following conditions are equivalent.
1 A is positive definite
2 The eigenvalues of A are positive
3 The determinants of the leading principal sub-matrices of A are positive
4 The pivots of A are positive
The previous slides show
1 ⇔ 2 and 1 ⇒ 3 ⇒ 4 ⇒ 1
14/57 Chen P Positive Definite Matrix Example (positive definite matrix) 2 −1 0 A = −1 2 −1 0 −1 2
The quadratic form of A is
T 2 2 2 x Ax = 2x1 + 2x2 + 2x3 − 2x1x2 − 2x2x3 1 2 3 2 2 4 = 2 x − x + x − x + x2 1 2 2 2 2 3 3 3 3 The eigenvalues, the determinants, and the pivots are √ spectrum(A) = {2, 2 ± 2}, |A1| = 2, |A2| = 3, |A3| = 4
1 1 0 0 2 1 − 2 0 1 3 2 A = − 2 1 0 2 0 1 − 3 2 4 0 − 3 1 3 0 0 1 15/57 Chen P Positive Definite Matrix ellipsoid If A is positive definite, xT Ax = 1 defines an ellipsoid.
A = QΛQT by the spectral theorem, so
T T T T X 2 x Ax = 1 ⇒ x QΛQ x = y Λy = λiyi = 1 i
h i T where Q = q1 ... qn and y = Q x
From the perspective of basis Q = {q1,... qn}, it is clear P 2 that i λiyi = 1 is an ellipsoid
The axes of symmetry are qi’s, and the intercepts are
q −1 yi = ± λi
16/57 Chen P Positive Definite Matrix Example (quadratic form and ellipse)
17/57 Chen P Positive Definite Matrix Definition (negative definite and semidefinite) Let A be a real symmetric matrix. A is negative definite if xT Ax < 0 for x 6= 0 A is positive semidefinite if xT Ax ≥ 0 for any x A is negative semidefinite if xT Ax ≤ 0 for any x
18/57 Chen P Positive Definite Matrix Approximation
19/57 Chen P Positive Definite Matrix Definition (partial derivatives)
Let f(x1, . . . , xn) be a multi-variate function. The first-order partial derivatives of f are ∂f fxi = , i = 1, . . . , n ∂xi The second-order partial derivatives of f are
2 ∂fxi ∂ f fxixj = = , i, j = 1, . . . , n ∂xj ∂xj∂xi
Note that
fxixj = fxj xi
20/57 Chen P Positive Definite Matrix Definition (gradient and Hessian)
Let f(x1, . . . , xn) be a function. The gradient of f is a vector of 1st-order partial derivatives fx1 . ∇f = . . fxn
The Hessian of f is a matrix of 2nd-order partial derivatives fx1x1 . . . fx1xn . . . H = . .. . . . fxnx1 . . . fxnxn
21/57 Chen P Positive Definite Matrix Definition (function approximation) Let f(x) be a function.
The 1st-order approximation to f(x) near a point x0 is
T f(x) ≈ f(x0) + ∇f(x0) (x − x0)
The 2nd-order approximation to f(x) near x0 is
T f(x) ≈ f(x0) + ∇f(x0) (x − x0) 1 + (x − x )T H(x )(x − x ) 2 0 0 0
22/57 Chen P Positive Definite Matrix Definition (stationary points)
A point x0 is a stationary point of f(x) if
∇f(x0) = 0
Let x0 be a stationary point of f(x). The 2nd-order approxi- mation to f(x) near x0 is 1 f(x) ≈ f(x ) + (x − x )T H(x )(x − x ) 0 2 0 0 0
23/57 Chen P Positive Definite Matrix Example (function approximation) Find the 2nd-order approximation near (0, 0) to f(x, y) defined by f(x, y) = 2x2 + 4xy + y2
"f # "4x + 4y# "f f # "4 4# ∇f = x = , H = xx xy = fy 4x + 2y fyx fyy 4 2
"4 4# f(0) = 0, ∇f(0) = 0, H(0) = 4 2 1 f(x) ≈ f(0) + (x − 0)T H(0)(x − 0) = 2x2 + 4xy + y2 2
24/57 Chen P Positive Definite Matrix Example (function approximation) Find the 2nd-order approximation near (0, 0) to F (x, y) defined by F (x, y) = 7 + 2(x + y)2 − y sin y − x3
"F # " 4(x + y) − 3x2 # ∇F = x = Fy 4(x + y) − sin y − y cos y "F F # "4 − 6x 4 # H = xx xy = Fyx Fyy 4 4 − 2 cos y + y sin y "4 4# F (0) = 7, ∇F (0) = 0, H(0) = 4 2 1 F (x) ≈ F (0) + (x − 0)T H(0)(x − 0) = 7 + 2x2 + 4xy + y2 2
25/57 Chen P Positive Definite Matrix Definition (local minimum and local maximum) Let f(x) be a function.
A point x0 is a local minimum of f(x) if
f(x) ≥ f(x0)
for any x in a neighborhood of x0.
A point x0 is a local maximum of f(x) if
f(x) ≤ f(x0)
for any x in a neighborhood of x0.
26/57 Chen P Positive Definite Matrix optimality of a stationary point
Let f(x) be a function. Let x0 be a stationary point of f(x), and H be the Hessian of f(x) at x0.
x0 is a local minimum if H is positive semidefinite.
x0 is a local maximum if H is negative semidefinite.
x0 is a saddle point if it is neither a local maximum nor a local minimum. For example, (0, 0) is a saddle point of F (x, y).
27/57 Chen P Positive Definite Matrix bowl or saddle
28/57 Chen P Positive Definite Matrix Singular Value Decomposition
29/57 Chen P Positive Definite Matrix basic idea of singular value decomposition Let A be a real matrix. Both AT A and AAT are square matrices A can be expressed by the eigenvalues of AT A and the eigenvectors of AT A and AAT This is called singular value decomposition (SVD) SVD is an extension of eigenvalue decomposition
30/57 Chen P Positive Definite Matrix AT A and AAT Let A be a real matrix. AT A and AAT are positive semi-definite They have only real and non-negative eigenvalues
xT AT A x = (Ax)T (Ax) = kAxk2 ≥ 0
T yT AAT y = AT y AT y = kAT yk2 ≥ 0
T T AT A = AT AT = AT A
31/57 Chen P Positive Definite Matrix Definition (singular value and singular vector) Let A be a real matrix. A singular value of A is the square root of a positive eigenvalue of AT A .
Let A be a real matrix and σ be a singular value of A. σ2 is an eigenvalue of AT A σ2 is also an eigenvalue of AAT There exists v 6= 0 such that AT A v = σ2v
There exists u 6= 0 such that AAT u = σ2u
u is a left singular vector and u is a right singular vector 32/57 Chen P Positive Definite Matrix Definition (singular space) Let A be a real matrix and σ be a singular value of A. The right singular space of A corresponding to σ is
n T 2 o Rσ(A) = v | A A v = σ v The left singular space of A corresponding to σ is
n T 2 o Lσ(A) = u | AA u = σ u
33/57 Chen P Positive Definite Matrix Lemma (linearly independent singular vectors∗) Let A be a real matrix of rank r. There exists a linearly inde- pendent set containing r right singular vectors of A.
Proof. Let A be of order m × n with rank r. Note AT A x = 0 ⇒ xT AT Ax = 0 ⇒ Ax = 0 ⇒ AT A x = 0
T Thus dim N A A = dim N(A) = n − r, and the multi- plicity of eigenvalue 0 of AT A is (n − r). The sum of the multiplicities of the non-zero eigenvalues of AT A is
n − (n − r) = r
So there are r linearly independent right singular vectors of A.
34/57 Chen P Positive Definite Matrix Theorem (singular value decomposition) Let A be a real matrix of order m × n. Then
A = UΣV T where Σ is an m × n ”diagonal” matrix with the singular values of A as the leading diagonal elements U is an m × m orthogonal matrix with the eigenvectors of AAT as columns V is an n × n orthogonal matrix with the eigenvectors of AT A as columns
35/57 Chen P Positive Definite Matrix construction of SVD Let A be a real matrix with rank r.
Find singular values σ1 . . . σr
Find right singular vectors v1 ... vr Avi Find left singular vectors ui = , since σi
T 2 T AA Avi Aσi vi 2 AA ui = = = σi ui σi σi
T Find orthonormal vr+1 ... vn in E0 A A T Find orthonormal ur+1 ... um in E0 AA Construct U = u1 ... um , V = v1 ... vn
36/57 Chen P Positive Definite Matrix We show U T AV = Σ, which then implies A = UΣV T . For Avj j = 1 . . . r, we have uj = or Avj = σjuj, so σj
T T T (U AV )ij = ui Avj = ui (σjuj) = σjδij, i = 1 . . . m
T For j = r + 1 . . . n, we have A A vj = 0 or Avj = 0, so
T T (U AV )ij = ui Avj = 0, i = 1 . . . m
Combining the results, we get
U T AV = Σ
Hence A = UΣV T
37/57 Chen P Positive Definite Matrix matrices in singular value decomposition Let A be a real matrix with SVD A = UΣV T . The column vectors in U form an eigenbasis of AAT
diagonal z }| { AAT = UΣV T V ΣT U T = U ΣΣT U T
The column vectors in V form an eigenbasis of AT A
diagonal z }| { AT A = V ΣT U T UΣV T = V ΣT Σ V T
38/57 Chen P Positive Definite Matrix singular vectors and fundamental subspaces∗ Let A be a real matrix with SVD A = UΣV T . The right singular vectors of A in V form an orthonormal T basis of C A The left singular vectors of A in U form an orthonormal basis of C (A)
39/57 Chen P Positive Definite Matrix Let A be of order m × n and rank r with SVD A = UΣV T . T vr+1,..., vn are eigenvectors of A A with eigenvalue 0 T A A vj = 0 vj = 0
T so they form a basis of N(A A) = N(A). It follows that v1,..., vr form a basis of the orthogonal complement of T N(A), i.e. C A . The first r columns of AV = UΣ means
Avi = σiui
Thus u1,..., ur are vectors in C (A). Since they are lin- early independent, they form a basis of C(A).
40/57 Chen P Positive Definite Matrix Example (singular value decomposition) Find an SVD of "−1 1 0# A = 0 −1 1
The eigenvalues of
1 −1 0 T A A = −1 2 −1 0 −1 1 are λ1 = 3, λ2 = 1, λ3 = 0. Hence the singular values of A are √ σ1 = 3, σ2 = 1
41/57 Chen P Positive Definite Matrix Orthonormal eigenvectors of AT A are
1 −1 1 1 1 1 v1 = √ −2 , v2 = √ 0 , v3 = √ 1 6 1 2 1 3 1
The corresponding left singular vectors of A are " # " # Av1 1 −1 Av2 1 1 u1 = = √ , u2 = = √ σ1 2 1 σ2 2 1
So √ √1 − √2 √1 " √1 √1 #" # 6 6 6 T − 3 0 0 1 1 A = UΣV = 2 2 − √ 0 √ √1 √1 0 1 0 2 2 2 2 √1 √1 √1 3 3 3
42/57 Chen P Positive Definite Matrix SVD for approximation Let A be a real matrix of rank r with SVD A = UΣV T . Then A = UΣV T T T = σ1u1v1 + ··· + σrurvr
= A1 + ··· + Ar
A is the sum of r real matrices of rank 1 An image of size 1000 × 1000 can be compressed with a rate of 90% when 50 terms are used
Data Compression with SVD 43/57 Chen P Positive Definite Matrix Theorem (pseudo-inverse∗ and SVD) Let A be a real matrix with SVD A = UΣV T . The pseudo inverse of A can be defined by
A+ = V Σ+U T
For any b, the minimum-length least-squares solution to Ax = b is x+ = A+b
44/57 Chen P Positive Definite Matrix 45/57 Chen P Positive Definite Matrix Minimum Principles∗
46/57 Chen P Positive Definite Matrix minimum principle for square system Let A be positive definite. Consider Ax = b. The system is non-singular
It can be solved by minimum principle: x0 is a solution of Ax = b if and only if it minimizes 1 P (x) = xT Ax − xT b 2 Note ∇ xT Ax = 2Ax, ∇ xT b = b so ∇P (x) = Ax − b
47/57 Chen P Positive Definite Matrix 48/57 Chen P Positive Definite Matrix minimum principle for over-determined system Let Ax = b be an over-determined system of linear equations. Such a system can be solved by minimum principle.
Specifically, the sum of squared errors as a function of x is
E(x) = kAx − bk2 = (Ax − b)T (Ax − b) = xT AT Ax − 2xT AT b + bT b
An x0 that achieves the minimum of E(x) satisfies
∇E(x) = 0 x=x0 That is T T A Ax0 = A b 49/57 Chen P Positive Definite Matrix Definition (Rayleigh quotient) Let A be a symmetric matrix. The Rayleigh quotient of A is
xT Ax R(x) = xT x
Let Q = q1,..., qn be an orthonormal eigenbasis of A, corresponding to eigenvalues λ1 ≤ · · · ≤ λn P For any x = xiqi, we have i T P x q A P x q xT Ax i i i i R(x) = = i i T T x x P P xiqi xiqi i i P 2 λixi i = P 2 xi i 50/57 Chen P Positive Definite Matrix Theorem (extremum of Rayleigh quotient)
Let A be a symmetric matrix and Q = {q1,..., qn} be or- thonormal eigenbasis of A, corresponding to eigenvalues
λ1 ≤ · · · ≤ λn
The global minimum of the Rayleigh quotient of A is λ1
The global maximum of the Rayleigh quotient of A is λn
The minimum λ1 is attained by q1 (i.e. [xQ] = {δi,1})
λ1 = min R(x) x
The maximum λn is attained by qn (i.e. [xQ] = {δi,n})
λn = max R(x) x
51/57 Chen P Positive Definite Matrix diagonal elements and eigenvalues Let A be a symmetric matrix.
For a unit vector along coordinate axis, R(ei) = aii
Thus aii is bounded by eigenvalues
λ1 ≤ aii ≤ λn
In the cases of all positive eigenvalues for A, we have 1 1 1 √ ≤ √ ≤ √ λn aii λ1
The intercept of ellipsoid xT Ax = 1 along a coordinate axis is bounded
52/57 Chen P Positive Definite Matrix 53/57 Chen P Positive Definite Matrix Theorem (saddle points of Rayleigh quotient)
Let A be a symmetric matrix and Q = {q1,..., qn} be or- thonormal eigenbasis of A, corresponding to eigenvalues
λ1 ≤ · · · ≤ λn
The eigenvectors q2,..., qn−1 are saddle points of R(x).
Consider q2 for example.
If we move from q2 along q1, R(x) decreases
If we move from q2 along q2, R(x) does not change
If we move from q2 along q3, R(x) increases
54/57 Chen P Positive Definite Matrix Theorem (Rayleigh quotient in a hyperplane) Let A be a symmetric matrix with orthonormal eigenvectors q1,..., qn and eigenvalues λ1 ≤ · · · ≤ λn. λ2 = max min R(x) v x∈v⊥ Let v be a vector and consider R(x) in the subspace v⊥.
For v = q1 λ2 = min R(x) ⊥ x∈q1 Given v, R(x) can be smaller as x can have component
along q1 λ2 ≥ min R(x) x∈v⊥ Thus λ2 = max min R(x) v x∈v⊥ 55/57 Chen P Positive Definite Matrix Corollary (Rayleigh quotient in a subspace) For the maximum in a hyperplane, we have
λn−1 ≤ max R(x) x∈v⊥ λn−1 = min max R(x) v x∈v⊥ Let V be a subspace of dimension j. We have λj+1 = max min R(x) V x∈V⊥ λn−j = min max R(x) V x∈V⊥
56/57 Chen P Positive Definite Matrix Theorem (intertwining of eigenvalues) Let A be a real symmetric matrix and B be (n − 1) × (n − 1) matrix formed by stripping the last row and column of A.
λ1(A) ≤ λ1(B) ≤ λ2(A) ≤ λ2(B) ≤ · · · ≤ λn−1(B) ≤ λn(A)
Example (intertwining of eigenvalues) 2 −1 0 " 2 −1# A = −1 2 −1 , B = −1 2 0 −1 2 √ √ λ1(A) = 2 − 2, λ2(A) = 2, λ3(A) = 2 + 2
λ1(B) = 1, λ2(B) = 3
57/57 Chen P Positive Definite Matrix