Lectures on Dynamic Systems and
Control
Mohammed Dahleh Munther A. Dahleh George Verghese
Department of Electrical Engineering and Computer Science
1
Massachuasetts Institute of Technology
1
c �
Chapter 4
Matrix Norms and Singular Value
Decomp osition
4.1 Intro duction
In this lecture, we intro duce the notion of a norm for matrices. The singular value decom-
position or SVD of a matrix is then presented. The SVD exp oses the 2-norm of a matrix,
value to us go es much further: it enables the solution of a class of matrix perturbation but its
problems that form the basis for the stability robustness concepts intro duced later� it solves
- least squares problem, which is a generalization of the least squares estima the so-called total
tion problem considered earlier� and it allows us to clarify the notion of conditioning, in the
context of matrix inversion. These applications of the SVD are presented at greater length in
the next lecture.
Example 4.1 To provide some immediate motivation for the study and applica-
tion of matrix norms, we b egin with an example that clearly brings out the issue
resp ect to inversion. The question of interest is how of matrix conditioning with
sensitive the inverse of a matrix is to p erturbations of the matrix.
Consider inverting the matrix
� �
100 100
A � (4.1)
100:2 100
A quick calculation shows that
� �
;5 5
;1
A � (4.2)
5:01 ;5
Now supp ose we invert the p erturb ed matrix
� �
100 100
A + �A � (4.3)
100:1 100
The result now is
� �
;10 10
;1 ;1 ;1
(A + �A) � A + �(A ) � (4.4)
10:01 ;10
;1
Here �A denotes the p erturbation in A and �(A ) denotes the resulting p er-
;1
turbation in A . Evidently a 0.1% change in one entry of A has resulted in a
;1
. If we want to solve the problem Ax � b where change in the entries of A 100%
T ;1 T
, then x � A b � [;10 10:01] , while after p erturbation of A we 1] b � [1 ;
;1 T
b � [;20 20:01] . Again, we see a 100% change in the + �x � (A + �A) get x
entries of the solution with only a 0.1% change in the starting data.
The situation seen in the ab ove example is much worse than what can ever arise in the
;1 ;1
)�(a ) � ;da�a, so the fractional change in the d(a scalar case. If a is a scalar, then
inverse of a has the same maginitude as the fractional change in a itself. What is seen in the
ab ove example, therefore, is a purely matrix phenomenon. It would seem to b e related to
dep endent, its the fact that A is nearly singular | in the sense that its columns are nearly
determinant is much smaller than its largest element, and so on. In what follows (see next
we shall develop a sound way to measure nearness to singularity, and show how this lecture),
sensitivity under inversion. measure relates to
Before understanding such sensitivity to p erturbations in more detail, we need ways to
vectors and matrices. We have already intro duced the notion measure the \magnitudes" of
vector norms in Lecture 1, so we now turn to the de�nition of matrix norms. of
4.2 Matrix Norms
An m � n complex matrix may b e viewed as an op erator on the (�nite dimensional) normed
n
vector space C :
m�n n m
A : ( C � k � k ) ;! ( C � k � k ) (4.5)
2 2
where the norm here is taken to b e the standard Euclidean norm. De�ne the induced 2-norm
follows: of A as
kAxk
4
2
kAk � sup (4.6)
2
kxk
x�06
2
: (4.7) � max kAxk
2
kxk �1
2
The term \induced" refers to the fact that the de�nition of a norm for vectors such as Ax and
ab ove de�nition of a matrix norm. From this de�nition, it follows that x is what enables the
amount of \ampli�cation" the matrix A provides to vectors the induced norm measures the
n
on the unit sphere in C , i.e. it measures the \gain" of the matrix.
Rather than measuring the vectors x and Ax using the 2-norm, we could use any p-norm,
interesting cases b eing p � 1� 2� 1. Our notation for this is the
kAk � max kAxk : (4.8)
p p
kxk �1
p
An imp ortant question to consider is whether or not the induced norm is actually a norm,
vectors in Lecture 1. Recall the three conditions that de�ne a norm: in the sense de�ned for
1. kxk � 0, and kxk � 0 () x � 0�
2. k�xk � j�j kxk �
3. kx + y k � kxk + ky k .
m�n
is a norm on C | using the preceding de�nition: Now let us verify that kAk
p
1. kAk � 0 since kAxk � 0 for any x. Futhermore, kAk � 0 () A � 0, since kAk is
p p p p
calculated from the maximum of kAxk evaluated on the unit sphere.
p
2. k�Ak � j�j kAk follows from k�y k � j�j ky k (for any y ).
p p p p
3. The triangle inequality holds since:
kA + B k � max k(A + B )xk
p p
kxk �1
p
� �
� max kAxk + kB xk
p p
kxk �1
p
� kAk + kB k :
p p
Induced norms have two additional prop erties that are very imp ortant:
1. kAxk � kAk kxk , which is a direct consequence of the de�nition of an induced norm�
p p p
m�n n�r
2. For A , B ,
kAB k � kAk kB k (4.9)
p p p
which is called the submultiplicative property. This also follows directly from the de�ni-
tion:
kAB xk � kAk kB xk
p p p
� kAk kB k kxk for any x:
p p p
Dividing by kxk :
p
kAB xk
p
� kAk kB k �
p p
kxk
p
from which the result follows.
Before we turn to a more detailed study of ideas surrounding the induced 2-norm, which
b e the fo cus of this lecture and the next, we make some remarks ab out the other induced will
interest, namely the induced 1-norm and induced 1-norm. We shall also norms of practical
say something ab out an imp ortant matrix norm that is not an induced norm, namely the
Frobenius norm.
It is a fairly simple exercise to prove that
m
X
kAk � max ja j (max of absolute column sums of A) � (4.10)
ij
1
1�j �n
i�1
and
n
X
j (max of absolute kAk � max ja row sums of A) : (4.11)
ij
1
1�i�m
j �1
(Note that these de�nitions reduce to the familiar ones for the 1-norm and 1-norm of column
vectors in the case n � 1.)
The pro of for the induced 1-norm involves two stages, namely:
1. Prove that the quantity in Equation (4.11) provides an upp er b ound � :
kAxk � � kxk 8x �
1 1
2. Show that this b ound is achievable for some x � x^ :
kAx^ k � � kx^ k for some x^ :
1 1
In order to show how these steps can b e implemented, we give the details for the 1-norm
n
and consider case. Let x 2 C
n
X
kAxk � max j a x j
ij j 1
1�i�m
j �1
n
X
� max ja jjx j
ij j
1�i�m
j �1
0 1
n
X
@ A
� max ja j max jx j
ij j
1�i�m 1�j �n
j �1
0 1
n
X
@ A
� max ja j kxk
ij 1
1�i�m
j �1
The ab ove inequalities show that an upp er b ound � is given by
n
X
max kAxk � � � max ja j:
ij 1
1�i�m
kxk �1
1
j �1
�
Now in order to show that this upp er b ound is achieved by some vector x^ , let i b e an index
P
n
at which the expression of � achieves a maximum, that is � � ja j. De�ne the vector x^
�
j �1 ij
as
2 3
sg n(a )
�
i 1
6 7
) sg n(a
6 7 �
i 2
6 7
x^ � :
.
6 7
.
.
4 5
) sg n(a
�
in
Clearly kx^ k � 1 and
1
n
X
kAx^ k � ja j � � :
�
1
ij
j �1
The pro of for the 1-norm pro ceeds in exactly the same way, and is left to the reader.
There are matrix norms | i.e. functions that satisfy the three de�ning conditions stated
imp ortant example of this for us is the earlier | that are not induced norms. The most
Frob enius norm:
1
0 1
2
n m
X X
4
2
@ A
kAk � ja j (4.12)
ij
F
j �1 i�1
1
; �
0
2
� trace (A A) (verify) (4.13)
In other words, the Frob enius norm is de�ned as the ro ot sum of squares of the entries, i.e.
mn
the usual Euclidean 2-norm of the matrix when it is regarded simply as a vector in C .
b e shown that it is not an induced matrix norm, the Frob enius norm still has Although it can
submultiplicative prop erty that was noted for induced norms. Yet other matrix norms the
may b e de�ned (some of them without the submultiplicative prop erty), but the ones ab ove
interest to us. are the only ones of
4.3 Singular Value Decomp osition
Before we discuss the singular value decomp osition of matrices, we b egin with some matrix
facts and de�nitions.
Some Matrix Facts:
n�n 0 0
� A matrix U 2 C is unitary if U U � U U � I . Here, as in Matlab, the sup erscript
0
(entry-by-entry) complex conjugate of the transpose, which is also called denotes the
transpose or conjugate transpose. the Hermitian
n�n T T T
� A matrix U 2 R is orthogonal if U U � U U � I , where the sup erscript denotes
transp ose. the
� Prop erty: If U is unitary, then kU xk � kxk .
2 2
0
� If S � S (i.e. S equals its Hermitian transp ose, in which case we say S is Hermitian),
0 1
S U � [diagonal matrix]. such that U then there exists a unitary matrix
0 0
� For any matrix A, b oth A A and AA are Hermitian, and thus can always b e diagonalized
by unitary matrices.
0 0
� For any matrix A, the eigenvalues of A A and AA are always real and non-negative
(proved easily by contradiction).
m�n
Theorem 4.1 (Singular Value Decomp osition, or SVD) Given any matrix A 2 C ,
b e written as A can
n�n
m�m m�n
0
A � V � (4.14)
U �
0 0
where U U � I , V V � I ,
2 3
�
1
6 7
.
.
6 7
.
0
6 7
6 7
� � � (4.15)
�
6 7
r
6 7
4 5
0 0
p
0
and � � ith nonzero eigenvalue of A A. The � are termed the singular values of A, and
i i
are arranged in order of descending magnitude, i.e.,
� � � � � � � � � 0 : �
1 2 r
Pro of: We will prove this theorem for the case rank(A) � m� the general case involves very
0
is Hermitian, and it can AA little more than what is required for this case. The matrix
m�m
, so that b e diagonalized by a unitary matrix U 2 C therefore
0 0
U � U � AA :
1
) has real Note that � � diag (� � � � : : : � � p ositive diagonal entries � due to the fact that
1 2 m 1 i
0 2 2 2 m�n 0 2
AA is p ositive de�nite. We can write � � � � diag (� � � � : : : � � ). De�ne V 2 C
1
m
1 1 2 1
;1
0 0 0
by V � � U A. V has orthonormal rows as can b e seen from the following calculation:
1 1
1
;1 ;1
0 0 0 0
V V � � U AA U � � I . Cho ose the matrix V in such a way that
1
1 2
1 1
" #
0
V
0
1
V �
0
V
2
n�n
is in C and unitary. De�ne the m � n matrix � � [� j0]. This implies that
1
0 0 0
� � V � U A: �V
1
1
0
words we have A � U �V . In other
1
One cannot always diagonalize an arbitrary matrix|cf the Jordan form.
Example 4.2 For the matrix A given at the b eginning of this lecture, the SVD
by writing [u� s� v ] � sv d(A) | is | computed easily in Matlab
� � � � � �
:7068 :7075 200:1 0 :7075 :7068
A � (4.16)
:7075 : ; 7068 0 0:1 ;:7068 :7075
Observations:
i)
0 T 0 0
AA � U �V � U V
T 0
� U �� U
2 3
2
�
1
6 7
.
.
6 7
.
0
6 7
0
6 7
2
� U U � (4.17)
�
6 7
r
6 7
4 5
0 0
0
which tells us U diagonalizes AA �
ii)
0 T 0 0
A A � V � U U �V
T 0
� V � �V
2 3
2
�
1
6 7
.
.
6 7
.
0
6 7
0
6 7
2
� V V � (4.18)
�
6 7
r
6 7
4 5
0 0
0
which tells us V diagonalizes A A�
i.e., iii) If U and V are expressed in terms of their columns,
h i
U � u u � � � u
1 2 m
and
h i
V � v v � � � v �
1 2 n
then
r
X
0
� (4.19) � u v A �
i i
i
i�1
which is another way to write the SVD. The u are termed the left singular vectors of
i
A, and the v are its right singular vectors. From this we see that we can alternately
i
interpret Ax as
r
X
; �
0
v x � u Ax � � (4.20)
i i
i
| {z }
i�1
pro jection
, where the which is a weighted sum of the u weights are the pro ducts of the singular
i
. values and the pro jections of x onto the v
i
P
r
� g Observation (iii) tells us that Ra(A) � span fu : : : u (b ecause Ax � c u |
i i 1 r
i�1
c are scalar weights). Since the columns of U are indep endent, dim Ra(A) � r � rank (A), where the
i
� g constitute a f u : : : u basis for the range space of A. The null space of A is given by and
1 r
spanfv � : : : � v g. To see this:
r +1 n
0 0
U �V x � 0 () �V x � 0
2 3
0
� v x
1
1
6 7
.
.
() � 0
4 5
.
0
� v x
r
r
0
() v x � 0 � i � 1� : : : � r
i
() x 2 spanfv � : : : � v g:
r +1 n
Example 4.3 One application of singular value decomp osition is to the solution
Supp ose A is an m � n complex matrix and b of a system of algebraic equations.
m
is a vector in C . Assume that the rank of A is equal to k , with k � m. We are
lo oking for a solution of the linear system Ax � b. By applying the singular value
decomp osition pro cedure to A, we get
0
A � U �V
2 3
� 0
1
6 7
0
� U V
4 5
0 0
where � is a k � k non-singular diagonal matrix. We will express the unitary
1
columnwise as matrices U and V
h i
U � u u : : : u
1 2 m
h i
V � v v : : : v :
1 2 n
A necessary and su�cient condition for the solvability of this system of equations
0
b � 0 for all i satisfying k � i � m. Otherwise, the system of equations u is that
i
inconsistent. This condition means that the vector b must b e orthogonal to the is
last m ; k columns of U . Therefore the system of linear equations can b e written
as
2 3
� 0
1
6 7
0 0
x � U b V
4 5
0 0
2 3
0
2 3
b u
0 1
u b
1
6 7
.
.
6 7 6 7
2 3
0
.
u b
6 7 6 7
2
� 0
1
6 7 6 7
0
.
u b
6 7 6 7 6 7
0
.
k
V x � . � :
6 7 6 7
4 5
6 7 6 7
0
.
6 7 6 7
0 0
.
.
6 7
4 5
.
.
4 5
.
0
u b
m
0
, Using the ab ove equation and the invertibility of � we can rewrite the system of
1
equations as
2 3
2 3
0
1
0
v
u b
1
1
�
1
6 7
0
6 7
1
0
v
6 7
u b
2
6 7
2
�
6 7
2
x �
6 7
.
6 7
.
4 5
: : :
.
4 5
1
0
0
u b
v
k
�
k
k
By using the fact that
2 3
0
v
1
6 7
0
h i
v
6 7
2
6 7
v : : : v � I � v
.
2 1
k
6 7
.
.
4 5
0
v
k
we obtain a solution of the form
k
X
1
0
x � u b v :
i
i
�
i
i�1
From the observations that were made earlier, we know that the vectors v � v � : : : � v
n
k +1 k +2
- kernel of A, and therefore a general solution of the system of linear equa span the
tions is given by
k n
X X
1
0
x � � (u b) v + � v
i i i
i
�
i
i�1
i�k +1
, with i in the where the co e�cients � interval k + 1 � i � n, are arbitrary complex
i
numb ers.
4.4 Relationship to Matrix Norms
The singular value decomp osition can b e used to compute the induced 2-norm of a matrix A.
Theorem 4.2
kAxk
4
2
kAk � sup
2
kxk
x�06
2
� � (4.21)
1
� (A) � �
max
which tells us that the maximum ampli�cation is given by the maximum singular value.
Pro of:
0
kAxk kU �V xk
2 2
sup � sup
kxk kxk
x�06 x6�0
2 2
0
xk k�V
2
� sup
kxk
x�06
2
k�y k
2
� sup
kV y k
y �06
2
1
� �
P
2
2
r
2
� jy j
i
i�1
i
� sup
1
� �
P
2
y �06 2
r
jy j
i
i�1
� � :
1
T
, and the For y � [1 0 � � � 0] , k�y k � � supremum is attained. (Notice that this correp onds
1
2
. Hence, .) v Av � � u to x �
1 1 1 1
Another application of the singular value decomp osition is in computing the minimal
elements with 2-norm equal to 1. ampli�cation a full rank matrix exerts on
m�n
Theorem 4.3 Given A 2 C , supp ose rank(A) � n. Then
min kAxk � � (A) : (4.22)
n
2
kxk �1
2
Note that if rank(A) � n, then there is an x such that the minimum is zero (rewrite A in
terms of its SVD to see this).
� 1, Pro of: For any kxk
2
0
kAxk � kU �V xk
2 2
0
� k�V xk (invariant under multiplication by unitary matrices)
2
� k�y k
2 2x2 A v A v 2 2 A v 1
v
1
2�2
Figure 4.1: Graphical depiction of the mapping involving A . Note that Av � � u and
1 1 1
Av � � u . that
2 2 2
0
for y � V x. Now
1
� !
n
2
X
2
k�y k � j� y j
i i
2
i�1
� : �
n
T
Note that the minimum is achieved for y � [0 0 � � � 0 1] � thus the pro of is complete.
The Frob enius norm can also b e expressed quite simply in terms of the singular values.
We leave you to verify that
1
0 1
2
n m
X X
4
2
@ A
kAk � ja j
ij
F
j �1 i�1
1
; �
0
2
� trace (A A)
1
� !
r
2
X
2
� � (4.23)
i
i�1
Example 4.4 Matrix Inequality
We say A � B , two square matrices, if
0 0
x Ax � x B x for all x 6� 0:
follows that for any matrix A, not necessarily square, It
0 2
A � � I : kAk � � $ A
2
Exercises
Exercise 4.1 Verify that for any A, an m � n matrix, the following holds:
p
1
p
: mkAk kAk � kAk �
1 1 2
n
0
Exercise 4.2 Supp ose A � A. Find the exact relation b etween the eigenvalues and singular values
A. Do es this hold if A is not conjugate symmetric� of
. Exercise 4.3 Show that if rank(A) � 1, then, kAk � kAk
2 F
Exercise 4.4 This problem leads you through the argument for the existence of the SVD, using an
0
, where U and V are unitary matrices is equivalent to iterative construction. Showing that A � U �V
0
AV � �. showing that U
n
a) Argue from the de�nition of kAk that there exist unit vectors (measured in the 2-norm) x 2 C
2
m
. and y 2 C such that Ax � � y , where � � kAk
2
b) We can extend b oth x and y ab ove to orthonormal bases, i.e. we can �nd unitary matrices
V and U whose �rst columns are x and y resp ectively:
1 1
~ ~
] V � [x V � U � [y U ]
1 1 1 1
Show that one way to do this is via Householder transformations, as follows:
0
hh
0
� I ; 2 V � h � x ; [1� 0� : : : � 0]
1
0
h h
. likewise for U and
1
0
. � Now de�ne A � U AV Why is kA k � kAk c)
1 1 2 1 2
1
d) Note that
� � � �
0 0 0
~
y Ax y AV � w
1
A � �
1
0 0
~ ~ ~
0 B U Ax U AV
1
1 1
lower left element in the ab ove matrix is 0� What is the justi�cation for claiming that the
e) Now show that
� �
�
2 0
kA k � � + w w
1 2
w
combine this with the fact that kA k � kAk � � to deduce that w � 0, so and
1 2 2
� �
� 0
� A
1
0 B
At the next iteration, we apply the ab ove pro cedure to B , and so on. When the iterations terminate,
we have the SVD.
[The reason that this is only an existence pro of and not an algorithm is that it b egins by invoking
y , but do es not show how to compute them. Very go o d algorithms do exist for the existence of x and
Van Loan's classic, Matrix Computations, Johns Hopkins Press, computing the SVD | see Golub and
numerical computations in a host of applications.] 1989. The SVD is a cornerstone of
Exercise 4.5 Supp ose the m � n matrix A is decomp osed in the form
� �
� 0
0
A � U V
0 0
where U and V are unitary matrices, and � is an invertible r � r matrix (| the SVD could b e used to
pro duce such a decomp osition). Then the \Mo ore-Penrose inverse", or pseudo-inverse of A, denoted
+
, can b e de�ned as the n � m matrix by A
� �
;1
� 0
+ 0
A � V U
0 0
(You can invoke it in Matlab with pinv(A).)
+ + + + + +
a) Show that A A and AA are symmetric, and that AA A � A and A AA � A . (These
alternative de�nition of the pseudo-inverse.) four conditions actually constitute an
+ 0 ;1 0
b) Show that when A has full column rank then A � (A A) A , and that when A has full
+ 0 0 ;1
� A (AA ) . row rank then A
c) Show that, of all x that minimize ky ; Axk (and there will b e many, if A do es not have full
2
+
A y . x � kxk is given by ^ column rank), the one with smallest length
2
Exercise 4.6 All the matrices in this problem are real. Supp ose
� �
R
A � Q
0
with Q b eing an m � m orthogonal matrix and R an n � n invertible matrix. (Recall that such a
decomp osition exists for any matrix A that has full column rank.) Also let Y b e an m � p matrix of
the form
� �
Y
1
Y � Q
Y
2
where the partitioning in the expression for Y is conformable with the partitioning for A.
^
(a) What choice X of the n � p matrix X minimizes the Frob enius norm, or equivalently the squared
Frob enius norm, of Y ; AX � In other words, �nd
2
^
X � argmin kY ; AX k
F
2
^
Also determine the value of kY ; AX k . (Your answers should b e expressed in terms of the
F
.) Q, R , Y and Y matrices
2 1
0 ;1 0 + +
^
(b) Can your X in (a) also b e written as (A A) A Y � Can it b e written as A Y , where A denotes
(Mo ore-Penrose) pseudo-inverse of A � the
(c) Now obtain an expression for the choice X of X that minimizes
2 2
kY ; AX k + kZ ; B X k
F F
where Z and B are given matrices of appropriate dimensions. (Your answer can b e expressed in
A, B , Y , and Z .) terms of
Exercise 4.7 Structured Singular Values
Given a complex square matrix A, de�ne the structured singular value function as follows.
1
� (A) �
�
(�) j min f� det(I ; �A) � 0g
�2� max
where � is some set of matrices.
a) If � � f�I : � 2 C g, show that � (A) � �(A), where � is the spectral radius of A, de�ned
�
j and the 's are the �(A) � max j� � eigenvalues of A. as:
i i i
n�n
b) If � � f� 2 C g, show that � (A) � � (A)
� max
c) If � � � fdiag (� � � � � � ) j � 2 C g, show that
1 n i
;1 ;1
�(A) � � AD ) � � (D AD ) (A) � � (D
� max �
where
� 2 fdiag(d � � � � d ) j d � 0g D
1 n i
Exercise 4.8 Consider again the structured singular value function of a complex square matrix A
p ossible to compute de�ned in the preceding problem. If has more structure, it is sometimes � A (A)
�
0
exactly. In this problem, assume A is a rank-one matrix, so that we can write A � uv where u� v are
� Compute vectors of dimension n. complex (A) when
�
(a) � � diag (� : : : � � )� � 2 C . �
1 n i
(b) � � diag (� : : : � � )� � 2 R . �
1 n i
To simplify the computation, minimize the Frob enius norm of � in the de�ntion of � (A). � MIT OpenCourseWare http://ocw.mit.edu
6.241J / 16.338J Dynamic Systems and Control Spring 2011
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.