Inner Products and Norms (part III)
Prof. Dan A. Simovici
UMB
1 / 74 Outline
1 Approximating Subspaces
2 Gram Matrices
3 The Gram-Schmidt Orthogonalization Algorithm
4 QR Decomposition of Matrices
5 Gram-Schmidt Algorithm in R
2 / 74 Approximating Subspaces
Definition A subspace T of a inner product linear space is an approximating subspace if for every x ∈ L there is a unique element in T that is closest to x.
Theorem Let T be a subspace in the inner product space L. If x ∈ L and t ∈ T , then x − t ∈ T ⊥ if and only if t is the unique element of T closest to x.
3 / 74 Approximating Subspaces Proof
Suppose that x − t ∈ T ⊥. Then, for any u ∈ T we have
k x − u k2=k (x − t) + (t − u) k2=k x − t k2 + k t − u k2,
by observing that x − t ∈ T ⊥ and t − u ∈ T and applying Pythagora’s 2 2 Theorem to x − t and t − u. Therefore, we have k x − u k >k x − t k , so t is the unique element of T closest to x.
4 / 74 Approximating Subspaces Proof (cont’d)
Conversely, suppose that t is the unique element of T closest to x and x − t 6∈ T ⊥, that is, there exists u ∈ T such that (x − t, u) 6= 0. This implies, of course, that u 6= 0L. We have k x − (t + au) k2=k x − t − au k2=k x − t k2 −2(x − t, au) + |a|2 k u k2 . 2 2 Since k x − (t + au) k >k x − t k (by the definition of t), we have 2 2 1 −2(x − t, au) + |a| k u k > 0 for every a ∈ F. For a = kuk2 (x − t, u) we have: 1 1 −2(x − t, (x − t, u)u) + | (x − t, u)|2 k u k2 k u k2 k u k2 1 1 = −2(x − t, (x − t, u)u) + | (x − t, u)|2 k u k2 k u k2 k u k2 |(x − t, u)|2 |(x − t, u)|2 = −2 + k u k2 k u k2 |(x − t, u)|2 = − 0, k u k2 > which is a contradiction. 5 / 74 Approximating Subspaces
Theorem A subspace T of an inner product linear space L is an approximating ⊥ subspace L if and only if L = T T .
6 / 74 Approximating Subspaces Proof
Let T be an approximating subspace of L and let x ∈ L. We have x − t ∈ T ⊥, where t is the element of T that best approximates x. If y = x − t, we can write x uniquely as x = t + y, where t ∈ T and ⊥ ⊥ y ∈ T , so L = T T . ⊥ Conversely, suppose that L = T T , where T is a subspace of L. Every x ∈ L can be uniquely written as x = t + y, where t ∈ T and y ∈ T ⊥, so x − t ∈ T ⊥. Thus t is the element in T that is closest to x, so T is an approximating subspace of L.
7 / 74 Approximating Subspaces
Theorem Any subspace T of a finite-dimensional inner product linear space L is an approximating subspace of L.
8 / 74 Approximating Subspaces
⊥ Let T be a subspace of L. It suffices to show that L = T T . ⊥ If T = {0L}, then T = L and the statement is immediate. Therefore, we can assume that T 6= {0L}. We need to verify only that every x ∈ L can be uniquely written as a sum x = t + v, where t ∈ T and v ∈ T ⊥. Let t1,..., tm be an orthonormal basis of T , that is, a basis such that ( 1 if i = j, (ti , tj ) = 0 otherwise, for 1 6 i, j > m. Define t = (x, t1)t1 + ··· + (x, tm)tm and v = x − t.
9 / 74 Approximating Subspaces Proof (cont’d)
The vector v is orthogonal to every vector ti because
(v, ti ) = (x − t, ti ) = (x, ti ) − (t, ti ) = 0.
Therefore v ∈ T ⊥ and x has the necessary decomposition. To prove that the decomposition is unique suppose that x = s + w, where s ∈ T and ⊥ w ∈ T⊥. Since s + w = t + v we have s − t = v − w ∈ T ∩ T = {0L}, which implies s = t and w = v.
10 / 74 Approximating Subspaces
Theorem Let T be a subspace of an inner product space L. We have (T ⊥)⊥ = T .
11 / 74 Approximating Subspaces
Observe that T ⊆ (T ⊥)⊥. Indeed, if t ∈ T , then (t, z) = 0 for every z ∈ T ⊥, so t ∈ (T ⊥)⊥. To prove the reverse inclusion, let x ∈ (T ⊥)⊥. We can write x = u + v, where u ∈ T and v ∈ T ⊥, so x − u = v ∈ T ⊥. Since T ⊆ (T ⊥)⊥, we have u ∈ (T ⊥)⊥, so x − u ∈ (T ⊥)⊥. Consequently, x − u ∈ T ⊥ ∩ (T ⊥)⊥ = {0}, so x = u ∈ T . Thus, (T ⊥)⊥ ⊆ T , which concludes the argument.
12 / 74 Approximating Subspaces
Corollary n ⊥ ⊥ Let Z be a subset of R . We have (Z ) = hZi.
13 / 74 Approximating Subspaces Proof
n ⊥ ⊥ Let Z be a subset of R . Since Z ⊆ hZi it follows that hZi ⊆ Z . Let ⊥ now y ∈ Z and let z = a1z1 + ··· + apzp ∈ hZi, where z1,..., zp ∈ Z. Since (y, z) = a1(y, z1) + ··· + ap(y, zp) = 0, it follows that y ∈ hZi⊥. Thus, we have Z ⊥ = hZi⊥. ⊥ ⊥ ⊥ ⊥ n This allows us to write (Z ) = (hZi ) . Since hZi is a subspace of R , we have (hZi⊥)⊥ = hZi, so (Z ⊥)⊥ = hZi.
14 / 74 Approximating Subspaces
Let W = {w1,..., wn} be a basis in the real n-dimensional inner product space L. If x = x1w1 + ··· + xnwn and y = y1w1 + ··· + ynwn, then
n n X X (x, y) = xi yj (wi , wj ), i=1 j=1 due to the bilinearity of the inner product. n×n Let A = (aij ) ∈ R be the matrix defined by aij = (wi , wj ) for 1 6 i, j 6 n. The symmetry of the inner product implies that the matrix A itself is symmetric. Now, the inner product can be expressed as y1 . (x, y) = (x1,..., xn)A . . yn
We refer to A as the matrix associated with W .
15 / 74 Approximating Subspaces
Theorem n Let S be a subspace of R such that dim(S) = k. There exists a matrix n×k A ∈ R having orthonormal columns such that S = Ran(A).
16 / 74 Approximating Subspaces Proof
Let v1,..., vk be an orthonormal basis of S. Define the matrix A as A = (v1,..., vk ). We have x ∈ S, if and only if x = a1v1 + ··· + ak vk , which is equivalent to x = Aa. This amounts to x ∈ Ran(A), so S = Ran(A).
17 / 74 Approximating Subspaces
For an orthonormal basis in an n-dimensional space, the associated matrix is the diagonal matrix In. In this case, we have
(x, y) = x1y1 + x2y2 + ··· + xnyn for x, y ∈ L. Observe that if W = {w1,..., wn} is an orthonormal set and x ∈ hWi, which means that x = a1w1 + ··· + anwn, then ai = (x, wi ) for 1 6 i 6 n.
18 / 74 Approximating Subspaces
Let W = {w1,..., wn} be an orthonormal set and let x ∈ hWi. The equality x = (x, w1)w1 + ··· + (x, wn)wn is the Fourier expansion of x with respect to the orthonormal set W . Furthermore, we have Parseval’s equality:
n 2 X 2 k x k = (x, x) = (x, wi ) . i=1
Thus, if 1 6 q 6 n we have q X 2 2 (x, wi ) 6k x k . i=1
19 / 74 Gram Matrices Gram Matrices
Definition
Let V = (v1,..., vm) be a sequence of vectors in an inner product space. m×m The Gram matrix of this sequence is the matrix GV = (gij ) ∈ R defined by gij = (vi , vj ) for 1 6 i, j 6 m.
Note that GV is a symmetric matrix.
20 / 74 Gram Matrices
Theorem
Let V = (v1,..., vm) be a sequence of m vectors in an inner product linear space (L, (·, ·)). If {v1,..., vm} is linearly independent, then the Gram matrix GV is positive definite.
21 / 74 Gram Matrices Proof
m Suppose that V is linearly independent and let x ∈ R . We have
m m 0 X X x GVx = xi (vi , vj )xj i=1 j=1 m m X X = xi vi , xj vj i=1 j=1 m 2 X = xi vi > 0. i=1
0 Therefore, if x GVx = 0, we have x1v1 + ··· + xnvn = 0. Since {v1,..., vm} is linearly independent it follows that x1 = ··· = xm = 0, so x = 0n. Thus, GV is indeed, positive definite.
22 / 74 Gram Matrices
Example S Let S = {x1,..., xn} be a finite set, L a C-linear space and L be the linear space that consists of function defined on S with values in L. We defined the linear basis of this space as {e1,..., en} consisting of the functions ( 1 if x = xi , ei (x) = 0 otherwise, for 1 6 i 6 n. If E = (e1,..., en), the Gram matrix of E is positive definite Pn Pn and the inner product of two functions f = i=1 ai ei and g = j=1 bj ej is
n n ! X X (f , g) = ai ei , bj ej i=1 i=1 n n X X = ai (ei , ej )bj . i=1 i=1
23 / 74 Gram Matrices
The Gram matrix of an arbitrary sequence is positive semidefinite, as the reader can easily verify. Definition
Let V = (v1,..., vm) be a sequence of m elements of an inner product real linear space. The Gramian of V is the number det(GV).
24 / 74 Gram Matrices
Theorem
If V = (v1,..., vm) is a sequence of elements in a inner product real linear space, then V is linearly independent if and only if det(GV) 6= 0.
25 / 74 Gram Matrices Proof
Suppose that det(GV) 6= 0 and that V is not linearly independent. In other words, the numbers a1,..., am exists such that at least one of them is not 0 and a1x1 + ··· + amxm = 0. This implies the equalities
a1(x1, xj ) + ··· + am(xm, xj ) = 0
for 1 6 j 6 m, so the system GVa = 0n has a non-trivial solution in a1,..., am. This implies det(GV) = 0, which contradicts the initial assumption. Conversely, suppose that V is linearly independent and det(GV) = 0. Then, the linear system
a1(x1, xj ) + ··· + am(xm, xj ) = 0,
for 1 6 j 6 m, has a non-trivial solution in a1,..., am. If w = a1x1 + ··· amxm, this amounts to (w, xi ) = 0 for 1 6 i 6 m. This, in turn, implies (w, w) = 0,
so w = 0, which contradicts the linear independence of V. 26 / 74 Gram Matrices
If L is a inner product complex linear space, then the Gram matrix of V is a Hermitian matrix. n×n The next theorem shows that every positive definite matrix A ∈ C can be regarded as a Gram matrix of a vector sequence. It has a very important counterpart in the framework of Hilbert spaces known as Mercer’s Theorem. Theorem n×n (Cholesky’s Decomposition Theorem) Let A ∈ C be a Hermitian positive definite matrix. There exists a unique upper triangular matrix R with real, positive diagonal elements such that A = R HR.
27 / 74 Gram Matrices Proof
The argument is by induction on n > 1. The base step, n = 1, is immediate. Suppose that a decomposition exists for all Hermitian positive matrices of (n+1)×(n+1) order n, and let A ∈ C be a symmetric and positive definite matrix. We can write a aH A = 11 , a B n×n n×n where B ∈ C . Note that a11 > 0 and B ∈ C is a Hermitian positive definite matrix. It is easy to verify the identity: √ √ ! 0 1 H! a11 0 1 0 a11 √ a a11 A = 1 1 H . √ a In 0 B − aa a11 a11 0 In
28 / 74 Gram Matrices Proof (cont’d)
n×n Let R1 ∈ C be the upper triangular non-singular matrix √ ! √1 H a11 a a R1 = 11 . 0 In This allows us to write 0 H 1 0 A = R1 R1, 0 A1 where A = B − 1 aaH. Since 1 a11 0 1 0 −1 H −1 = (R1 ) AR1 , 0 A1 the matrix 1 00 0 A1 is positive definite, so A = B − 1 aaH ∈ n×n is a Hermitian positive 1 a11 C
definite matrix. 29 / 74 Gram Matrices Proof (cont’d)
By the inductive hypothesis, A1 can be factored as
H A1 = P P,
where P is an upper triangular matrix. Therefore
1 00 1 00 1 00 = H 0 A1 0 P 0 P
Thus, 1 00 1 00 A = R H R 1 0 P H 0 P 1
30 / 74 Gram Matrices Proof (cont’d)
If R is defined as √ ! √ ! 0 0 √1 H √1 H 1 0 1 0 a11 a a a11 a a R = R1 = 11 = 11 , 0 P 0 P 0 In 0 P
then A = R HR and R is clearly an upper triangular matrix. We refer to the matrix R as the Cholesky factor of A.
31 / 74 Gram Matrices
Corollary
n×n If A ∈ C is a Hermitian positive definite matrix, then det(A) > 0.
32 / 74 Gram Matrices Proof
We have A = R HR, where R is an upper triangular matrix with real, positive diagonal elements, so det(A) = det(R H) det(R) = (det(R))2. Since det(R) is the product of its diagonal elements, det(R) is a real, positive number, which implies det(A) > 0.
33 / 74 Gram Matrices
Example Let A be the symmetric matrix
3 0 2 A = 0 2 1 . 2 1 2
This matrix is positive definite and it can be written as √ √ 3 0 0 3 0 √2 1 0 0 3 A = 0 1 0 0 2 1 0 1 0 , √2 0 1 2 3 0 1 3 0 0 1 because 2 1 1 0 2 1 A1 = − 0 2 = 2 . 1 2 3 2 1 3
34 / 74 Gram Matrices Example (cont’d)
Applying the same equality to A1 we have √ ! √ ! 2 0 1 0 2 √1 A = 2 . 1 √1 1 1 2 0 6 0 1
1 Since the matrix ( 6 ) can be factored directly we have √ ! ! ! √ ! 2 0 1 0 1 0 2 √1 A = 2 1 √1 1 0 √1 0 √1 2 6 6 0 1 √ ! √ ! 2 √1 2 0 2 = 1 1 . √ √ 0 √1 2 6 6
35 / 74 Gram Matrices Example (cont’d)
In turn, this implies
1 0 0 1 0 0 1√ 0 0 √ 0 2 √1 0 2 1 = 0 2 0 2 , 1 1 0 1 2 0 √ √ 0 0 √1 3 2 6 6
36 / 74 Gram Matrices Example (cont’d)
The Cholesky final decomposition of A is √ √ 3 0 0 1 0 0 1 0 0 3 0 √2 √ √ 3 0 2 √1 A = 0 1 0 0 2 0 2 0 1 0 2 1 1 √ 0 1 0 √ √ 0 0 √1 3 2 6 6 0 0 1 √ √ 3 0 √2 3√ 0 0 √ 3 = 0 2 0 0 2 √1 . 2 2 1 1 √ √ √ 0 0 √1 3 2 6 6
37 / 74 Gram Matrices
Cholesky’s Decomposition Theorem can be extended to positive semi-definite matrices. Theorem (Cholesky’s Decomposition Theorem for Positive Semidefinite Matrices) n×n Let A ∈ C be a Hermitian positive semidefinite matrix. There exists an upper triangular matrix R with real, non-negative diagonal elements such that A = R HR.
38 / 74 Gram Matrices
Observe that for positive semidefinite matrices, the diagonal elements of R are non-negative numbers and the uniqueness of R does not longer hold. Example 1 −1 Let A = . Since x0Ax = (x − x )2, it is clear that A is a −1 1 1 2 positive semidefinite but not a positive definite matrix. Let R be a matrix of the form r r R = 1 0 r2 such that A = R0R. It is easy to see that the last equality is equivalent to 2 2 r1 = r2 = 1 and rr1 = −1. Thus, we have four distinct Cholesky factors, namely 1 −1 1 −1 −1 1 −1 1 , , , . 0 1 0 −1 0 1 0 −1
39 / 74 Gram Matrices
Theorem
n×n A Hermitian matrix A ∈ C is positive definite if and only if all its leading principal minors are positive.
40 / 74 Gram Matrices Proof
If A is positive definite, then every principal submatrix is positive definite, so each principal minor of A is positive. n×n Conversely, suppose that A ∈ C is an Hermitian matrix having positive leading principal minors. We prove by induction on n that A is positive definite. The base case, n = 1 is immediate. Suppose that the statement holds for (n−1)×(n−1) matrices in C . Note that A can be written as B b A = , bH a
(n−1)×(n−1) where B ∈ C is a Hermitian matrix. Since the leading minors of B are the first n − 1 leading minors of A it follows, by the inductive hypothesis, that B is positive definite. Thus, there exists a Cholesky decomposition B = R HR, where R is an upper triangular matrix with real, positive diagonal elements. Since R is invertible, let w = (R H)−1b.
41 / 74 Gram Matrices Proof (cont’d)
The matrix B is invertible. Therefore, we have det(A) = det(B)(a − bHB−1b) > 0. Since det(B) > 0 it follows that H −1 −1 a > b B b. We observed that if B is positive definite, then so is B . 2 Therefore, a > 0 is and we can write a = c for some positive c. This allows us to write R H 0 R w A = = C HC, wsH c 0H c
where C is the upper triangular matrix with positive elements
R w C = 0H c
This implies immediately the positive definiteness of A.
42 / 74 Gram Matrices
n×n Let A, B ∈ C . We write A B if A − B 0, that is, if A − B is a positive definite matrix. Similarly, we write A B if A − B O, that is, if A − B is positive semidefinite. Theorem
n×n Let A0, A1,..., Am be m + 1 matrices in C such that A0 is positive definite and all matrices are Hermitian. There exists a > 0 such that for m any t ∈ [−a, a] the matrix Bm(t) = A0 + A1t + ··· + Amt is positive definite.
43 / 74 Gram Matrices Proof
H Since all matrices A0,..., Am are Hermitian, note that x Ai x are real H numbers for 0 6 i 6 m. Therefore, pm(t) = x Bm(t)x is a polynomial in t H with real coefficients and pm(0) = x A0x is a positive number if x 6= 0. Since pm is a continuous function there exists an interval [−a, a] such that t ∈ [−a, a] implies pm(t) > 0 if x 6= 0. This shows that Bm(t) is positive definite.
44 / 74 The Gram-Schmidt Orthogonalization Algorithm
The Gram-Schmidt algorithm constructs an orthonormal basis for a n subspace U of C , starting from an arbitrary basis of {u1,..., um} of U. The orthonormal basis is constructed sequentially such that hw1,..., wki = hu1,..., uki for 1 6 k 6 m.
45 / 74 The Gram-Schmidt Orthogonalization Algorithm The Gram-Schmidt Algorithm
1 Note that W1 = hw1i = hu1i, which allows us to define w1 = u1. ku1k Note that k w1 k= 1. Suppose that we have constructed Wk as an orthonormal basis for hu1,..., uki and we seek to construct wk+1 such that {w1,..., wk , wk+1} is an orthonormal basis for hu1,..., uk , uk+1i. The Fourier expansion of uk+1 relative to the orthonormal basis {w1,..., wk , wk+1} is
k+1 X uk+1 = (uk+1, wi )wi , i=1 which implies that Pk uk+1 − i=1(uk+1, wi )wi wk+1 = (uk+1, wk+1)
46 / 74 The Gram-Schmidt Orthogonalization Algorithm
Note that by Fourier expansion of uk+1 with respect to the orthonormal set {w1,..., wk , wk+1} we have k 2 X 2 2 k uk+1 k = (uk+1, wi ) + (uk+1, wk+1) . i=1 Therefore, k 2 X uk+1 − (uk+1, wi )wi i=1 k k ! X X = uk+1 − (uk+1, wi )wi , uk+1 − (uk+1, wi )wi i=1 i=1 k k 2 X 2 X 2 = k uk+1 k −2 (uk+1, wi ) + (uk+1, wi ) i=1 i=1 k 2 X 2 2 = k uk+1 k − (uk+1, wi ) = (uk+1, wk+1) . i=1
47 / 74 The Gram-Schmidt Orthogonalization Algorithm
The last equalities imply
k X uk+1 − (uk+1, wi )wi = |(uk+1, wk+1)| i=1 It follows that we can define Pk uk+1 − i=1(uk+1, wi )wi wk+1 = , Pk uk+1 − i=1(uk+1, wi )wi or as Pk uk+1 − i=1(uk+1, wi )wi wk+1 = − . Pk uk+1 − i=1(uk+1, wi )wi
48 / 74 The Gram-Schmidt Orthogonalization Algorithm Example
3×2 Let A ∈ R be the matrix 1 1 A = 0 0 . 1 3
3 It is easy to see that rank(A) = 2. We have {u1, u2} ⊆ R and we construct an orthogonal basis for the subspace generated by these columns. We begin by defining √ 2 1 2 w = u = 0 . 1 1 √ k u1 k2 2 2
49 / 74 The Gram-Schmidt Orthogonalization Algorithm Example cont’d
u2−(u2,w1)w1 The second vector w2 is w2 = . Note that √ √ √ku2−(u2,w1)w1k 2 2 (u2, w1) = 1 · 2 + 3 · 2 = 2 2, so √ 1 2 √ 2 k u − (u , w )w k = 0 − 2 2 0 2 2 1 1 √ 3 2 2 −1 √
= 0 = 2. 1
Thus, √ − 2 1 2 w = √ (u − (u , w )w ) = 0 . 2 2 2 1 1 √ 2 2 2
50 / 74 The Gram-Schmidt Orthogonalization Algorithm Example cont’d
Thus, the orthonormal basis we are seeking consists of the vectors: √ √ 2 2 2 − 2 0 and 0 . √ √ 2 2 2 2
51 / 74 The Gram-Schmidt Orthogonalization Algorithm
Example
Let f1, f2, f3 ∈ C[0, 1] be the functions defined as
f1(x) = 1 + x
f2(x) = 1 − x 2 f3(x) = x for x ∈ [0, 1]. These functions are linearly independent because if af1 + bf2 + cf3 = 0, we have the following implications:
x = 0 → a + b = 0, x = 1 → 2a + c = 0, 1 3a b c x = 2 → 2 + 2 + 4 = 0, which yield a = b = c = 0. Thus, the subspace hf1, f2, f3i of C[0, 1] has dimension 3.
52 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)
The set of pairwise orthonormal functions we seek is denoted by {g1, g2, g3}. For f1 we have Z 1 2 2 7 k f1 k = f1 dx = , 0 3 √ f1 3 so g1 = = √ (1 + x). kf1k 7
53 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)
The function g2 is f2 − (f2, g1)g1 g2 = . k f2 − (f2, g1)g1 k We have √ ! √ Z 1 3 3 f2 − (f2, g1)g1 = 1 − x − (1 − x) · √ (1 + x) dx √ (1 + x) 0 7 7 3 Z 1 = 1 − x − · (1 − x2) dx (1 + x) 7 0 3 2 5 9 = 1 − x − · (1 + x) = − x. 7 3 7 7
2 2 Since 5 − 9 x = R 1 5 − 9 x dx = 1 , it follows that 7√ 7 0 7 7 7 5 9 g2(x) = 7 7 − 7 x for x ∈ [0, 1].
54 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)
The third function is
f3 − (f3, g1)g1 − (f3, g2)g2 g3 = . k f3 − (f3, g1)g1 − (f3, g2)g2 k
The inner products (f3, g1) and (f3, g2) are √ √ Z 1 2 3 21 (f3, g1) = x √ (1 + x) dx = 7 12 0 √ Z 1 √ 2 5 9 7 (f3, g2) = x 7 − x dx = − . 0 7 7 12
55 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)
We have:
f3 − (f3, g1)g1 − (f3, g2)g2 √ √ ! √ 21 3 7 √ 5 9 = x2 − √ (1 + x) + 7 − x 12 7 12 7 7 3 1 = x2 − x − , 2 3 and Z 1 2 2 2 3 1 53 k f3 − (f3, g1)g1 − (f3, g2)g2 k = x − x − dx = , 0 2 3 90 which yields r90 3 1 g = x2 − x − . 3 53 2 3
56 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)
We obtained the set of pairwise orthogonal functions: √ 3 g1 = √ (1 + x), 7 √ 5 9 g = 7 − x , 2 7 7 r90 3 1 g = x2 − x − 3 53 2 3
for x ∈ [0, 1], which generate the same subspace as
f1(x) = 1 + x
f2(x) = 1 − x 2 f3(x) = x .
57 / 74 The Gram-Schmidt Orthogonalization Algorithm
Theorem n If L = (v1,..., vm) is a sequence of m vectors in R . We have
m Y 2 det(GL) 6 k vj k2 . j=1
The equality takes place only if the vectors of L are pairwise orthogonal.
58 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof
Suppose that L is linearly independent and construct the orthonormal set {y1,..., ym} yj = bj1v1 + ··· + bjj vj
for 1 6 j 6 m, using Gram-Schmidt algorithm. Since bjj 6= 0 it follows that we can write vj = cj1y1 + ··· + cjj yj
for 1 6 j 6 m so that (vj , yp) = 0 if j < p and (vj , yp) = cjp if p 6 j. Thus, we have (v1, y1)(v2, y1) ··· (vm, y1) 0 (v2, y ) ··· (vm, y ) 2 2 0 0 ··· (vm, y ) (v1,..., vm) = (y1,..., ym) 3 . . . . . . . . . 0 0 ··· (vm, ym)
59 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof
This implies 0 (v1, v1) ··· (v1, vm) v1 . . . . ··· . = . (v1,..., vm) 0 (vm, v1) ··· (vm, vm) vm (v1, y1) 0 0 (v1, y1)(v2, y1) ··· (vm, y1) (v2, y1)(v2, y2) 0 0 (v2, y2) ··· (vm, y2) = . . . . . . . . . . . . . . . (vm, y1)(vm, y2)(vm, ym) 0 0 ··· (vm, ym) Therefore, we have m m Y 2 Y 2 det(GL) = (vi , yi ) 6 (vi , vi ) , i=1 i=1 2 2 2 because (vi , yi ) 6 (vi , vi ) (yi , yi ) and (yi , yi ) = 1 for 1 6 i 6 m. Qm 2 To have det(GL) = i=1(vi , vi ) we must have vi = ki yi , that is, the vectors vi must be pairwise orthogonal. 60 / 74 The Gram-Schmidt Orthogonalization Algorithm
Theorem Let V be a finite-dimensional linear space. If U is an orthonormal set of vectors, then there exists a basis T of V that consists of orthonormal vectors such that U ⊆ T .
61 / 74 The Gram-Schmidt Orthogonalization Algorithm
Let U = {u1,..., um} be an orthonormal set of vectors in V . There is an extension of U, Z = {u1,..., um, um+1,..., un} to a basis of V , where n = dim(V ), by the Extension Corollary. Now, apply the Gram-Schmidt algorithm to the set U to produce an orthonormal basis W = {w1,..., wn} for the entire space V . It is easy to see that wi = ui for 1 6 i 6 m, so U ⊆ W and W is the orthonormal basis of V that extends the set U.
62 / 74 The Gram-Schmidt Orthogonalization Algorithm
Corollary
If A is an (m × n)-matrix with m > n having orthonormal set of columns, then there exists an (m × (m − n))-matrix B such that (AB) is an orthogonal (unitary) square matrix.
63 / 74 The Gram-Schmidt Orthogonalization Algorithm
Corollary
Let U be a subspace of an n-dimensional linear space V such that dim(U) = m, where m < n. Then dim(U⊥) = n − m.
64 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof
Let u1,..., um be an orthonormal basis of U, and let
u1,..., um, um+1,..., un
be its completion to an orthonormal basis for V . Then, um+1,..., un is a basis of the orthogonal complement U⊥, so dim(U⊥) = n − m.
65 / 74 The Gram-Schmidt Orthogonalization Algorithm
Theorem
n A subspace U of R is m-dimensional if and only if is the set of solution of (n−m)×n an homogeneous linear system Ax = 0, where A ∈ R is a full-rank matrix.
66 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof
n Suppose that U is an m-dimensional subspace of R . If v1,..., vn−m is a 0 basis of the orthogonal complement of U, then vi x = 0 for every x ∈ U and 1 6 i ≤ n − m. These conditions are equivalent to the equality
0 0 0 (v1 v2 ··· vn−m)x = 0,
which shows that U is the set of solution of an homogeneous linear 0 0 0 Ax = 0, where A = (v1 v2 ··· vn−m). (n−m)×n Conversely, if A ∈ R is a full-rank matrix, then the set of solutions of the homogeneous system Ax = b is the null subspace of A and, therefore is an m-dimensional subspace.
67 / 74 QR Decomposition of Matrices QR Factorization
Theorem m×n Let A ∈ C be a full-rank matrix with rank(A) = n 6 m. Then A can m×n n×n be factored as A = QR, where Q ∈ C and R ∈ C such that 1 the columns of Q constitute an orthonormal basis for Img(A), and 2 R is an upper triangular invertible matrix such that its diagonal elements are real non-negative numbers, that is, rii > 0 for 1 6 i 6 n.
68 / 74 QR Decomposition of Matrices Proof
Let u1,..., un be the columns of A. Since rank(A) = n, these columns constitute a basis for Img(A). Starting from u1,..., un construct an orthonormal basis for Img(A) using the Gram-Schmidt algorithm. Define Q as the orthogonal matrix
Q = (w1,..., wn).
We have hu1,..., uki = hw1,..., wki for 1 6 k 6 n, so
uk = r1k w1 + ··· + rkk wk r1k . . rkk = Q . 0 . . 0
69 / 74 QR Decomposition of Matrices
We may assume that rkk > 0; otherwise, that is, if rkk < 0, replace wk by −wk . Clearly, rank(Q) = n. Since rank(A) 6 min{rank(Q), rank(R)}, it follows that rank(R) = n, so R is an invertible matrix. Therefore, rkk > 0 for 1 6 k 6 n.
70 / 74 Gram-Schmidt Algorithm in R What to do in R
Several packages in R implement the Gram-Schmidt algorithm: far, pracma, etc. select a mirror, and install a package; consult documentation.
71 / 74 Gram-Schmidt Algorithm in R Example
Consider the matrix 1 0 A = 0 1 1 0 This can be entered as > mat < − matrix(c(1,0,1,1,1,0),nrow=3,ncol=2) > orth1 < − orthonormalization(mat,basis=FALSE,norm=FALSE) > orth1
[, 1] [, 2] [1, ] 1 0.5 [2, ] 0 1.0 [3, ] 1 −0.5
72 / 74 Gram-Schmidt Algorithm in R
> orth1 < − orthonormalization(mat,basis=FALSE,norm=FALSE) > orth1
[, 1] [, 2] [1, ] 1 0.5 [2, ] 0 1.0 [3, ] 1 −0.5
73 / 74 Gram-Schmidt Algorithm in R
> orth1 < − orthonormalization(mat,basis=TRUE,norm=TRUE) > orth1
[, 1] [, 2] [, 3] [1, ] 0.7071068 0.4082483 −0.5773503 [2, ] 0.0000000 0.8164966 0.5773503 [3, ] 0.7071068 −0.4082483 0.5773503
74 / 74