<<

Inner Products and Norms (part III)

Prof. Dan A. Simovici

UMB

1 / 74 Outline

1 Approximating Subspaces

2 Gram Matrices

3 The Gram-Schmidt Orthogonalization Algorithm

4 QR Decomposition of Matrices

5 Gram-Schmidt Algorithm in R

2 / 74 Approximating Subspaces

Definition A subspace T of a inner product linear space is an approximating subspace if for every x ∈ L there is a unique element in T that is closest to x.

Theorem Let T be a subspace in the L. If x ∈ L and t ∈ T , then x − t ∈ T ⊥ if and only if t is the unique element of T closest to x.

3 / 74 Approximating Subspaces Proof

Suppose that x − t ∈ T ⊥. Then, for any u ∈ T we have

k x − u k2=k (x − t) + (t − u) k2=k x − t k2 + k t − u k2,

by observing that x − t ∈ T ⊥ and t − u ∈ T and applying Pythagora’s 2 2 Theorem to x − t and t − u. Therefore, we have k x − u k >k x − t k , so t is the unique element of T closest to x.

4 / 74 Approximating Subspaces Proof (cont’d)

Conversely, suppose that t is the unique element of T closest to x and x − t 6∈ T ⊥, that is, there exists u ∈ T such that (x − t, u) 6= 0. This implies, of course, that u 6= 0L. We have k x − (t + au) k2=k x − t − au k2=k x − t k2 −2(x − t, au) + |a|2 k u k2 . 2 2 Since k x − (t + au) k >k x − t k (by the definition of t), we have 2 2 1 −2(x − t, au) + |a| k u k > 0 for every a ∈ F. For a = kuk2 (x − t, u) we have: 1 1 −2(x − t, (x − t, u)u) + | (x − t, u)|2 k u k2 k u k2 k u k2 1 1 = −2(x − t, (x − t, u)u) + | (x − t, u)|2 k u k2 k u k2 k u k2 |(x − t, u)|2 |(x − t, u)|2 = −2 + k u k2 k u k2 |(x − t, u)|2 = − 0, k u k2 > which is a contradiction. 5 / 74 Approximating Subspaces

Theorem A subspace T of an inner product linear space L is an approximating ⊥ subspace L if and only if L = T  T .

6 / 74 Approximating Subspaces Proof

Let T be an approximating subspace of L and let x ∈ L. We have x − t ∈ T ⊥, where t is the element of T that best approximates x. If y = x − t, we can write x uniquely as x = t + y, where t ∈ T and ⊥ ⊥ y ∈ T , so L = T  T . ⊥ Conversely, suppose that L = T  T , where T is a subspace of L. Every x ∈ L can be uniquely written as x = t + y, where t ∈ T and y ∈ T ⊥, so x − t ∈ T ⊥. Thus t is the element in T that is closest to x, so T is an approximating subspace of L.

7 / 74 Approximating Subspaces

Theorem Any subspace T of a finite-dimensional inner product linear space L is an approximating subspace of L.

8 / 74 Approximating Subspaces

⊥ Let T be a subspace of L. It suffices to show that L = T  T . ⊥ If T = {0L}, then T = L and the statement is immediate. Therefore, we can assume that T 6= {0L}. We need to verify only that every x ∈ L can be uniquely written as a sum x = t + v, where t ∈ T and v ∈ T ⊥. Let t1,..., tm be an orthonormal of T , that is, a basis such that ( 1 if i = j, (ti , tj ) = 0 otherwise, for 1 6 i, j > m. Define t = (x, t1)t1 + ··· + (x, tm)tm and v = x − t.

9 / 74 Approximating Subspaces Proof (cont’d)

The vector v is orthogonal to every vector ti because

(v, ti ) = (x − t, ti ) = (x, ti ) − (t, ti ) = 0.

Therefore v ∈ T ⊥ and x has the necessary decomposition. To prove that the decomposition is unique suppose that x = s + w, where s ∈ T and ⊥ w ∈ T⊥. Since s + w = t + v we have s − t = v − w ∈ T ∩ T = {0L}, which implies s = t and w = v.

10 / 74 Approximating Subspaces

Theorem Let T be a subspace of an inner product space L. We have (T ⊥)⊥ = T .

11 / 74 Approximating Subspaces

Observe that T ⊆ (T ⊥)⊥. Indeed, if t ∈ T , then (t, z) = 0 for every z ∈ T ⊥, so t ∈ (T ⊥)⊥. To prove the reverse inclusion, let x ∈ (T ⊥)⊥. We can write x = u + v, where u ∈ T and v ∈ T ⊥, so x − u = v ∈ T ⊥. Since T ⊆ (T ⊥)⊥, we have u ∈ (T ⊥)⊥, so x − u ∈ (T ⊥)⊥. Consequently, x − u ∈ T ⊥ ∩ (T ⊥)⊥ = {0}, so x = u ∈ T . Thus, (T ⊥)⊥ ⊆ T , which concludes the argument.

12 / 74 Approximating Subspaces

Corollary n ⊥ ⊥ Let Z be a subset of R . We have (Z ) = hZi.

13 / 74 Approximating Subspaces Proof

n ⊥ ⊥ Let Z be a subset of R . Since Z ⊆ hZi it follows that hZi ⊆ Z . Let ⊥ now y ∈ Z and let z = a1z1 + ··· + apzp ∈ hZi, where z1,..., zp ∈ Z. Since (y, z) = a1(y, z1) + ··· + ap(y, zp) = 0, it follows that y ∈ hZi⊥. Thus, we have Z ⊥ = hZi⊥. ⊥ ⊥ ⊥ ⊥ n This allows us to write (Z ) = (hZi ) . Since hZi is a subspace of R , we have (hZi⊥)⊥ = hZi, so (Z ⊥)⊥ = hZi.

14 / 74 Approximating Subspaces

Let W = {w1,..., wn} be a basis in the real n-dimensional inner product space L. If x = x1w1 + ··· + xnwn and y = y1w1 + ··· + ynwn, then

n n X X (x, y) = xi yj (wi , wj ), i=1 j=1 due to the bilinearity of the inner product. n×n Let A = (aij ) ∈ R be the defined by aij = (wi , wj ) for 1 6 i, j 6 n. The symmetry of the inner product implies that the matrix A itself is symmetric. Now, the inner product can be expressed as   y1  .  (x, y) = (x1,..., xn)A  .  . yn

We refer to A as the matrix associated with W .

15 / 74 Approximating Subspaces

Theorem n Let S be a subspace of R such that dim(S) = k. There exists a matrix n×k A ∈ R having orthonormal columns such that S = Ran(A).

16 / 74 Approximating Subspaces Proof

Let v1,..., vk be an of S. Define the matrix A as A = (v1,..., vk ). We have x ∈ S, if and only if x = a1v1 + ··· + ak vk , which is equivalent to x = Aa. This amounts to x ∈ Ran(A), so S = Ran(A).

17 / 74 Approximating Subspaces

For an orthonormal basis in an n-dimensional space, the associated matrix is the In. In this case, we have

(x, y) = x1y1 + x2y2 + ··· + xnyn for x, y ∈ L. Observe that if W = {w1,..., wn} is an orthonormal set and x ∈ hWi, which means that x = a1w1 + ··· + anwn, then ai = (x, wi ) for 1 6 i 6 n.

18 / 74 Approximating Subspaces

Let W = {w1,..., wn} be an orthonormal set and let x ∈ hWi. The equality x = (x, w1)w1 + ··· + (x, wn)wn is the Fourier expansion of x with respect to the orthonormal set W . Furthermore, we have Parseval’s equality:

n 2 X 2 k x k = (x, x) = (x, wi ) . i=1

Thus, if 1 6 q 6 n we have q X 2 2 (x, wi ) 6k x k . i=1

19 / 74 Gram Matrices Gram Matrices

Definition

Let V = (v1,..., vm) be a sequence of vectors in an inner product space. m×m The of this sequence is the matrix GV = (gij ) ∈ R defined by gij = (vi , vj ) for 1 6 i, j 6 m.

Note that GV is a .

20 / 74 Gram Matrices

Theorem

Let V = (v1,..., vm) be a sequence of m vectors in an inner product linear space (L, (·, ·)). If {v1,..., vm} is linearly independent, then the Gram matrix GV is positive definite.

21 / 74 Gram Matrices Proof

m Suppose that V is linearly independent and let x ∈ R . We have

m m 0 X X x GVx = xi (vi , vj )xj i=1 j=1  m m  X X =  xi vi , xj vj  i=1 j=1 m 2 X = xi vi > 0. i=1

0 Therefore, if x GVx = 0, we have x1v1 + ··· + xnvn = 0. Since {v1,..., vm} is linearly independent it follows that x1 = ··· = xm = 0, so x = 0n. Thus, GV is indeed, positive definite.

22 / 74 Gram Matrices

Example S Let S = {x1,..., xn} be a finite set, L a C-linear space and L be the linear space that consists of function defined on S with values in L. We defined the linear basis of this space as {e1,..., en} consisting of the functions ( 1 if x = xi , ei (x) = 0 otherwise, for 1 6 i 6 n. If E = (e1,..., en), the Gram matrix of E is positive definite Pn Pn and the inner product of two functions f = i=1 ai ei and g = j=1 bj ej is

n n ! X X (f , g) = ai ei , bj ej i=1 i=1 n n X X = ai (ei , ej )bj . i=1 i=1

23 / 74 Gram Matrices

The Gram matrix of an arbitrary sequence is positive semidefinite, as the reader can easily verify. Definition

Let V = (v1,..., vm) be a sequence of m elements of an inner product real linear space. The Gramian of V is the number det(GV).

24 / 74 Gram Matrices

Theorem

If V = (v1,..., vm) is a sequence of elements in a inner product real linear space, then V is linearly independent if and only if det(GV) 6= 0.

25 / 74 Gram Matrices Proof

Suppose that det(GV) 6= 0 and that V is not linearly independent. In other words, the numbers a1,..., am exists such that at least one of them is not 0 and a1x1 + ··· + amxm = 0. This implies the equalities

a1(x1, xj ) + ··· + am(xm, xj ) = 0

for 1 6 j 6 m, so the system GVa = 0n has a non-trivial solution in a1,..., am. This implies det(GV) = 0, which contradicts the initial assumption. Conversely, suppose that V is linearly independent and det(GV) = 0. Then, the linear system

a1(x1, xj ) + ··· + am(xm, xj ) = 0,

for 1 6 j 6 m, has a non-trivial solution in a1,..., am. If w = a1x1 + ··· amxm, this amounts to (w, xi ) = 0 for 1 6 i 6 m. This, in turn, implies (w, w) = 0,

so w = 0, which contradicts the of V. 26 / 74 Gram Matrices

If L is a inner product complex linear space, then the Gram matrix of V is a . n×n The next theorem shows that every positive definite matrix A ∈ C can be regarded as a Gram matrix of a vector sequence. It has a very important counterpart in the framework of Hilbert spaces known as Mercer’s Theorem. Theorem n×n (Cholesky’s Decomposition Theorem) Let A ∈ C be a Hermitian positive definite matrix. There exists a unique upper R with real, positive diagonal elements such that A = R HR.

27 / 74 Gram Matrices Proof

The argument is by induction on n > 1. The base step, n = 1, is immediate. Suppose that a decomposition exists for all Hermitian positive matrices of (n+1)×(n+1) order n, and let A ∈ C be a symmetric and positive definite matrix. We can write a aH A = 11 , a B n×n n×n where B ∈ C . Note that a11 > 0 and B ∈ C is a Hermitian positive definite matrix. It is easy to verify the identity: √ √ !  0  1 H! a11 0 1 0 a11 √ a a11 A = 1 1 H . √ a In 0 B − aa a11 a11 0 In

28 / 74 Gram Matrices Proof (cont’d)

n×n Let R1 ∈ C be the upper triangular non-singular matrix √ ! √1 H a11 a a R1 = 11 . 0 In This allows us to write  0  H 1 0 A = R1 R1, 0 A1 where A = B − 1 aaH. Since 1 a11  0  1 0 −1 H −1 = (R1 ) AR1 , 0 A1 the matrix 1 00  0 A1 is positive definite, so A = B − 1 aaH ∈ n×n is a Hermitian positive 1 a11 C

definite matrix. 29 / 74 Gram Matrices Proof (cont’d)

By the inductive hypothesis, A1 can be factored as

H A1 = P P,

where P is an upper triangular matrix. Therefore

1 00  1 00  1 00 = H 0 A1 0 P 0 P

Thus, 1 00  1 00 A = R H R 1 0 P H 0 P 1

30 / 74 Gram Matrices Proof (cont’d)

If R is defined as √ ! √ !  0  0 √1 H √1 H 1 0 1 0 a11 a a a11 a a R = R1 = 11 = 11 , 0 P 0 P 0 In 0 P

then A = R HR and R is clearly an upper triangular matrix. We refer to the matrix R as the Cholesky factor of A.

31 / 74 Gram Matrices

Corollary

n×n If A ∈ C is a Hermitian positive definite matrix, then det(A) > 0.

32 / 74 Gram Matrices Proof

We have A = R HR, where R is an upper triangular matrix with real, positive diagonal elements, so det(A) = det(R H) det(R) = (det(R))2. Since det(R) is the product of its diagonal elements, det(R) is a real, positive number, which implies det(A) > 0.

33 / 74 Gram Matrices

Example Let A be the symmetric matrix

3 0 2 A = 0 2 1 . 2 1 2

This matrix is positive definite and it can be written as √ √  3 0 0    3 0 √2  1 0 0 3 A =  0 1 0 0 2 1  0 1 0  , √2 0 1 2 3 0 1 3 0 0 1 because       2 1 1 0  2 1 A1 = − 0 2 = 2 . 1 2 3 2 1 3

34 / 74 Gram Matrices Example (cont’d)

Applying the same equality to A1 we have √ ! √ ! 2 0 1 0 2 √1 A = 2 . 1 √1 1 1 2 0 6 0 1

1 Since the matrix ( 6 ) can be factored directly we have √ ! ! ! √ ! 2 0 1 0 1 0 2 √1 A = 2 1 √1 1 0 √1 0 √1 2 6 6 0 1 √ ! √ ! 2 √1 2 0 2 = 1 1 . √ √ 0 √1 2 6 6

35 / 74 Gram Matrices Example (cont’d)

In turn, this implies

    1 0 0  1 0 0 1√ 0 0 √ 0 2 √1  0 2 1 = 0 2 0  2 , 1 1   0 1 2 0 √ √ 0 0 √1 3 2 6 6

36 / 74 Gram Matrices Example (cont’d)

The Cholesky final decomposition of A is √   √  3 0 0 1 0 0  1 0 0  3 0 √2  √ √ 3 0 2 √1  A =  0 1 0 0 2 0  2  0 1 0  2 1 1   √ 0 1 0 √ √ 0 0 √1 3 2 6 6 0 0 1 √ √    3 0 √2  3√ 0 0 √ 3 =  0 2 0   0 2 √1  . 2  2 1 1    √ √ √ 0 0 √1 3 2 6 6

37 / 74 Gram Matrices

Cholesky’s Decomposition Theorem can be extended to positive semi-definite matrices. Theorem (Cholesky’s Decomposition Theorem for Positive Semidefinite Matrices) n×n Let A ∈ C be a Hermitian positive semidefinite matrix. There exists an upper triangular matrix R with real, non-negative diagonal elements such that A = R HR.

38 / 74 Gram Matrices

Observe that for positive semidefinite matrices, the diagonal elements of R are non-negative numbers and the uniqueness of R does not longer hold. Example  1 −1 Let A = . Since x0Ax = (x − x )2, it is clear that A is a −1 1 1 2 positive semidefinite but not a positive definite matrix. Let R be a matrix of the form r r  R = 1 0 r2 such that A = R0R. It is easy to see that the last equality is equivalent to 2 2 r1 = r2 = 1 and rr1 = −1. Thus, we have four distinct Cholesky factors, namely 1 −1 1 −1 −1 1 −1 1  , , , . 0 1 0 −1 0 1 0 −1

39 / 74 Gram Matrices

Theorem

n×n A Hermitian matrix A ∈ C is positive definite if and only if all its leading principal minors are positive.

40 / 74 Gram Matrices Proof

If A is positive definite, then every principal submatrix is positive definite, so each principal of A is positive. n×n Conversely, suppose that A ∈ C is an Hermitian matrix having positive leading principal minors. We prove by induction on n that A is positive definite. The base case, n = 1 is immediate. Suppose that the statement holds for (n−1)×(n−1) matrices in C . Note that A can be written as  B b A = , bH a

(n−1)×(n−1) where B ∈ C is a Hermitian matrix. Since the leading minors of B are the first n − 1 leading minors of A it follows, by the inductive hypothesis, that B is positive definite. Thus, there exists a B = R HR, where R is an upper triangular matrix with real, positive diagonal elements. Since R is invertible, let w = (R H)−1b.

41 / 74 Gram Matrices Proof (cont’d)

The matrix B is invertible. Therefore, we have det(A) = det(B)(a − bHB−1b) > 0. Since det(B) > 0 it follows that H −1 −1 a > b B b. We observed that if B is positive definite, then so is B . 2 Therefore, a > 0 is and we can write a = c for some positive c. This allows us to write  R H 0  R w A = = C HC, wsH c 0H c

where C is the upper triangular matrix with positive elements

 R w C = 0H c

This implies immediately the positive definiteness of A.

42 / 74 Gram Matrices

n×n Let A, B ∈ C . We write A B if A − B 0, that is, if A − B is a positive definite matrix. Similarly, we write A  B if A − B  O, that is, if A − B is positive semidefinite. Theorem

n×n Let A0, A1,..., Am be m + 1 matrices in C such that A0 is positive definite and all matrices are Hermitian. There exists a > 0 such that for m any t ∈ [−a, a] the matrix Bm(t) = A0 + A1t + ··· + Amt is positive definite.

43 / 74 Gram Matrices Proof

H Since all matrices A0,..., Am are Hermitian, note that x Ai x are real H numbers for 0 6 i 6 m. Therefore, pm(t) = x Bm(t)x is a polynomial in t H with real coefficients and pm(0) = x A0x is a positive number if x 6= 0. Since pm is a continuous function there exists an interval [−a, a] such that t ∈ [−a, a] implies pm(t) > 0 if x 6= 0. This shows that Bm(t) is positive definite.

44 / 74 The Gram-Schmidt Orthogonalization Algorithm

The Gram-Schmidt algorithm constructs an orthonormal basis for a n subspace U of C , starting from an arbitrary basis of {u1,..., um} of U. The orthonormal basis is constructed sequentially such that hw1,..., wki = hu1,..., uki for 1 6 k 6 m.

45 / 74 The Gram-Schmidt Orthogonalization Algorithm The Gram-Schmidt Algorithm

1 Note that W1 = hw1i = hu1i, which allows us to define w1 = u1. ku1k Note that k w1 k= 1. Suppose that we have constructed Wk as an orthonormal basis for hu1,..., uki and we seek to construct wk+1 such that {w1,..., wk , wk+1} is an orthonormal basis for hu1,..., uk , uk+1i. The Fourier expansion of uk+1 relative to the orthonormal basis {w1,..., wk , wk+1} is

k+1 X uk+1 = (uk+1, wi )wi , i=1 which implies that Pk uk+1 − i=1(uk+1, wi )wi wk+1 = (uk+1, wk+1)

46 / 74 The Gram-Schmidt Orthogonalization Algorithm

Note that by Fourier expansion of uk+1 with respect to the orthonormal set {w1,..., wk , wk+1} we have k 2 X 2 2 k uk+1 k = (uk+1, wi ) + (uk+1, wk+1) . i=1 Therefore, k 2 X uk+1 − (uk+1, wi )wi i=1 k k ! X X = uk+1 − (uk+1, wi )wi , uk+1 − (uk+1, wi )wi i=1 i=1 k k 2 X 2 X 2 = k uk+1 k −2 (uk+1, wi ) + (uk+1, wi ) i=1 i=1 k 2 X 2 2 = k uk+1 k − (uk+1, wi ) = (uk+1, wk+1) . i=1

47 / 74 The Gram-Schmidt Orthogonalization Algorithm

The last equalities imply

k X uk+1 − (uk+1, wi )wi = |(uk+1, wk+1)| i=1 It follows that we can define Pk uk+1 − i=1(uk+1, wi )wi wk+1 = , Pk uk+1 − i=1(uk+1, wi )wi or as Pk uk+1 − i=1(uk+1, wi )wi wk+1 = − . Pk uk+1 − i=1(uk+1, wi )wi

48 / 74 The Gram-Schmidt Orthogonalization Algorithm Example

3×2 Let A ∈ R be the matrix 1 1 A = 0 0 . 1 3

3 It is easy to see that (A) = 2. We have {u1, u2} ⊆ R and we construct an orthogonal basis for the subspace generated by these columns. We begin by defining √  2  1 2 w = u =  0  . 1 1  √  k u1 k2 2 2

49 / 74 The Gram-Schmidt Orthogonalization Algorithm Example cont’d

u2−(u2,w1)w1 The second vector w2 is w2 = . Note that √ √ √ku2−(u2,w1)w1k 2 2 (u2, w1) = 1 · 2 + 3 · 2 = 2 2, so  √  1 2 √ 2 k u − (u , w )w k = 0 − 2 2  0  2 2 1 1    √  3 2 2   −1 √

=  0  = 2. 1

Thus,  √  − 2 1 2 w = √ (u − (u , w )w ) =  0  . 2 2 2 1 1  √  2 2 2

50 / 74 The Gram-Schmidt Orthogonalization Algorithm Example cont’d

Thus, the orthonormal basis we are seeking consists of the vectors: √ √  2   2  2 − 2  0  and  0  .  √   √  2 2 2 2

51 / 74 The Gram-Schmidt Orthogonalization Algorithm

Example

Let f1, f2, f3 ∈ C[0, 1] be the functions defined as

f1(x) = 1 + x

f2(x) = 1 − x 2 f3(x) = x for x ∈ [0, 1]. These functions are linearly independent because if af1 + bf2 + cf3 = 0, we have the following implications:

x = 0 → a + b = 0, x = 1 → 2a + c = 0, 1 3a b c x = 2 → 2 + 2 + 4 = 0, which yield a = b = c = 0. Thus, the subspace hf1, f2, f3i of C[0, 1] has 3.

52 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)

The set of pairwise orthonormal functions we seek is denoted by {g1, g2, g3}. For f1 we have Z 1 2 2 7 k f1 k = f1 dx = , 0 3 √ f1 3 so g1 = = √ (1 + x). kf1k 7

53 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)

The function g2 is f2 − (f2, g1)g1 g2 = . k f2 − (f2, g1)g1 k We have √ ! √ Z 1 3 3 f2 − (f2, g1)g1 = 1 − x − (1 − x) · √ (1 + x) dx √ (1 + x) 0 7 7 3 Z 1  = 1 − x − · (1 − x2) dx (1 + x) 7 0 3 2 5 9 = 1 − x − · (1 + x) = − x. 7 3 7 7

2 2 Since 5 − 9 x = R 1 5 − 9 x dx = 1 , it follows that 7√ 7 0 7 7 7 5 9  g2(x) = 7 7 − 7 x for x ∈ [0, 1].

54 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)

The third function is

f3 − (f3, g1)g1 − (f3, g2)g2 g3 = . k f3 − (f3, g1)g1 − (f3, g2)g2 k

The inner products (f3, g1) and (f3, g2) are √ √ Z 1 2 3 21 (f3, g1) = x √ (1 + x) dx = 7 12 0 √ Z 1 √   2 5 9 7 (f3, g2) = x 7 − x dx = − . 0 7 7 12

55 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)

We have:

f3 − (f3, g1)g1 − (f3, g2)g2 √ √ ! √ 21 3 7 √ 5 9  = x2 − √ (1 + x) + 7 − x 12 7 12 7 7 3 1 = x2 − x − , 2 3 and Z 1  2 2 2 3 1 53 k f3 − (f3, g1)g1 − (f3, g2)g2 k = x − x − dx = , 0 2 3 90 which yields r90  3 1 g = x2 − x − . 3 53 2 3

56 / 74 The Gram-Schmidt Orthogonalization Algorithm Example (cont’d)

We obtained the set of pairwise orthogonal functions: √ 3 g1 = √ (1 + x), 7 √ 5 9  g = 7 − x , 2 7 7 r90  3 1 g = x2 − x − 3 53 2 3

for x ∈ [0, 1], which generate the same subspace as

f1(x) = 1 + x

f2(x) = 1 − x 2 f3(x) = x .

57 / 74 The Gram-Schmidt Orthogonalization Algorithm

Theorem n If L = (v1,..., vm) is a sequence of m vectors in R . We have

m Y 2 det(GL) 6 k vj k2 . j=1

The equality takes place only if the vectors of L are pairwise orthogonal.

58 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof

Suppose that L is linearly independent and construct the orthonormal set {y1,..., ym} yj = bj1v1 + ··· + bjj vj

for 1 6 j 6 m, using Gram-Schmidt algorithm. Since bjj 6= 0 it follows that we can write vj = cj1y1 + ··· + cjj yj

for 1 6 j 6 m so that (vj , yp) = 0 if j < p and (vj , yp) = cjp if p 6 j. Thus, we have   (v1, y1)(v2, y1) ··· (vm, y1)  0 (v2, y ) ··· (vm, y )  2 2   0 0 ··· (vm, y ) (v1,..., vm) = (y1,..., ym)  3  .  . . . .   . . . .  0 0 ··· (vm, ym)

59 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof

This implies    0  (v1, v1) ··· (v1, vm) v1  . .   .   . ··· .  =  .  (v1,..., vm) 0 (vm, v1) ··· (vm, vm) vm     (v1, y1) 0 0 (v1, y1)(v2, y1) ··· (vm, y1) (v2, y1)(v2, y2) 0   0 (v2, y2) ··· (vm, y2) =     .  . . .   . . . .   . . .   . . . .  (vm, y1)(vm, y2)(vm, ym) 0 0 ··· (vm, ym) Therefore, we have m m Y 2 Y 2 det(GL) = (vi , yi ) 6 (vi , vi ) , i=1 i=1 2 2 2 because (vi , yi ) 6 (vi , vi ) (yi , yi ) and (yi , yi ) = 1 for 1 6 i 6 m. Qm 2 To have det(GL) = i=1(vi , vi ) we must have vi = ki yi , that is, the vectors vi must be pairwise orthogonal. 60 / 74 The Gram-Schmidt Orthogonalization Algorithm

Theorem Let V be a finite-dimensional linear space. If U is an orthonormal set of vectors, then there exists a basis T of V that consists of orthonormal vectors such that U ⊆ T .

61 / 74 The Gram-Schmidt Orthogonalization Algorithm

Let U = {u1,..., um} be an orthonormal set of vectors in V . There is an extension of U, Z = {u1,..., um, um+1,..., un} to a basis of V , where n = dim(V ), by the Extension Corollary. Now, apply the Gram-Schmidt algorithm to the set U to produce an orthonormal basis W = {w1,..., wn} for the entire space V . It is easy to see that wi = ui for 1 6 i 6 m, so U ⊆ W and W is the orthonormal basis of V that extends the set U.

62 / 74 The Gram-Schmidt Orthogonalization Algorithm

Corollary

If A is an (m × n)-matrix with m > n having orthonormal set of columns, then there exists an (m × (m − n))-matrix B such that (AB) is an orthogonal (unitary) .

63 / 74 The Gram-Schmidt Orthogonalization Algorithm

Corollary

Let U be a subspace of an n-dimensional linear space V such that dim(U) = m, where m < n. Then dim(U⊥) = n − m.

64 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof

Let u1,..., um be an orthonormal basis of U, and let

u1,..., um, um+1,..., un

be its completion to an orthonormal basis for V . Then, um+1,..., un is a basis of the orthogonal complement U⊥, so dim(U⊥) = n − m.

65 / 74 The Gram-Schmidt Orthogonalization Algorithm

Theorem

n A subspace U of R is m-dimensional if and only if is the set of solution of (n−m)×n an homogeneous linear system Ax = 0, where A ∈ R is a full-rank matrix.

66 / 74 The Gram-Schmidt Orthogonalization Algorithm Proof

n Suppose that U is an m-dimensional subspace of R . If v1,..., vn−m is a 0 basis of the orthogonal complement of U, then vi x = 0 for every x ∈ U and 1 6 i ≤ n − m. These conditions are equivalent to the equality

0 0 0 (v1 v2 ··· vn−m)x = 0,

which shows that U is the set of solution of an homogeneous linear 0 0 0 Ax = 0, where A = (v1 v2 ··· vn−m). (n−m)×n Conversely, if A ∈ R is a full-rank matrix, then the set of solutions of the homogeneous system Ax = b is the null subspace of A and, therefore is an m-dimensional subspace.

67 / 74 QR Decomposition of Matrices QR Factorization

Theorem m×n Let A ∈ C be a full-rank matrix with rank(A) = n 6 m. Then A can m×n n×n be factored as A = QR, where Q ∈ C and R ∈ C such that 1 the columns of Q constitute an orthonormal basis for Img(A), and 2 R is an upper triangular such that its diagonal elements are real non-negative numbers, that is, rii > 0 for 1 6 i 6 n.

68 / 74 QR Decomposition of Matrices Proof

Let u1,..., un be the columns of A. Since rank(A) = n, these columns constitute a basis for Img(A). Starting from u1,..., un construct an orthonormal basis for Img(A) using the Gram-Schmidt algorithm. Define Q as the

Q = (w1,..., wn).

We have hu1,..., uki = hw1,..., wki for 1 6 k 6 n, so

uk = r1k w1 + ··· + rkk wk   r1k  .   .    rkk  = Q   .  0     .   .  0

69 / 74 QR Decomposition of Matrices

We may assume that rkk > 0; otherwise, that is, if rkk < 0, replace wk by −wk . Clearly, rank(Q) = n. Since rank(A) 6 min{rank(Q), rank(R)}, it follows that rank(R) = n, so R is an invertible matrix. Therefore, rkk > 0 for 1 6 k 6 n.

70 / 74 Gram-Schmidt Algorithm in R What to do in R

Several packages in R implement the Gram-Schmidt algorithm: far, pracma, etc. select a mirror, and install a package; consult documentation.

71 / 74 Gram-Schmidt Algorithm in R Example

Consider the matrix 1 0 A = 0 1 1 0 This can be entered as > mat < − matrix(c(1,0,1,1,1,0),nrow=3,ncol=2) > orth1 < − orthonormalization(mat,basis=FALSE,norm=FALSE) > orth1

 [, 1] [, 2]  [1, ] 1 0.5    [2, ] 0 1.0  [3, ] 1 −0.5

72 / 74 Gram-Schmidt Algorithm in R

> orth1 < − orthonormalization(mat,basis=FALSE,norm=FALSE) > orth1

 [, 1] [, 2]  [1, ] 1 0.5    [2, ] 0 1.0  [3, ] 1 −0.5

73 / 74 Gram-Schmidt Algorithm in R

> orth1 < − orthonormalization(mat,basis=TRUE,norm=TRUE) > orth1

 [, 1] [, 2] [, 3]  [1, ] 0.7071068 0.4082483 −0.5773503   [2, ] 0.0000000 0.8164966 0.5773503  [3, ] 0.7071068 −0.4082483 0.5773503

74 / 74